Three weeks for the entire project, including approval, planning, launch, analytics and reporting. Six members on the research team.
Diary study: 32 respondents recording every trip — voice notes, photos, geolocation — in Telegram for 7 days. Without automation, two researchers would spend an hour per diary every day. In one day, the entire infrastructure was built: Telegram bot, Whisper transcription, auto-upload to Google Drive.
Day 1 of the "Diary Navigation" project. A research agency is preparing to launch a diary study: 32 respondents will document every trip around the city in Telegram groups over the course of 7 days. The methodology is complex, the team is distributed, and there will be a lot of data.
The project manager is not a developer. Coding experience: zero. Tool: Cursor (an IDE with an AI assistant). The goal: build a working infrastructure in a single day that will operate throughout the entire week of fieldwork.
/kb)The entire dialogue is ~5,800 lines of conversation in Cursor. The project manager described tasks via voice messages and text, the AI assistant wrote code, explained every step, and debugged errors from screenshots and logs. Every decision was made collaboratively: AI proposed options, the human chose.
Key principles that emerged along the way:
This is not a story about a programmer building a bot. This is a story about a researcher who had never written code building a working infrastructure for fieldwork in a single day.
The diary study methodology spans dozens of pages: how to describe problems and drivers, what to say at the kickoff meeting, how to document findings, entry formats, checklists, examples and anti-examples. The team is distributed: researchers, recruiters, the client.
What was inconvenient:
What was needed:
Architecture:
/kb <question> -> answer from the knowledge baseModel choice: gpt-4.1-mini. For a FAQ bot with retrieval — sufficient. Can be switched with a single line later.
For researchers — ask "how do we describe a barrier?" and get an exact excerpt from the methodology in 3 seconds, without opening Google Drive.
For the project manager — no more answering the same questions. The bot answers more precisely because it quotes the document rather than paraphrasing from memory.
The bot's code was written, but it was sitting on a local computer. The project manager had never deployed an application, never worked with GitHub.
A step-by-step process: GitHub -> Git config -> Push -> Render -> Webhook -> Debug. Two bugs were resolved in minutes (outdated SDK version, unpaid API).
Solution: Render Free. Wake-up delays are not critical for a research bot.
Google Drive for Desktop for methodology (two-way sync) + rclone for diaries (Drive -> local disk, copy only).
Automatic collection of all content from 32 chats: text, photos, audio, video -> Google Drive in structured folders. Transcription via Whisper.
Two key commands: "I updated the methodology" and "sync the diaries." A pipeline cheat sheet for the team.
32 respondents will keep diaries in Telegram groups: text, voice messages, photos, video circles. Each in a separate chat with a researcher. That's ~150-200 entries per day, scattered across 32 groups.
What was inconvenient:
What was needed:
1. Bot polls chat history 3 times a day
Pros: you can set a schedule, nothing gets missed. Cons: The Telegram API does not allow a bot to read chat history. A bot only receives what arrives in real time via webhook.
Decision: rejected. This is a Telegram API limitation — the bot only sees new messages.
2. Real-time collection via webhook (chosen)
Pros: nothing gets lost, every message is processed immediately. Cons: if the server goes down — messages are lost (Telegram retries the webhook several times, but not indefinitely).
An unexpected difficulty: the Google service account only sees "My Drive," while Google Drive for Desktop syncs files to the "Computers" section. These are different locations.
The project manager was surprised: "The bot we have running — it will save to Google Drive, to my drive, right?" — correct, but not to "Computers," to "My Drive."
Solution: create a folder for diaries in "My Drive," share it with the service account. But the service account couldn't write to a personal Drive (no Shared Drive on a free account).
Final solution: OAuth access instead of a service account. Obtained a refresh_token via an authorization script, added it to Render.
1. Disabled Privacy Mode
In BotFather: /setprivacy -> Disable. Without this, the bot only sees commands (/kb) and ignores regular messages.
2. Created the folder structure 33 folders: one per respondent + one test folder. In Latin characters — Cyrillic caused errors. Subfolders with dates are created automatically on the first message.
3. Set up collection for all formats
| Format | What the bot does | What gets saved |
|---|---|---|
| Text | Saves as .md | file with the entry text |
| Photo | Downloads the file | photo in .jpg format |
| Voice message | Downloads + transcribes | audio + text transcript |
| Video circle | Downloads + transcribes audio track | video + text transcript |
| Video | Downloads the file | video in .mp4 format |
Transcription — OpenAI Whisper (whisper-1).
4. Format evolved during the process
5. File naming
First version: message_1.md, message_2.md. The project manager asked: "respondent name first, then date, number, and time." Result: structured names with respondent identifier, date, and time.
Tested on a test group where the project manager was the "respondent":
1. Respondent writes in a Telegram group (text/photo/audio/video circle)
↓
2. Telegram sends a webhook to Render
↓
3. Bot identifies the respondent by chat_id
↓
4. Creates the day folder if it doesn't exist
↓
5. Downloads the media file from Telegram servers
↓
6. If audio/video → sends to Whisper → saves transcript
↓
7. Uploads everything to Google Drive in the respondent's folder
↓
8. On the "sync diaries" command → rclone copies to the local disk
For researchers — all of a respondent's entries in one folder, with transcriptions. No need to re-listen to voice messages — the text is already there.
For the project manager — a structured archive for analysis. Every trip is in a file with a clear name.
For analysis — based on this data, monitoring cards, trip tracking tables, and analytical summaries will later be built.
The recruiter had 208 candidate applications — manual review took 5 minutes each, and the top 60 needed deep scoring at 20 minutes each. Meanwhile, every Telegram bot bug required an hour to explain context to the developer. Automated scoring processed all 208 in 15 minutes. Bug context transfer dropped from an hour to 4 minutes.
By the end of the first day, the AI assistant's context window in Cursor was full: ~5,800 lines of dialogue. The assistant started "forgetting" earlier agreements. A new session was needed — but how to pass everything that had been done?
A structured summary of 7 blocks: knowledge base, bot, diaries, Render ENV, OAuth, rclone, rules. The project manager copied the summary into the beginning of a new dialogue — and the new agent immediately understood the context.
This "shift handoff" pattern later became a permanent system: a MEMORY.md file (loaded automatically at the start of every Claude Code session) and CLAUDE.md (project instructions).
4 sequential commits, each for a specific error from Render logs:
UnboundLocalError: media_pathsInvalid file formatThe entire "bug -> fix -> deploy -> verify" cycle took 3-5 minutes per bug.
The decision "not to fix right now" saved time for more important tasks. Three optimization options were ready, but instead of overcomplicating the architecture prematurely, we chose monitoring (YAGNI).
In Cursor, the project manager was an operator: AI said "click here," she clicked. In Claude Code, the project manager became a conductor: she says "update the monitoring" — and everything happens.
| Aspect | Cursor | Claude Code |
|---|---|---|
| Commit and push | AI writes code, asks to commit | AI does commit + push itself |
| Running scripts | "Paste this command into PowerShell" | AI runs the script directly |
| Context between sessions | Manual summary copying | MEMORY.md loads automatically |
Author: project manager + Claude Code (Opus 4.6)
A diary study with 32 respondents. Recruiters send a stream of candidates from a screening questionnaire — dozens of people, each needing to be evaluated across several parameters: geography, behavioral profile, frequency of contact with the research subject, diversity of situations, tools used, and engagement.
What was inconvenient:
What was needed:
Pros: fast, no tools needed. An experienced recruiter has a feel for the "right" candidate. Cons: not reproducible (two recruiters will evaluate the same person differently), no justification for the client, classification errors go undetected, fatigue with high volume.
Decision: rejected as the sole method. But the recruiter's live expertise is retained in subsequent stages (phone screening).
Pros:
Cons: requires time to develop the rubric and script. The script doesn't see what a live recruiter sees (tone, motivation, nuances). Therefore, this is a supplement to the recruiter, not a replacement.
Each criterion is tied to the research objectives, not to an abstract "candidate quality": geography, behavioral profile, contact frequency, diversity of situations, tools used, usage frequency, variability of conditions, engagement. Each criterion's weight reflects its significance for the project goals.
Key decision: situational users (who use the tool not always, but depending on context) receive the maximum behavioral profile score. Not "always" (predictable, no barriers) and not "never" (too little material), but specifically the one who sometimes uses it and sometimes doesn't. They have the most interesting switches and triggers.
A Python script (+ openpyxl):
Thresholds: >=18 = recommended, 15-17 = conditional (clarify during the call), <15 = not recommended.
Via update scripts, scoring results were written to the recruiter's working spreadsheet — the interview schedule file. Five columns: GEO segment, quota, comment with justification, recommendation, numerical score. The recruiter opens the familiar document and sees for each candidate: why they received that score, what to pay attention to during the call.
Scoring is the first filter. Then the recruiter calls candidates from the "recommended" and "conditional" zones and checks what the script can't catch:
If the scoring flagged something as "clarify during the call" — the recruiter specifically addresses that point.
1. Candidate fills out the screening questionnaire
↓
2. Data goes into Excel (Preliminary Selection)
↓
3. Script runs 8 criteria + screeners
↓
4. Excel receives: score, recommendation, comment by criteria
↓
5. Results are transferred to the interview schedule
↓
6. Recruiter sees score and flags → calls the candidate with specific questions
↓
7. Final decision: scoring + recruiter's impression from the call
The heaviest savings day. Diary monitoring: two researchers spent an hour per day on each of 32 respondents — checking entries, quality assessment, notes. Trip tracking: 45 minutes to manually enter each trip into a spreadsheet. Two solutions automated 789 hours of routine work.
Author: project manager + Claude Code (Opus 4.6)
On the second day of the fieldwork phase, we have 6 active respondents (+ 1 test), each documenting 1-5 trips per day. Over the week, that's ~150-200 entries in Telegram groups.
What was inconvenient:
What was needed:
Pros: fully automatic, no human involvement. Cons: GPT-4.1-mini is too weak a model for quality research analysis. Superficial conclusions, no depth, won't catch subtle patterns. At minimum GPT-4 or Claude Opus is needed — and that requires an API key and additional costs.
Decision: deferred. Not the right quality level for research analysis.
Pros: high-quality analysis (Claude Opus), automatic on a schedule. Cons: requires an Anthropic API key (separate payment), prompt configuration, additional cost per request.
Decision: deferred. No API key, and at this stage it's overkill.
Pros:
Cons: not fully automatic — needs to be launched manually. But it takes 1 minute: open Claude Code -> type "update the monitoring" -> done.
Initially planned to upload cards to Google Drive. But the Google service account cannot create files (no storage quota).
Final solution: cards are stored in the bot's repository; on push to main, Render automatically deploys the updated site. Additionally, a copy is saved locally.
Two new modes added:
Mode 8 — Respondent Monitoring Card. A compact living document: who this is, what we see (patterns, barriers, drivers, discrepancies), what it means for the business, what to follow up on, trip chronology. Updated daily; each update is a delta.
Mode 9 — Daily Insight Summary. A Telegram post for the team: cross-respondent analysis for the day, key findings, trends, tasks for researchers. (Not yet technically implemented; the format is ready.)
A monitoring script:
A protected monitoring page:
Each respondent has their own monitoring card. It contains: profile, accumulated patterns, barriers and drivers, key insights for each day, and questions for the in-depth interview. Cards are updated daily as new entries come in.
For researchers — to see accumulated understanding of a respondent before asking questions. No need to re-read all entries — open the card and understand the picture in 2 minutes.
For the project manager — to quickly assess progress: who has interesting patterns, who has little data, where there are discrepancies with the quota.
For interview preparation — the "What to follow up on" section linked to specific trips = a ready foundation for the in-depth interview guide.
For the client — a section in each card shows what the findings mean for the product.
1. Respondents write in Telegram groups
↓
2. Bot saves entries to Google Drive
↓
3. Claude Code syncs diaries to the local computer
↓
4. Claude Code (Opus 4.6) reads entries and generates/updates cards
↓
5. Cards are saved locally and in the repository for the website
↓
6. Git push → Render auto-deploy → site updated
↓
7. Team opens the monitoring page
Monitoring cards give an overall picture of the respondent, but for the final report and interview preparation, you need to work with data at the level of individual trips.
What was inconvenient:
What was needed:
1. Local Excel/CSV
Pros: easy to create, familiar format. Cons: no shared access, researchers can't see or comment. Need to resend the file after every update.
Decision: doesn't work. Team collaboration is more important.
2. Google Sheets with manual entry
Pros: shared access, comments, filters. Cons: manually filling ~30 columns per trip — unrealistic at 5-10 trips per day. The analytical layer requires expert filling according to a taxonomy.
Decision: doesn't work. Too labor-intensive manually.
3. Google Sheets + automatic filling via Claude Code (chosen)
Pros:
Cons: filling is not fully automatic — needs to be launched manually. But it's the same model as with monitoring: command -> result.
Created a separate Claude Code skill (diary-tracking)
The skill is separated from the analytical one — different roles, different rhythms. The analyst deeply examines entries. The data engineer structures them into a table.
Three operating modes:
Designed the table structure
One tab = one respondent. Three zones per tab:
| Zone | Columns | Contents |
|---|---|---|
| Portrait (sidebar) | A-D | Respondent card: segment, city, quota, style, researcher |
| Trip table | E-V | 18 columns: route, purpose, context, navigation, transcription, full text |
| Analytical layer | W-AD | 8 columns: insight type, formulation, strength, feature group, interview question |
Plus a respondent summary (patterns, barriers, drivers, discrepancies) — in separate columns on the right.
Tested on one respondent
Iterated the layout based on feedback
First version: portrait on top (10 rows), then empty rows, then the table. Problem: the table doesn't fit on screen, inconvenient.
Second version: portrait as a compact sidebar on the left (columns A-D, frozen). Trip table starts from the first row. Header row with filters frozen. Everything visible immediately on opening.
Notified the team
Sent instructions to the team chat: what's in the table, how it's structured, how to leave comments, the daily update schedule.
Daily cycle (~12:00 PM Moscow time):
For the final report — structured data for each trip with an analytical layer. No need to re-read 500+ messages — everything in one table with filters.
For interview preparation — respondent summary + prioritized questions linked to specific trips. Open the tab — ready for the interview.
For researchers — the ability to see analytics and influence it through comments. Not a "black box," but collaborative work.
For cross-respondent analysis — a uniform structure across all respondents allows comparing, grouping, and building typologies at the end of the project.
Difference from monitoring — monitoring gives a quick picture of "what's happening." The table gives complete data on "what exactly and why" — for deep analysis and the report.
Three operational solutions in a day. Daily analytical summaries in 5 minutes instead of 30. Instant recruiter quota checks in Telegram. Express batch candidate analysis — 5 minutes instead of 30.
Author: project manager + Claude Code (Opus 4.6)
On the fourth day of the fieldwork phase, 16 of 32 diaries are launched. Recruiters are selecting respondents according to a quota matrix with multiple dimensions — a target number of respondents in each cell.
What was inconvenient:
What was needed:
/quotas onlyPros: quick to implement, accessible to the entire team in Telegram, doesn't require Claude Code. Cons: shows only the current status. Doesn't help evaluate a candidate, create a profile, or find discrepancies. The report is static — it parses a ready-made file, doesn't analyze.
Decision: partially — implemented as a quick tool for the team.
Pros: deep analysis, scoring, profile creation, cross-checking — all in one place. Cons: accessible only to whoever has Claude Code running. Recruiters and administrators won't see the report without the project manager.
Decision: partially — implemented for analytical tasks.
Pros:
/quotas -> HTML report in 3 seconds) and deep (skill, 5 analysis modes)Cons: two tools instead of one — need to remember when to use which.
/quotasA new module for the Telegram bot:
5 existing files modified: command handler, command recognition, Google Drive config, download method, dependencies (openpyxl).
recruitment-analyst Skill (5 modes)| Mode | Trigger | What it does |
|---|---|---|
| 1. Candidate scoring | "evaluate the candidate" | 8 criteria from the scoring script, score table, recommendation by thresholds |
| 2. Quota analysis | "check quotas" | Parses profiles, matches against the matrix, finds deficits and imbalances |
| 3. Profile creation | "create profile" | Takes data from Excel/recruiter, fills in the template, updates quota control |
| 4. Cross-check | "cross-check" | Compares diary folders with profiles: who exists, who's missing |
| 5. Post generation | "write a post for recruiters" | A human-friendly HTML post for the team chat |
4 reference files: scoring system (25-point), quota matrix (target), profile template, Excel file structure.
write-case-study SkillDiscovered that the case study writing skill was in the project root, not in .claude/skills/. Moved it — the skill was picked up automatically.
The command table expanded from 5 to 10 rows + a separate section for Telegram bot commands.
/quotas), without access to Claude Code./quotas in Telegram1. Team member types /quotas in the team chat
↓
2. Bot downloads "Respondent Profiles.md" from Google Drive
↓
3. Parses the table, classifies by quota matrix
↓
4. Generates an HTML report with deficits
↓
5. Sends to the chat (3 seconds)
1. Project manager says "evaluate the candidate" / "check quotas" / ...
↓
2. Claude Code loads the recruitment-analyst skill
↓
3. Reads data from Excel / profiles / diary folders
↓
4. Applies the 25-point scoring / quota matrix / profile template
↓
5. Returns a structured result
Day 4 of the fieldwork phase. The recruiter sends a list of 8 candidates in the work chat asking to check if "all geos are suitable." Seems simple — but in reality you need to keep several things in mind at once:
What's hard to do manually:
Effort estimate: ~30 minutes of manual work — open profiles, quota matrix, target audience portrait, count, double-check, write a response.
Pros: instant, no tools needed. Cons: easy to make mistakes. This is exactly how the recruiter classified two cities as "large" — they felt big. Classification errors lead to quota skew that's only discovered at the end.
Decision: rejected. Too high a risk of error with 12 cells and 16 variables.
Pros: accurate, reliable. Cons: ~30 minutes of work. The project manager spends time on mechanical work instead of research.
Decision: this is how it was done before.
Pros:
Cons: the result needs to be verified — AI can make mistakes in nuances. But reviewing a ready analysis is faster than doing it from scratch.
Copied the recruiter's message from Telegram into Claude Code: "Look at the respondents, give a recommendation."
Claude Code automatically:
The recruiter had classified all four candidates as "large city" — but two of them fall into "medium/small" according to our grid. This is counterintuitive: both cities are million-plus populations.
Without the check: 4 candidates would have gone into one cell, 2 of them into the wrong one.
After the analysis was ready and posts sent, the head of recruiting noticed something AI hadn't accounted for: gender and age balance of the sample.
Formally, there was no requirement for equal gender and age distribution in the client agreement. So neither the quota matrix, nor the recruiter skill, nor Claude Code tracked this parameter. But the head of recruiting — an experienced recruiter — noticed the skew intuitively.
What the review showed:
The current sample had a noticeable gender skew — a significant predominance of one sex. By age: one age group was not represented at all, while another was overrepresented.
This wasn't an error — it happened naturally through screening. But if left unchecked, the skew would grow.
Claude Code quickly found ages for all 8 candidates in the screening Excel and built a forecast.
What changed in the recommendation:
One of the backup respondents was elevated for consideration, as they fell into a unique quota cell that had no representation in the sample until then.
Why this matters for the case:
AI completed the task within formalized criteria (quotas, geo, cells). But the human — the head of recruiting — saw what wasn't in the brief: demographic balance. This is a classic example of how the human brain sets the direction, and AI accelerates execution. Without the head of recruiting, we wouldn't have looked at gender and age. Without Claude Code, the check would have taken another 30 minutes instead of 5.
1. Recruiter sends a candidate list in the work chat
|
2. Project manager copies the message into Claude Code
|
3. Claude Code reads quota matrix + profiles + target audience portrait
|
4. Analysis: city classification, slot counting, conflict detection
|
5. Project manager reviews and adjusts recommendations
|
6. Team adds: gender, age, other non-formalized criteria
|
7. Ready posts are sent to the work chat
━━━ Calculation D4-EXPRESS ━━━━━━━━━━━━━━━━━━━━━━━━━━━
Before: ~35 min x 3 batches = 105 min ~ 1.75 hrs (researcher)
After: ~5 min x 3 batches = 15 min = 0.25 hrs (manager + Claude Code)
Frequency: 3 times per project
Savings: 1.75 - 0.25 = 1.5 hrs
Bonus: caught a city classification error
that would have been missed manually.
→ HRS saved: 1.5 hrs
→ PPL involved: 1 (researcher — freed)
→ Confidence: high
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
14 respondents have been keeping diaries for 1-2 days. Monitoring cards are updated, the tracking table is filled in. But the team lacks the big picture: what is happening in the study overall, what patterns are forming, and what it means for the product.
What was inconvenient:
What was needed:
Launched Mode 9 and went through 4 rounds of edits with the lead
| Iteration | Before | After | Principle |
|---|---|---|---|
| 1 | "days 1-3" | "days 1-2 of diaries" | Precision: project day ≠ diary day |
| 2 | "Trends" section | "Initial observations and hypotheses" | On days 1-2 there is not enough data for trends |
| 3 | "Quota mismatch is systemic" | "The concept of navigation is vague" | Framing: insight, not a problem |
| 4 | "Notes for researchers" section | Link to monitoring | Don't duplicate monitoring cards |
Established 3 new principles in the prompt:
Updated the prompt for the daily brief mode: format template, "Early days" section, principles #6-8.
For the team — see the overall picture of the study every day in 3 minutes of reading.
For the lead — 10 minutes to review the brief instead of 60 minutes writing it. Calibration rules are locked in — subsequent briefs are more accurate right away.
For the project — a unified communication rhythm: daily brief + instant access to resources.
So many messages were pinned in the working chat that finding the right link (table, instructions, schedule) became a quest. Site passwords, links to Google Drive documents, roles — all scattered across the chat history.
What was needed:
5 new bot commands — simple, no API calls, serving static HTML.
Security constraint: commands only work in the working chat. In respondent groups the bot silently ignores them — passwords and internal links won't leak.
For the whole team — get any project link in 3 seconds right in Telegram instead of searching through pinned messages.
For security — monitoring passwords and internal documents are accessible only in the working chat, not in respondent groups.
Total bot commands: 6 for the team + 3 service commands.
The recruiter reported a respondent quota mismatch. Turned out some quotas were based on preliminary data. Audit of 26 respondents, cascading update. Plus extracting a feature request map from 26 diaries — 15 clusters of product requests in 30 minutes instead of 10 hours.
On the fifth day of the field stage, the recruiter reported that one respondent's quota didn't match what we had in our system.
We started investigating and discovered a systemic problem: we had been pulling quotas from one column, but that column was filled before the voice screening. After the call, data was updated on a different sheet, while the column remained outdated.
Hypothesis: some of our quotas were based on preliminary rather than final data.
1. Trust our scoring algorithm
Pros: the algorithm is objective, operates on formulas from survey data. Cons: the algorithm analyzes survey text, not a live conversation. Navigation behavior is not a binary characteristic: a person may sincerely answer differently depending on phrasing.
Decision: rejected — the algorithm is useful for initial screening but cannot be the source of truth.
2. Trust the recruiter's column
Pros: data is in a structured format, easy to parse. Cons: filled before the voice screening. The head recruiter explicitly said: "I filled this column before the screening, it's unreliable."
Decision: rejected.
3. The "Quotas" sheet as the single source of truth (chosen)
Pros: filled after the final voice screening, reflects the recruiter's final decision. Structured by quota cells — immediately visible who is in which group. Cons: requires opening the Excel file and cross-checking manually each time.
Decision: accepted. We formalized the rule and built it into the pipeline.
Audited all 26 respondents. Cross-checked every launched respondent's quota against the quotas sheet.
Cascading update. Corrected quotas everywhere they were stored.
Formalized the rule. Added a "Source of Truth" section to the project configuration and skill:
For data reliability — a respondent's quota determines which analytical group they fall into. Wrong quota → wrong interpretation of their diary behavior.
For the team — now everyone working with quotas has a definitive answer: "Where do I look up the correct quota?" No need to guess which of three sources is current.
For future projects — lesson learned: in recruiting, data passes through several stages (survey → preliminary analysis → voice screening → final decision). The assessment can change at each stage. You need to know exactly at which stage the "truth" is established.
By the third day of the field stage (26 respondents, ~100+ trip entries), implicit feature requests were accumulating in the data: people described inconveniences, workarounds, app-switching — but these signals were scattered across entries.
1. Manual analysis by researchers
Pros: deep understanding of each respondent's context. Cons: with 26 respondents and ~100 entries — days of work. Hard to maintain a cross-respondent picture.
Decision: rejected. Too labor-intensive for a mid-field snapshot on day 3.
2. GPT analysis on the server (automatic)
Pros: no human involvement, runs on schedule. Cons: GPT-4.1-mini can't handle nuanced classification (feature request vs problem vs barrier). Doesn't see screening context. Can't distinguish "unaware of value" from a real request.
Decision: rejected. Analysis quality is critical — this goes to the client.
3. Claude Code + analysis page on the site (chosen)
Pros: maximum analysis quality (Claude Opus), linked to profiles and screening, flexible format (markdown → HTML with tabs), scalability (new .md = new tab).
Cons: triggered manually. But analysis is not a daily task — it's on demand.
Extracted feature requests from all diaries. Analyzed diaries of 26 respondents in parallel (5 agents, ~370 files). For each request: who, what they want, why, what happens without it, quote.
Clustered into 15 themes.
Created an analysis page on the site. Password-protected entry, tab navigation, responsive design. Color scheme — teal (#0d9488). Auto-detection: new .md in the analysis directory → new tab.
Set up the workflow. The project lead requests an analysis → Claude Code analyzes → shows the result → if approved → .md, commit, push, auto-deploy.
For the client — a structured request map with context and quotes. You can see what's widespread, what's isolated, and what already exists but isn't being found.
For researchers — a cross-respondent view by topic that doesn't exist in individual monitoring cards.
For final interviews — a list of features to test for "unaware of value" (Guide Block 4).
For the lead — the ability to quickly request analysis on any topic and immediately see the result on the site.
.md in the analysis directory → git push → Render auto-deploy → new tab on the analysis pageSelecting respondents for in-depth interviews: three researchers spent 7 hours each manually evaluating every diary. Diary status check: the project lead spent 30 minutes per respondent. A 100-point selection skill cut the work to 1.25 hours. Status — 20 minutes instead of 16 hours.
Author: project lead + Claude Code (Opus 4.6)
On the sixth day of a diary study field stage, 32 respondents are keeping trip diaries across 32 separate Telegram groups. The diary lasts 7 calendar days, but each respondent started on a different date. In 1-2 days the first ones will finish — and we need to plan the selection for in-depth interviews.
What was inconvenient:
What was needed:
Pros: familiar format, the team can edit. Cons: requires manually counting days and trips for each respondent. With 32 people that's 2+ hours of work, and it needs to be redone every day. Human error — easy to miscount.
Decision: rejected. Doesn't scale with 32 respondents and daily updates.
Pros:
Cons: depends on diary sync from Google Drive (need to sync before running). The trip detector is approximate (~80% accuracy) — counts based on text patterns, not manual markup.
The key task — automatically determine how many trips a respondent described in each diary message. The challenge: respondents write in free form — some number their entries, some describe in text, some send voice messages.
We implemented three detection methods applied sequentially.
Accuracy: ~80%. For the task of "overall progress picture" this is sufficient.
Key insight from the project lead: "Thanks to the researchers actively asking follow-up questions, even from a single trip we were able to get a great deal of important context. This is valuable for the research objectives and the business goals."
Out of 32 respondents, 24 need to be selected for the final in-depth interview. The script automatically maps them to the quota grid and identifies three levels: who cannot be dropped (only one in a segment), who should preferably stay, and where there is room to choose. The lead makes the decision seeing the full picture.
A compact ASCII table in the terminal: all 32 respondents on one screen, grouped by type (pedestrians / drivers). Columns: respondent code, city, day N/7, trip count, end date, days remaining, trips by day.
A critically important UX decision: !! next to "days remaining" for those finishing within 2 days or fewer. This is a visual urgency marker.
1. Sync diaries from Google Drive → local folder
↓
2. Script scans all respondent subfolders
↓
3. Trip detector: numbering pattern → markers → keywords
↓
4. Determining Day 1: first day with ≥1 trip
↓
5. Calculation: diary day, end date, trips by day
↓
6. Quota grid: mapping 32 respondents to segments
↓
7. Output: ASCII table + interview selection section
Author: project lead + Claude Code (Opus 4.6)
Diary study on navigation: 32 respondents keep diaries for 7 days, 24 of them are selected for final 70-minute interviews. The interview format is a detailed walkthrough of specific trips from the diary. This means we need to select not "good respondents" but those whose diaries will yield the most material for a productive conversation.
What was inconvenient:
What was needed:
Pros: fast, the lead already knows the material. Cons: not transparent for the client.
Decision: rejected. The client expects a reasoned justification.
Pros: objective metric, easy to calculate. Cons: quantity != quality. 20 entries of "everything was fine, got there" are less useful than 5 entries with reflection and screenshots. Doesn't account for quotas — you could lose an entire segment. Penalizes "infrequent" users who are valuable precisely because of their non-use of navigation.
Decision: rejected. Contradicts the principle "value for the interview > formal activity."
Pros:
Cons: labor-intensive — full scoring of one respondent takes 5-10 minutes. Requires an up-to-date monitoring card as input.
100-point system, each dimension evaluates a specific aspect of value for the interview:
| Dimension | Max | What it evaluates |
|---|---|---|
| Activity | 25 | How many days and trips, consistency of diary-keeping |
| Trip diversity | 20 | Types of transport, purposes, contexts, route familiarity |
| Comment quality | 20 | Reflection, "why", emotions, screenshots, voice messages |
| Research value | 25 | Insights, barriers, discrepancies, feature requests |
| Interview potential | 10 | Material for a 70-minute walkthrough: contradictions, hypotheses |
Key decision: research value weighs as much as activity (25 points). Three deep entries with insights are worth more than twenty formal ones.
Two respondents were scored first — as scale anchors: upper (high activity, many insights) and lower (minimal data). Each subsequent scoring is calibrated against these two.
Rule: score a respondent only when they are on day 5 of the diary or later. 5 days is enough for a reliable assessment, with 2 days remaining for interview planning.
Described the 5 dimensions, the scale, the decision-making logic — in plain language, without internal details. Sent to the team's working chat so everyone understands how the selection works.
1. Monitoring card updated (fresh diary data)
|
2. Check diary day: >=5? If not — wait
|
3. Launch scoring: "evaluate the respondent for interview"
|
4. Claude Code reads the card + diary + quota grid
|
5. Evaluates across 5 dimensions, checks for red flags
|
6. Determines quota position (mandatory / competitive)
|
7. Produces a scoring card with a score and decision
|
8. Result is saved to the results file
|
9. When the entire cell is scored — final cut-off by score
The final stretch. Interview guide v2.0 — complete methodology rework in 2 hours instead of 19.5. Bot crash: Google OAuth token expired, diagnosed and fixed in 20 minutes instead of 4 hours. Analysis page with feature requests and hypotheses for the team.
Author: project lead + Claude Code (Opus 4.6)
On the 6th day of the diary study field stage (32 respondents, 32 Telegram groups), the bot stopped saving entries to Google Drive. Respondents kept writing, but files weren't appearing.
What happened:
RefreshError: invalid_grant: Token has been expired or revokedWhy this was critical:
Root cause: The OAuth consent screen in Google Cloud Console was in "Testing" mode. In this mode, Google automatically revokes the refresh token after 7 days. The project launched a week ago — the token expired exactly after 7 days.
Pros: service accounts don't require a refresh token, they don't expire.
Cons: service accounts have no storage quota on Google Drive and cannot upload files to a user's personal drive. You get a storageQuotaExceeded error. Only works with Shared Drives (Team Drives).
Decision: rejected after an attempt — got a 403 from the Google Drive API.
Pros:
Cons: requires access to Google Cloud Console and the hosting panel.
Pros: can use a service account, tokens don't expire. Cons: need to migrate ~2,000 files, change all scripts (rclone, sync, bot), risk of data loss during migration.
Decision: deferred. Disproportionate amount of work for an urgent situation.
Chain of checks:
getWebhookInfo → 500 Internal Server Error, 224 pending updates → bot is crashingRefreshError: invalid_grant → OAuth token expiredWrapped OAuth authorization in try/except. On OAuth failure — the bot logs a warning and tries the service account. This is insurance for the future: if OAuth fails again, the bot won't crash with a 500 but will attempt an alternative path.
try:
if not creds.valid:
creds.refresh(Request())
except Exception:
logging.warning("OAuth refresh failed, falling back to service account")
return None
After deploying the code with fallback — got a new error: storageQuotaExceeded. Service accounts cannot create files on a personal Google Drive. Option 1 doesn't work — OAuth is needed.
Wrote a token regeneration script:
Updated the environment variable on the hosting platform → Manual Deploy.
In Google Cloud Console → OAuth consent screen → Audience → clicked "Publish App". Status changed from "Testing" to "In production". The refresh token is now permanent.
| Metric | Value |
|---|---|
| Time from detection to fix | ~20 minutes |
| Messages in queue | 224 |
| Messages lost | 0 |
| Queue processing speed | ~30 messages/minute |
| Queue cleared in | ~7 minutes |
1. Detection: sync showed 0 new files
|
2. Drive check: no files → problem on the bot side
|
3. Webhook check: getWebhookInfo → 500, N pending
|
4. Server logs: error identified
|
5. Fix: token regeneration + deploy
|
6. Telegram resends the queue → files on Drive
GOOGLE_OAUTH_CLIENT_ID, GOOGLE_OAUTH_CLIENT_SECRET, GOOGLE_OAUTH_REFRESH_TOKEN — all three are needed on the server.getWebhookInfo — if pending_update_count > 0 and last_error_message contains 500 — the bot is down.Author: project lead + Claude Code (Opus 4.6)
Deadline: approval of the final interview guide with the client. First interviews are the next day. Available: a draft guide for 70 minutes, 7 blocks, written at the start of the project. But over the week of fieldwork, much had changed:
What was needed: over the weekend, turn the draft into a working tool that guarantees data collection for ALL research tasks, and test it on real material.
Pros: fast, the structure already exists. Cons: critical gaps will remain. Without features from the client, Block 4 (navigation value) is non-functional. Without prioritization, Task 4 is not covered. Questions are not linked to diary data. Decision: rejected.
Pros: can incorporate everything we know from diaries. Cons: risk of "researcher perfectionism" — endless refinement. The draft structure works — no reason to break it. No time. Decision: rejected.
Pros:
Cons: labor-intensive — need to extract data from 30 tabs, conduct the review, refine, and test. But the result is a verified tool.
Used a review skill — evaluation across 8 blocks (introduction, screening, link to tasks, phrasing, structure, tools, timing, pilot). Built a coverage matrix: research tasks x guide questions.
Overall assessment: "Needs revision."
Critical gaps:
Strengths (preserved):
This was the key challenge. The client hadn't provided a feature list, and it was unknown whether they would. The first version of Block 4 was dead without it.
Solution — flip the logic: instead of "top-down" (features → respondent), go "bottom-up" (respondent → their mental model):
This way we capture the respondent's mental model without prompting — and then compare it with actual product capabilities during analysis.
Prioritizing directions is important but consumes 15-20 minutes of interview time. With a 70-minute limit, this is critical.
Solution: the prioritization exercise is completed on the last diary day, NOT during the interview:
Cards were created from real diary data, separately for drivers and pedestrians. Phrasing uses respondents' language, not research jargon.
To formulate hypotheses and cards, we needed to consolidate data from all 30 respondents. We wrote a script that extracts the analytical layer from the Google Sheets tracking table — patterns, barriers, drivers, feature requests, interview questions.
Result: a structured JSON with data across 8 sections x 30 respondents. From this data:
Targeted edits based on review results:
| Block | What changed |
|---|---|
| Block 1 (warm-up) | Added recording request, time check |
| Block 3 | MUST/IF TIME markers, removed leading phrasing, added 2 patterns from data |
| Block 4 (value) | Complete redesign: bottom-up, 4 steps, linked to exercise |
| Block 5 (hypotheses) | 6 hypotheses from diaries instead of empty TODO. Three MUST, three IF TIME |
| Block 7 (closing) | Added a key question |
| Timing | Priorities for each block: what is mandatory, what can be skipped |
The most unconventional decision of the session. Instead of waiting for the first interview to test the guide, we conducted a "virtual interview" — walked through the guide using data from one real diary.
For each block: what question we ask → what we can predict from the diary → what's missing and what only a live interview will provide.
What the test showed:
What the guide doesn't cover: testing revealed unique behavioral patterns that the guide doesn't cover — they are specific to the particular respondent and go into the researcher's individual preparation before the interview.
Main takeaway: the diary provides "what," the interview will provide "why" and "what if." The guide is designed precisely for this transition.
Author: project lead + Claude Code (Opus 4.6)
By day 7 of the project, a large volume of analytical data had accumulated from diaries of 32 respondents (~400+ trips). Two key artifacts — the feature request map and hypotheses for final interviews — existed but in different places:
What was needed: a single access point for feature requests and hypotheses, current across all 32 respondents, with an evidence base.
1. Update feature requests in the existing file, leave hypotheses in the guide
Pros: fast, minimal changes. Cons: the team won't see the hypotheses without reading the 10-page guide. Quick access is needed before interviews. Decision: rejected.
2. Create two separate documents on the site (chosen)
Pros:
.md files — just add a second fileCons: labor-intensive — need to review data from all 32 respondents and 87 work sessions.
Step 1. Updated the feature request map.
Source data: diaries of 32 respondents over 7 days, monitoring cards, analytical layer of the tracking table, cross-respondent patterns.
Result:
Step 2. Created the hypotheses file.
Sources: interview guide v2.0, tracking table, monitoring cards, quantitative layer.
10 hypotheses with an evidence base from the full diary stage. Each hypothesis: summary, evidence table, explanation of "why it matters," interview question formulation.
Step 3. Deploy and notification.
Analysis page on the project site — two tabs:
A single link for the team to the analysis page on the project site.
Diaries ended — interviews begin. Researchers can't read hundreds of files per respondent in one evening. Narrative analytics system: respondent portrait, dialogues, interview zones — three files instead of hours of reading. Pipeline from diary to business article, reusable across projects.
Author: project manager + Claude Code (Opus 4.6)
Diaries are ending — interviews begin. Researchers need to prepare: immerse themselves in each respondent's data, understand their behavior, know what to clarify during the interview.
Input: 7 days of diary entries — dozens of text messages, voice notes, photos, screenshots, dialogues with researchers. No researcher can read it all in one evening before the interview.
The monitoring card (already on the site) is a good tool for daily tracking, but it's a structured summary: patterns, barriers, drivers, chronology. Interview preparation requires something different — understanding the person: how they think, why they behave this way, what's behind the numbers. Specific quotes, specific situations, specific questions.
After the interview, something is needed for the client. Not just "the respondent uses navigation in 33% of trips," but an answer to the question: how does this person's behavior relate to business goals? What works, what doesn't, and why.
What's needed: a system for creating analytical articles about respondents that works at all stages of research and is reusable across projects.
1. Extend the monitoring card
Add a narrative block, quotes, and interview questions to the existing format. Pros: one file, one place. Cons: the card would become enormous, two formats would mix (structured summary and narrative), the card is updated daily — the narrative would interfere. Decision: rejected.
2. Create separate analytical articles per respondent (chosen)
Three separate files in each respondent's personal folder: narrative analytics, dialogues, interview zones. After the interview — enrichment with transcript + business article. Pros: each artifact is a separate file with a clear purpose, doesn't pollute monitoring, scales well. Cons: more files. Decision: chosen.
3. Prepare for each interview manually
The researcher reads the diary themselves, extracts quotes, formulates questions. Pros: deep immersion. Cons: 24 respondents x hours of reading = unrealistic under tight deadlines. Starting from scratch every time. Decision: rejected.
Not a day-by-day chronology (respondents didn't do anything unusual — they lived ordinary lives). Not a dry summary (that already exists in monitoring). But analytics through narrative: the skeleton is mechanisms, causes, behavioral structure; the tone is specific situations, quotes, context.
Focus:
Length — exactly enough to reveal behavior and its causes. Not 3 paragraphs and not 30 — as much as needed.
32 personal folders — one per respondent. Each contains:
Sources: 177 diary files (7 days), monitoring card, analytical layer from tracking spreadsheet.
Result — three files:
Defined three stages applicable to any research project:
| Stage | When | Output | Input |
|---|---|---|---|
| Narrative | Before or after interview | Narrative analytics | Diary and/or transcript |
| Preparation | Before interview | Dialogues + dig zones | Any pre-data |
| Business | After narrative | Business article | Narrative + business goals |
Key decision: the skill adapts to available data, not to the research method. One command covers:
The researcher doesn't choose "diary mode" or "interview mode." They say "write narrative" — and the skill determines what data is available.
For researchers:
For the project:
For future projects: