HRS 1,212
PPL 6
Reddite Caesari quae sunt Caesaris, et Deo quae sunt Dei

Give AI the routine, return the human — to the human

A
B
Start

32 people keep a detailed diary in Telegram for 7 days.

Three weeks for the entire project, including approval, planning, launch, analytics and reporting. Six members on the research team.

Day 1

Day 1: Full stack in one day

Diary study: 32 respondents recording every trip — voice notes, photos, geolocation — in Telegram for 7 days. Without automation, two researchers would spend an hour per diary every day. In one day, the entire infrastructure was built: Telegram bot, Whisper transcription, auto-upload to Google Drive.

223h saved
2 people
Full stack in one day
foundational

Full Stack in One Day: From Idea to a Working Bot

Context

Day 1 of the "Diary Navigation" project. A research agency is preparing to launch a diary study: 32 respondents will document every trip around the city in Telegram groups over the course of 7 days. The methodology is complex, the team is distributed, and there will be a lot of data.

The project manager is not a developer. Coding experience: zero. Tool: Cursor (an IDE with an AI assistant). The goal: build a working infrastructure in a single day that will operate throughout the entire week of fieldwork.


What We Started With

  • A voice message describing the task: "I want a little bot in Telegram that answers questions based on the methodology knowledge base"
  • A Google Drive folder with the methodology
  • Not a single line of code
  • Not a single configured service

What We Had After 8 Hours

  • A Telegram bot answering methodology questions (/kb)
  • A bot collecting diary entries from chats (text, photos, audio, video circles)
  • Automatic transcription of voice messages
  • Saving everything to Google Drive in structured folders
  • Syncing diaries with the local computer
  • Auto-updating the knowledge base on a schedule
  • Deployment on Render with auto-deploy from GitHub
  • A system of quick commands to manage all of this

How This Was Possible

The entire dialogue is ~5,800 lines of conversation in Cursor. The project manager described tasks via voice messages and text, the AI assistant wrote code, explained every step, and debugged errors from screenshots and logs. Every decision was made collaboratively: AI proposed options, the human chose.

Key principles that emerged along the way:

  • "Everything you can create yourself — create yourself" — the human provides only secrets (tokens, keys), AI does the rest
  • Step-by-step guidance — every screenshot from the user leads to an explanation and the next step
  • Commit + push first, then deploy — a rule born after confusion about the order of operations

Why This Case Matters

This is not a story about a programmer building a bot. This is a story about a researcher who had never written code building a working infrastructure for fieldwork in a single day.

Built a Methodology Knowledge Base Bot

Problem

The diary study methodology spans dozens of pages: how to describe problems and drivers, what to say at the kickoff meeting, how to document findings, entry formats, checklists, examples and anti-examples. The team is distributed: researchers, recruiters, the client.

What was inconvenient:

  • To recall the problem description format, you'd have to open Google Drive, find the right file, find the right section
  • A researcher in the field won't do that — they'll ask in a chat and get an approximate answer from a colleague
  • There was no single place to quickly get an exact excerpt from the methodology
  • New team members had to read everything from scratch

What was needed:

  • A Telegram bot you can ask a question in natural language
  • An answer from the knowledge base, not an AI fabrication
  • A link to the specific section/document

What We Did

Architecture:

  • Telegram Bot (BotFather -> token)
  • FastAPI backend in Python
  • OpenAI Responses API + vector store with methodology
  • Command /kb <question> -> answer from the knowledge base

Model choice: gpt-4.1-mini. For a FAQ bot with retrieval — sufficient. Can be switched with a single line later.


Why This Is Needed

For researchers — ask "how do we describe a barrier?" and get an exact excerpt from the methodology in 3 seconds, without opening Google Drive.

For the project manager — no more answering the same questions. The bot answers more precisely because it quotes the document rather than paraphrasing from memory.

Deployed the Bot: From "What Is GitHub" to a Working Service

Problem

The bot's code was written, but it was sitting on a local computer. The project manager had never deployed an application, never worked with GitHub.

What We Did

A step-by-step process: GitHub -> Git config -> Push -> Render -> Webhook -> Debug. Two bugs were resolved in minutes (outdated SDK version, unpaid API).

Solution: Render Free. Wake-up delays are not critical for a research bot.

Set Up File Synchronization Between Google Drive and Local Disk

Google Drive for Desktop for methodology (two-way sync) + rclone for diaries (Drive -> local disk, copy only).

Taught the Bot to Collect Diaries from Telegram

Automatic collection of all content from 32 chats: text, photos, audio, video -> Google Drive in structured folders. Transcription via Whisper.

Built a Command System and Documented the Pipeline

Two key commands: "I updated the methodology" and "sync the diaries." A pipeline cheat sheet for the team.

Diary collection from Telegram
Before 224h
After 1.2h

Taught the Bot to Collect Diaries from Telegram

Problem

32 respondents will keep diaries in Telegram groups: text, voice messages, photos, video circles. Each in a separate chat with a researcher. That's ~150-200 entries per day, scattered across 32 groups.

What was inconvenient:

  • Information lives only in Telegram — if not saved, it gets lost in the message stream
  • Manual saving is impossible: 32 groups x 7 days = hundreds of files
  • Voice messages need to be re-listened to — no text transcription
  • No structure: where is the entry from a specific respondent for a specific day?

What was needed:

  • Automatic collection of all content from chats
  • Saving to Google Drive in structured folders by respondent and date
  • Automatic transcription of audio and video circles
  • Structured file names

Options We Considered

1. Bot polls chat history 3 times a day

Pros: you can set a schedule, nothing gets missed. Cons: The Telegram API does not allow a bot to read chat history. A bot only receives what arrives in real time via webhook.

Decision: rejected. This is a Telegram API limitation — the bot only sees new messages.

2. Real-time collection via webhook (chosen)

Pros: nothing gets lost, every message is processed immediately. Cons: if the server goes down — messages are lost (Telegram retries the webhook several times, but not indefinitely).


The Google Drive Problem: "My Drive" vs "Computers"

An unexpected difficulty: the Google service account only sees "My Drive," while Google Drive for Desktop syncs files to the "Computers" section. These are different locations.

The project manager was surprised: "The bot we have running — it will save to Google Drive, to my drive, right?" — correct, but not to "Computers," to "My Drive."

Solution: create a folder for diaries in "My Drive," share it with the service account. But the service account couldn't write to a personal Drive (no Shared Drive on a free account).

Final solution: OAuth access instead of a service account. Obtained a refresh_token via an authorization script, added it to Render.


What We Did

1. Disabled Privacy Mode In BotFather: /setprivacy -> Disable. Without this, the bot only sees commands (/kb) and ignores regular messages.

2. Created the folder structure 33 folders: one per respondent + one test folder. In Latin characters — Cyrillic caused errors. Subfolders with dates are created automatically on the first message.

3. Set up collection for all formats

Format What the bot does What gets saved
Text Saves as .md file with the entry text
Photo Downloads the file photo in .jpg format
Voice message Downloads + transcribes audio + text transcript
Video circle Downloads + transcribes audio track video + text transcript
Video Downloads the file video in .mp4 format

Transcription — OpenAI Whisper (whisper-1).

4. Format evolved during the process

  • First version: JSON. Project manager: "is JSON the best format for our needs?" -> no, researchers need a human-readable format
  • Second version: .txt. Project manager: "I don't like txt, let's use markdown"
  • Final version: .md for text and transcripts, media as-is

5. File naming First version: message_1.md, message_2.md. The project manager asked: "respondent name first, then date, number, and time." Result: structured names with respondent identifier, date, and time.


Testing

Tested on a test group where the project manager was the "respondent":

  1. Sent text -> a folder with the date appeared, but files were empty -> logs -> Drive write error -> OAuth not configured -> configured -> worked
  2. Sent audio -> file + transcript saved
  3. Tested a video circle -> saved

How It Works

1. Respondent writes in a Telegram group (text/photo/audio/video circle)
        ↓
2. Telegram sends a webhook to Render
        ↓
3. Bot identifies the respondent by chat_id
        ↓
4. Creates the day folder if it doesn't exist
        ↓
5. Downloads the media file from Telegram servers
        ↓
6. If audio/video → sends to Whisper → saves transcript
        ↓
7. Uploads everything to Google Drive in the respondent's folder
        ↓
8. On the "sync diaries" command → rclone copies to the local disk

Why This Is Needed

For researchers — all of a respondent's entries in one folder, with transcriptions. No need to re-listen to voice messages — the text is already there.

For the project manager — a structured archive for analysis. Every trip is in a file with a clear name.

For analysis — based on this data, monitoring cards, trip tracking tables, and analytical summaries will later be built.

Day 2

Day 2: Scoring and debugging

The recruiter had 208 candidate applications — manual review took 5 minutes each, and the top 60 needed deep scoring at 20 minutes each. Meanwhile, every Telegram bot bug required an hour to explain context to the developer. Automated scoring processed all 208 in 15 minutes. Bug context transfer dropped from an hour to 4 minutes.

40.7h saved
2 people
Context transfer and debugging
Before 4h
After 16 min

Passing Context Between AI Sessions

Problem

By the end of the first day, the AI assistant's context window in Cursor was full: ~5,800 lines of dialogue. The assistant started "forgetting" earlier agreements. A new session was needed — but how to pass everything that had been done?

What We Did

A structured summary of 7 blocks: knowledge base, bot, diaries, Render ENV, OAuth, rclone, rules. The project manager copied the summary into the beginning of a new dialogue — and the new agent immediately understood the context.

This "shift handoff" pattern later became a permanent system: a MEMORY.md file (loaded automatically at the start of every Claude Code session) and CLAUDE.md (project instructions).

Fixed Diary Collection: 4 Bugs, 4 Commits

4 sequential commits, each for a specific error from Render logs:

  1. JSON instead of MD on Drive
  2. Photos/audio being ignored
  3. UnboundLocalError: media_paths
  4. Whisper: Invalid file format

The entire "bug -> fix -> deploy -> verify" cycle took 3-5 minutes per bug.

Delays on Free Render: A Deliberate Choice "Not to Fix"

The decision "not to fix right now" saved time for more important tasks. Three optimization options were ready, but instead of overcomplicating the architecture prematurely, we chose monitoring (YAGNI).

Transition from Cursor to Claude Code: New Horizons for the Conductor

In Cursor, the project manager was an operator: AI said "click here," she clicked. In Claude Code, the project manager became a conductor: she says "update the monitoring" — and everything happens.

Aspect Cursor Claude Code
Commit and push AI writes code, asks to commit AI does commit + push itself
Running scripts "Paste this command into PowerShell" AI runs the script directly
Context between sessions Manual summary copying MEMORY.md loads automatically
Candidate scoring system
Before 37.3h
After 15 min

Author: project manager + Claude Code (Opus 4.6)


Problem

A diary study with 32 respondents. Recruiters send a stream of candidates from a screening questionnaire — dozens of people, each needing to be evaluated across several parameters: geography, behavioral profile, frequency of contact with the research subject, diversity of situations, tools used, and engagement.

What was inconvenient:

  • "Accept / reject" decisions were made by gut feeling — the recruiter looked at the questionnaire and gave an intuitive recommendation
  • When there are many candidates, criteria start to drift: you evaluate the first ones strictly, by the twentieth you're tired and miss nuances
  • No documented justification: if the client asks "why was this one accepted and that one rejected?" — the answer only exists in the recruiter's head
  • Classification errors: the recruiter listed Perm and Rostov as "large cities" — they feel big intuitively, but on the project grid they're "medium." Without formalization, such errors go undetected
  • Evaluation results were not stored in a structured way — impossible to go back and compare candidates

What was needed:

  • A formalized rubric: what exactly we evaluate, on what scale, with what weight
  • Automatic score calculation from questionnaire data
  • A text justification for each candidate — breakdown by criteria
  • Results in the interview schedule spreadsheet: the recruiter and project manager see the score, recommendation, and comment right in the working document

Options We Considered

1. Manual evaluation by the recruiter

Pros: fast, no tools needed. An experienced recruiter has a feel for the "right" candidate. Cons: not reproducible (two recruiters will evaluate the same person differently), no justification for the client, classification errors go undetected, fatigue with high volume.

Decision: rejected as the sole method. But the recruiter's live expertise is retained in subsequent stages (phone screening).

2. Scoring rubric + automatic scoring script (chosen)

Pros:

  • Every candidate is evaluated against the same criteria with the same thresholds
  • Numerical score + text justification — can be shown to the client
  • Screeners (heavy users, professional use, conflict of interest) are checked automatically — none slip through
  • Classification errors are caught: the script knows that Perm = "medium" and won't let it pass
  • Results are written to Excel — the recruiter sees them in the familiar working document

Cons: requires time to develop the rubric and script. The script doesn't see what a live recruiter sees (tone, motivation, nuances). Therefore, this is a supplement to the recruiter, not a replacement.


What We Did

1. Developed a 25-Point Rubric with 8 Criteria

Each criterion is tied to the research objectives, not to an abstract "candidate quality": geography, behavioral profile, contact frequency, diversity of situations, tools used, usage frequency, variability of conditions, engagement. Each criterion's weight reflects its significance for the project goals.

Key decision: situational users (who use the tool not always, but depending on context) receive the maximum behavioral profile score. Not "always" (predictable, no barriers) and not "never" (too little material), but specifically the one who sometimes uses it and sometimes doesn't. They have the most interesting switches and triggers.

2. Wrote an Automatic Scoring Script

A Python script (+ openpyxl):

  • Reads questionnaires from Excel
  • Applies all 8 criteria
  • Checks hard screeners: professional panelists, professional use of the research subject, client employees, unwillingness to participate in the format
  • Writes three columns to Excel: comment (breakdown by criteria), recommendation, score

Thresholds: >=18 = recommended, 15-17 = conditional (clarify during the call), <15 = not recommended.

3. Transferred Results to the Interview Schedule

Via update scripts, scoring results were written to the recruiter's working spreadsheet — the interview schedule file. Five columns: GEO segment, quota, comment with justification, recommendation, numerical score. The recruiter opens the familiar document and sees for each candidate: why they received that score, what to pay attention to during the call.

4. Connected Automation with Live Expertise

Scoring is the first filter. Then the recruiter calls candidates from the "recommended" and "conditional" zones and checks what the script can't catch:

  • Does the questionnaire match what the person says on the phone
  • Motivation: is the person interested in the experience or only in the reward
  • Articulateness: can they describe their experience in their own words

If the scoring flagged something as "clarify during the call" — the recruiter specifically addresses that point.


Why This Is Needed

  1. For the recruiter — no more guessing "accept or reject," but instead seeing a numerical benchmark and specific flags for the phone call. Focusing time on candidates in the "conditional" zone rather than reviewing everyone.
  2. For the project manager — explaining to the client why the sample looks the way it does. Showing the rubric, scores, justifications — instead of "we think these ones fit."
  3. For the project — catching classification errors (cities, behavioral patterns) before a candidate enters the sample. One such error = a broken quota.
  4. For the team — a common language: "recommended 21/25" is clearer than "seems fine."

How It Works

1. Candidate fills out the screening questionnaire
        ↓
2. Data goes into Excel (Preliminary Selection)
        ↓
3. Script runs 8 criteria + screeners
        ↓
4. Excel receives: score, recommendation, comment by criteria
        ↓
5. Results are transferred to the interview schedule
        ↓
6. Recruiter sees score and flags → calls the candidate with specific questions
        ↓
7. Final decision: scoring + recruiter's impression from the call
Day 3

Day 3: Monitoring and tracking

The heaviest savings day. Diary monitoring: two researchers spent an hour per day on each of 32 respondents — checking entries, quality assessment, notes. Trip tracking: 45 minutes to manually enter each trip into a spreadsheet. Two solutions automated 789 hours of routine work.

789h saved
3 people
Diary monitoring cards
Before 228h
After 2.1h

Author: project manager + Claude Code (Opus 4.6)


Problem

On the second day of the fieldwork phase, we have 6 active respondents (+ 1 test), each documenting 1-5 trips per day. Over the week, that's ~150-200 entries in Telegram groups.

What was inconvenient:

  • Entries are scattered across 32 separate Telegram groups
  • To understand what's happening with a specific respondent, you need to re-read all their messages across all days
  • There is no single place where a researcher can see the full picture of a respondent: what has already been understood, what's new today, what should be followed up on
  • The project manager cannot quickly assess progress across all respondents
  • Preparing for an in-depth interview requires a manual summary for each person

What was needed:

  • A compact card for each respondent, updated daily
  • Visible patterns, barriers, drivers, discrepancies with the quota
  • Tied to the business objectives of the research
  • Interview questions — specific, linked to actual trips
  • Access for the entire team, but closed to outsiders

Options We Considered

1. Automatic generation on the server (GPT-4.1-mini on Render)

Pros: fully automatic, no human involvement. Cons: GPT-4.1-mini is too weak a model for quality research analysis. Superficial conclusions, no depth, won't catch subtle patterns. At minimum GPT-4 or Claude Opus is needed — and that requires an API key and additional costs.

Decision: deferred. Not the right quality level for research analysis.

2. Claude API on the server (Anthropic API on Render)

Pros: high-quality analysis (Claude Opus), automatic on a schedule. Cons: requires an Anthropic API key (separate payment), prompt configuration, additional cost per request.

Decision: deferred. No API key, and at this stage it's overkill.

3. Claude Code on the researcher's computer (chosen)

Pros:

  • Maximum analysis quality (Claude Opus 4.6 — the strongest model)
  • No additional costs (included in the Claude subscription)
  • Full control: the human sees and can correct the result
  • Flexibility: can request analysis for one respondent or for all

Cons: not fully automatic — needs to be launched manually. But it takes 1 minute: open Claude Code -> type "update the monitoring" -> done.

Card Storage: Google Drive vs Git

Initially planned to upload cards to Google Drive. But the Google service account cannot create files (no storage quota).

Final solution: cards are stored in the bot's repository; on push to main, Render automatically deploys the updated site. Additionally, a copy is saved locally.


What We Did

1. Expanded the UX Analyst Skill (7 -> 9 modes)

Two new modes added:

  • Mode 8 — Respondent Monitoring Card. A compact living document: who this is, what we see (patterns, barriers, drivers, discrepancies), what it means for the business, what to follow up on, trip chronology. Updated daily; each update is a delta.

  • Mode 9 — Daily Insight Summary. A Telegram post for the team: cross-respondent analysis for the day, key findings, trends, tasks for researchers. (Not yet technically implemented; the format is ready.)

2. Created Monitoring Utilities

A monitoring script:

  • Syncs diaries from Google Drive
  • Reads entries by respondent and date
  • Saves cards in two locations (locally + for the website)
  • CLI interface for debugging

3. Created a Monitoring Web Page

A protected monitoring page:

  • Password-protected login (separate from the respondent password)
  • Tabs by respondent: "R-01, pedestrian, Voronezh"
  • Cards rendered from Markdown to HTML
  • Responsive design for mobile
  • Direct link to a respondent by identifier

4. Generated Cards for Respondents

Each respondent has their own monitoring card. It contains: profile, accumulated patterns, barriers and drivers, key insights for each day, and questions for the in-depth interview. Cards are updated daily as new entries come in.


Why This Is Needed

  1. For researchers — to see accumulated understanding of a respondent before asking questions. No need to re-read all entries — open the card and understand the picture in 2 minutes.

  2. For the project manager — to quickly assess progress: who has interesting patterns, who has little data, where there are discrepancies with the quota.

  3. For interview preparation — the "What to follow up on" section linked to specific trips = a ready foundation for the in-depth interview guide.

  4. For the client — a section in each card shows what the findings mean for the product.


How It Works (Technical Chain)

1. Respondents write in Telegram groups
        ↓
2. Bot saves entries to Google Drive
        ↓
3. Claude Code syncs diaries to the local computer
        ↓
4. Claude Code (Opus 4.6) reads entries and generates/updates cards
        ↓
5. Cards are saved locally and in the repository for the website
        ↓
6. Git push → Render auto-deploy → site updated
        ↓
7. Team opens the monitoring page
Trip tracking spreadsheet
Before 575h
After 11h

Built a Trip Tracking Table in Google Sheets

Problem

Monitoring cards give an overall picture of the respondent, but for the final report and interview preparation, you need to work with data at the level of individual trips.

What was inconvenient:

  • No single repository where each trip is a separate row with full context
  • Can't quickly filter: "show all trips without navigation on familiar routes" or "show all barriers"
  • Information about a single trip can accumulate over 1-2 days (respondent recorded the trip, researcher asked questions, respondent answered the next morning) — nowhere to bring it all together
  • Researchers can't see analytics and leave comments in one place
  • No structured analytical layer: insight type, influence strength, feature group, interview questions

What was needed:

  • A table with complete data for each trip + an analytical layer
  • Filtering by any parameter (route type, navigation yes/no, app, insight type)
  • Access for researchers with commenting capability
  • Daily updates: new trips, additions to old ones, processing comments
  • A "Full text" column — ALL information about the trip without abbreviations

Options We Considered

1. Local Excel/CSV

Pros: easy to create, familiar format. Cons: no shared access, researchers can't see or comment. Need to resend the file after every update.

Decision: doesn't work. Team collaboration is more important.

2. Google Sheets with manual entry

Pros: shared access, comments, filters. Cons: manually filling ~30 columns per trip — unrealistic at 5-10 trips per day. The analytical layer requires expert filling according to a taxonomy.

Decision: doesn't work. Too labor-intensive manually.

3. Google Sheets + automatic filling via Claude Code (chosen)

Pros:

  • Shared access for the team (comments)
  • Automatic filling from raw diaries (Claude Code reads entries, breaks them into trips, fills all columns)
  • Analytical layer filled according to the insight taxonomy (8 types) tied to the project's analytical framework
  • Daily update with one command: "track the table"
  • Researcher comments are processed interactively

Cons: filling is not fully automatic — needs to be launched manually. But it's the same model as with monitoring: command -> result.


What We Did

Created a separate Claude Code skill (diary-tracking)

The skill is separated from the analytical one — different roles, different rhythms. The analyst deeply examines entries. The data engineer structures them into a table.

Three operating modes:

  • Mode 1 — Initial creation of a respondent's tab (one-time)
  • Mode 2 — Daily update (~12:00 PM Moscow time): new trips + additions + comments
  • Mode 3 — Interview preparation (end of the week): finalize summary, discrepancies, priority questions

Designed the table structure

One tab = one respondent. Three zones per tab:

Zone Columns Contents
Portrait (sidebar) A-D Respondent card: segment, city, quota, style, researcher
Trip table E-V 18 columns: route, purpose, context, navigation, transcription, full text
Analytical layer W-AD 8 columns: insight type, formulation, strength, feature group, interview question

Plus a respondent summary (patterns, barriers, drivers, discrepancies) — in separate columns on the right.

Tested on one respondent

  • 2 days of diary data, 7 trips
  • 51 entry files (text, voice transcriptions, dialogues with the researcher)
  • All data parsed and written to the table, including full voice message texts

Iterated the layout based on feedback

First version: portrait on top (10 rows), then empty rows, then the table. Problem: the table doesn't fit on screen, inconvenient.

Second version: portrait as a compact sidebar on the left (columns A-D, frozen). Trip table starts from the first row. Header row with filters frozen. Everything visible immediately on opening.

Notified the team

Sent instructions to the team chat: what's in the table, how it's structured, how to leave comments, the daily update schedule.


How It Works

  1. Respondents write in Telegram -> bot saves to Google Drive
  2. Claude Code syncs diaries to the local computer
  3. Claude Code reads entries, breaks them into trips (9-item template + voice messages + dialogues)
  4. Each trip -> a row in Google Sheets (all 26 columns)
  5. The analytical layer is filled according to the project's insight taxonomy
  6. Researchers open the table, filter, leave comments
  7. On the next update, comments are processed interactively (the project manager decides what to do with each one)

Daily cycle (~12:00 PM Moscow time):

  • Command "track the table" -> sync diaries -> parse new trips -> update rows (including additions to old trips) -> update analytics -> process comments -> report

Why This Is Needed

For the final report — structured data for each trip with an analytical layer. No need to re-read 500+ messages — everything in one table with filters.

For interview preparation — respondent summary + prioritized questions linked to specific trips. Open the tab — ready for the interview.

For researchers — the ability to see analytics and influence it through comments. Not a "black box," but collaborative work.

For cross-respondent analysis — a uniform structure across all respondents allows comparing, grouping, and building typologies at the end of the project.

Difference from monitoring — monitoring gives a quick picture of "what's happening." The table gives complete data on "what exactly and why" — for deep analysis and the report.

Day 4

Day 4: Analytics and quotas

Three operational solutions in a day. Daily analytical summaries in 5 minutes instead of 30. Instant recruiter quota checks in Telegram. Express batch candidate analysis — 5 minutes instead of 30.

6.5h saved
2 people
Quota control and recruiter skill
Before 1.7h
After 1 min

Author: project manager + Claude Code (Opus 4.6)


Problem

On the fourth day of the fieldwork phase, 16 of 32 diaries are launched. Recruiters are selecting respondents according to a quota matrix with multiple dimensions — a target number of respondents in each cell.

What was inconvenient:

  • To understand the current quota status, you need to open the "Respondent Profiles" file, manually count how many people are in each matrix cell, and compare with target numbers
  • Recruiters have no quick way to find out who's missing — they need to ask the project manager
  • Candidate scoring (25-point system, 8 criteria) was done by a script once during the selection phase. Evaluating a new candidate on the fly is impossible — you'd need to set up the script, figure out Excel columns
  • Creating a new respondent profile is manual work: find data in Excel, fill in the template, update the summary table, recalculate quotas
  • No check that every launched diary has a profile (and vice versa)

What was needed:

  • An instant quota report on command — in Telegram, for the entire team
  • The ability to evaluate a candidate against 8 criteria in 1 minute
  • Automation of profile creation and quota control updates
  • Cross-check: diary folders <-> profiles <-> tracking table

Options We Considered

1. Bot command /quotas only

Pros: quick to implement, accessible to the entire team in Telegram, doesn't require Claude Code. Cons: shows only the current status. Doesn't help evaluate a candidate, create a profile, or find discrepancies. The report is static — it parses a ready-made file, doesn't analyze.

Decision: partially — implemented as a quick tool for the team.

2. Claude Code skill only

Pros: deep analysis, scoring, profile creation, cross-checking — all in one place. Cons: accessible only to whoever has Claude Code running. Recruiters and administrators won't see the report without the project manager.

Decision: partially — implemented for analytical tasks.

3. Bot command + Claude Code skill (chosen)

Pros:

  • Two levels: quick (bot, /quotas -> HTML report in 3 seconds) and deep (skill, 5 analysis modes)
  • Bot accessible to all 6 team members in Telegram
  • Skill handles tasks the bot can't: scoring, profiling, cross-checking
  • Single data source — the "Respondent Profiles.md" file on Google Drive

Cons: two tools instead of one — need to remember when to use which.


What We Did

1. Bot command /quotas

A new module for the Telegram bot:

  • Downloads "Respondent Profiles.md" from Google Drive (file_id from config)
  • Parses the Markdown table: RESP, name, segment, city, GEO, quota, status
  • Compares with the target matrix (12 cells: 2x2x3)
  • Generates an HTML report with emoji, color coding (green / yellow / red), and a deficit list
  • Sends it to the chat where the command was called

5 existing files modified: command handler, command recognition, Google Drive config, download method, dependencies (openpyxl).

2. recruitment-analyst Skill (5 modes)

Mode Trigger What it does
1. Candidate scoring "evaluate the candidate" 8 criteria from the scoring script, score table, recommendation by thresholds
2. Quota analysis "check quotas" Parses profiles, matches against the matrix, finds deficits and imbalances
3. Profile creation "create profile" Takes data from Excel/recruiter, fills in the template, updates quota control
4. Cross-check "cross-check" Compares diary folders with profiles: who exists, who's missing
5. Post generation "write a post for recruiters" A human-friendly HTML post for the team chat

4 reference files: scoring system (25-point), quota matrix (target), profile template, Excel file structure.

3. Connecting the write-case-study Skill

Discovered that the case study writing skill was in the project root, not in .claude/skills/. Moved it — the skill was picked up automatically.

4. Updating the Command Reference

The command table expanded from 5 to 10 rows + a separate section for Telegram bot commands.


Why This Is Needed

  1. For recruiters — see quota deficits right in Telegram in 3 seconds (/quotas), without access to Claude Code.
  2. For the project manager — evaluate new candidates against 8 criteria in 1 minute instead of manual Excel analysis.
  3. For researchers — cross-check: quickly find respondents without a profile or profiles without entries.
  4. For the team — a unified reference of 10 commands + 5 bot commands. No need to remember what's where — just say "remind me of the commands."

How It Works

/quotas in Telegram

1. Team member types /quotas in the team chat
        ↓
2. Bot downloads "Respondent Profiles.md" from Google Drive
        ↓
3. Parses the table, classifies by quota matrix
        ↓
4. Generates an HTML report with deficits
        ↓
5. Sends to the chat (3 seconds)

Recruiter Skill in Claude Code

1. Project manager says "evaluate the candidate" / "check quotas" / ...
        ↓
2. Claude Code loads the recruitment-analyst skill
        ↓
3. Reads data from Excel / profiles / diary folders
        ↓
4. Applies the 25-point scoring / quota matrix / profile template
        ↓
5. Returns a structured result
Express candidate analysis
Before 1.8h
After 15 min

Express Analysis of Recruiter's Candidates

Problem

Day 4 of the fieldwork phase. The recruiter sends a list of 8 candidates in the work chat asking to check if "all geos are suitable." Seems simple — but in reality you need to keep several things in mind at once:

What's hard to do manually:

  • Remember which cities count as "large" according to our grid
  • Count how many slots are free in each of the 6 cells of the quota matrix for pedestrians
  • Check geographic balance: which cities already have respondents, where there's an excess, where there's a gap
  • Formulate a clear recommendation: who to take, who not, what else needs to be found

Effort estimate: ~30 minutes of manual work — open profiles, quota matrix, target audience portrait, count, double-check, write a response.


Options We Considered

1. Answer "by gut feeling"

Pros: instant, no tools needed. Cons: easy to make mistakes. This is exactly how the recruiter classified two cities as "large" — they felt big. Classification errors lead to quota skew that's only discovered at the end.

Decision: rejected. Too high a risk of error with 12 cells and 16 variables.

2. Manual analysis: open all documents, count

Pros: accurate, reliable. Cons: ~30 minutes of work. The project manager spends time on mechanical work instead of research.

Decision: this is how it was done before.

3. Claude Code: load context and get analysis (chosen)

Pros:

  • All project context is already in the working directory: quota matrix, profiles, target audience portrait
  • Analysis in 5 minutes instead of 30
  • Finds non-obvious conflicts (city duplicates, competition for a single slot)
  • Immediately drafts a post for the recruiter

Cons: the result needs to be verified — AI can make mistakes in nuances. But reviewing a ready analysis is faster than doing it from scratch.


What We Did

1. Asked Claude Code to analyze the candidate list

Copied the recruiter's message from Telegram into Claude Code: "Look at the respondents, give a recommendation."

Claude Code automatically:

  • Read the quota matrix from the quotas module
  • Read current profiles from the profiles file
  • Read the target audience portrait from the knowledge base
  • Counted fill rate for each cell
  • Classified candidates' cities
  • Found conflicts

2. Discovered a classification error

The recruiter had classified all four candidates as "large city" — but two of them fall into "medium/small" according to our grid. This is counterintuitive: both cities are million-plus populations.

Without the check: 4 candidates would have gone into one cell, 2 of them into the wrong one.

3. Human brain enriches AI: gender and age

After the analysis was ready and posts sent, the head of recruiting noticed something AI hadn't accounted for: gender and age balance of the sample.

Formally, there was no requirement for equal gender and age distribution in the client agreement. So neither the quota matrix, nor the recruiter skill, nor Claude Code tracked this parameter. But the head of recruiting — an experienced recruiter — noticed the skew intuitively.

What the review showed:

The current sample had a noticeable gender skew — a significant predominance of one sex. By age: one age group was not represented at all, while another was overrepresented.

This wasn't an error — it happened naturally through screening. But if left unchecked, the skew would grow.

Claude Code quickly found ages for all 8 candidates in the screening Excel and built a forecast.

What changed in the recommendation:

One of the backup respondents was elevated for consideration, as they fell into a unique quota cell that had no representation in the sample until then.

Why this matters for the case:

AI completed the task within formalized criteria (quotas, geo, cells). But the human — the head of recruiting — saw what wasn't in the brief: demographic balance. This is a classic example of how the human brain sets the direction, and AI accelerates execution. Without the head of recruiting, we wouldn't have looked at gender and age. Without Claude Code, the check would have taken another 30 minutes instead of 5.


Why This Is Needed

  1. For the project manager — 5 minutes instead of 30 for analyzing a batch of candidates. Time goes to reviewing the result, not mechanical counting.
  2. For the recruiter — a clear answer: who to take, who not, why. Plus an educational post about city classification — the error won't repeat.
  3. For the project — a quota error was prevented. Two candidates from medium cities didn't end up in the "large" cell — quotas remained valid.
  4. For the team — the head of recruiting's comment about gender and age revealed a blind spot: formalized criteria don't cover all aspects of sample quality.

How It Works

1. Recruiter sends a candidate list in the work chat
        |
2. Project manager copies the message into Claude Code
        |
3. Claude Code reads quota matrix + profiles + target audience portrait
        |
4. Analysis: city classification, slot counting, conflict detection
        |
5. Project manager reviews and adjusts recommendations
        |
6. Team adds: gender, age, other non-formalized criteria
        |
7. Ready posts are sent to the work chat

Metrics (interview 2026-02-16)

━━━ Calculation D4-EXPRESS ━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Before: ~35 min x 3 batches = 105 min ~ 1.75 hrs (researcher)
  After:  ~5 min x 3 batches = 15 min = 0.25 hrs (manager + Claude Code)
  Frequency: 3 times per project

  Savings: 1.75 - 0.25 = 1.5 hrs
  Bonus: caught a city classification error
         that would have been missed manually.

  → HRS saved: 1.5 hrs
  → PPL involved: 1 (researcher — freed)
  → Confidence: high

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Daily summary and quick access commands
Before 3.9h
After 35 min

Daily Analytical Brief (Mode 9)

Problem

14 respondents have been keeping diaries for 1-2 days. Monitoring cards are updated, the tracking table is filled in. But the team lacks the big picture: what is happening in the study overall, what patterns are forming, and what it means for the product.

What was inconvenient:

  • No regular analytical brief across all respondents — only individual cards
  • The lead sees the details, but the team doesn't see cross-respondent conclusions
  • Unclear what level of confidence is appropriate in early days (hypothesis vs trend)
  • Recommendations for researchers are duplicated across different places

What was needed:

  • A daily post with cross-respondent analysis for the team
  • A format calibrated to the study stage (days 1-3 vs days 5-7)
  • Tied to the business objectives of the study, not just listing trips

What we did

Launched Mode 9 and went through 4 rounds of edits with the lead

Iteration Before After Principle
1 "days 1-3" "days 1-2 of diaries" Precision: project day ≠ diary day
2 "Trends" section "Initial observations and hypotheses" On days 1-2 there is not enough data for trends
3 "Quota mismatch is systemic" "The concept of navigation is vague" Framing: insight, not a problem
4 "Notes for researchers" section Link to monitoring Don't duplicate monitoring cards

Established 3 new principles in the prompt:

  • Confidence calibration: days 1-3 = hypotheses, days 4-5 = confirming, days 6-7 = stable pattern
  • Audience-appropriate framing: the same finding can sound like a problem or an insight — choose the frame that conveys the essence
  • Don't duplicate monitoring: per-respondent recommendations go in site cards, only a link goes in the post

Updated the prompt for the daily brief mode: format template, "Early days" section, principles #6-8.


Why this matters

For the team — see the overall picture of the study every day in 3 minutes of reading.

For the lead — 10 minutes to review the brief instead of 60 minutes writing it. Calibration rules are locked in — subsequent briefs are more accurate right away.

For the project — a unified communication rhythm: daily brief + instant access to resources.

Quick Access Bot Commands

Problem

So many messages were pinned in the working chat that finding the right link (table, instructions, schedule) became a quest. Site passwords, links to Google Drive documents, roles — all scattered across the chat history.

What was needed:

  • Instant access to any project link via a bot command
  • Security: internal links and passwords must not leak into respondent groups

What we did

5 new bot commands — simple, no API calls, serving static HTML.

Security constraint: commands only work in the working chat. In respondent groups the bot silently ignores them — passwords and internal links won't leak.


Why this matters

For the whole team — get any project link in 3 seconds right in Telegram instead of searching through pinned messages.

For security — monitoring passwords and internal documents are accessible only in the working chat, not in respondent groups.

Total bot commands: 6 for the team + 3 service commands.

Day 5

Day 5: Source of truth

The recruiter reported a respondent quota mismatch. Turned out some quotas were based on preliminary data. Audit of 26 respondents, cascading update. Plus extracting a feature request map from 26 diaries — 15 clusters of product requests in 30 minutes instead of 10 hours.

10.8h saved
2 people
Source of truth
Before 1.8h
After 30 min

Source of Truth — how we verified whether we were relying on reliable data

Problem

On the fifth day of the field stage, the recruiter reported that one respondent's quota didn't match what we had in our system.

We started investigating and discovered a systemic problem: we had been pulling quotas from one column, but that column was filled before the voice screening. After the call, data was updated on a different sheet, while the column remained outdated.

Hypothesis: some of our quotas were based on preliminary rather than final data.


Options considered

1. Trust our scoring algorithm

Pros: the algorithm is objective, operates on formulas from survey data. Cons: the algorithm analyzes survey text, not a live conversation. Navigation behavior is not a binary characteristic: a person may sincerely answer differently depending on phrasing.

Decision: rejected — the algorithm is useful for initial screening but cannot be the source of truth.

2. Trust the recruiter's column

Pros: data is in a structured format, easy to parse. Cons: filled before the voice screening. The head recruiter explicitly said: "I filled this column before the screening, it's unreliable."

Decision: rejected.

3. The "Quotas" sheet as the single source of truth (chosen)

Pros: filled after the final voice screening, reflects the recruiter's final decision. Structured by quota cells — immediately visible who is in which group. Cons: requires opening the Excel file and cross-checking manually each time.

Decision: accepted. We formalized the rule and built it into the pipeline.


What we did

Audited all 26 respondents. Cross-checked every launched respondent's quota against the quotas sheet.

Cascading update. Corrected quotas everywhere they were stored.

Formalized the rule. Added a "Source of Truth" section to the project configuration and skill:

  • When creating a new respondent profile → find the name on the quotas sheet → take the quota from there
  • If the name is not found → notify the lead

Why this matters

For data reliability — a respondent's quota determines which analytical group they fall into. Wrong quota → wrong interpretation of their diary behavior.

For the team — now everyone working with quotas has a definitive answer: "Where do I look up the correct quota?" No need to guess which of three sources is current.

For future projects — lesson learned: in recruiting, data passes through several stages (survey → preliminary analysis → voice screening → final decision). The assessment can change at each stage. You need to know exactly at which stage the "truth" is established.


How it works

  1. The recruiter conducts a voice screening of the candidate
  2. Based on the results, fills in the "Quotas" sheet in the interview schedule file
  3. When creating a respondent profile, Claude Code reads the "Quotas" sheet via openpyxl
  4. Quota from sheet → profile → monitoring card → tracking table
  5. Periodic reconciliation: profiles vs. quotas sheet (manual or on demand)
Feature request map from diaries
Before 10h
After 30 min

Feature Request Map from Diaries — how to extract product requests from qualitative data

Problem

By the third day of the field stage (26 respondents, ~100+ trip entries), implicit feature requests were accumulating in the data: people described inconveniences, workarounds, app-switching — but these signals were scattered across entries.

  • Feature requests are not stated explicitly — the respondent describes a trip, and the feature request is implied
  • Entries from 26 respondents across 26 different diaries — impossible to see which requests recur
  • No distinction between what needs to be built and what already exists but the user doesn't know about
  • Context is lost
  • The team has no single place for analytical materials (monitoring is per respondent, not per topic)

Options considered

1. Manual analysis by researchers

Pros: deep understanding of each respondent's context. Cons: with 26 respondents and ~100 entries — days of work. Hard to maintain a cross-respondent picture.

Decision: rejected. Too labor-intensive for a mid-field snapshot on day 3.

2. GPT analysis on the server (automatic)

Pros: no human involvement, runs on schedule. Cons: GPT-4.1-mini can't handle nuanced classification (feature request vs problem vs barrier). Doesn't see screening context. Can't distinguish "unaware of value" from a real request.

Decision: rejected. Analysis quality is critical — this goes to the client.

3. Claude Code + analysis page on the site (chosen)

Pros: maximum analysis quality (Claude Opus), linked to profiles and screening, flexible format (markdown → HTML with tabs), scalability (new .md = new tab). Cons: triggered manually. But analysis is not a daily task — it's on demand.


What we did

Extracted feature requests from all diaries. Analyzed diaries of 26 respondents in parallel (5 agents, ~370 files). For each request: who, what they want, why, what happens without it, quote.

Clustered into 15 themes.

Created an analysis page on the site. Password-protected entry, tab navigation, responsive design. Color scheme — teal (#0d9488). Auto-detection: new .md in the analysis directory → new tab.

Set up the workflow. The project lead requests an analysis → Claude Code analyzes → shows the result → if approved → .md, commit, push, auto-deploy.


Why this matters

For the client — a structured request map with context and quotes. You can see what's widespread, what's isolated, and what already exists but isn't being found.

For researchers — a cross-respondent view by topic that doesn't exist in individual monitoring cards.

For final interviews — a list of features to test for "unaware of value" (Guide Block 4).

For the lead — the ability to quickly request analysis on any topic and immediately see the result on the site.


How it works

  1. The project lead: "analyze feature requests / barriers / patterns"
  2. Claude Code reads diaries of all respondents (in parallel)
  3. Clustering + linking to context + quotes
  4. Result in chat → the project lead decides
  5. If approved → .md in the analysis directory → git push → Render auto-deploy → new tab on the analysis page
Day 6

Day 6: Selection and status

Selecting respondents for in-depth interviews: three researchers spent 7 hours each manually evaluating every diary. Diary status check: the project lead spent 30 minutes per respondent. A 100-point selection skill cut the work to 1.25 hours. Status — 20 minutes instead of 16 hours.

82.5h saved
3 people
Diary status dashboard
Before 72h
After

Author: project lead + Claude Code (Opus 4.6)


Problem

On the sixth day of a diary study field stage, 32 respondents are keeping trip diaries across 32 separate Telegram groups. The diary lasts 7 calendar days, but each respondent started on a different date. In 1-2 days the first ones will finish — and we need to plan the selection for in-depth interviews.

What was inconvenient:

  • Unclear who is on which diary day: 32 respondents with 5 different start dates, impossible to track mentally
  • Can't see who finishes tomorrow — and interview prep for them is urgent
  • No overall picture of trip counts — who is writing actively and who is silent
  • Three respondents were unclear whether they had even started
  • The decision to select 24 out of 32 for interviews needs to be made 1-2 days before diary completion, but there was no tool for it
  • Manual check of each respondent — open the Telegram group, scroll through entries, count days — takes ~30 minutes per respondent, 32 × 30 min = 16 hours. With automation — ~20 minutes

What was needed:

  • A single screen showing the status of all 32 respondents: diary day, trip count, end date
  • Automatic calculation: who finishes in 1-2 days (urgent) vs 3-5 days (can wait)
  • Trip detector: how many actual trips recorded, by day
  • Interview selection section: who is mandatory (only one in a quota segment), who can be chosen
  • Launched with a single command, result in 5 seconds

Options considered

1. Google Sheets with manual updates

Pros: familiar format, the team can edit. Cons: requires manually counting days and trips for each respondent. With 32 people that's 2+ hours of work, and it needs to be redone every day. Human error — easy to miscount.

Decision: rejected. Doesn't scale with 32 respondents and daily updates.

2. Python script with automatic diary scanning (chosen)

Pros:

  • Scans the local diary folder in 2-3 seconds
  • Automatically determines the diary day, counts trips, calculates deadlines
  • Interview selection section with quota logic — no need to remember who is in which segment
  • Single terminal command

Cons: depends on diary sync from Google Drive (need to sync before running). The trip detector is approximate (~80% accuracy) — counts based on text patterns, not manual markup.


What we did

1. Trip detector (three methods)

The key task — automatically determine how many trips a respondent described in each diary message. The challenge: respondents write in free form — some number their entries, some describe in text, some send voice messages.

We implemented three detection methods applied sequentially.

Accuracy: ~80%. For the task of "overall progress picture" this is sufficient.

Key insight from the project lead: "Thanks to the researchers actively asking follow-up questions, even from a single trip we were able to get a great deal of important context. This is valuable for the research objectives and the business goals."

2. Interview selection section

Out of 32 respondents, 24 need to be selected for the final in-depth interview. The script automatically maps them to the quota grid and identifies three levels: who cannot be dropped (only one in a segment), who should preferably stay, and where there is room to choose. The lead makes the decision seeing the full picture.

4. Output format

A compact ASCII table in the terminal: all 32 respondents on one screen, grouped by type (pedestrians / drivers). Columns: respondent code, city, day N/7, trip count, end date, days remaining, trips by day.

A critically important UX decision: !! next to "days remaining" for those finishing within 2 days or fewer. This is a visual urgency marker.


Why this matters

  1. For the project lead — see the progress of all 32 respondents in 5 seconds instead of 16 hours of manual checking. Make interview selection decisions backed by data.
  2. For researchers — understand who finishes in 1-2 days and who urgently needs interview questions prepared.
  3. For group admins — see who is silent (0 trips for the day) and who needs a nudge.
  4. For the selection process — the quota grid automatically shows who cannot be dropped. Without it, it's easy to accidentally remove the only representative of a segment.

How it works

1. Sync diaries from Google Drive → local folder
        ↓
2. Script scans all respondent subfolders
        ↓
3. Trip detector: numbering pattern → markers → keywords
        ↓
4. Determining Day 1: first day with ≥1 trip
        ↓
5. Calculation: diary day, end date, trips by day
        ↓
6. Quota grid: mapping 32 respondents to segments
        ↓
7. Output: ASCII table + interview selection section
Interview selection skill
Before 14h
After 1.5h

Author: project lead + Claude Code (Opus 4.6)


Problem

Diary study on navigation: 32 respondents keep diaries for 7 days, 24 of them are selected for final 70-minute interviews. The interview format is a detailed walkthrough of specific trips from the diary. This means we need to select not "good respondents" but those whose diaries will yield the most material for a productive conversation.

What was inconvenient:

  • Selection criteria were in the lead's head, no formalized system
  • 32 respondents x 5-7 days x 2-6 trips/day = hundreds of entries. Impossible to keep in memory when comparing
  • Quota constraints (type x geo x navigation frequency) make selection non-linear — you can't just take the top 24 by "quality"
  • Respondents start diaries on different dates. Those who started later have less data — unfair advantage for early starters
  • The client needs transparent justification: why these 24 and not others

What was needed:

  • A formalized evaluation system: what exactly we assess and on what scale
  • Each decision tied to research objectives, not to abstract "quality"
  • Quota logic: proportional selection maintaining balance by city and navigation frequency
  • Fairness for respondents with a late start
  • Justification for the client for each inclusion and exclusion

Options considered

1. Expert selection without formalization

Pros: fast, the lead already knows the material. Cons: not transparent for the client.

Decision: rejected. The client expects a reasoned justification.

2. Simple ranking by trip count

Pros: objective metric, easy to calculate. Cons: quantity != quality. 20 entries of "everything was fine, got there" are less useful than 5 entries with reflection and screenshots. Doesn't account for quotas — you could lose an entire segment. Penalizes "infrequent" users who are valuable precisely because of their non-use of navigation.

Decision: rejected. Contradicts the principle "value for the interview > formal activity."

3. 100-point system with quota logic and wave scoring (chosen)

Pros:

  • 5 dimensions cover all aspects of value for the interview
  • Quota logic guarantees sample preservation
  • Calibration anchors ensure scoring stability between sessions
  • Wave scoring (day 5+) eliminates unfairness for "late starters"
  • The result is a ready-made justification for the client

Cons: labor-intensive — full scoring of one respondent takes 5-10 minutes. Requires an up-to-date monitoring card as input.


What we did

1. Created a skill with 5 evaluation dimensions

100-point system, each dimension evaluates a specific aspect of value for the interview:

Dimension Max What it evaluates
Activity 25 How many days and trips, consistency of diary-keeping
Trip diversity 20 Types of transport, purposes, contexts, route familiarity
Comment quality 20 Reflection, "why", emotions, screenshots, voice messages
Research value 25 Insights, barriers, discrepancies, feature requests
Interview potential 10 Material for a 70-minute walkthrough: contradictions, hypotheses

Key decision: research value weighs as much as activity (25 points). Three deep entries with insights are worth more than twenty formal ones.

2. Built in calibration anchors

Two respondents were scored first — as scale anchors: upper (high activity, many insights) and lower (minimal data). Each subsequent scoring is calibrated against these two.

3. Implemented wave logic for "day 5+"

Rule: score a respondent only when they are on day 5 of the diary or later. 5 days is enough for a reliable assessment, with 2 days remaining for interview planning.

4. Created a methodology post for the team

Described the 5 dimensions, the scale, the decision-making logic — in plain language, without internal details. Sent to the team's working chat so everyone understands how the selection works.


Why this matters

  1. For the research lead — a formalized system instead of "I feel like it." Scoring can be delegated, the result will be the same.
  2. For the client — transparent justification: each inclusion and exclusion is tied to research objectives.
  3. For the project team — clear criteria: what makes a "good diary" and why a respondent with 1 trip can be more valuable than one with 20.
  4. For respondents (indirectly) — fairness: a late start doesn't mean an automatic loss. Wave scoring levels the playing field.
  5. For the agency — a reusable skill. On the next diary project there's no need to reinvent the selection system.

How it works

1. Monitoring card updated (fresh diary data)
        |
2. Check diary day: >=5? If not — wait
        |
3. Launch scoring: "evaluate the respondent for interview"
        |
4. Claude Code reads the card + diary + quota grid
        |
5. Evaluates across 5 dimensions, checks for red flags
        |
6. Determines quota position (mandatory / competitive)
        |
7. Produces a scoring card with a score and decision
        |
8. Result is saved to the results file
        |
9. When the entire cell is scored — final cut-off by score
Day 7

Day 7: Guide, bot, and analysis

The final stretch. Interview guide v2.0 — complete methodology rework in 2 hours instead of 19.5. Bot crash: Google OAuth token expired, diagnosed and fixed in 20 minutes instead of 4 hours. Analysis page with feature requests and hypotheses for the team.

21.2h saved
3 people
Bot fix — OAuth token
Before 4h
After 20 min

Author: project lead + Claude Code (Opus 4.6)


Problem

On the 6th day of the diary study field stage (32 respondents, 32 Telegram groups), the bot stopped saving entries to Google Drive. Respondents kept writing, but files weren't appearing.

What happened:

  • Ran diary sync — downloaded 1 file instead of the expected dozens
  • Google Drive check showed: only 5 files for the entire day (all before 04:00 AM)
  • Telegram webhook returned 500 Internal Server Error
  • 224 messages from respondents stuck in the Telegram queue
  • Server logs: RefreshError: invalid_grant: Token has been expired or revoked

Why this was critical:

  • Telegram stores the message queue for ~24 hours, then deletes — data was at risk of being lost
  • 32 respondents kept writing all day, unaware of the problem
  • The field stage lasts 7 days — losing even one day = irreparable damage

Root cause: The OAuth consent screen in Google Cloud Console was in "Testing" mode. In this mode, Google automatically revokes the refresh token after 7 days. The project launched a week ago — the token expired exactly after 7 days.


Options considered

1. Switch the bot to a service account instead of OAuth

Pros: service accounts don't require a refresh token, they don't expire. Cons: service accounts have no storage quota on Google Drive and cannot upload files to a user's personal drive. You get a storageQuotaExceeded error. Only works with Shared Drives (Team Drives).

Decision: rejected after an attempt — got a 403 from the Google Drive API.

2. Regenerate the OAuth refresh token + publish the consent screen (chosen)

Pros:

  • Solves both the current problem (new token) and the root cause (publishing = token is permanent)
  • No changes to the storage architecture needed
  • Can be done in 10 minutes

Cons: requires access to Google Cloud Console and the hosting panel.

3. Move storage to a Shared Drive

Pros: can use a service account, tokens don't expire. Cons: need to migrate ~2,000 files, change all scripts (rclone, sync, bot), risk of data loss during migration.

Decision: deferred. Disproportionate amount of work for an urgent situation.


What we did

1. Diagnosis (5 minutes)

Chain of checks:

  • Diary sync → only 1 new file → suspicion
  • Google Drive check → only 5 files for the day (all before 04:00) → problem on the bot side
  • getWebhookInfo → 500 Internal Server Error, 224 pending updates → bot is crashing
  • Server logs → RefreshError: invalid_grant → OAuth token expired

2. Code fix: fallback to service account (2 minutes)

Wrapped OAuth authorization in try/except. On OAuth failure — the bot logs a warning and tries the service account. This is insurance for the future: if OAuth fails again, the bot won't crash with a 500 but will attempt an alternative path.

try:
    if not creds.valid:
        creds.refresh(Request())
except Exception:
    logging.warning("OAuth refresh failed, falling back to service account")
    return None

3. Discovered service account limitation (3 minutes)

After deploying the code with fallback — got a new error: storageQuotaExceeded. Service accounts cannot create files on a personal Google Drive. Option 1 doesn't work — OAuth is needed.

4. OAuth token regeneration (5 minutes)

Wrote a token regeneration script:

  • Starts a local HTTP server on port 8090
  • Opens a browser for Google authorization
  • Intercepts the authorization code via redirect
  • Exchanges the code for a new refresh token

Updated the environment variable on the hosting platform → Manual Deploy.

5. Publishing the consent screen (1 minute)

In Google Cloud Console → OAuth consent screen → Audience → clicked "Publish App". Status changed from "Testing" to "In production". The refresh token is now permanent.

6. Result

Metric Value
Time from detection to fix ~20 minutes
Messages in queue 224
Messages lost 0
Queue processing speed ~30 messages/minute
Queue cleared in ~7 minutes

Why this matters

  1. For the project lead — understanding a typical bot failure scenario and a rapid diagnosis algorithm.
  2. For the engineer — the code fix with fallback remains as insurance; the token regeneration script is ready for reuse.
  3. For future projects — a checklist for setting up Google OAuth: always publish the consent screen before launching fieldwork.
  4. For the team — data for the entire day was recovered without losses, respondents noticed nothing.

How it works

1. Detection: sync showed 0 new files
        |
2. Drive check: no files → problem on the bot side
        |
3. Webhook check: getWebhookInfo → 500, N pending
        |
4. Server logs: error identified
        |
5. Fix: token regeneration + deploy
        |
6. Telegram resends the queue → files on Drive

Checklist "OAuth for Google Drive"

  1. When setting up the project: Google Cloud Console → OAuth consent screen → publish (Publish App). Don't leave it in Testing.
  2. Environment variables: GOOGLE_OAUTH_CLIENT_ID, GOOGLE_OAUTH_CLIENT_SECRET, GOOGLE_OAUTH_REFRESH_TOKEN — all three are needed on the server.
  3. If the token expires: run the token regeneration script → copy the new token → update on hosting → Manual Deploy.
  4. Monitoring: getWebhookInfo — if pending_update_count > 0 and last_error_message contains 500 — the bot is down.
Interview guide v2.0
Before 19.5h
After 2h

Author: project lead + Claude Code (Opus 4.6)


Problem

Deadline: approval of the final interview guide with the client. First interviews are the next day. Available: a draft guide for 70 minutes, 7 blocks, written at the start of the project. But over the week of fieldwork, much had changed:

  • Data from 30 diaries (200+ trips) accumulated, changing the picture
  • The client hadn't provided a product feature list, and it was unclear whether they would
  • The brief contained only a couple of hypotheses, while diaries generated dozens
  • The draft contained unresolved TODOs: "insert feature list," "add hypotheses"
  • No mechanism for prioritizing development directions (Project Task 4) — and this was the key expected deliverable

What was needed: over the weekend, turn the draft into a working tool that guarantees data collection for ALL research tasks, and test it on real material.


Options considered

1. Cosmetic editing of the draft

Pros: fast, the structure already exists. Cons: critical gaps will remain. Without features from the client, Block 4 (navigation value) is non-functional. Without prioritization, Task 4 is not covered. Questions are not linked to diary data. Decision: rejected.

2. Completely rewrite the guide

Pros: can incorporate everything we know from diaries. Cons: risk of "researcher perfectionism" — endless refinement. The draft structure works — no reason to break it. No time. Decision: rejected.

3. Systematic review + targeted refinement + testing on data (chosen)

Pros:

  • Review across 8 blocks identifies exactly the gaps that are critical
  • Targeted edits preserve the working structure
  • Diary data is built into the guide as hypotheses and patterns
  • Testing on a real diary verifies whether the guide works before the first interview
  • Fits within the weekend

Cons: labor-intensive — need to extract data from 30 tabs, conduct the review, refine, and test. But the result is a verified tool.


What we did

Step 1: Systematic review across 8 blocks

Used a review skill — evaluation across 8 blocks (introduction, screening, link to tasks, phrasing, structure, tools, timing, pilot). Built a coverage matrix: research tasks x guide questions.

Overall assessment: "Needs revision."

Critical gaps:

  • Block 4 (navigation value) is built "top-down": show the respondent a feature list → ask "do you know about this?" Without a feature list from the client, the block is non-functional
  • No prioritization mechanism — No pedestrian/driver split in questions
  • No priorities within blocks (what is mandatory, what if time permits)
  • Hypotheses are not formulated — TODO in the text

Strengths (preserved):

  • Tied to specific trips from the diary
  • 7-block structure with a logical funnel
  • Standard questions about patterns

Step 2: Solving the "no feature list" problem

This was the key challenge. The client hadn't provided a feature list, and it was unknown whether they would. The first version of Block 4 was dead without it.

Solution — flip the logic: instead of "top-down" (features → respondent), go "bottom-up" (respondent → their mental model):

  1. Open-ended questions about navigation value
  2. Tied to specific trips from the diary
  3. Discussion of the exercise results from the last diary day
  4. Segment-specific deep dives (IF TIME)

This way we capture the respondent's mental model without prompting — and then compare it with actual product capabilities during analysis.

Step 3: Moving prioritization to the last diary day

Prioritizing directions is important but consumes 15-20 minutes of interview time. With a 70-minute limit, this is critical.

Solution: the prioritization exercise is completed on the last diary day, NOT during the interview:

  • The respondent receives cards with barriers and values
  • Selects top 3 in each category, ranks them
  • During the interview — only a brief discussion of results (5 min instead of 20)

Cards were created from real diary data, separately for drivers and pedestrians. Phrasing uses respondents' language, not research jargon.

Step 4: Extracting data from 30 diaries

To formulate hypotheses and cards, we needed to consolidate data from all 30 respondents. We wrote a script that extracts the analytical layer from the Google Sheets tracking table — patterns, barriers, drivers, feature requests, interview questions.

Result: a structured JSON with data across 8 sections x 30 respondents. From this data:

  • Formulated 6 hypotheses for Block 5
  • Identified 2 new patterns for Block 3
  • Compiled barrier and value cards for the exercise

Step 5: Guide refinement (v2.0)

Targeted edits based on review results:

Block What changed
Block 1 (warm-up) Added recording request, time check
Block 3 MUST/IF TIME markers, removed leading phrasing, added 2 patterns from data
Block 4 (value) Complete redesign: bottom-up, 4 steps, linked to exercise
Block 5 (hypotheses) 6 hypotheses from diaries instead of empty TODO. Three MUST, three IF TIME
Block 7 (closing) Added a key question
Timing Priorities for each block: what is mandatory, what can be skipped

Step 6: Virtual interview — testing on real data

The most unconventional decision of the session. Instead of waiting for the first interview to test the guide, we conducted a "virtual interview" — walked through the guide using data from one real diary.

For each block: what question we ask → what we can predict from the diary → what's missing and what only a live interview will provide.

What the test showed:

  • The guide works — it generates data for all blocks
  • Block 3: all 4 key trips yield rich material for discussion
  • Block 4 (flipped): open-ended questions are applicable, trip references are there
  • Block 5: 5 out of 6 hypotheses are relevant to this respondent
  • Block 7: Q4 "by default" — a key question, no answer in the diary = exactly what the interview will provide

What the guide doesn't cover: testing revealed unique behavioral patterns that the guide doesn't cover — they are specific to the particular respondent and go into the researcher's individual preparation before the interview.

Main takeaway: the diary provides "what," the interview will provide "why" and "what if." The guide is designed precisely for this transition.

Step 7: Publishing and team communication

  • Uploaded guide v2.0 and the virtual interview to Google Drive
  • Created a page on the project site for convenient reading
  • Sent 2 posts to the team chat:
    1. Guide + virtual interview + links + request for researcher feedback
    2. Prioritization exercise: barrier and value cards + request for feedback

Why this matters

  1. For researchers — a working tool with priorities (MUST/IF TIME), linked to diary data, with ready-made hypotheses. No need to improvise
  2. For the lead — confidence that the guide covers ALL research tasks. The coverage matrix is proof
  3. For the client — the guide is based on data (30 diaries, 200+ trips), not assumptions. Hypotheses come from real user behavior
  4. For the project — prioritization is solved without losing interview time. Data is collected in the diary, discussed in the interview
  5. For methodology — the virtual interview as a method for testing a guide. Can be done for each respondent before the live interview
Analysis page
Counted in interview guide v2.0

Analysis Page: feature requests + hypotheses for interviews

Author: project lead + Claude Code (Opus 4.6)


Problem

By day 7 of the project, a large volume of analytical data had accumulated from diaries of 32 respondents (~400+ trips). Two key artifacts — the feature request map and hypotheses for final interviews — existed but in different places:

  • Feature requests were on the site but outdated — written from early days' data, 26 respondents. Over the remaining days, new clusters appeared and existing ones were significantly expanded.
  • Hypotheses were embedded inside the interview guide — accessible only to those who read the entire guide. There was no separate document with the evidence base.
  • The team (researchers) was preparing for pilot interviews and needed quick access to both artifacts.

What was needed: a single access point for feature requests and hypotheses, current across all 32 respondents, with an evidence base.


Options considered

1. Update feature requests in the existing file, leave hypotheses in the guide

Pros: fast, minimal changes. Cons: the team won't see the hypotheses without reading the 10-page guide. Quick access is needed before interviews. Decision: rejected.

2. Create two separate documents on the site (chosen)

Pros:

  • The site page already supports auto-tabs from .md files — just add a second file
  • Two tabs: feature requests and hypotheses — logically connected but read independently
  • Researchers get a single link for interview preparation

Cons: labor-intensive — need to review data from all 32 respondents and 87 work sessions.


What we did

Step 1. Updated the feature request map.

Source data: diaries of 32 respondents over 7 days, monitoring cards, analytical layer of the tracking table, cross-respondent patterns.

Result:

  • Before: 15 clusters, 26 respondents, first days, ~200+ trips
  • After: 20 clusters, 32 respondents, all 7 days, ~400+ trips

Step 2. Created the hypotheses file.

Sources: interview guide v2.0, tracking table, monitoring cards, quantitative layer.

10 hypotheses with an evidence base from the full diary stage. Each hypothesis: summary, evidence table, explanation of "why it matters," interview question formulation.

Step 3. Deploy and notification.

  • Commit + push → auto-deploy on Render
  • Post in team chat: what was updated, where to view it, access code

Result

Analysis page on the project site — two tabs:

  1. Feature request map — 20 clusters with quotes, summary matrix, and conclusions
  2. Hypotheses from diary data — 10 hypotheses with evidence base and interview questions

A single link for the team to the analysis page on the project site.


What this delivered

  • For researchers: interview preparation — quick access to feature requests and hypotheses linked to specific respondents
  • For the lead: an up-to-date picture of user needs across all 32 respondents
  • For the project: the methodological chain "diary → tracking table → feature requests → hypotheses → interview" closed into a unified pipeline
Day 8

Day 8: Respondent portrait

Diaries ended — interviews begin. Researchers can't read hundreds of files per respondent in one evening. Narrative analytics system: respondent portrait, dialogues, interview zones — three files instead of hours of reading. Pipeline from diary to business article, reusable across projects.

36h saved
3 people
Respondent portrait
Before 48h
After 12h

Author: project manager + Claude Code (Opus 4.6)


Problem

Diaries are ending — interviews begin. Researchers need to prepare: immerse themselves in each respondent's data, understand their behavior, know what to clarify during the interview.

Input: 7 days of diary entries — dozens of text messages, voice notes, photos, screenshots, dialogues with researchers. No researcher can read it all in one evening before the interview.

The monitoring card (already on the site) is a good tool for daily tracking, but it's a structured summary: patterns, barriers, drivers, chronology. Interview preparation requires something different — understanding the person: how they think, why they behave this way, what's behind the numbers. Specific quotes, specific situations, specific questions.

After the interview, something is needed for the client. Not just "the respondent uses navigation in 33% of trips," but an answer to the question: how does this person's behavior relate to business goals? What works, what doesn't, and why.

What's needed: a system for creating analytical articles about respondents that works at all stages of research and is reusable across projects.


Options Considered

1. Extend the monitoring card

Add a narrative block, quotes, and interview questions to the existing format. Pros: one file, one place. Cons: the card would become enormous, two formats would mix (structured summary and narrative), the card is updated daily — the narrative would interfere. Decision: rejected.

2. Create separate analytical articles per respondent (chosen)

Three separate files in each respondent's personal folder: narrative analytics, dialogues, interview zones. After the interview — enrichment with transcript + business article. Pros: each artifact is a separate file with a clear purpose, doesn't pollute monitoring, scales well. Cons: more files. Decision: chosen.

3. Prepare for each interview manually

The researcher reads the diary themselves, extracts quotes, formulates questions. Pros: deep immersion. Cons: 24 respondents x hours of reading = unrealistic under tight deadlines. Starting from scratch every time. Decision: rejected.


What We Did

1. Defined the genre — "Narrative Analytics"

Not a day-by-day chronology (respondents didn't do anything unusual — they lived ordinary lives). Not a dry summary (that already exists in monitoring). But analytics through narrative: the skeleton is mechanisms, causes, behavioral structure; the tone is specific situations, quotes, context.

Focus:

  • How the person moves overall (A→B patterns)
  • Recurring actions and what changes
  • One-time / unique actions

Length — exactly enough to reveal behavior and its causes. Not 3 paragraphs and not 30 — as much as needed.

2. Created folder structure

32 personal folders — one per respondent. Each contains:

  • Narrative analytics
  • Researcher↔respondent dialogues
  • Interview dig zones

3. Wrote a pilot article

Sources: 177 diary files (7 days), monitoring card, analytical layer from tracking spreadsheet.

Result — three files:

  • Narrative analytics — respondent portrait
  • Dialogues — all significant researcher↔respondent exchanges, organized by day and topic. Original quotes preserved.
  • Interview zones — 8 specific directions

4. Designed a full lifecycle pipeline

Defined three stages applicable to any research project:

Stage When Output Input
Narrative Before or after interview Narrative analytics Diary and/or transcript
Preparation Before interview Dialogues + dig zones Any pre-data
Business After narrative Business article Narrative + business goals

Key decision: the skill adapts to available data, not to the research method. One command covers:

  • Diary method (narrative from diary → enrichment with transcript)
  • In-depth interview without pre-data (narrative from transcript)
  • Any hybrid (whatever's available — that's what we work with)

The researcher doesn't choose "diary mode" or "interview mode." They say "write narrative" — and the skill determines what data is available.


What This Changes

For researchers:

  • Interview prep: instead of reading 177 files — one narrative document + ready-made dig zones
  • After interview: narrative gets enriched, not rewritten
  • For reporting: business article maps respondent behavior to business goals

For the project:

  • 24 respondents going to interview. The same package can be prepared for each
  • Folders already created. Transcripts will go there too
  • Business articles for all 24 respondents → assembled into an overall vision for the report

For future projects:

  • Pipeline is not tied to the diary method. Works for any qual research with interviews
  • "Narrative analytics" genre is reusable: analytics through story, not dry summaries
  • Skill (in development) will be in the team's shared skills — plugs into any project
Total

Total in 8 days

1 212 hours saved
17 AI solutions
6 team members
20 → 0 researchers would need to be hired