HRS 1,212
PPL 6

How We Rebuilt the Content Creation Process for the School Platform

I’m building an educational platform for my daughter. The Serbian school curriculum, grades one through eight — math, language, nature and society, everything that makes up primary school. I started with fourth-grade math. The stack is Next.js 15, Supabase, Tailwind, KaTeX for formulas. Claude is the content co-author: theory sections, exercises, quizzes, hints, Russian translations.

Over three days in early March, we created seventeen skills across three topics — equations, multiplication, division. Each skill includes theory sections, several quizzes, and more than twenty exercises each. All of it lives in Supabase, loaded through SQL seed files. It looked like it was working — until I started looking at the results more carefully.

What went wrong

The problems fell into two categories: technical and pedagogical.

The technical ones were repetitive. Seed files used E-strings with literal \n instead of chr(10) — it worked, but violated the format we had already established as the reference standard. media fields contained NULL instead of an empty JSON array '[]'. For numbers above a thousand, we hadn’t defined answer variants using the dot separator (in Serbian notation, 1.000 instead of 1000) — a child could enter the correct answer and get an error.

The pedagogical problems were more serious. The theory for each skill opened with an abstract definition: «Једначина је…» (“An equation is…”). A fourth-grader doesn’t start with definitions — they start with a situation: there were books on the shelf, some were taken away, how many are left? Hints for exercises were generic — «Размисли поново» (“Think again”) instead of pointing to the specific mistake. And exercises weren’t connected to the theory: the theory used one set of numbers and examples, the exercises used completely different ones.

Each of the sixteen skills contained some version of the same combination of problems. It became clear that these weren’t random mistakes — they were the predictable result of having no process.

Why the model isn’t the issue

Claude is perfectly capable of generating quality educational content in Serbian. But with each new skill, it was starting from scratch: no memory of the terminology, no sense of which phrasing feels natural in a Serbian textbook, no access to the source materials. Every time, I had to explain the context again, and every time the result depended on how thoroughly I’d managed to convey it in that particular session.

And ahead of us are seven more grades and several subjects. If the process doesn’t hold up across five math topics, it won’t survive the scale of an entire curriculum.

The question wasn’t about finding a better model — it was about giving the existing model a proper working environment: reference materials, sources, a defined algorithm, and quality control.

What we rebuilt

The rebuild took one day and touched three things.

A source library. We designed a two-file system. The first file — reference.md — contains terminology (around eighty math terms in Serbian with Russian translations), rules and properties from the textbook in precise Serbian phrasing, and a phraseology block: templates for the kinds of sentences that appear in problem statements, instructions, theory explanations, and hints. The second file — index.md — is a map of all textbook topics, linked to their source files with coverage status.

All source materials (textbook PDFs, problem documents) get converted to Markdown and stored in a converted/ folder. Claude reads them directly — no intermediate steps, no wasted context unpacking a PDF each time.

The pattern is designed to transfer to other subjects: each one will get its own reference.md with terminology and its own index.md with sources.

We worked this out in a brainstorming session. I asked questions one after another: who’s the primary consumer of the file — you or me? one file or several? what do we need to record about each source? Claude offered options, I asked it to check each one, and it found the weak points in its own suggestions. For example, it initially assumed that page numbers in the answer key would match the textbook — but when we opened the PDF, the numbering was completely independent.

A dedicated skill. Instead of generating content on the fly, we created school-create-math — a formalized five-step algorithm. Step zero is context loading: the terminology reference, the SQL format standard, the current state of the platform. Step one is finding sources for the topic and analyzing the concepts. Step two is writing the theory following a five-stage structure: a real-life hook, a worked example, the rule, a counterexample, a comprehension check. Step three is generating the SQL — separately from the text, so the two don’t bleed into each other. Step four is a self-check against a checklist.

Hard blockers sit between the steps: you can’t move to SQL until the theory text has passed a check for AI patterns. You can’t mix writing text and generating code in a single step. These constraints came directly from experience: when Claude writes theory and SQL at the same time, the quality of the text drops.

A quality control chain. Two additional skills: school-reviewer for regression checks and capturing recurring patterns, and school-qa — a mandatory checklist before every commit. No skill reaches the database without passing both.

The result

The seventeenth skill — “Division with remainder” — was built using the new process. The theory opens with a situation: children dividing apples, with one left over. Every exercise connects back to the theory through matching examples. Hints are specific: «Подели 47 на 8» (“Divide 47 by 8. How many whole eights fit? Multiply back and find the remainder”). Terminology is consistent because Claude reads the reference file at the start of each session rather than pulling from memory.

The difference between the old skills and the new one is visible without a detailed side-by-side comparison. We’re now systematically rewriting all sixteen previous skills — same process, two or three per session.

What this transfers to

Three things that apply to any project where AI is generating content.

First, the scaling problem is in the process, not the model. One good generation proves nothing — ten generations without a reference file and a checklist will inevitably accumulate technical debt. When hundreds of skills across multiple subjects and grades are ahead of you, that debt becomes a dead end.

Second, a model needs a working context, not just a prompt. Terminology, reference examples, formatting rules — all of it needs to live in files that load at the start of every session. A prompt is gone by the next session; a file loads every time and produces a consistent result.

Third, quality control needs to be built into the process, not added at the end. If checking only happens during review of a finished result, the errors are already systemic, and starting over is faster than fixing them one by one.

That’s exactly what we decided to do — rewrite everything. My daughter isn’t using the platform yet, so the risk is zero, and partial fixes would have created twice the work. Sometimes the most efficient move is to accept that starting fresh with the right process is faster than repairing the output of the wrong one.