10 years of messages between 3 friends

Built with Claude Code over several iterations. The pipeline today:

Parse raw WhatsApp .txt exports into a per-message data blob; sender names aliased.
Claude Sonnet reads the chat in 100-message windows to find conversation boundaries — about 1,860 conversations across ten years.
Stratify-sample 200 conversations, have Sonnet draft a topic taxonomy from the sample, hand-edit it once, then lock it (12 categories × 75 subtopics).
Tag each conversation against the locked taxonomy via roughly 3,000 prompt-cached Sonnet calls.
Short reactions ("lol", "yeah", "haha") inherit their conversation's topic instead of being tagged in isolation.

message volume, words written and average length per person

person	messages	words	w/day	avg/msg	media	questions	median reply

who dominates the conversation, weighted by words

first message of a new calendar day counts as a fresh start

share per person (bars, left axis) · total messages (dotted line, right axis), monthly

when this chat is most alive

mon → sun

LLM-classified categories — word-share per person. messages may carry multiple tags.