10 years of messages between 3 friends

how this was made

Built with Claude Code over several iterations. The pipeline today:

  1. Parse raw WhatsApp .txt exports into a per-message data blob; sender names aliased.
  2. Claude Sonnet reads the chat in 100-message windows to find conversation boundaries — about 1,860 conversations across ten years.
  3. Stratify-sample 200 conversations, have Sonnet draft a topic taxonomy from the sample, hand-edit it once, then lock it (12 categories × 75 subtopics).
  4. Tag each conversation against the locked taxonomy via roughly 3,000 prompt-cached Sonnet calls.
  5. Short reactions ("lol", "yeah", "haha") inherit their conversation's topic instead of being tagged in isolation.

per-person activity

message volume, words written and average length per person

person messages words w/day avg/msg media questions median reply

word share

who dominates the conversation, weighted by words

who starts chats

first message of a new calendar day counts as a fresh start

activity over time

share per person (bars, left axis) · total messages (dotted line, right axis), monthly

hour of day

when this chat is most alive

day of week

mon → sun

topics

LLM-classified categories — word-share per person. messages may carry multiple tags.