a builder's codex
codex · operators · Hamel Husain · ins_open-coding-then-axial-coding

Sample 100+ traces, write one free-form note per trace, let an LLM cluster the notes — humans first, machines second

By Hamel Husain · Independent ML consultant and Berkeley PhD researcher · 2026-04-28 · podcast · Evals as error analysis, the benevolent dictator, LLM judges

Tier A · TL;DR
Sample 100+ traces, write one free-form note per trace, let an LLM cluster the notes — humans first, machines second

Claim

Run trace review as a two-stage pipeline: a human samples 100+ traces and writes a free-form note on the first thing wrong with each (open coding); then an LLM groups those notes into failure-mode buckets (axial coding). An LLM cannot do the open-coding pass for you because it lacks product context; humans cannot scale the categorisation pass.

Mechanism

Open coding captures domain-specific failure that an LLM judge would miss because the LLM has no privileged access to product reality (e.g., "we don't actually offer virtual tours" — hallucination invisible without context). Axial coding is pure clustering, which LLMs do reliably. The split assigns each task to the actor that can do it, and the resulting pivot table converts qualitative review into quantitative priority.

Conditions

Holds when:

Fails when:

Evidence

"When you're doing this open coding... appoint one person whose taste that you trust."

"I would bet money... if I put that into ChatGPT and asked, 'Is there an error?' it would say, 'No, did a great job.'"

Stopping rule: theoretical saturation, not a fixed count. Once 15–60 traces stop yielding new categories, you stop.

— Hamel Husain & Shreya Shankar on Lenny's Podcast, 2026-04-28

Signals

Counter-evidence

Coding-agent teams (Claude Code, Codex) operate with much lighter eval discipline because the developer is also the user; the dogfood loop closes inside one head. That pattern does not generalise to products where the buyer is not the builder.

Cross-references

Open the interactive view → View original source → Markdown source →