a builder's codex
codex · operators · Hamel Husain

Hamel Husain

Bio

Independent machine-learning consultant. Former engineer at GitHub and Airbnb. Co-teaches the highest-grossing course on Maven (~2,000 PMs and engineers across 500 companies, including OpenAI and Anthropic) on LLM evaluation as a practical discipline. Public reference voice for "evals as error analysis" and the open-coding → axial-coding → LLM-as-judge pipeline.

Operating themes

Cards

Sources captured

External

Insights · 4

Tier B · ai-native · leadership
Appoint one trusted-taste expert as the eval benevolent dictator — committees stall the loop
Tier A · ai-native · engineering
Evals are systematic data analysis on your LLM application — start with error analysis, not tests
Tier A · ai-native · engineering
Build LLM-as-judge as binary true/false, one judge per pesky failure mode — and validate against human labels
Tier A · ai-native · research-discovery
Sample 100+ traces, write one free-form note per trace, let an LLM cluster the notes — humans first, machines second
Open the interactive profile → linkedin · website · site