a builder's codex
codex · operators · Benjamin Mann · ins_economic-turing-test-for-ai

The Economic Turing Test — would you hire the agent if you didn't know it was a machine?

By Benjamin Mann · Co-founder, Anthropic; tech lead product engineering; ex-OpenAI GPT-3 architect · 2026-04-28 · podcast · The Economic Turing Test, mission over money, transformative AI

Tier A · TL;DR
The Economic Turing Test — would you hire the agent if you didn't know it was a machine?

Claim

The right benchmark for "is this AI system transformative?" is not a generic capability test but the Economic Turing Test: contract an agent for one to three months on a specific job; if you would hire it back, having believed it was a person, it has passed for that role. Aggregate to a money-weighted basket of jobs and call 50% the threshold for transformative AI.

Mechanism

Generic AGI debates are unfalsifiable. The Economic Turing Test grounds the question in revealed-preference: would a real buyer pay a real wage for the agent's output, blind to its origin? The answer is per-role and per-workstream, which makes it actionable for product teams. It also re-orders the conversation from "is the model smart enough?" to "for which specific workstream does this agent already pass?" — a question with concrete answers and concrete economic stakes.

Conditions

Holds when:

Fails when:

Evidence

"If you contract an agent for a month or three months on a particular job, if you decide to hire that agent and it turns out to be a machine rather than a person, then it's passed the Economic Turing Test for that role."

Reference points Mann anchors against: Fin/Intercom resolves 82% of customer-service tickets fully automated; the remaining 18% is genuinely harder. Anthropic's own engineering reports 95% of Claude Code's code is written by Claude with 10–20x output multiplier.

— Benjamin Mann on Lenny's Podcast, 2026-04-28

Signals

Counter-evidence

The test isolates economic substitution but ignores trust, legal exposure, and failure-mode novelty. An agent might pass the test for ninety days and then produce a catastrophic compounding error a human would not. Aggregation to a 50% threshold for "transformative AI" is provocative but arbitrary; reasonable operators can disagree about the right denominator and weighting.

Cross-references

Open the interactive view → View original source → Markdown source →