a builder's codex
codex · operators · Asha Sharma · ins_post-training-as-the-moat

The economic moat in AI is post-training on proprietary data, not pre-training a base model

By Asha Sharma · CVP, Microsoft AI Platform · 2026-04-28 · podcast · Asha Sharma — Product as organism, post-training, agentic society — Lenny's Podcast

Tier A · TL;DR
The economic moat in AI is post-training on proprietary data, not pre-training a base model

Claim

Beyond ~30B parameters, the capex of pre-training your own base model no longer makes economic sense. The defensible asset shifts to post-training: fine-tuning, RAG, reward design, and proprietary feedback loops on data you uniquely capture. Cursor is the canonical proof — its $300M ARR moat is the data of which suggestions users accept and reject, not its IDE chrome.

Mechanism

Pre-training capex scales with model size; revenue from a self-trained base model rarely justifies the spend once a frontier provider exists. Post-training is the cheaper, defensible layer: your fine-tuned model behavior is a function of your accumulated user interactions. As long as you keep collecting interaction data, the model gap to competitors compounds. Pre-training is a one-time spend in a public market; post-training is a moat that compounds in your private one.

Conditions

Holds when:

Fails when:

Evidence

Asha cites Nathan Lambert's research: "Once a model hits 30B parameters, the CapEx to pre-train doesn't make economic sense; you should fine-tune instead."

"50% of developers are now fine-tuning. When you go through the full loop — synthetic data generation, rewards design, A/B testing rigorously, extract job-to-be-done — you get better results faster."

— Asha Sharma on Lenny's Podcast, 2026-04-28

Cursor's moat is named explicitly: data from accepted vs. rejected suggestions, retrained continuously.

Signals

Counter-evidence

Benjamin Mann at Anthropic argues the foundation-model layer is still where the most compounding happens — "the model will eat your scaffolding for breakfast." Sherwin Wu echoes: customer fine-tunes can be obsoleted by the next base model. The post-training moat exists for current generation; the pre-training disruption can collapse it overnight.

Cross-references

Open the interactive view → View original source → Markdown source →