Claim
Beyond ~30B parameters, the capex of pre-training your own base model no longer makes economic sense. The defensible asset shifts to post-training: fine-tuning, RAG, reward design, and proprietary feedback loops on data you uniquely capture. Cursor is the canonical proof — its $300M ARR moat is the data of which suggestions users accept and reject, not its IDE chrome.
Mechanism
Pre-training capex scales with model size; revenue from a self-trained base model rarely justifies the spend once a frontier provider exists. Post-training is the cheaper, defensible layer: your fine-tuned model behavior is a function of your accumulated user interactions. As long as you keep collecting interaction data, the model gap to competitors compounds. Pre-training is a one-time spend in a public market; post-training is a moat that compounds in your private one.
Conditions
Holds when:
- You have a UX surface that produces ranking-worthy interaction signals (accepts/rejects, ratings, follow-ups).
- You can run the post-training loop frequently — synthetic data generation, eval design, RAG updates.
Fails when:
- The product surface produces sparse or noisy signals. Garbage data degrades fine-tunes.
- The base model improves so quickly that your fine-tune becomes redundant. Watch for capability waves that obsolete your customizations.
Evidence
Asha cites Nathan Lambert's research: "Once a model hits 30B parameters, the CapEx to pre-train doesn't make economic sense; you should fine-tune instead."
"50% of developers are now fine-tuning. When you go through the full loop — synthetic data generation, rewards design, A/B testing rigorously, extract job-to-be-done — you get better results faster."
— Asha Sharma on Lenny's Podcast, 2026-04-28
Cursor's moat is named explicitly: data from accepted vs. rejected suggestions, retrained continuously.
Signals
- The team's data flywheel produces monthly model updates with measurable improvement.
- Fine-tuning is on a normal release cadence, not a research project.
- The product gets better the more it's used, in measurable axes (suggestion acceptance rate, time-to-task, etc.).
Counter-evidence
Benjamin Mann at Anthropic argues the foundation-model layer is still where the most compounding happens — "the model will eat your scaffolding for breakfast." Sherwin Wu echoes: customer fine-tunes can be obsoleted by the next base model. The post-training moat exists for current generation; the pre-training disruption can collapse it overnight.
Cross-references
- Treat the product as a living organism with a metabolism, not a shipped artifact — the broader framing this moat lives inside
- Plan in seasons keyed to secular changes, not 6-month roadmaps — the cadence implication