Why Engineering Around AI Models Matters More Than the Models Themselves
Here's an uncomfortable truth about building AI products: the model is often the least important part.
The Conventional Wisdom
Most teams approach AI product development the same way:
- Find the latest, smartest model
- Integrate it into their product
- Hope for the best
This approach keeps teams constantly chasing the next model release. It also tends to produce mediocre results.
What Actually Works
Experienced AI builders have noticed something counterintuitive: companies shipping reliable AI products that actually work aren't necessarily using the smartest models.
They're using the ones with the best engineering around them.
That engineering comes in several forms:
Evals (evaluations) are the foundation. Think of them as the scientific method applied to non-deterministic software. You form hypotheses about how your system should behave, test those assumptions against real inputs, and measure the results.
Feedback loops connect production back to development. What happens when users interact with your AI? That data should flow back into your testing and improvement process.
Production harnesses determine whether your AI actually works in the real world, not just in development.
The Token vs. Dollar Puzzle
Here's a striking data point: Chinese and open-source models generate massive token volumes. In terms of raw usage, they're huge.
But in dollar-weighted spend? They're tiny.
Why? Because delivery matters. Rate limits, unstable APIs, integration friction, and support gaps make cheaper models more expensive in practice.
Teams trade lower token costs for the predictability and uptime of commercial providers.
The Interface Debate
One of the most practical debates in AI engineering right now involves agent interfaces:
The bash approach gives agents an unconstrained Unix environment—files, curl, general computing access.
The structured approach constrains agents to typed APIs, SQL, and specific interfaces.
Benchmark data shows structured access wins. SQL is more accurate, more token-efficient, and faster than bash for most production tasks.
The lesson: constraining your AI with computer science fundamentals often outperforms giving it unlimited freedom.
The Bottom Line
As long as frontier labs can raise billions, they'll keep throwing compute at model improvements. For product builders, that means chasing models is a losing game.
The winning strategy: invest in engineering around whatever model you're using. Build robust evals. Create feedback loops. Engineer your integration properly.
When the next model comes out—and it will—you'll be able to swap it in quickly because your engineering is solid.
Stay ahead of AI trends. tldl summarizes podcasts from builders and investors in the AI space.