Engineering

Faster, Smaller, Cheaper... Better?

Examining the rapid evolution of AI models and questioning whether faster, smaller, and cheaper necessarily means better for developers and society.

Every week, there’s another headline: “New AI Model 50% Faster,” “Open-Source LLM Cuts Costs to Almost Nothing,” “Tiny Model Beats GPT-4 at Your Specific Task.” The hype machine is relentless, and the industry metrics tell a compelling story of exponential progress. But I keep asking myself: are we actually getting better, or just… different?

Faster, smaller & cheaper! But better?

The improvements over the past eighteen months have been genuinely impressive. Quantization techniques squeeze models down from 70B parameters to 7B without catastrophic accuracy loss. Inference speeds have dropped from seconds to milliseconds. Costs per million tokens have collapsed from dollars to cents. On paper, this is the kind of Moore’s Law-on-steroids narrative Silicon Valley loves.

But here’s the thing: shipping the fastest model doesn’t mean shipping the best product. A developer using a 7B model today faces different tradeoffs than one using GPT-4 last year. Latency matters less if you’re batch-processing overnight. Cost matters less if you’re running a single query per user per month. Speed matters less if accuracy matters more. We’ve optimized for the wrong metrics and pretended that “faster, smaller, cheaper” automatically equals “better.”

Size? Cost? Accuracy? What’s it matter?

The allure of self-hosting is understandable. Run your model locally, keep your data private, avoid vendor lock-in. But self-hosting doesn’t solve hallucinations—it just means you own the hallucinations now. A smaller model isn’t more trustworthy because it runs on your laptop; it’s just smaller. And cheaper isn’t an unqualified win when you’re paying the cost in robustness, consistency, and reliability.

The real kicker? Most teams using these models today have zero processes around validation. No systematic testing. No rollback procedures. No feedback loops that let them know when the model is lying. They’re shipping smaller, cheaper models with the same confidence (and lack of guardrails) that worked for well-tested software ten years ago. Spoiler alert: it didn’t work then, either.

We’re seeing the consumption of AI models as a replacement for building real systems—throwing cheaper inference at problems that actually need better architecture, better data pipelines, and better human oversight. And somewhere, some service is degrading gracefully because the model hallucinates at 3 AM and nobody’s monitoring it.

Where to from here?

The uncomfortable truth: AI results aren’t better just because they’re faster, smaller, or cheaper. Better means reliable. Better means predictable. Better means you have the harnesses, the processes, and the infrastructure to catch failures before they hurt people.

That’s the work nobody wants to fund right now. Building observability into model pipelines. Implementing structured validation and feedback loops. Creating circuit breakers for when accuracy degrades. Actually measuring whether the AI is solving the problem, not just whether it’s producing output.

The models will continue to improve. The costs will continue to fall. But until the industry stops treating AI as fire-and-forget code and starts treating it as a system requiring serious engineering discipline, “better” will remain a marketing category, not a technical one.

The real innovation isn’t in the next 3x speedup. It’s in the unglamorous work of building the guardrails that make rapid iteration safe and meaningful. That’s where the actual competitive advantage lies.