Skip to content
Back to Thoughts
Thoughts1 min read

Model size is a vanity metric

ailocal-inference
Share

Ran Kokoro-82M on my MacBook today. 82 million parameters. 337MB on disk. Real-time text-to-speech that sounds broadcast-quality. Latency under 50ms. Cost: $0.

Meanwhile companies are paying $15/million characters for cloud TTS APIs that sound roughly the same. At scale that's thousands per month for a commodity capability.

The frontier models get all the press. But the real story is at the bottom of the parameter count — small, specialized models running locally with zero latency, zero cost, zero vendor lock-in. No API keys. No rate limits. No privacy concerns.

Whisper-tiny (39M params) transcribes English at 95%+ accuracy. DistilBERT (66M) handles sentiment analysis as well as models 5x its size. The pattern keeps repeating: distill, specialize, deploy locally.

The best model for the job is almost always the smallest one that works.


Share

More in Thoughts