Model size is a vanity metric
Ran Kokoro-82M on my MacBook today. 82 million parameters. 337MB on disk. Real-time text-to-speech that sounds broadcast-quality. Latency under 50ms. Cost: $0.
Meanwhile companies are paying $15/million characters for cloud TTS APIs that sound roughly the same. At scale that's thousands per month for a commodity capability.
The frontier models get all the press. But the real story is at the bottom of the parameter count — small, specialized models running locally with zero latency, zero cost, zero vendor lock-in. No API keys. No rate limits. No privacy concerns.
Whisper-tiny (39M params) transcribes English at 95%+ accuracy. DistilBERT (66M) handles sentiment analysis as well as models 5x its size. The pattern keeps repeating: distill, specialize, deploy locally.
The best model for the job is almost always the smallest one that works.
More in Thoughts
Remote work is a skill
Working across Japan, USA, and India taught me that async communication is everything.
AI won't replace engineers
It'll replace engineers who can't use AI. Different thing.
The best code is no code
Before writing a function, check if a library does it. Before a library, check if you need it at all.