The Context Window Trap
1M token context windows don't mean you should use them. When to chunk, summarize, or restructure instead of stuffing everything in.
Long-form articles and short thoughts on AI, product, and engineering.
1M token context windows don't mean you should use them. When to chunk, summarize, or restructure instead of stuffing everything in.
Building LLM-as-a-Judge for AssessAI taught me why automated evaluation needs human calibration — and where it breaks down.
What it takes to deploy conversational Voice AI that handles real phone calls. Latency budgets, failure modes, and lessons from Avoca.
OpenAI ada-002 vs text-embedding-3 vs Cohere vs local models. Real benchmarks from production retrieval systems.
Claude Code vs Cursor vs Copilot vs Windsurf. Ranked by someone who uses them all for real work, not demos.
A decision framework for when to fine-tune, when to prompt engineer, and when RAG is the right call. Most teams pick fine-tuning too early.
An honest comparison from daily use. Where each model wins, where each fails, and what I actually reach for.
What I learned building a 23-agent system. When to split agents, when to merge them, and why most multi-agent architectures are over-engineered.
Grounding, tool use, and verification loops. What it took to ship agents that users actually trust.
Structured outputs, function calling, and system-level constraints killed the artisanal prompt. Good riddance.
Running Llama 3, Mistral, and Phi-3 on a MacBook. When cloud APIs are overkill and local inference makes more sense.
Vector search is the easy part. Here's what matters after you've embedded your documents.