Metrics That Matter Early Stage
Vanity metrics vs. real signal. What I track for AssessAI and why most startup dashboards are performance art.
Long-form articles and short thoughts on AI, product, and engineering.
Vanity metrics vs. real signal. What I track for AssessAI and why most startup dashboards are performance art.
1M token context windows don't mean you should use them. When to chunk, summarize, or restructure instead of stuffing everything in.
Designing a cron-at-scale system — priority queues, exactly-once execution, retry with dead letter queues, and the monitoring that keeps it honest.
Working across Japan, USA, and India taught me that async communication is everything.
Trunk-based dev, feature branches, conventional commits. What works when you're the only person pushing code.
Building LLM-as-a-Judge for AssessAI taught me why automated evaluation needs human calibration — and where it breaks down.
Most companies still use LeetCode to hire senior engineers. In 2026. When AI does 70% of the coding. Here's what I built instead.
What it takes to deploy conversational Voice AI that handles real phone calls. Latency budgets, failure modes, and lessons from Avoca.
How I turned Claude Code into a complete engineering team in a terminal.
Next.js + Supabase + Vercel + Claude Code. The stack that lets one person ship like five.
Designing S3-like object storage — chunking, deduplication, CDN integration, and the metadata layer that ties it all together.
OpenAI ada-002 vs text-embedding-3 vs Cohere vs local models. Real benchmarks from production retrieval systems.
Ran Kokoro-82M on my MacBook for TTS. 337MB. Broadcast-quality speech. Zero cost.
Auth, database, edge functions, realtime, storage. One service replacing five. Here's how I use Supabase as my entire backend.
What I look for when reviewing code: correctness, edge cases, naming, testing. Lessons from leading a team at Blinq.
It'll replace engineers who can't use AI. Different thing.
Claude Code vs Cursor vs Copilot vs Windsurf. Ranked by someone who uses them all for real work, not demos.
Consistent hashing, eviction policies, cache stampede prevention, and the Redis vs Memcached decision you'll actually face in production.
Kokoro-82M: 337MB model, broadcast-quality speech, runs on a MacBook, costs nothing. Here's how to set it up.
Dev containers, multi-stage builds, compose for local dev. The Docker knowledge that actually matters when you're writing code, not managing infrastructure.
A decision framework for when to fine-tune, when to prompt engineer, and when RAG is the right call. Most teams pick fine-tuning too early.
Idempotency, double-entry bookkeeping, webhook handling, and PCI compliance — the system where bugs cost real money.
Why the best engineers think about users first. The skill gap companies don't test for.
An honest comparison from daily use. Where each model wins, where each fails, and what I actually reach for.
Before writing a function, check if a library does it. Before a library, check if you need it at all.
Inverted indexes, query parsing, BM25 ranking, and autocomplete — designing an Elasticsearch-like search system from scratch.
Log correlation, structured traces, and the debugging stories that taught me how to find bugs in production without adding console.log.
Claude Code hooks: auto-format, auto-test, auto-commit, auto-sync. My full setup for turning manual chores into background processes.
What I learned building a 23-agent system. When to split agents, when to merge them, and why most multi-agent architectures are over-engineered.
Tactical breakdown of building AssessAI from nothing to 200+ source files, 630+ tests, and a working product. Architecture decisions and what I'd change.
Fan-out on write vs read, ranking algorithms, and the caching strategy behind Facebook and Twitter's news feeds.
When you're the only engineer, you can't test everything. Here's where to invest your testing time for maximum confidence.
Grounding, tool use, and verification loops. What it took to ship agents that users actually trust.
Designing a WhatsApp/Slack-scale chat system — WebSockets, message ordering, presence indicators, and read receipts.
Postgres + Vercel + a single Next.js app covers 90% of use cases.
Caching, revalidation, loading states, parallel routes. The things that bit me after 200+ source files on Next.js 15.
Model Context Protocol: what it is, how I use 9 servers daily, and a practical guide to connecting AI tools to everything.
Structured outputs, function calling, and system-level constraints killed the artisanal prompt. Good riddance.
Push, email, SMS — designing a multi-channel notification system with priority queues, delivery guarantees, and template engines.
Server components, the use() hook, Actions — what actually changed in practice after shipping a full product on React 19.
Running Llama 3, Mistral, and Phi-3 on a MacBook. When cloud APIs are overkill and local inference makes more sense.
Claude Code as a primary development environment. 68 commands, tmux sessions, and why I rarely open VS Code anymore.
Token bucket, sliding window, and the Redis scripts that make distributed rate limiting actually work at scale.
The moment I decided, the problem I couldn't ignore, and what the first week looked like.
LeetCode interviewing in the AI era is cargo cult hiring.
Vector search is the easy part. Here's what matters after you've embedded your documents.
Discriminated unions, branded types, const assertions, and satisfies. The patterns that actually make TypeScript worth it.
Designing a bit.ly-scale URL shortener — Base62 encoding, redirect latency, analytics, and the caching layer that makes it fast.