ProductMarch 1, 20265 min read

Shipping Zero to One in 8 Weeks

startupproductassessaiarchitecture

AssessAI went from a one-page spec to a working product in 8 weeks. Solo. Here's the tactical breakdown — not the motivational fluff, but the actual decisions, tradeoffs, and mistakes.

week 1-2: architecture that scales with one person

The stack: Next.js 15 (App Router), React 19, TypeScript, Tailwind, Supabase, OpenAI, Vercel AI SDK.

The principle behind every choice: minimize the number of services I operate. Supabase handles auth, database, real-time subscriptions, edge functions, and storage. That's five services that I don't need to set up, connect, or maintain separately. The cost is vendor lock-in. The benefit is shipping speed. At this stage, shipping speed wins.

Key architecture decisions:

Per-question workspace routing. Each assessment question gets its own workspace state. The candidate can work on question 3, jump back to question 1, and their state is preserved independently. This required a routing layer (workspace-client.tsx) that manages state per question, not per page.

Upsert everything. I learned this the hard way. Auto-save was initially delete-then-insert. Two saves 200ms apart caused data loss. Switched to ON CONFLICT ... DO UPDATE for every write. Zero data loss from race conditions since.

OpenAI for eval, not for UX. The AI evaluation pipeline uses gpt-4o for final scoring and gpt-4o-mini for everything else (prompt quality checks, extraction, intermediate analysis). This cut API costs by ~8x. The key insight: evaluation doesn't need to be real-time. Candidates submit, then scoring happens asynchronously. Latency doesn't matter for background jobs.

week 3-4: the testing strategy that made speed possible

630+ unit tests. Written first (TDD). This sounds slow. It's the opposite.

The test suite caught 30+ bugs in a single marathon coding session. Without tests, each of those bugs would have been discovered manually — probably days or weeks later, probably after shipping to users, probably requiring more complex fixes because other code was built on top of the broken behavior.

My testing priority:

Data transformations (normalizeMarkdown, score calculation, schema validation)
State transitions (assessment flow, auth states)
API contracts (request/response shapes)
Happy path integration (create assessment -> submit -> score)

What I didn't test: UI rendering, third-party library wrappers, config constants.

week 5-6: the flagship feature

LLM Interaction Mode. Candidates get an AI collaborator during the assessment. We measure how they use it. Do they ask precise questions? Do they verify the output? Do they iterate on the design?

This was the hardest feature because it touches everything: real-time chat, message persistence, prompt counting, response extraction, and scoring integration.

Three bugs that burned me:

The ghostwriting bug. The extraction service pulled deliverables from all chat messages — including the AI's responses. So the AI was effectively writing the candidate's answer. Fix: filter to candidate messages only during extraction.

The question routing bug. Mixed-mode assessments (some questions with AI, some without) always showed the LLM chat interface. The question_mode field wasn't propagating from the database to the workspace router. Fix: per-question mode loading from the session state.

The prompt counting bug. usePrompt() was counting prompts before the API confirmed the message was sent. If the API failed, the count was already incremented. Fix: count after server confirmation, not before.

week 7-8: polish and the 80/20 trap

The last two weeks were the hardest. Not technically — the core product worked. But the gap between "works" and "feels good" is enormous.

Anti-cheating system: 3-tier escalation (gentle warning, firm warning, auto-flag). Had to use useRef for the counter because useState with React 19's concurrent rendering captured stale values in event handlers.

Loading states and skeletons for every data-fetching page. Error boundaries on candidate-facing routes. Accessibility improvements (keyboard nav, ARIA labels, focus management). Markdown rendering in chat with proper code block support.

Each of these took 2-4 hours. None of them are features. All of them matter for whether someone trusts your product.

what I'd change

Start with the scoring pipeline, not the UI. I built the candidate experience first and scoring last. Should have been the opposite. The scoring pipeline is the core value — it should have been validated (with real evaluations against real responses) before I built the chrome around it.

Use a simpler schema for OpenAI structured output. I burned two days debugging schema validation errors before learning that deeply nested Zod schemas with enums inside arrays break OpenAI's structured output. Flat schemas with optional fields work reliably. I would have saved those two days by starting simple.

Ship a landing page in week 1. I spent 8 weeks building before putting anything in front of potential customers. Should have shipped a landing page with the value proposition in the first week and collected interest while building.

the numbers

200+ source files
630+ unit tests
275 questions across 20 categories
11 database migrations
291 test scenarios
0 TypeScript errors at build time
1 engineer

The 8-week timeline is real but it's not replicable without two things: a clear spec from day one, and an AI-augmented development workflow that handles the mechanical work (testing boilerplate, database migrations, repetitive CRUD) so I can focus on architecture and product decisions.

One person can ship a real product in weeks. But only if they spend their time on decisions, not on typing.

Shipping Zero to One in 8 Weeks

week 1-2: architecture that scales with one person

week 3-4: the testing strategy that made speed possible

week 5-6: the flagship feature

week 7-8: polish and the 80/20 trap

what I'd change

the numbers

More in Product

Metrics That Matter Early Stage

Building AssessAI: Why Coding Tests Are Broken

Solo Founder Stack 2026