Prompt Engineering Is Dead
In 2023, "prompt engineer" was a real job title at real companies. People made six figures writing "you are an expert in..." prefixes and hand-tuning few-shot examples. That era is over.
Not because prompts don't matter. They do. But the craft of writing them has been replaced by something better: structured APIs that constrain the model's behavior at the system level.
what replaced it
Three capabilities killed traditional prompt engineering:
Structured outputs. Instead of praying the model returns valid JSON, you hand it a schema and it must conform. OpenAI's response_format, Anthropic's tool use with schema definitions — these guarantee the output shape. No more regex parsing. No more "please return a JSON object with the following fields."
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Analyze this resume" }],
response_format: {
type: "json_schema",
json_schema: {
name: "resume_analysis",
schema: {
type: "object",
properties: {
years_experience: { type: "number" },
top_skills: { type: "array", items: { type: "string" } },
fit_score: { type: "number", minimum: 0, maximum: 100 },
reasoning: { type: "string" },
},
required: ["years_experience", "top_skills", "fit_score", "reasoning"],
},
},
},
});That schema is worth more than any prompt template. The model can't return a string where you need a number. It can't skip required fields. The structure is the prompt.
Function calling. Instead of telling the model "if the user wants to book a flight, output their intent in this format," you define tools and the model calls them. The prompt becomes the function signature.
tools = [
{
"type": "function",
"function": {
"name": "search_flights",
"description": "Search for available flights",
"parameters": {
"type": "object",
"properties": {
"origin": {"type": "string", "description": "IATA airport code"},
"destination": {"type": "string", "description": "IATA airport code"},
"date": {"type": "string", "format": "date"},
},
"required": ["origin", "destination", "date"],
},
},
}
]The model doesn't need a five-paragraph essay explaining when to search for flights. The tool definition makes it obvious. Intent recognition, parameter extraction, and output formatting all happen from the schema alone.
System-level constraints. Model providers now offer guardrails at the API level — content filters, token limits, temperature control, stop sequences. The things we used to hack into prompts with "IMPORTANT: never discuss X" are now just config parameters.
what still matters
I'm not saying prompts are irrelevant. The system prompt still sets context, tone, and domain knowledge. But the high-value work shifted:
Before: Crafting elaborate multi-paragraph prompts with examples, edge case handling, output format instructions, and role-playing directives.
Now: Designing good schemas, choosing the right tool definitions, and writing a concise system prompt that establishes context. The engineering moved from text to structure.
The skills that matter now:
- Schema design — what fields, what types, what constraints
- Tool decomposition — breaking capabilities into well-defined functions
- Evaluation — measuring whether your system works, not whether your prompt sounds good
- Pipeline architecture — chaining multiple calls with routing logic
None of these are "prompt engineering." They're software engineering applied to LLM systems.
the few-shot trap
Few-shot examples used to be essential. You'd spend hours curating 3-5 examples that covered your edge cases. Now they're often counterproductive.
With structured outputs, the schema already constrains the format. Adding examples sometimes hurts because the model over-indexes on the example patterns instead of generalizing. I've seen classification accuracy drop 5-10% when adding few-shot examples to a system that already has a well-defined schema.
Where examples still help: tone calibration ("write like this, not like that"), domain-specific terminology, and subjective judgment calls where the schema can't capture the nuance.
the real skill now
The job title shouldn't be "prompt engineer." It should be "LLM systems engineer." The work is:
- Designing evaluation pipelines that catch regressions
- Building routing logic that sends queries to the right model
- Defining schemas that constrain without over-constraining
- Setting up fallback chains when the primary model fails
- Monitoring output quality in production
This is just software engineering. Which is the point. The weird interlude where we treated LLM interaction as a craft separate from engineering is ending. Good. The faster we normalize LLM APIs as just another system dependency — with schemas, tests, monitoring, and error handling — the better our systems get.
If you're still spending hours tuning prompt wording, step back. Define a schema. Write an eval. Measure the output. That's where the real gains live.
More in AI & LLM
The Context Window Trap
1M token context windows don't mean you should use them. When to chunk, summarize, or restructure instead of stuffing everything in.
LLM Evaluation Is Hard
Building LLM-as-a-Judge for AssessAI taught me why automated evaluation needs human calibration — and where it breaks down.
Voice AI in Production
What it takes to deploy conversational Voice AI that handles real phone calls. Latency budgets, failure modes, and lessons from Avoca.