Validation and Recovery: Handling Output Failures Gracefully
Even with a careful prompt and structured output, the model will occasionally produce something broken: a missing field, an invalid value, malformed JSON, or a confident answer that is simply wrong. A robust workflow does not assume the output is perfect. It validates every output and has a recovery plan for when validation fails. This lesson is about building that safety net so a single bad output does not silently corrupt everything downstream.
What You'll Learn
- Why you should validate every output, not just trust it
- The layers of validation, from syntax to schema to semantics
- Recovery strategies when validation fails
- How to design prompts that fail loudly instead of silently
- A practical validate-and-retry pattern you can apply anywhere
Trust Nothing, Validate Everything
The golden rule of structured output: never feed an unvalidated model output into a downstream step. A program that assumes the JSON is well-formed will crash or, worse, will quietly store garbage. A validation step is a cheap insurance policy that catches problems at the boundary, where they are easy to handle, instead of deep in your process where they are expensive.
Validation answers a simple question before you use an output: is this good enough to proceed? If yes, continue. If no, recover.
The Layers of Validation
Validate from cheapest and most mechanical to most expensive and most semantic. Stop as soon as a layer fails.
- Syntax. Does it parse at all? For JSON, is it valid JSON? This is the cheapest check and catches the most common failure.
- Schema. Are all required fields present, with the right types, and do categorical fields use allowed values only? A
priorityof "super-urgent" fails here even though the JSON is valid. - Constraints. Do the values obey your business rules? A date that is in the past when it should be in the future, a total that does not equal the sum of line items, a summary over the word limit.
- Semantics. Is the content actually correct and grounded in the input? This is the hardest layer and often needs a human or an LLM-as-judge check, because syntactically perfect output can still be a hallucination.
Most automations check the first three with simple rules and reserve the fourth for high-stakes outputs. The point is that "it parsed" is the lowest bar, not a guarantee of correctness.
Recovery Strategies
When validation fails, you have several moves. Choose based on stakes and cost.
- Retry as-is. Models are non-deterministic, so simply running the same prompt again often succeeds. Cheap and surprisingly effective for transient glitches. Cap the number of retries so you do not loop forever.
- Retry with the error. Feed the failed output and the validation error back to the model and ask it to fix it. This is the most powerful recovery for structured-output failures, because the model can see exactly what was wrong.
- Fall back to a default. For low-stakes fields, substitute a safe default (
"Other",null) rather than failing the whole record. - Escalate to a human. For high-stakes cases that keep failing, route to a person. A good system knows what it cannot handle and asks for help instead of guessing.
- Reject and log. Drop the record, log it, and add it to your eval set so you can fix the prompt for next time.
A common, robust pattern combines these: try once, retry-with-error up to two times, then fall back or escalate.
Retry With the Error: The Pattern
This is the single most useful recovery technique for structured output. When the output fails validation, you tell the model exactly what was wrong and ask for a corrected version.
Because the model can see the precise errors, the corrected output is usually right on the first retry. This is far more reliable than just running the original prompt again and hoping.
Design Prompts That Fail Loudly
The worst failure is a silent one: the model invents a plausible value when it should have signaled uncertainty, and your validation passes because the value looks fine. You can design against this.
- Require an explicit unknown. Instruct the model to use
nullor"unknown"when it cannot determine a value from the input, rather than guessing. Then your validation can flag too many unknowns. - Ask for a confidence or grounding field. Have the model include a field like
"confidence": "high" | "low"or"source_quote"that points to the text it based its answer on. Low confidence or a missing quote becomes a routable signal. - Separate extraction from inference. Tell the model to only extract what is explicitly stated, and to mark anything it had to infer. Inferred values are where hallucinations hide.
These turn invisible errors into visible ones your validation layer can catch.
Putting It Together
A production-grade structured-output workflow looks like this:
- Run the prompt with a precise schema.
- Validate: syntax, then schema, then constraints, then (if high-stakes) semantics.
- On failure, retry-with-error up to a small cap.
- Still failing? Fall back to a default for low-stakes fields, or escalate to a human for high-stakes ones.
- Log every failure and feed recurring ones back into your eval set and prompt.
This loop is what makes AI output dependable enough to build on. The model does not have to be perfect; your validation and recovery make the system reliable.
Key Takeaways
- Never feed an unvalidated output into a downstream step; validate at the boundary.
- Validate in layers: syntax, schema, constraints, then semantics, stopping at the first failure.
- Recover with retry, retry-with-error, safe defaults, or human escalation, and cap your retries.
- Retry-with-error, where you show the model exactly what failed, is the most reliable fix for structured-output problems.
- Design prompts to fail loudly: require explicit unknowns, confidence signals, and source quotes so hallucinations become catchable.

