our production LLM agent just returned this JSON to your order processing service:
{
"action": "refund",
"amount": "fifty dollars",
"order_id": null,
"confidence": "pretty high"
}
Your downstream service crashes. The retry hits the same model. Same broken output. The refund never fires — but the user got a confirmation email.
You need your agent to return valid, typed, structured output — every time. What do you do?
A) Prompt-engineer harder — add "Always return valid JSON with these exact fields" to your system prompt and document the schema inline.
B) Use structured outputs / function calling (OpenAI, Bedrock tool use, Gemini response schema) — constrain the model at the API level to return a typed schema.
C) Post-process with a validation layer — parse the output, run JSON Schema or Pydantic validation, retry with corrective context if it fails (max 2 retries).
D) Add a second LLM as a judge — pass the first model's output to a smaller, faster model that scores and flags invalid responses before they reach your service.
Three of these are patterns used in production AI systems. One of them is wishful thinking.
Pick one — A, B, C, or D — and tell me why. I'll drop the full breakdown in the comments, including the pattern that looks defensive but actually makes hallucinations worse under load.
Drop your answer








