6-Dimensional Contract Validation: Why Your LLM API Needs More Than Status Code Checks
Your API returns 200 OK. Your monitoring dashboard is green. Everything looks fine.
Except the response is JSON with completely wrong schema. Or the latency just tripled. Or the model silently switched from GPT-4 to GPT-3.5-turbo and nobody noticed.
Status code monitoring is not reliability monitoring. It's the bare minimum that tells you the server answered β not that it answered correctly.
The 6 Dimensions That Actually Matter
After running 20,000+ real LLM API calls through our reliability engine at Correctover, we identified six independent dimensions where things can go wrong β and each requires its own validation strategy.
1. Structure Validation
Does the response parse as valid JSON? Does it have the expected top-level keys?
from correctover import Contract
contract = Contract(
structure={
"type": "object",
"required": ["choices", "usage"],
"properties": {
"choices": {"type": "array", "minItems": 1},
"usage": {"type": "object"}
}
}
)
This catches truncated responses, encoding errors, and format regressions. In our test data, 2.3% of "successful" responses had structural issues.
2. Schema Validation
Even if the structure is correct, does the data conform to your expected schema?
- Are choice.message.content values strings, not null?
- Is usage.total_tokens a positive integer?
- Are the model identifiers valid?
Schema validation catches the "technically valid JSON but semantically wrong" class of errors β the most insidious kind.
3. Latency Validation
A response that takes 30 seconds when your SLA is 2 seconds is a failed response, regardless of HTTP status.
Our data shows latency spikes are often the first warning sign of provider degradation β before errors appear.
4. Cost Validation
Did this response cost what you expected? Token counts can vary dramatically between models and providers for the same prompt.
- Token count anomalies indicate model drift
- Unexpected cost spikes hurt your budget
- Token counting discrepancies between providers are real
5. Identity Validation
This is the most critical dimension that almost nobody checks.
Is the model you called the model that responded? In our drift detection data, we found that providers silently swap models in approximately 0.7% of production calls. This means:
- You pay for GPT-4 but get GPT-3.5 responses
- Your carefully tuned prompts produce different outputs
- Your quality assurance is undermined silently
6. Integrity Validation
Is the response internally consistent? Does it contain contradictions, hallucinations within the same response, or logical inconsistencies?
While full semantic validation is an open research problem, protocol-level integrity checks can catch:
- Empty or placeholder content in structured outputs
- Contradictory metadata and content
- Response length anomalies suggesting truncation or padding
Why All Six Dimensions Matter Together
Each dimension catches a different class of failure:
| Dimension | Catches | Miss Rate if Omitted |
|---|---|---|
| Structure | Malformed responses | 2.3% |
| Schema | Semantically invalid data | 3.1% |
| Latency | Degraded performance | 4.7% |
| Cost | Token anomalies, drift | 1.8% |
| Identity | Silent model swaps | 0.7% |
| Integrity | Internal inconsistencies | 1.9% |
If you only check HTTP status codes, you miss 14.5% of production failures.
The Validation Performance Question
"But won't six-dimensional validation slow down my API calls?"
No. With Correctover's MAPE-K decision engine:
- P50 validation overhead: 22 microseconds
- P99 validation overhead: 99 microseconds
- Total overhead: less than 0.01% of request time
This is not a trade-off. You get reliability without performance sacrifice.
Implementation: 3 Lines to 6-Dimensional Reliability
from correctover import Correctover, Contract
client = Correctover(
api_key="your-openai-key", # BYOK - your key, direct connection
contract=Contract.all_dimensions(), # Enable all 6 dimensions
failover=True # Auto-failover on contract violation
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
# If any dimension fails, automatic failover kicks in
# You always get a validated response
What Failover Actually Means
Here is the key insight: Failover is not Correctover.
A simple failover switches to another provider when one fails. But it does not verify that the new provider's response is any better. You might fail over from one broken response to another.
Correctover validates the response before accepting it, and only fails over when a contract violation is confirmed. The new provider's response is also validated across all six dimensions.
Provider A then Validate (6D) then Contract Violated then Failover then Provider B then Validate (6D) then Contract Met then Return
Not:
Provider A then Timeout then Failover then Provider B then Return (unchecked)
The Data Behind It
Our 20,000+ call reliability dataset revealed:
- 303 unique failure types classified across 6 dimensions
- 87 built-in self-healing rules covering common failure patterns
- L3 Failover end-to-end: 949ms (including validation)
- Zero false positives at the contract validation layer
These are not theoretical numbers. They come from real production API calls across multiple LLM providers.
Start Using It Today
pip install correctover
This is the fifth article in the LLM Reliability series. Previous articles: Why Retry Is Not Self-Healing, Your Failover Is Lying to You, The Hidden Cost of LLM API Gateways, Silent Model Swaps.













