If you’ve tried to build anything serious on top of LLMs recently, you’ve probably run into this:
“Thinking” is supposed to make models better.
In practice, it makes your infrastructure worse.
Let’s break down where it actually hurts.
The illusion of “just turn on reasoning”
At a high level, you’d expect something simple:
- Turn reasoning on → better answers
- Turn reasoning off → cheaper, faster
Reality is messier.
- Models sometimes don’t think when you ask them to
- Models sometimes overthink trivial prompts, burning tokens for no gain
- There’s no consistent behavior across providers
So now you’re not just building a product.
You’re debugging model psychology.
The fragmentation problem nobody talks about
Every provider decided to implement “thinking” differently.
- OpenAI → effort levels (low, medium, high)
- Anthropic → token budgets (explicit caps)
- Google → both… depending on the model version
That’s just inputs.
Outputs are worse:
- Some return dedicated thinking blocks
- Others return reasoning summaries
- Some mix reasoning into standard content structures
There is no standard. No shared schema. No predictable behavior.
So if you’re routing across models, you now need:
- Input normalization
- Output parsing per provider
- Logic to reconcile different reasoning formats
This is where “simple API routing” stops being simple.
Billing is inconsistent too
Even cost modeling breaks.
- Some providers expose reasoning tokens explicitly
- Some hide it inside total usage
- Some introduce provider-specific fields (looking at you, xAI)
Now you’re not just optimizing performance.
You’re building a cost translation layer.
Model switching makes everything worse
Switching models mid-thread sounds great… until you try it.
Even within the same provider:
- Different endpoints behave differently (yes, even inside OpenAI)
- Input formats change
- Output structures change
- Reasoning formats change
Now add state:
- What context do you carry over?
- How do you preserve reasoning continuity?
- How do you avoid exploding token usage?
This is where most teams either:
- Give up on portability, or
- Build a fragile pile of adapters that break every few weeks
What we realized building Backboard
The real problem isn’t reasoning.
It’s lack of abstraction.
Developers shouldn’t have to:
- Learn 5 different “thinking” systems
- Normalize 5 different response formats
- Track 5 different billing models
- Rebuild state every time they switch models
So we made a call:
Unify it.
What “unified thinking” actually means
Instead of exposing provider quirks, we abstract them into one model:
- A single thinking parameter
- Direct control over reasoning budget
- Consistent behavior across models
- Normalized input and output structures
So you can:
- Tune reasoning without caring about provider differences
- Switch models without rewriting logic
- Keep state intact across everything
And most importantly:
Stop thinking about thinking.
The uncomfortable truth
If you’re building on multiple LLMs and you haven’t hit these issues yet, you will.
The complexity is not obvious at the start.
It compounds as soon as you:
- Add a second provider
- Introduce reasoning
- Try to optimize cost
- Or maintain state across sessions
At that point, you’re not building your product anymore.
You’re building infrastructure.
Where this goes long term
Short term:
- Abstractions like this save teams weeks or months of engineering time
- They reduce cost volatility and debugging overhead
Long term:
- The winning platforms won’t be the best models
- They’ll be the ones that make models interchangeable and stateful
That’s the real unlock.
If you’re currently stitching together multiple providers, do a quick audit:
- How many reasoning formats are you handling?
- How portable is your state layer?
- How confident are you in your cost predictability?
If the answer isn’t clean, you’re already paying the tax.
This is what we're working on at Backboard.io :)









![Defluffer - reduce token usage 📉 by 45% using this one simple trick! [Earthday challenge]](https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiekbgepcutl4jse0sfs0.png)


