I Think Therefore I Am… A Big Pain in the A$

If you’ve tried to build anything serious on top of LLMs recently, you’ve probably run into this:

“Thinking” is supposed to make models better.
In practice, it makes your infrastructure worse.

Let’s break down where it actually hurts.

The illusion of “just turn on reasoning”

At a high level, you’d expect something simple:

Turn reasoning on → better answers
Turn reasoning off → cheaper, faster

Reality is messier.

Models sometimes don’t think when you ask them to
Models sometimes overthink trivial prompts, burning tokens for no gain
There’s no consistent behavior across providers

So now you’re not just building a product.
You’re debugging model psychology.

The fragmentation problem nobody talks about

Every provider decided to implement “thinking” differently.

OpenAI → effort levels (low, medium, high)
Anthropic → token budgets (explicit caps)
Google → both… depending on the model version

That’s just inputs.

Outputs are worse:

Some return dedicated thinking blocks
Others return reasoning summaries
Some mix reasoning into standard content structures

There is no standard. No shared schema. No predictable behavior.

So if you’re routing across models, you now need:

Input normalization
Output parsing per provider
Logic to reconcile different reasoning formats

This is where “simple API routing” stops being simple.

Billing is inconsistent too

Even cost modeling breaks.

Some providers expose reasoning tokens explicitly
Some hide it inside total usage
Some introduce provider-specific fields (looking at you, xAI)

Now you’re not just optimizing performance.
You’re building a cost translation layer.

Model switching makes everything worse

Switching models mid-thread sounds great… until you try it.

Even within the same provider:

Different endpoints behave differently (yes, even inside OpenAI)
Input formats change
Output structures change
Reasoning formats change

Now add state:

What context do you carry over?
How do you preserve reasoning continuity?
How do you avoid exploding token usage?

This is where most teams either:

Give up on portability, or
Build a fragile pile of adapters that break every few weeks

What we realized building Backboard

The real problem isn’t reasoning.

It’s lack of abstraction.

Developers shouldn’t have to:

Learn 5 different “thinking” systems
Normalize 5 different response formats
Track 5 different billing models
Rebuild state every time they switch models

So we made a call:

Unify it.

What “unified thinking” actually means

Instead of exposing provider quirks, we abstract them into one model:

A single thinking parameter
Direct control over reasoning budget
Consistent behavior across models
Normalized input and output structures

So you can:

Tune reasoning without caring about provider differences
Switch models without rewriting logic
Keep state intact across everything

And most importantly:

Stop thinking about thinking.

The uncomfortable truth

If you’re building on multiple LLMs and you haven’t hit these issues yet, you will.

The complexity is not obvious at the start.
It compounds as soon as you:

Add a second provider
Introduce reasoning
Try to optimize cost
Or maintain state across sessions

At that point, you’re not building your product anymore.
You’re building infrastructure.

Where this goes long term

Short term:

Abstractions like this save teams weeks or months of engineering time
They reduce cost volatility and debugging overhead

Long term:

The winning platforms won’t be the best models
They’ll be the ones that make models interchangeable and stateful

That’s the real unlock.

If you’re currently stitching together multiple providers, do a quick audit:

How many reasoning formats are you handling?
How portable is your state layer?
How confident are you in your cost predictability?

If the answer isn’t clean, you’re already paying the tax.

This is what we're working on at Backboard.io :)

I Think Therefore I Am… A Big Pain in the A$$

The illusion of “just turn on reasoning”

The fragmentation problem nobody talks about

Billing is inconsistent too

Model switching makes everything worse

What we realized building Backboard

What “unified thinking” actually means

The uncomfortable truth

Where this goes long term

Tags

Author

Stats

Published

You Might Also Like

Join the OpenClaw Challenge: $1,200 Prize Pool!

Congrats to the Notion MCP Challenge Winners!

AI Doesn't Fix Weak Engineering. It Just Speeds It Up.

Your Job Isn't Going Away. But Someone's Fundraise Depends on You Thinking It Is.

Defluffer - reduce token usage 📉 by 45% using this one simple trick! [Earthday challenge]

Build a voice-enabled Telegram Bot with the Gemini Interactions API