This article was originally published on aicoderscope.com
TL;DR: As of June 23, Claude Fable 5 is no longer free on Pro, Max, or Team β every call now bills against usage credits at the full API rate of $10/$50 per million tokens. A single light agentic task runs about $0.90, a heavy multi-turn session about $7. Opus 4.8 does the same work for roughly half, and GLM-5.2 for a tenth. Keep Fable 5 for the hard 20%; route everything else cheaper.
| Claude Fable 5 | Claude Opus 4.8 | GLM-5.2 | OpenCode + Ollama | |
|---|---|---|---|---|
| Price (in / out per M) | $10 / $50 | $5 / $25 | $1.40 / $4.40 | $0 (local) |
| Light task (~50K in / 8K out) | ~$0.90 | ~$0.45 | ~$0.11 | $0 |
| Heavy session (~400K in / 60K out) | ~$7.00 | ~$3.50 | ~$0.82 | $0 |
| The catch | Now metered; double Opus | Slightly behind on gnarliest tasks | Self-host or trust Z.ai routing | Your GPU, your tok/s |
Honest take: The free window was the time to fall in love with Fable 5; the bill is the time to be disciplined. Run Opus 4.8 (or GLM-5.2 for cost-sensitive work) as your default backend and reach for Fable 5 only when a task has already defeated the cheaper model. Letting an agent loop on Fable 5 all day is the fastest way to a four-figure month.
What actually changed on June 23
From June 9 through June 22, 2026, anyone on a paid Claude plan β Pro, Max, Team, or seat-based Enterprise β could call Claude Fable 5 at no extra cost. That promotional window is over. As of June 23, Anthropic removed Fable 5 from those plan allowances. It still shows up in the model picker, but using it now draws down usage credits, and those credits are billed at the standard API rate: $10 per million input tokens and $50 per million output tokens.
The important nuance: credits are not some softer consumer rate. They meter at the exact per-token API price. So whether you hit Fable 5 through the raw API, through Claude Code, or through a usage-credit balance attached to your Pro subscription, the math is identical. The subscription buys you the cheaper models in-plan; Fable 5 is now incremental spend on top.
If you spent the last two weeks letting Fable 5 drive your agent and it felt free, that feeling ends today. The same workflow now has a meter on it.
The per-session math, with no hand-waving
"$10 per million tokens" means nothing until you turn it into the cost of one task you actually run. So here are two concrete scenarios, priced across every backend a developer would realistically consider in June 2026. Token rates below are the current public API prices for each model (verified June 23, 2026; sources at the end).
A light task is a focused request: fix a bug, write a function, add a test. In Cursor or Cline agent mode this realistically moves ~50,000 input tokens (the agent reads a few files and the conversation) and ~8,000 output tokens (the diff plus reasoning).
A heavy session is a multi-file refactor or a feature that takes the agent several turns β re-reading files, running tools, re-reading again. That cumulative traffic lands around ~400,000 input and ~60,000 output tokens once you count the full back-and-forth.
LIGHT TASK (50,000 input + 8,000 output)
Fable 5 50KΓ$10/M + 8KΓ$50/M = $0.50 + $0.40 = $0.90
Opus 4.8 50KΓ$5/M + 8KΓ$25/M = $0.25 + $0.20 = $0.45
GPT-5.5 50KΓ$5/M + 8KΓ$30/M = $0.25 + $0.24 = $0.49
GLM-5.2 50KΓ$1.40 + 8KΓ$4.40 = $0.07 + $0.04 = $0.11
OpenCode+Ollama $0.00
HEAVY SESSION (400,000 input + 60,000 output)
Fable 5 400KΓ$10/M + 60KΓ$50/M = $4.00 + $3.00 = $7.00
Opus 4.8 400KΓ$5/M + 60KΓ$25/M = $2.00 + $1.50 = $3.50
GPT-5.5 400KΓ$5/M + 60KΓ$30/M = $2.00 + $1.80 = $3.80
GLM-5.2 400KΓ$1.40 + 60KΓ$4.40 = $0.56 + $0.26 = $0.82
OpenCode+Ollama $0.00
Two things jump out. Fable 5 is the most expensive option in every row β by design, since output is where it really stings at $50/M and agentic coding is output-heavy. And the gap to Opus 4.8 is almost exactly 2Γ, because Opus sits at half Fable's rate on both input and output. GLM-5.2 is in a different league on price: roughly an eighth of Fable on a light task, and under a dollar on the heavy session.
What that becomes per month
Per-task numbers are abstract until you multiply by a real workday. Say you run ten heavy sessions a day across twenty working days β 200 sessions a month. That is a believable load for someone who leans on an agent for most non-trivial changes.
| Backend | Per heavy session | 200 sessions / month |
|---|---|---|
| Claude Fable 5 | $7.00 | $1,400 |
| GPT-5.5 | $3.80 | $760 |
| Claude Opus 4.8 | $3.50 | $700 |
| GLM-5.2 (Z.ai API) | $0.82 | $164 |
| OpenCode + Ollama | $0.00 | $0 (+ electricity) |
That $1,400 figure is the one that matters now that the free window is gone. Before June 23, a power user could run Fable 5 flat-out inside a $20 Pro plan. Today the same behavior is a $1,400 line item. Even a moderate user doing three heavy sessions a day lands near $420/month on Fable 5 β well past the $20 Cursor Pro flat rate and the $100 GitHub Copilot Max tier.
This is the same trap the GitHub Copilot token-billing change created earlier in June: the moment metered, output-priced agent usage replaces a flat subscription, heavy users see bills jump by an order of magnitude. Fable 5 going credit-only is that story repeated for Anthropic's top model.
Where prompt caching changes the picture
The numbers above are raw, no caching. In real agentic loops, a large fraction of your input is the same context re-sent every turn β the system prompt, your rules file, the files already in scope. Anthropic, OpenAI, and Z.ai all discount cached input by about 90%.
For Fable 5, cached reads drop from $10/M to $1/M. On the heavy session above, if 300K of the 400K input tokens are cache hits, the input cost falls from $4.00 to roughly $1.30 (100K fresh at $10/M + 300K cached at $1/M), pulling the session from $7.00 down to about $4.30. Output is never cached, so the $3.00 output cost is unmovable β and that is the real reason Fable 5 stays expensive. You can cache your way out of input cost, never out of $50/M output.
Batch mode is the other lever: non-urgent jobs (bulk refactors, codemod-style passes, overnight test generation) run at $5/$25 per million on Fable 5, half the interactive rate. It is useless for a live agent β there is latency β but for fire-and-forget work it halves the bill.
The decision framework after June 22
Solo developer, cost-sensitive. Make Opus 4.8 your default in Cursor or Cline and you cut the bill in half versus Fable 5 for work most people cannot tell apart on a 40-line change. If you want to go further, point your editor at GLM-5.2 through an OpenAI-compatible endpoint β at $1.40/$4.40 it is roughly a tenth of Fable, and on long-horizon coding benchmarks it trades blows with GPT-5.5. The setup is covered in GLM 5.2 as your Cursor and Cline backend.
Privacy-first or zero-marginal-cost. Run a local model behind OpenCode + Ollama. The per-session cost is genuinely $0 once the hardware is paid for; what you trade is tokens-per-second and peak quality. For the hardware reality of running a capable coding model locally, see runaihome's best local AI models by VRAM.
Team lead picking a standard. A flat $20 Cursor Pro seat or a $100 Copilot Max seat is now dramatically cheaper than metered Fable 5 for anyone running agents all day β see the Copilot Max breakdown and Cursor vs Claude Code for where the flat plans win. Reserve Fable 5 (via credits) for the senior engineers tackling the genuinely hard refactors, and let the rest of the team run a flat-rate t












