Claude Fable 5 Is Now Credit-Only: What a Real Coding Session Costs After June 22

This article was originally published on aicoderscope.com

TL;DR: As of June 23, Claude Fable 5 is no longer free on Pro, Max, or Team — every call now bills against usage credits at the full API rate of $10/$50 per million tokens. A single light agentic task runs about $0.90, a heavy multi-turn session about $7. Opus 4.8 does the same work for roughly half, and GLM-5.2 for a tenth. Keep Fable 5 for the hard 20%; route everything else cheaper.

	Claude Fable 5	Claude Opus 4.8	GLM-5.2	OpenCode + Ollama
Price (in / out per M)	$10 / $50	$5 / $25	$1.40 / $4.40	$0 (local)
Light task (~50K in / 8K out)	~$0.90	~$0.45	~$0.11	$0
Heavy session (~400K in / 60K out)	~$7.00	~$3.50	~$0.82	$0
The catch	Now metered; double Opus	Slightly behind on gnarliest tasks	Self-host or trust Z.ai routing	Your GPU, your tok/s

Honest take: The free window was the time to fall in love with Fable 5; the bill is the time to be disciplined. Run Opus 4.8 (or GLM-5.2 for cost-sensitive work) as your default backend and reach for Fable 5 only when a task has already defeated the cheaper model. Letting an agent loop on Fable 5 all day is the fastest way to a four-figure month.

What actually changed on June 23

From June 9 through June 22, 2026, anyone on a paid Claude plan — Pro, Max, Team, or seat-based Enterprise — could call Claude Fable 5 at no extra cost. That promotional window is over. As of June 23, Anthropic removed Fable 5 from those plan allowances. It still shows up in the model picker, but using it now draws down usage credits, and those credits are billed at the standard API rate: $10 per million input tokens and $50 per million output tokens.

The important nuance: credits are not some softer consumer rate. They meter at the exact per-token API price. So whether you hit Fable 5 through the raw API, through Claude Code, or through a usage-credit balance attached to your Pro subscription, the math is identical. The subscription buys you the cheaper models in-plan; Fable 5 is now incremental spend on top.

If you spent the last two weeks letting Fable 5 drive your agent and it felt free, that feeling ends today. The same workflow now has a meter on it.

The per-session math, with no hand-waving

"$10 per million tokens" means nothing until you turn it into the cost of one task you actually run. So here are two concrete scenarios, priced across every backend a developer would realistically consider in June 2026. Token rates below are the current public API prices for each model (verified June 23, 2026; sources at the end).

A light task is a focused request: fix a bug, write a function, add a test. In Cursor or Cline agent mode this realistically moves ~50,000 input tokens (the agent reads a few files and the conversation) and ~8,000 output tokens (the diff plus reasoning).

A heavy session is a multi-file refactor or a feature that takes the agent several turns — re-reading files, running tools, re-reading again. That cumulative traffic lands around ~400,000 input and ~60,000 output tokens once you count the full back-and-forth.

LIGHT TASK  (50,000 input + 8,000 output)
  Fable 5    50K×$10/M + 8K×$50/M = $0.50 + $0.40 = $0.90
  Opus 4.8   50K×$5/M  + 8K×$25/M = $0.25 + $0.20 = $0.45
  GPT-5.5    50K×$5/M  + 8K×$30/M = $0.25 + $0.24 = $0.49
  GLM-5.2    50K×$1.40 + 8K×$4.40 = $0.07 + $0.04 = $0.11
  OpenCode+Ollama                                   $0.00

HEAVY SESSION  (400,000 input + 60,000 output)
  Fable 5    400K×$10/M + 60K×$50/M = $4.00 + $3.00 = $7.00
  Opus 4.8   400K×$5/M  + 60K×$25/M = $2.00 + $1.50 = $3.50
  GPT-5.5    400K×$5/M  + 60K×$30/M = $2.00 + $1.80 = $3.80
  GLM-5.2    400K×$1.40 + 60K×$4.40 = $0.56 + $0.26 = $0.82
  OpenCode+Ollama                                     $0.00

Two things jump out. Fable 5 is the most expensive option in every row — by design, since output is where it really stings at $50/M and agentic coding is output-heavy. And the gap to Opus 4.8 is almost exactly 2×, because Opus sits at half Fable's rate on both input and output. GLM-5.2 is in a different league on price: roughly an eighth of Fable on a light task, and under a dollar on the heavy session.

What that becomes per month

Per-task numbers are abstract until you multiply by a real workday. Say you run ten heavy sessions a day across twenty working days — 200 sessions a month. That is a believable load for someone who leans on an agent for most non-trivial changes.

Backend	Per heavy session	200 sessions / month
Claude Fable 5	$7.00	$1,400
GPT-5.5	$3.80	$760
Claude Opus 4.8	$3.50	$700
GLM-5.2 (Z.ai API)	$0.82	$164
OpenCode + Ollama	$0.00	$0 (+ electricity)

That $1,400 figure is the one that matters now that the free window is gone. Before June 23, a power user could run Fable 5 flat-out inside a $20 Pro plan. Today the same behavior is a $1,400 line item. Even a moderate user doing three heavy sessions a day lands near $420/month on Fable 5 — well past the $20 Cursor Pro flat rate and the $100 GitHub Copilot Max tier.

This is the same trap the GitHub Copilot token-billing change created earlier in June: the moment metered, output-priced agent usage replaces a flat subscription, heavy users see bills jump by an order of magnitude. Fable 5 going credit-only is that story repeated for Anthropic's top model.

Where prompt caching changes the picture

The numbers above are raw, no caching. In real agentic loops, a large fraction of your input is the same context re-sent every turn — the system prompt, your rules file, the files already in scope. Anthropic, OpenAI, and Z.ai all discount cached input by about 90%.

For Fable 5, cached reads drop from $10/M to $1/M. On the heavy session above, if 300K of the 400K input tokens are cache hits, the input cost falls from $4.00 to roughly $1.30 (100K fresh at $10/M + 300K cached at $1/M), pulling the session from $7.00 down to about $4.30. Output is never cached, so the $3.00 output cost is unmovable — and that is the real reason Fable 5 stays expensive. You can cache your way out of input cost, never out of $50/M output.

Batch mode is the other lever: non-urgent jobs (bulk refactors, codemod-style passes, overnight test generation) run at $5/$25 per million on Fable 5, half the interactive rate. It is useless for a live agent — there is latency — but for fire-and-forget work it halves the bill.

The decision framework after June 22

Solo developer, cost-sensitive. Make Opus 4.8 your default in Cursor or Cline and you cut the bill in half versus Fable 5 for work most people cannot tell apart on a 40-line change. If you want to go further, point your editor at GLM-5.2 through an OpenAI-compatible endpoint — at $1.40/$4.40 it is roughly a tenth of Fable, and on long-horizon coding benchmarks it trades blows with GPT-5.5. The setup is covered in GLM 5.2 as your Cursor and Cline backend.

Privacy-first or zero-marginal-cost. Run a local model behind OpenCode + Ollama. The per-session cost is genuinely $0 once the hardware is paid for; what you trade is tokens-per-second and peak quality. For the hardware reality of running a capable coding model locally, see runaihome's best local AI models by VRAM.

Team lead picking a standard. A flat $20 Cursor Pro seat or a $100 Copilot Max seat is now dramatically cheaper than metered Fable 5 for anyone running agents all day — see the Copilot Max breakdown and Cursor vs Claude Code for where the flat plans win. Reserve Fable 5 (via credits) for the senior engineers tackling the genuinely hard refactors, and let the rest of the team run a flat-rate t