What is it?
Hiii!
cost-xray is a local tool for inspecting AI coding-agent requests.
It currently supports:
- Claude Code
- Codex
After installing it, you keep using your agent normally. No new workflow. No special prompt dance. No "please debug yourself in JSON" ritual.
You run your agent, do some coding, then open:
cx
And suddenly the black box starts looking a lot less black.
The home view groups captured sessions by agent, project, and session. You can quickly see turns, tokens, cache hit rate, and estimated cost.
The Problem It Solves
Most usage tools can tell you:
This session cost $X.
That is useful, but it is also a little like checking your restaurant bill and only seeing:
Food: $84
Cool. But what food?
cost-xray goes one level deeper. It asks:
- What filled the context window?
- Which parts were fresh input?
- Which parts were cache reads?
- Which parts were cache writes?
- Which tool outputs came back into the next request?
- Which MCP schemas were sitting there doing cardio in the prompt prefix?
Why This Is Different From Log-Based Usage Tools
Many usage tools read local transcripts or logs. That is useful, but it sees the conversation after the agent has already run.
The expensive and interesting part is often assembled right before the model call:
- system prompts
- injected tool schemas
- MCP server schemas
- prior messages
- previous tool results
- cached prefix reads and writes
- thinking or reasoning blocks
Those pieces may never appear clearly in a normal chat transcript.
cost-xray looks at the actual request traffic instead. That means it can show not only the total bill, but the sources inside the request that contributed to it.
Quick Install
The repo provides a one-line installer:
curl -fsSL https://raw.githubusercontent.com/tigerless-labs/cost-xray/master/install.sh | bash
During install, it asks which agent you want to capture:
- Claude Code
- Codex
- both
After that, open a new terminal and use your agent normally:
codex
or:
claude
Then open the dashboard:
cx
You can also check or control the background capture service:
cx status
cx stop
cx start
cx restart
Demo Flow
For a small demo, I would run a normal coding-agent task:
codex
Then ask it something realistic, like:
Inspect this repo and add a small CLI flag with tests.
After a few turns, open:
cx
From there, drill down:
agent -> project -> session -> category -> MCP server -> tool -> per-turn call
The useful part is that the dashboard does not stop at "this session used N tokens." It can show whether the spend came from fresh input, cache reads, cache writes, or output.
The detail view shows context-window occupancy and cumulative per-source cost. This is where the tool starts to feel less like a billing dashboard and more like an x-ray of the agent runtime.
Things You Can Discover
Here are a few patterns cost-xray is designed to expose.
1. Tool Schemas That Are Always Present
Coding agents often inject tool definitions into every request.
That is useful when the tool is needed. But if a large schema is always present and almost never called, it still takes up context-window space.
cost-xray can help identify that unused overhead.
2. MCP Servers That Add Prefix Weight
MCP is powerful, but every configured server can bring tool schema into the prompt prefix.
If a server contributes thousands of tokens and is never actually used, cost-xray makes that visible.
3. Huge Tool Results
Sometimes the expensive part is not the prompt. It is a giant tool output that got pulled back into the next request.
For example:
Read -> large file output -> repeated in later context
When that happens, cost-xray lets you trace cost down to the tool call and its output, not just the overall session.
4. Cache Behavior
Prompt caching can make stable prefixes cheaper, but cached tokens still occupy context-window space.
cost-xray separates fresh input, cache read, cache write, and output cost. That makes it easier to understand when caching is helping the bill but not helping the window.
How It Works
At a high level, cost-xray has three stages:
coding agent -> local capture proxy -> raw store -> materializer -> TUI
The capture layer uses mitmproxy as a local hop. It records redacted request and response data under ~/.cost-xray/.
Then a separate materializer process tokenizes, prices, and classifies the captured traffic. This matters because the proxy itself stays lightweight. The analysis happens off the request path, so it does not need to slow down the agent call.
The repo describes the architecture as:
capture raw, derive at read time
That means old sessions can be re-analyzed as the attribution logic improves.
Supported Agents
At the time of this demo, the repo lists support for:
| Agent | Capture approach |
|---|---|
| Claude Code | reverse proxy mode |
| Codex | forward proxy plus scoped local CA |
The interesting design choice is that agent-specific behavior lives behind adapters in cost_xray/adapters/.
So the shared logic can focus on:
- canonical events
- token classification
- pricing
- cache boundaries
- drill-down views
Adding another agent becomes mostly a matter of writing a new adapter for that agent's wire format.
What I Like About This Project
The best part is that cost-xray treats AI coding agents like systems worth observing.
That feels right.
As these tools get more capable, the invisible runtime around them gets more complex. A single turn can include policies, tool schemas, prior state, command outputs, MCP definitions, cache behavior, and generated reasoning.
If we want to optimize agent workflows, we need to inspect more than the final chat transcript.
cost-xray gives developers a way to see the hidden budget:
- what fills the window
- what costs money
- what is cached
- what is unused
- what tool outputs keep coming back
What I Would Try Next
I would use cost-xray for a week of normal agent work, then look for repeated patterns:
- Which MCP servers are configured but rarely called?
- Which tools return too much output?
- Which projects have unusually large prefixes?
- Which sessions spend heavily on cache writes?
- Which prompts or workflows cause context-window bloat?
That could turn into a practical cleanup pass for an AI development setup.
Code
The project is open source here:
tigerless-labs
/
cost-xray
See what Claude Code and Codex actually send to the API — and what each part costs.
cost-xray
See what your AI coding agent actually sends to the API — and what each part costs.
Most usage tools read local logs. That shows the total cost of a call or session, but it misses the request-time context assembled before the model is invoked: system prompts, tool schemas, MCP blocks, tool results, cache reads/writes, and previous thinking blocks.
cost-xray captures the actual local API traffic for Claude Code and Codex, then attributes tokens and dollars back to the sources inside the request. It shows not just how much a turn cost, but why it cost that much.
Requirements
- A supported coding agent — Claude Code or Codex
- macOS or Linux
- No API keys, no account, no config…
You can also browse it directly:





















