Note: Information in this article is current as of May 2026.
I've been using both Claude Code and OpenAI Codex for personal development for two months. I wanted to get a clearer picture of my actual usage, so I tracked token consumption properly. The total from March through early May (up to 5/7) exceeded 4 billion tokens, with April alone hitting 3.77 billion tokens.
How I measured
Claude Code
Claude Code stores conversation logs as JSONL files under ~/.claude/projects/. Each record includes model name, timestamp, and token counts for input/output/cache_creation/cache_read. I parsed these and aggregated by month and model.
Codex
Codex stores per-session JSONL logs under ~/.codex/sessions/. The log contains multiple token_count events; I used the last total_token_usage per session as the usage for that session.
Monthly Trends
Total
| Month | Total Tokens | Total Cost |
|---|---|---|
| March 2026 | 265M | ~$115 |
| April 2026 | 3,770M | ~$2,100 |
| May 2026 ※ | 217M | ~$155 |
Breakdown
| Month | Claude Tokens | Claude Cost | Codex Tokens | Codex Cost |
|---|---|---|---|---|
| March 2026 | 263M | $115 | 2M | ~$0.3 |
| April 2026 | 3,570M | $1,950 | 196M | ~$147 |
| May 2026 ※ | 157M | $112 | 60M | ~$43 |
※ Both Claude and Codex data through 2026-05-07.
Claude costs are API-rate equivalents. Codex costs are estimated from published model rates (gpt-5.3-codex: $1.75/$14 per M, gpt-5.4: $2.50/$15 per M, gpt-5.5: $5.00/$30 per M, cached input at 10% of input rate).
March was when I started using Claude Code — I kept hitting rate limits and upgraded my plan repeatedly. In April I pushed harder, and around mid-April I introduced Codex to handle overflow when Claude hit its limits. April combined reached 3.77 billion tokens, about 14x March.
What happened in April
Breaking it down by day, April 13 alone consumed 220 million tokens.
| Date | Claude Code Tokens |
|---|---|
| Apr 9 | 11M |
| Apr 10 | 14.8M |
| Apr 13 | 219.8M ← |
| Apr 14 | 4.09M |
| Apr 16 | 6.29M |
| Apr 20 | 23.37M |
| Apr 23 | 11.9M |
That single week (Apr 13 week) accounted for 68% of the month. Apr 13 stands out — it matches a day when I was running Agent/Subagents in heavy parallel.
Model breakdown
Claude Code
| Model | Period | Tokens | Est. Cost |
|---|---|---|---|
| claude-haiku-4-5 | Mar–May | 410M | $81 |
| claude-sonnet-4-6 | Mar–May | 2,520M | $1,220 |
| claude-opus-4-6 | Mar–Apr | 270M | $284 |
| claude-opus-4-7 | Apr–May | 780M | $593 |
Sonnet was the main workhorse. Opus-4-7 launched in April and I adopted it immediately.
Codex
| Model | Period | Tokens | Est. Cost |
|---|---|---|---|
| gpt-5.3-codex | March | 2M | ~$0.3 |
| gpt-5.4 | April | 69M | ~$43 |
| gpt-5.5 | Late Apr–May | 189M | ~$148 |
Model evolution was rapid — three generations in three months.
Cache
Both tools are heavily cache-dependent, but in different ways.
Claude Code
| Type | Mar–Apr Total |
|---|---|
| raw input | 342K |
| output | 17.5M |
| cache creation | 144M |
| cache read | 3,770M |
| total | 3,830M |
96% of all tokens are cache reads. Claude Code re-injects large contexts (CLAUDE.md, codebase, conversation history) via cache on every session, keeping raw input tiny.
Codex
| Model | Cached input ratio |
|---|---|
| gpt-5.3-codex | 80% |
| gpt-5.4 | 91% |
| gpt-5.5 | 96% |
94% of Codex input is cached. The ratio increases with newer models. gpt-5.5 has a higher per-token price, but in my logs the high cached-input ratio kept actual costs in check. Looking at raw input/output alone misrepresents what's really happening in both tools.
Division of labor
These two tools aren't competing — they cover different roles. Claude Code handles Biz tasks (docs, research, design, organizing), while Codex is dedicated to Dev (implementation).
That said, the division shifted when gpt-5.5 arrived. Up to gpt-5.4, Codex felt like talking to a senior engineer who only cared about technical correctness — no real dialogue. So I only used it as a backend called from Claude. With gpt-5.5, it finally felt like a real conversation partner, and I started giving it implementation instructions directly. Now the flow is: Claude creates tickets, Codex handles implementation.
The token volume difference reflects the nature of the tasks. Biz work generates large contexts and long conversations. On top of that, Codex only ran at full capacity from mid-April, so the periods aren't even comparable. A raw quantity comparison doesn't mean much.
Takeaways
- The April explosion coincides with when I seriously started testing Multi-agent parallel execution. Running 10 Subagents simultaneously sends token counts through the roof fast
- High cache dependency means both tools are designed for large, persistent contexts. The more you invest in CLAUDE.md and documentation, the more the cache works for you
- Codex model iteration is fast — three generations in three months. The high cached-input ratio on gpt-5.5 is cost-efficient in practice
- Token composition matters. Claude Code is dominated by output and cache_read; Codex is dominated by cached input. Raw input/output alone gives a misleading picture
Measuring this made my "just using it" habits much clearer. Cache ratio and Agent parallel execution impact are things you simply don't see unless you look.
Feel free to leave a comment if you'd like to discuss how to approach AI adoption in your organization.




