Token usage dropped. Accuracy improved. And I built a 200-line Python proxy to prove it.
The Problem Nobody Talks About
MCP (Model Context Protocol) was supposed to be the universal remote for AI agents. Connect once, and your agent can interact with GitHub, Jira, Slack, filesystems, databasesβyou name it.
But here's what nobody tells you: connect four MCP servers, and your agent burns 60,000 tokens before you even say "hello."
Redis ran the numbers. A typical setup with Redis, GitHub, Jira, and Grafanaβfour servers, 167 toolsβconsumes ~60,000 tokens upfront just loading tool descriptions. In production, it's often 150,000+ tokens.
Atlassian found their own MCP server alone consumes ~10,000 tokens for Jira and Confluence. GitHub's official server exposes 94 tools and chews through ~17,600 tokens per request. Combine several, and you hit 30,000+ tokens of pure metadataβbefore your agent solves anything.
Every extra tool is a chance to pick the wrong one. Redis measured 42% tool selection accuracy without filtering. The model gets lost in the noise, grabs the wrong tool, overwrites data, or sends requests into the void.
We gave agents unlimited power. And they became slower, dumber, and more expensive.
The Solutions (and Why They're Not Enough)
The industry noticed. Multiple solutions emerged:
| Approach | Example | Core Problem |
|---|---|---|
| Regex-based filtering |
mcpwrapped, Tool Filter MCP
|
You must manually configure which tools to hide. 167 tools? Good luck. |
| Schema compression | Atlassian mcp-compressor (97% reduction) |
Strips descriptions to save tokens, but accuracy dropsβmodels can't tell create_jira_issue from create_confluence_page. |
| Tool Search (Anthropic) | Claude Code built-in | 85% token reduction, but only 34% selection accuracy in independent testing. |
| Vector search (Redis) | Redis Tool Filtering | 98% token reduction, 8x faster, 2x accuracyβbut requires Redis infrastructure. |
| Hybrid search (Stacklok) | MCP Optimizer | 94% accuracy on 2,792 tools, but closed-source commercial product. |
All of them fall into one of two traps:
- Manual configuration: You have to know in advance which tools to hide.
- Heavy infrastructure: You need Redis, a cloud service, or a commercial license.
What I wanted was simple: zero-config, 100% local, and smart enough to figure out what tools I actually need.
So I built it.
Introducing shutup-mcp
shutup is an MCP proxy that shows your agent only the tools it actually needsβzero config, 100% local, no API keys.
shutup --config ~/claude_desktop_config.json --intent "read and write files"
That's it. Behind the scenes, shutup:
- Reads your MCP config and discovers all connected serversβfilesystem, GitHub, Jira, whatever.
-
Fetches all tool definitions and builds a local embedding index using
all-MiniLM-L6-v2(~80MB, runs entirely offline). -
Watches for changesβadd a new MCP server,
shutuprebuilds the index automatically. -
Filters tools by intentβwhen your agent requests tools,
shutupintercepts and returns only the top-K most relevant ones.
Your agent never knows the other 79,997 tools exist.
Why This Approach Wins
1. Zero Config, Actually
No regex. No YAML. No manual whitelists. You already have a claude_desktop_config.json. shutup reads it directly.
{
"mcpServers": {
"filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"] },
"github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"] },
"fetch": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-fetch"] }
}
}
shutup connects to all three, aggregates their tools, and filters them intelligently. No extra configuration files needed.
2. Intent-Based Filtering
Most proxies hide tools based on names or regex patterns. shutup hides tools based on what you're actually trying to do.
Say "read and write files"βshutup returns filesystem tools, hiding GitHub and fetch tools.
Say "create a GitHub issue"βshutup surfaces GitHub tools while hiding filesystem operations.
It treats tool selection as a retrieval problem, not a reasoning oneβthe same insight that drove Redis to 98% token reduction.
3. Multi-Server Aggregation
This is where shutup differs from most open-source alternatives. It doesn't just filter one MCP serverβit aggregates all of them.
When Stacklok analyzed 2,792 tools, they found 94% selection accuracy using hybrid search. But their Optimizer is a commercial product. shutup brings the same patternβsemantic retrieval across multiple serversβto an open-source, zero-dependency tool.
4. Privacy-First, 100% Local
Two embedding backends:
-
sentence-transformers(default): Downloadsall-MiniLM-L6-v2once (~80MB), runs entirely offline. -
ollama: Usenomic-embed-textor any Ollama embedding model. Completely air-gapped.
No API keys. No telemetry. No cloud dependencies.
Benchmark Context (Why This Matters)
Let's put numbers to the problem.
| Scenario | Tools Loaded | Token Overhead (Est.) | Selection Accuracy |
|---|---|---|---|
| Single MCP server (GitHub) | 94 | ~17,600 | 79-88% (Opus 4.5) |
| Four servers (Redis+GitHub+Jira+Grafana) | 167 | ~60,000 | ~42% (without filtering) |
| Enterprise setup (10+ servers) | 500+ | 150,000+ | < 30% |
Sources: Atlassian, Redis, Stacklok, Anthropic
Now look at what filtering achieves:
| Solution | Token Reduction | Selection Accuracy | Infrastructure Required |
|---|---|---|---|
| Anthropic Tool Search | 85% | 34% (2,792 tools) | Built into Claude |
| Atlassian mcp-compressor | 70-97% | Drops at high compression | Proxy only |
| Redis Tool Filtering | 98% | 85% | Redis + vector DB |
| Stacklok MCP Optimizer | 60-85% | 94% | Commercial platform |
| shutup-mcp | ~98% (projected) | TBD (benchmarking) | Zero |
shutup uses the same architectural pattern as Redis (vector embeddings + semantic search) but without the Redis dependency. It's the "Redis approach" in a single pip install.
How It Works (Under the Hood)
Architecture
Agent (Claude Code / Cursor / Windsurf)
β
shutup-mcp (stdio proxy)
β
βββββββββββββββββββββββββββ
β ServerManager β
β - Parses mcp.json β
β - Manages connections β
β - Watches for changes β
βββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ
β ToolEmbedder β
β - Builds local index β
β - Cosine similarity β
β - Returns top-K tools β
βββββββββββββββββββββββββββ
β
Upstream MCP Servers (filesystem, github, fetch, β¦)
Core Loop
-
Startup: Parse
claude_desktop_config.json, connect to each MCP server, fetch tool definitions. -
Embed: For each tool, create text
"{name}: {description}"and embed using chosen backend. -
Request: User provides intent (e.g.,
--intent "read and write files"). - Filter: Compute cosine similarity, return top-K tools (default K=5).
-
Proxy: Forward
tools/listandtools/callrequests transparently.
Example
$ shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
--intent "create a GitHub issue about the API outage" \
--top-k 3
[shutup] Loading config: claude_desktop_config.json
[shutup] Connected to 3 MCP servers (filesystem, github, fetch)
[shutup] Fetched 47 total tools
[shutup] Intent: "create a GitHub issue about the API outage"
[shutup] Returning 3/47 tools:
- github__create_issue
- github__list_issues
- github__get_repo
The agent only sees 3 tools. Token overhead drops from ~25,000 to ~300.
Getting Started
Install
pip install shutup-mcp
Run
# Default: sentence-transformers (auto-downloads model)
shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
--intent "your task description"
# Privacy mode: use Ollama
shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
--intent "read and write files" \
--embedder ollama
Integrate with Claude Code
In your claude_desktop_config.json, replace direct MCP server entries with shutup as a proxy, or run shutup as a standalone gateway. Full integration docs are on the GitHub repo.
What's Next?
This is v0.1.0βa minimal, functional proxy that proves the pattern works. I'm actively working on:
- Benchmarking: Head-to-head comparison with Anthropic Tool Search, mcp-compressor, and Stacklok Optimizer (public dataset, reproducible).
- Hybrid search: BM25 + embeddings for better exact-match performance.
- Rust rewrite: Move embedding and similarity computation to Rust for sub-millisecond latency at scale.
- Tool usage analytics: Show which tools your agent actually uses vs. what gets filtered out.
Why I Built This
I was tired of watching my agent burn tokens on tools it would never use. Tired of "pick the wrong tool" errors. Tired of configuring regex filters every time I added a new MCP server.
The Redis team proved the pattern: treat tool selection as retrieval. 98% token reduction. 8x faster. Double the accuracy.
But their solution required Redis. Stacklok's required a commercial platform. Anthropic's couldn't reliably find the right tools.
I wanted something that worked out of the box, completely local, with zero configuration.
So I built it. In 200 lines of Python.
Try It Yourself
- GitHub: github.com/hjs-spec/shutup-mcp
-
PyPI:
pip install shutup-mcp
Star the repo if this solves a problem for you. PRs welcomeβespecially if you want to help with benchmarking or the Rust rewrite.
Your agent doesn't need 167 tools. It needs 3. Tell it to shutup.









![Defluffer - reduce token usage π by 45% using this one simple trick! [Earthday challenge]](https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiekbgepcutl4jse0sfs0.png)


