My AI Agent Could See 167 Tools. Then I Told It to shutup.

Token usage dropped. Accuracy improved. And I built a 200-line Python proxy to prove it.

The Problem Nobody Talks About

MCP (Model Context Protocol) was supposed to be the universal remote for AI agents. Connect once, and your agent can interact with GitHub, Jira, Slack, filesystems, databases—you name it.

But here's what nobody tells you: connect four MCP servers, and your agent burns 60,000 tokens before you even say "hello."

Redis ran the numbers. A typical setup with Redis, GitHub, Jira, and Grafana—four servers, 167 tools—consumes ~60,000 tokens upfront just loading tool descriptions. In production, it's often 150,000+ tokens.

Atlassian found their own MCP server alone consumes ~10,000 tokens for Jira and Confluence. GitHub's official server exposes 94 tools and chews through ~17,600 tokens per request. Combine several, and you hit 30,000+ tokens of pure metadata—before your agent solves anything.

Every extra tool is a chance to pick the wrong one. Redis measured 42% tool selection accuracy without filtering. The model gets lost in the noise, grabs the wrong tool, overwrites data, or sends requests into the void.

We gave agents unlimited power. And they became slower, dumber, and more expensive.

The Solutions (and Why They're Not Enough)

The industry noticed. Multiple solutions emerged:

Approach	Example	Core Problem
Regex-based filtering	`mcpwrapped`, `Tool Filter MCP`	You must manually configure which tools to hide. 167 tools? Good luck.
Schema compression	Atlassian `mcp-compressor` (97% reduction)	Strips descriptions to save tokens, but accuracy drops—models can't tell `create_jira_issue` from `create_confluence_page`.
Tool Search (Anthropic)	Claude Code built-in	85% token reduction, but only 34% selection accuracy in independent testing.
Vector search (Redis)	Redis Tool Filtering	98% token reduction, 8x faster, 2x accuracy—but requires Redis infrastructure.
Hybrid search (Stacklok)	MCP Optimizer	94% accuracy on 2,792 tools, but closed-source commercial product.

All of them fall into one of two traps:

Manual configuration: You have to know in advance which tools to hide.
Heavy infrastructure: You need Redis, a cloud service, or a commercial license.

What I wanted was simple: zero-config, 100% local, and smart enough to figure out what tools I actually need.

So I built it.

Introducing `shutup-mcp`

shutup is an MCP proxy that shows your agent only the tools it actually needs—zero config, 100% local, no API keys.

shutup --config ~/claude_desktop_config.json --intent "read and write files"

That's it. Behind the scenes, shutup:

Reads your MCP config and discovers all connected servers—filesystem, GitHub, Jira, whatever.
Fetches all tool definitions and builds a local embedding index using all-MiniLM-L6-v2 (~80MB, runs entirely offline).
Watches for changes—add a new MCP server, shutup rebuilds the index automatically.
Filters tools by intent—when your agent requests tools, shutup intercepts and returns only the top-K most relevant ones.

Your agent never knows the other 79,997 tools exist.

Why This Approach Wins

1. Zero Config, Actually

No regex. No YAML. No manual whitelists. You already have a claude_desktop_config.json. shutup reads it directly.

{
  "mcpServers": {
    "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"] },
    "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"] },
    "fetch": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-fetch"] }
  }
}

shutup connects to all three, aggregates their tools, and filters them intelligently. No extra configuration files needed.

2. Intent-Based Filtering

Most proxies hide tools based on names or regex patterns. shutup hides tools based on what you're actually trying to do.

Say "read and write files"—shutup returns filesystem tools, hiding GitHub and fetch tools.

Say "create a GitHub issue"—shutup surfaces GitHub tools while hiding filesystem operations.

It treats tool selection as a retrieval problem, not a reasoning one—the same insight that drove Redis to 98% token reduction.

3. Multi-Server Aggregation

This is where shutup differs from most open-source alternatives. It doesn't just filter one MCP server—it aggregates all of them.

When Stacklok analyzed 2,792 tools, they found 94% selection accuracy using hybrid search. But their Optimizer is a commercial product. shutup brings the same pattern—semantic retrieval across multiple servers—to an open-source, zero-dependency tool.

4. Privacy-First, 100% Local

Two embedding backends:

sentence-transformers (default): Downloads all-MiniLM-L6-v2 once (~80MB), runs entirely offline.
ollama: Use nomic-embed-text or any Ollama embedding model. Completely air-gapped.

No API keys. No telemetry. No cloud dependencies.

Benchmark Context (Why This Matters)

Let's put numbers to the problem.

Scenario	Tools Loaded	Token Overhead (Est.)	Selection Accuracy
Single MCP server (GitHub)	94	~17,600	79-88% (Opus 4.5)
Four servers (Redis+GitHub+Jira+Grafana)	167	~60,000	~42% (without filtering)
Enterprise setup (10+ servers)	500+	150,000+	< 30%

Sources: Atlassian, Redis, Stacklok, Anthropic

Now look at what filtering achieves:

Solution	Token Reduction	Selection Accuracy	Infrastructure Required
Anthropic Tool Search	85%	34% (2,792 tools)	Built into Claude
Atlassian mcp-compressor	70-97%	Drops at high compression	Proxy only
Redis Tool Filtering	98%	85%	Redis + vector DB
Stacklok MCP Optimizer	60-85%	94%	Commercial platform
shutup-mcp	~98% (projected)	TBD (benchmarking)	Zero

shutup uses the same architectural pattern as Redis (vector embeddings + semantic search) but without the Redis dependency. It's the "Redis approach" in a single pip install.

How It Works (Under the Hood)

Architecture

Agent (Claude Code / Cursor / Windsurf)
    ↓
shutup-mcp (stdio proxy)
    ↓
┌─────────────────────────┐
│ ServerManager           │
│ - Parses mcp.json       │
│ - Manages connections   │
│ - Watches for changes   │
└─────────────────────────┘
    ↓
┌─────────────────────────┐
│ ToolEmbedder            │
│ - Builds local index    │
│ - Cosine similarity     │
│ - Returns top-K tools   │
└─────────────────────────┘
    ↓
Upstream MCP Servers (filesystem, github, fetch, …)

Core Loop

Startup: Parse claude_desktop_config.json, connect to each MCP server, fetch tool definitions.
Embed: For each tool, create text "{name}: {description}" and embed using chosen backend.
Request: User provides intent (e.g., --intent "read and write files").
Filter: Compute cosine similarity, return top-K tools (default K=5).
Proxy: Forward tools/list and tools/call requests transparently.

Example

$ shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
         --intent "create a GitHub issue about the API outage" \
         --top-k 3

[shutup] Loading config: claude_desktop_config.json
[shutup] Connected to 3 MCP servers (filesystem, github, fetch)
[shutup] Fetched 47 total tools
[shutup] Intent: "create a GitHub issue about the API outage"
[shutup] Returning 3/47 tools:
  - github__create_issue
  - github__list_issues
  - github__get_repo

The agent only sees 3 tools. Token overhead drops from ~25,000 to ~300.

Getting Started

Install

pip install shutup-mcp

Run

# Default: sentence-transformers (auto-downloads model)
shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
       --intent "your task description"

# Privacy mode: use Ollama
shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
       --intent "read and write files" \
       --embedder ollama

Integrate with Claude Code

In your claude_desktop_config.json, replace direct MCP server entries with shutup as a proxy, or run shutup as a standalone gateway. Full integration docs are on the GitHub repo.

What's Next?

This is v0.1.0—a minimal, functional proxy that proves the pattern works. I'm actively working on:

Benchmarking: Head-to-head comparison with Anthropic Tool Search, mcp-compressor, and Stacklok Optimizer (public dataset, reproducible).
Hybrid search: BM25 + embeddings for better exact-match performance.
Rust rewrite: Move embedding and similarity computation to Rust for sub-millisecond latency at scale.
Tool usage analytics: Show which tools your agent actually uses vs. what gets filtered out.

Why I Built This

I was tired of watching my agent burn tokens on tools it would never use. Tired of "pick the wrong tool" errors. Tired of configuring regex filters every time I added a new MCP server.

The Redis team proved the pattern: treat tool selection as retrieval. 98% token reduction. 8x faster. Double the accuracy.

But their solution required Redis. Stacklok's required a commercial platform. Anthropic's couldn't reliably find the right tools.

I wanted something that worked out of the box, completely local, with zero configuration.

So I built it. In 200 lines of Python.

Try It Yourself

GitHub: github.com/hjs-spec/shutup-mcp
PyPI: pip install shutup-mcp

Star the repo if this solves a problem for you. PRs welcome—especially if you want to help with benchmarking or the Rust rewrite.

Your agent doesn't need 167 tools. It needs 3. Tell it to shutup.

My AI Agent Could See 167 Tools. Then I Told It to shutup.

The Problem Nobody Talks About

The Solutions (and Why They're Not Enough)

Introducing `shutup-mcp`

Why This Approach Wins

1. Zero Config, Actually

2. Intent-Based Filtering

3. Multi-Server Aggregation

4. Privacy-First, 100% Local

Benchmark Context (Why This Matters)

How It Works (Under the Hood)

Architecture

Core Loop

Example

Getting Started

Install

Run

Integrate with Claude Code

What's Next?

Why I Built This

Try It Yourself

Tags

Author

Stats

Published

You Might Also Like

Join the OpenClaw Challenge: $1,200 Prize Pool!

Congrats to the Notion MCP Challenge Winners!

AI Doesn't Fix Weak Engineering. It Just Speeds It Up.

Your Job Isn't Going Away. But Someone's Fundraise Depends on You Thinking It Is.

Defluffer - reduce token usage 📉 by 45% using this one simple trick! [Earthday challenge]

Build a voice-enabled Telegram Bot with the Gemini Interactions API

My AI Agent Could See 167 Tools. Then I Told It to shutup.

The Problem Nobody Talks About

The Solutions (and Why They're Not Enough)

Introducing shutup-mcp

Why This Approach Wins

1. Zero Config, Actually

2. Intent-Based Filtering

3. Multi-Server Aggregation

4. Privacy-First, 100% Local

Benchmark Context (Why This Matters)

How It Works (Under the Hood)

Architecture

Core Loop

Example

Getting Started

Install

Run

Integrate with Claude Code

What's Next?

Why I Built This

Try It Yourself

Tags

Author

Stats

Published

You Might Also Like

Join the OpenClaw Challenge: $1,200 Prize Pool!

Congrats to the Notion MCP Challenge Winners!

AI Doesn't Fix Weak Engineering. It Just Speeds It Up.

Your Job Isn't Going Away. But Someone's Fundraise Depends on You Thinking It Is.

Defluffer - reduce token usage 📉 by 45% using this one simple trick! [Earthday challenge]

Build a voice-enabled Telegram Bot with the Gemini Interactions API

Introducing `shutup-mcp`