The MCP Token Problem (and Three Ways to Solve It)
If you’ve built an MCP server, you’ve probably noticed something uncomfortable: your LLM burns through tokens before the user even types a question.
Here’s why, and what we built to fix it.
The Problem
Most MCP servers register every tool upfront. When an LLM connects via tools/list, it receives the full catalog — every tool name, description, and JSON Schema input definition — injected into its context window.
A hand-built Stripe MCP with 300 endpoints? That’s 50-80K tokens in the system prompt. Before the conversation even starts.
This is why developers are ditching MCPs for raw CLI wrappers and direct API calls. But CLIs lose credential vaulting, access control, audit trails, and rate limiting — everything enterprises actually need.
The trade-off shouldn’t be “token-efficient OR secure.” You should get both.
mcp-gen: Three Generation Modes
mcp-gen is a CLI that takes any OpenAPI specification and generates a production-ready TypeScript MCP server. The key insight is that you don’t need one tool per endpoint — you need the right abstraction for your use case.
Mode 1: Generic — 3 Tools, Any API Size
This is the interesting one. Regardless of whether the API has 10 or 1,000 endpoints, Generic mode registers exactly three tools:
| Tool | What it does |
|---|---|
{name}_list_schemas | Browse endpoints by category or search |
{name}_get_schema | Inspect one endpoint’s parameters |
{name}_api_call | Call any endpoint |
The LLM discovers progressively: browse → inspect → call. All endpoint metadata lives server-side in an endpoint_map.json — it’s never sent to the LLM until specifically requested.
~800 tokens in the tools/list response. Always. Whether it’s Petstore or the Stripe API.
The trade-off is an extra round-trip. The LLM needs to call list_schemas and get_schema before it can call the API. For most conversational use cases, this is invisible — the LLM naturally asks “what can I do?” before doing anything.
Use Generic mode when:
- The API has 50+ endpoints
- Token cost matters (it usually does)
- The LLM doesn’t need instant access to every operation
When NOT to use Generic mode:
- You need single-shot tool calling with no discovery step
- The API has 5-10 endpoints and tokens aren’t a concern
Mode 2: Per-Endpoint — 1:1 Mapping
One tool per API endpoint. Each tool gets a typed input schema derived from the OpenAPI spec with path, query, and header parameters flattened into a single object.
~200-400 tokens per tool. A 20-endpoint API runs ~5K tokens. A 100-endpoint API hits ~30K+.
This is the pattern most hand-built MCP servers use. It’s fine for small, focused APIs where every endpoint matters and you want the LLM to see the full surface area immediately.
Use Per-Endpoint when:
- Small API (under 30 endpoints)
- Every operation is equally important
- You want zero discovery latency
When NOT to use Per-Endpoint:
- API has 50+ endpoints (token costs explode)
- Most operations are rarely used
Mode 3: Curated — Hand-Picked + Fallback
You choose 5-15 high-value endpoints as direct tools. Everything else stays accessible through a fallback api_call tool and an explorer for discovery.
curated:
tools:
- operation_id: createCustomer
- operation_id: getInvoice
- operation_id: createPaymentIntent
fallback:
enabled: true
explorer:
enabled: true
~2-4K tokens for 10 curated tools + fallback + explorer. The sweet spot for large APIs where you know which operations matter most.
Use Curated when:
- You know the top 10-15 operations your users need
- You want immediate access to common operations
- You still want the LLM to discover edge-case endpoints
When NOT to use Curated:
- You don’t know which operations matter yet (start with Generic, then curate)
- All operations are equally important
The Numbers
Here’s how the modes compare at different API sizes:
| API Size | Hand-Built (typical) | Per-Endpoint | Curated (10 picks) | Generic |
|---|---|---|---|---|
| 20 endpoints | ~8-12K | ~5-8K | ~3-4K | ~800 |
| 50 endpoints | ~20-30K | ~15-20K | ~3-4K | ~800 |
| 100 endpoints | ~40-60K | ~30-40K | ~3-4K | ~800 |
| 300 endpoints | ~100-150K | ~80-120K | ~3-4K | ~800 |
These are estimates based on typical tool description and JSON Schema sizes. “Hand-Built” assumes one tool per endpoint with verbose descriptions, which is the common pattern in open-source MCP servers.
The pattern is clear: Generic mode’s token cost is constant regardless of API size. Everything else scales linearly.
Unified Parameter Schema
There’s a second problem beyond token count: most MCP servers force the LLM to understand HTTP routing.
// What most MCPs expect:
path: { petId: "123" }
query: { status: "available" }
headers: { "X-API-Key": "sk-..." }
That’s transport-level detail the LLM shouldn’t care about. Every token spent on “put this in the path vs the query string” is a token wasted.
mcp-gen flattens everything into a single params object:
// What mcp-gen expects:
params: { petId: "123", status: "available" }
// Routing handled server-side. Credentials injected from vault.
The server handles routing. Credentials are pulled from the vault at call time — the LLM never sees auth headers at all.
What Else You Get
Token efficiency is the headline, but every generated server also includes:
- Credential vaulting — secrets pulled from the vault at runtime, never exposed to the LLM
- Audit logging — every tool invocation logged with customer context
- Multi-tenant support — customer ID from JWT, isolated credential scopes
- Type-safe inputs — Zod validation on every parameter before the API call
- Deployable to Statio — one
pilum deployto Cloud Run with gateway routing, rate limiting, and usage billing
The Positioning
Other MCP servers make you choose: token-efficient OR feature-complete.
Generic mode is ~800 tokens for any API — smaller than most hand-built single-purpose MCPs — while including credential management, audit trails, and enterprise access control that raw CLI wrappers can’t provide.
The full technical breakdown with configuration examples is on the Statio mcp-gen page.
Resources
- mcp-gen technical deep-dive — full configuration reference and mode comparison
- Statio — managed MCP security platform
- MCP specification — the Model Context Protocol
- Romans.dev — SID Technologies product portfolio