By Dan Flanagan • 2026-06-26 • 5 min

The MCP Token Problem (and Three Ways to Solve It)

If you’ve built an MCP server, you’ve probably noticed something uncomfortable: your LLM burns through tokens before the user even types a question.

Here’s why, and what we built to fix it.

The Problem

Most MCP servers register every tool upfront. When an LLM connects via tools/list, it receives the full catalog — every tool name, description, and JSON Schema input definition — injected into its context window.

A hand-built Stripe MCP with 300 endpoints? That’s 50-80K tokens in the system prompt. Before the conversation even starts.

This is why developers are ditching MCPs for raw CLI wrappers and direct API calls. But CLIs lose credential vaulting, access control, audit trails, and rate limiting — everything enterprises actually need.

The trade-off shouldn’t be “token-efficient OR secure.” You should get both.

mcp-gen: Three Generation Modes

mcp-gen is a CLI that takes any OpenAPI specification and generates a production-ready TypeScript MCP server. The key insight is that you don’t need one tool per endpoint — you need the right abstraction for your use case.

Mode 1: Generic — 3 Tools, Any API Size

This is the interesting one. Regardless of whether the API has 10 or 1,000 endpoints, Generic mode registers exactly three tools:

Tool	What it does
`{name}_list_schemas`	Browse endpoints by category or search
`{name}_get_schema`	Inspect one endpoint’s parameters
`{name}_api_call`	Call any endpoint

The LLM discovers progressively: browse → inspect → call. All endpoint metadata lives server-side in an endpoint_map.json — it’s never sent to the LLM until specifically requested.

~800 tokens in the tools/list response. Always. Whether it’s Petstore or the Stripe API.

The trade-off is an extra round-trip. The LLM needs to call list_schemas and get_schema before it can call the API. For most conversational use cases, this is invisible — the LLM naturally asks “what can I do?” before doing anything.

Use Generic mode when:

The API has 50+ endpoints
Token cost matters (it usually does)
The LLM doesn’t need instant access to every operation

When NOT to use Generic mode:

You need single-shot tool calling with no discovery step
The API has 5-10 endpoints and tokens aren’t a concern

Mode 2: Per-Endpoint — 1:1 Mapping

One tool per API endpoint. Each tool gets a typed input schema derived from the OpenAPI spec with path, query, and header parameters flattened into a single object.

~200-400 tokens per tool. A 20-endpoint API runs ~5K tokens. A 100-endpoint API hits ~30K+.

This is the pattern most hand-built MCP servers use. It’s fine for small, focused APIs where every endpoint matters and you want the LLM to see the full surface area immediately.

Use Per-Endpoint when:

Small API (under 30 endpoints)
Every operation is equally important
You want zero discovery latency

When NOT to use Per-Endpoint:

API has 50+ endpoints (token costs explode)
Most operations are rarely used

Mode 3: Curated — Hand-Picked + Fallback

You choose 5-15 high-value endpoints as direct tools. Everything else stays accessible through a fallback api_call tool and an explorer for discovery.

curated:
  tools:
    - operation_id: createCustomer
    - operation_id: getInvoice
    - operation_id: createPaymentIntent
  fallback:
    enabled: true
  explorer:
    enabled: true

~2-4K tokens for 10 curated tools + fallback + explorer. The sweet spot for large APIs where you know which operations matter most.

Use Curated when:

You know the top 10-15 operations your users need
You want immediate access to common operations
You still want the LLM to discover edge-case endpoints

When NOT to use Curated:

You don’t know which operations matter yet (start with Generic, then curate)
All operations are equally important

The Numbers

Here’s how the modes compare at different API sizes:

API Size	Hand-Built (typical)	Per-Endpoint	Curated (10 picks)	Generic
20 endpoints	~8-12K	~5-8K	~3-4K	~800
50 endpoints	~20-30K	~15-20K	~3-4K	~800
100 endpoints	~40-60K	~30-40K	~3-4K	~800
300 endpoints	~100-150K	~80-120K	~3-4K	~800

These are estimates based on typical tool description and JSON Schema sizes. “Hand-Built” assumes one tool per endpoint with verbose descriptions, which is the common pattern in open-source MCP servers.

The pattern is clear: Generic mode’s token cost is constant regardless of API size. Everything else scales linearly.

Unified Parameter Schema

There’s a second problem beyond token count: most MCP servers force the LLM to understand HTTP routing.

// What most MCPs expect:
path: { petId: "123" }
query: { status: "available" }
headers: { "X-API-Key": "sk-..." }

That’s transport-level detail the LLM shouldn’t care about. Every token spent on “put this in the path vs the query string” is a token wasted.

mcp-gen flattens everything into a single params object:

// What mcp-gen expects:
params: { petId: "123", status: "available" }
// Routing handled server-side. Credentials injected from vault.

The server handles routing. Credentials are pulled from the vault at call time — the LLM never sees auth headers at all.

What Else You Get

Token efficiency is the headline, but every generated server also includes:

Credential vaulting — secrets pulled from the vault at runtime, never exposed to the LLM
Audit logging — every tool invocation logged with customer context
Multi-tenant support — customer ID from JWT, isolated credential scopes
Type-safe inputs — Zod validation on every parameter before the API call
Deployable to Statio — one pilum deploy to Cloud Run with gateway routing, rate limiting, and usage billing

The Positioning

Other MCP servers make you choose: token-efficient OR feature-complete.

Generic mode is ~800 tokens for any API — smaller than most hand-built single-purpose MCPs — while including credential management, audit trails, and enterprise access control that raw CLI wrappers can’t provide.

The full technical breakdown with configuration examples is on the Statio mcp-gen page.

Resources

mcp-gen technical deep-dive — full configuration reference and mode comparison
Statio — managed MCP security platform
MCP specification — the Model Context Protocol
Romans.dev — SID Technologies product portfolio