
MCP is powerful, but it can quietly break your AI system
The Model Context Protocol makes it easy to plug tools into LLMs. But the default integration pattern floods context windows, burns tokens, and quietly turns systems fragile and expensive.
The Model Context Protocol (MCP) has exploded in popularity. Everyone is wiring MCP servers into their agents. Everyone is excited about the standardized way to "plug-and-play" tools. And yes, MCP genuinely makes it easier to connect external systems to LLMs.
But if you are not careful, MCP can create a new problem. It can bloat context, burn tokens, and make systems fragile and expensive. Anthropic's own "Code Execution with MCP" article basically confirms this.
The core problem: MCP floods the context window
In most implementations, MCP tools are exposed to the model as direct tools. This means:
- All tool definitions are loaded into the model's context.
- The LLM calls tools directly.
- Every intermediate result flows back through the model.
That looks roughly like this:
// What the model sees in its prompt (simplified)
{
"tools": [
{
"name": "gdrive.getDocument",
"description": "Retrieves a document from Google Drive",
"parameters": {
"type": "object",
"properties": {
"documentId": { "type": "string" },
"fields": { "type": "string" }
},
"required": ["documentId"]
}
},
{
"name": "salesforce.updateRecord",
"description": "Updates a Salesforce record",
"parameters": {
"type": "object",
"properties": {
"objectType": { "type": "string" },
"recordId": { "type": "string" },
"data": { "type": "object" }
},
"required": ["objectType", "recordId", "data"]
}
}
// ...now imagine this times 200+ tools
]
}
Each tool gets its own JSON schema, description, and parameter spec. One or two tools is fine. Now imagine hundreds of them.
Before the model even answers the user, it is already chewing through tens or hundreds of thousands of tokens of tool metadata.
Direct tool calling bloats context
Now imagine a simple user request:
Download my meeting transcript from Google Drive and attach it to the Salesforce lead.
With direct tool calling, the interaction often looks like this:
// MODEL → TOOL: gdrive.getDocument
{
"documentId": "abc123"
}
// TOOL → MODEL
{
"title": "Q4 Planning Call",
"content": "Discussed Q4 goals...\n[full 50k-token transcript text]",
"metadata": {}
}
The entire transcript is now inside the model's context. Next, the model calls Salesforce:
// MODEL → TOOL: salesforce.updateRecord
{
"objectType": "SalesMeeting",
"recordId": "00Q5f000001abcXYZ",
"data": {
"Notes": "Discussed Q4 goals...\n[full 50k-token transcript text re-written]"
}
}
Two problems here:
- The transcript flows through the model twice. Once as tool output and once as tool input.
- The model has to copy the entire payload. Chance of truncation, formatting errors, invalid JSON, and so on.
These are just two tools. Now scale this up to:
- 10 tools, 20 tools, more
- multiple documents
- long chains of operations
Your context window explodes before your system becomes useful.
Why this is fundamentally flawed
With naive MCP integration:
- Tool definitions flood the context.
- Tool results flood the context.
- The LLM does clerical work (copying data) instead of reasoning.
- Costs and latency skyrocket.
- Privacy is weak. Everything passes through the model.
Anthropic's own solution is basically an admission that this pattern does not hold under real-world scale.
Code execution: treating MCP tools like an API, not a prompt
Another way to expose MCP tools is to let the model write code that calls MCP tools, instead of calling them directly through the LLM.
A common pattern is to generate a file tree that mirrors your MCP servers:
servers/
google-drive/
getDocument.ts
getSheet.ts
index.ts
salesforce/
updateRecord.ts
query.ts
index.ts
Each file wraps an MCP tool:
// servers/google-drive/getDocument.ts
import { callMCPTool } from "../../client";
interface GetDocumentInput {
documentId: string;
}
interface GetDocumentResponse {
title: string;
content: string;
}
export async function getDocument(
input: GetDocumentInput,
): Promise<GetDocumentResponse> {
return callMCPTool<GetDocumentResponse>("google_drive__get_document", input);
}
Then your "Google Drive to Salesforce" flow becomes normal code:
// scripts/syncTranscriptToSalesforce.ts
import * as gdrive from "../servers/google-drive";
import * as salesforce from "../servers/salesforce";
async function syncTranscript() {
const transcript = await gdrive.getDocument({ documentId: "abc123" });
await salesforce.updateRecord({
objectType: "SalesMeeting",
recordId: "00Q5f000001abcXYZ",
data: {
Notes: transcript.content,
Title: transcript.title,
},
});
console.log("Transcript synced to Salesforce.");
}
syncTranscript().catch(console.error);
What changes?
- The sandbox sees the full document.
- The LLM only sees the code, not the raw 50k-token transcript.
- Tool definitions are read from the filesystem on demand, not injected into the prompt up front.
Token usage could drop from something like 150k to 2k. That is roughly the 98.7% reduction Anthropic talks about.
Progressive disclosure in practice
Instead of dumping every tool into context, the model can:
- List directories like
./servers/. - Choose a server (
google-drive,salesforce,slack, and so on). - Open specific files (
getDocument.ts,updateRecord.ts). - Read just enough to understand the interface.
The model does not need all tool schemas in the prompt. It can discover, open, read, and use.
Privacy-preserving operations
The same pattern helps with sensitive data (emails, phone numbers, internal IDs).
Instead of hoping prompts are "responsible," you keep the real data inside the sandbox and only surface tokenized or masked values to the model.
Conclusion
MCP is not "bad."
But the default way we have been using it is deeply flawed at scale. With naive MCP integration:
- Context windows get flooded with tool definitions.
- Intermediate results blow up token usage.
- Errors grow with tool-chain complexity.
- Privacy becomes harder to guarantee.
- Costs and latency quietly spiral out of control.
Anthropic's own article is basically a warning label.
If we do not design carefully, we will end up with "MCP-enabled" systems that are slower, more expensive, and less private than the systems we were trying to improve.
Share this piece