Cut Token Costs with Anthropic Code Execution

An artistic representation featuring programming code on a computer screen, highlighting the term 'Anthropic'.

Click to zoom

What is the problem with MCP-powered agents?

Quick version: Model Context Protocol (MCP) makes it easy for language models to reach out to external systems — databases, filesystems, APIs — but the classic wiring smells when you scale. I’ve seen teams happily pile dozens of tool schemas and huge intermediate payloads into the model context, and then wonder why their agent grinds to a halt.

The typical pattern is straightforward but wasteful: each MCP tool’s schema and metadata are dumped into the model context, and every intermediate response — even massive blobs like meeting transcripts or spreadsheets — gets streamed back through the model so it can decide the next action. Two practical headaches follow:

Token bloat and cost: Large tool catalogs plus large payloads balloon token consumption. The bill climbs fast.
Latency and context limits: Shuttling big data through context increases delay and eventually hits model context size caps, constraining what the agent can do.

Picture an agent that fetches a 50k-word sales meeting transcript from Google Drive and then needs to push it to Salesforce. In the classic flow that transcript is streamed back into the model and then re-sent to Salesforce — duplicating tens of thousands of tokens purely for orchestration. That’s expensive and unnecessary.

What is Anthropic’s “Code Execution with MCP” pattern?

Anthropic’s insight is elegantly simple: treat MCP servers as code APIs and have the model write and run code that calls thin wrappers instead of dumping everything into the context. In practice they put MCP inside a sandboxed code execution loop (TypeScript in their examples) and expose each MCP server as a small filesystem of modules.

Instead of stuffing all tool definitions and data into the model context, the agent generates a directory — e.g., servers/ — where each MCP tool is represented by a tiny wrapper like servers/google-drive/getDocument.ts. The model then composes those wrappers by producing TypeScript scripts that run in a restricted runtime. Heavy lifting (parsing, summarizing, transformation) happens in code, not inside the model context.

Three-step pattern

Generate a servers/ filesystem mirroring available MCP servers and tools.
Create thin, typed wrapper functions for each MCP tool (one file per tool).
Ask the model to write TypeScript that imports those wrappers, performs control flow, and handles data locally in the sandboxed runtime.

Why it matters — practical benefits for agent builders

This “code execution with MCP” pattern isn’t just clever — it changes the tradeoffs you live with:

Progressive tool discovery: The agent can inspect the generated filesystem and import only modules it needs. No more loading the whole tool catalog into context up front.
Context-efficient data handling: Huge payloads (transcripts, sheets, logs) are read and processed inside the runtime; the model only sees summaries, samples, or outcomes.
Privacy-preserving orchestration: Sensitive fields can be tokenized or masked in the sandbox. The model works against placeholders while the MCP client retains real values for downstream calls.
Reusable skills and state: Scripts and helper functions live in a skills/ folder and can be audited and reused across runs — think of these as small, executable capabilities you trust (similar to Claude Skills in practice).

In short: treat servers as code APIs and the model becomes a code author that composes trusted primitives. Cleaner, cheaper, and more controllable.

Quantitative impact — dramatic token reduction

Anthropic shares a vivid example: a workflow that consumed roughly 150,000 tokens when tool calls and intermediate data were streamed through the model dropped to about 2,000 tokens after being rewritten as code that runs against filesystem-based MCP wrappers. That’s ≈98.7% reduction — real money saved and much faster turnaround.

Similar approaches: Cloudflare Code Mode

Cloudflare’s “Code Mode” on Workers echoes this idea: MCP-like tools are surfaced as TypeScript APIs and model-generated code executes inside an isolate. So if you’re asking, “is Cloudflare Code Mode the same?” — they’re cousins. The pattern is converging: push orchestration and heavy data handling into sandboxed code environments to tame token costs and latency.

Security and engineering tradeoffs

Switching to model-generated code running in a sandbox doesn’t eliminate responsibility — it shifts it. You still need to:

Harden the sandbox and runtime: limit network and file access, enforce CPU/memory quotas, and monitor resource usage.
Vet generated code and wrappers for injection or permission risks — code vetting matters.
Manage secrets and tokenization so sensitive data never leaks into context or logs unintentionally.

From experience: these tradeoffs are manageable. Building a well-scoped, hardened runtime is engineering work, yes — but it’s usually worth it given token cost savings and improved observability of agent workflows.

Example: A lightweight transcript-to-salesflow

Small, concrete walk-through — because examples stick.

The MCP client generates servers/google-drive/getDocument.ts and servers/salesforce/updateRecord.ts.
The model writes a short TypeScript file that imports getDocument, loads the transcript into local memory, extracts highlights and a summarized status, then calls updateRecord with the summary (not the full transcript).
Only the summary and success/failure status are returned to the model. The transcript never flows through the model context.

Result: compact context, limited data exposure, and a huge drop in tokens used for orchestration. If you need to know how to summarize large transcripts without resending them to the model — how to summarize large transcripts without resending them to the model. Learn more in our guide to agentic workflows.

One original insight

Beyond obvious cost and latency wins, the pattern fosters a hybrid developer-model workflow: engineers curate small, audited utility scripts in the skills/ folder. Agents can import these audited utilities instead of generating ad-hoc code every run. That middle ground — automation with guardrails — accelerates safe, repeatable production automation in ways teams actually trust.

References and further reading

For deeper detail, see Anthropic’s engineering post: Code Execution with MCP. Also check Cloudflare Workers’ Code Mode docs for a similar serverless-isolate angle. If you’re researching comparisons: look up “Cloudflare Code Mode vs Anthropic code execution pattern” and you’ll find practical side-by-sides.

Key takeaways

Problem: Direct MCP calls stream tool definitions and large intermediate results through model context, causing token bloat and latency.
Solution: Represent MCP servers as filesystem-based code APIs, let the model write sandboxed TypeScript that composes thin wrappers, and run that code in an isolated runtime.
Benefits: Up to ≈99% token reduction in some workflows, improved privacy handling, progressive tool discovery, and reusable skills.
Tradeoffs: Requires sandbox hardening, code vetting, and careful secret management — but the ROI is often strong.

Anthropic’s “code execution with MCP” is a pragmatic, high-impact pattern for building scalable, token-efficient agents. Treat MCP servers as code APIs and you’re no longer paying a tax on orchestration. How much money can you save? Depends on your workloads, but the 150k → 2k tokens example gives you the scale of possibility.

In short: if you’re building MCP-powered agents, try the TypeScript sandbox approach — your token bill (and your on-call rotations) will thank you. I’ve seen teams adopt it and rarely look back.

🎉

Thanks for reading!

If you found this article helpful, share it with others

⌨️ Keyboard Shortcuts

Anthropic’s Code Execution with MCP: Cut Token Costs & Speed Agents