Add a "code mode" tool to an existing MCP server so LLMs can write small processing scripts that run against large API responses in a sandboxed runtime — only the script's compact output enters the LLM context window. Use this skill whenever someone wants to add code mode, context reduction, script execution, sandbox execution, or LLM-generated-code processing to an MCP server. Also trigger when users mention reducing token usage, shrinking API responses, running user-provided code safely, or adding a code execution tool to their MCP server — in any language (TypeScript, Python, Go, Rust, etc.).
89
85%
Does it follow best practices?
Impact
95%
2.20xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
When an MCP tool returns a large API response (e.g. listing 500 Kubernetes pods, 200 SCIM users, or thousands of GitHub issues), that entire payload enters the LLM's context window — consuming tokens and degrading performance.
Code mode flips the approach: instead of dumping raw data into context, the LLM writes a small processing script. The MCP server runs the script in a sandboxed runtime against the raw data, and only the script's stdout enters context.
This works especially well with well-known APIs (SCIM, Kubernetes, GitHub, Stripe, Slack, AWS, etc.) because the LLM already knows the response schema from training data — it can write the extraction script in one shot without inspecting the data.
Typical results: 65–99% context reduction.
This is an interactive planning skill. Work with the user step-by-step:
Do not jump ahead. Confirm each step with the user before proceeding.
Ask the user (or discover by reading their codebase):
Summarize your understanding back to the user and confirm before moving on.
The sandbox must be isolated from the host filesystem and network by default
and secure by default. Present the user with options that match their server
language, using the reference in references/sandbox-options.md.
If the server is TypeScript/JavaScript:
| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
quickjs-emscripten | JavaScript | WASM (no fs/net) | ~1MB | Lightweight, actively maintained, best default |
pyodide | Python | WASM (no fs/net) | ~20MB | Full CPython in WASM, heavier |
isolated-vm | JavaScript | V8 isolate (no fs/net) | ~5MB native | Fast, separate V8 heap, not WASM |
If the server is Python:
| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
RestrictedPython | Python | AST-restricted compile | Tiny | Compiles to restricted bytecode, no I/O by default |
pyodide (in-process WASM) | Python | WASM | ~20MB | Heavier but stronger isolation than RestrictedPython |
quickjs (via quickjs PyPI) | JavaScript | WASM/native | Small | Run JS from Python |
If the server is Go:
| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
goja | JavaScript | Pure Go interpreter | Zero CGO | No fs/net, widely used (used by Grafana) |
Wazero | WASM guest (JS/Python compiled to WASM) | WASM runtime, pure Go | Zero CGO | Strongest isolation, runs any WASM module |
starlark-go | Starlark (Python dialect) | Pure Go interpreter | Zero CGO | Deterministic, no I/O, used by Bazel |
If the server is Rust:
| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
boa_engine | JavaScript | Pure Rust interpreter | No unsafe deps | ES2024 support, embeddable |
wasmtime / wasmer | WASM guest | WASM runtime | Strong | Run any WASM module, strongest isolation |
deno_core | JavaScript/TypeScript | V8-based | Larger | Full V8, powerful but heavier |
rustpython | Python | Pure Rust interpreter | Moderate | Less mature but functional |
Read references/sandbox-options.md for detailed tradeoffs on each option.
Present 2–3 options to the user (filtered to their server language), explain the tradeoffs briefly, and let them choose. If they're unsure, recommend the lightest WASM-based option for their language.
Once the sandbox is selected, create a concrete plan with the user. The plan should cover these components:
A new MCP tool (e.g. code_mode or <domain>_code_mode) that accepts:
command or args: The underlying API call / query to execute
(e.g. kubectl args, SCIM endpoint + params, GraphQL query)code: The processing script the LLM writeslanguage (optional): Script language, defaults to javascriptThe tool handler:
DATA variable into the sandbox[code-mode: 18.0KB -> 6.2KB (65.5% reduction)]A utility module that:
DATA (the raw API response as a string) into the sandboxA benchmark script that compares tool output size vs. code-mode output size
across realistic scenarios. See references/benchmark-pattern.md for the
template.
Present the plan to the user and confirm before implementing.
Follow the confirmed plan. Implement in this order:
npm i quickjs-emscripten)Keep the implementation minimal — don't over-abstract. The executor and tool can each be a single file.
DATA variable should always be a string (JSON-serialized). The
script is responsible for parsing it if needed (JSON.parse(DATA) in JS,
json.loads(DATA) in Python).[code-mode: {before}KB -> {after}KB ({pct}% reduction)]After implementation, run the benchmark to verify code mode actually reduces
context size. Read references/benchmark-pattern.md for the full template.
The benchmark should:
Present the benchmark results to the user. Typical expectations:
references/sandbox-options.md — Detailed comparison of all sandbox options
by server language, with security analysis and setup instructionsreferences/benchmark-pattern.md — Benchmark script template and methodology3a2240a
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.