feat: add MCP server example for sandboxed JavaScript execution#35
feat: add MCP server example for sandboxed JavaScript execution#35simongdavies wants to merge 2 commits intomainfrom
Conversation
Add an MCP (Model Context Protocol) server that exposes an execute_javascript tool, allowing AI agents to run arbitrary JavaScript inside an isolated Hyperlight micro-VM sandbox with strict CPU time limits and automatic snapshot/restore recovery after timeouts. Includes server implementation, demo scripts (PowerShell and Bash), vitest test suite, and documentation. Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>
8e32b4b to
62d98d0
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new example MCP (Model Context Protocol) server under src/js-host-api/examples/mcp-server that lets MCP clients execute JavaScript inside a Hyperlight sandbox with configurable resource limits, plus demo scripts, documentation, and a Vitest-based integration test suite.
Changes:
- Introduces an MCP stdio server (
execute_javascript) that compiles/runs JS inside a reusable Hyperlight sandbox with CPU + wall-clock timeouts, snapshot/restore recovery, and optional timing/code logs. - Adds Vitest config + multiple integration-style test suites covering tool behavior, timeouts/recovery, env-var configurability, and timing log output.
- Adds end-to-end demo scripts (bash + PowerShell) and a README describing setup and client configuration.
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/js-host-api/examples/mcp-server/server.js | MCP server implementation; sandbox lifecycle, limits, logging, and tool registration. |
| src/js-host-api/examples/mcp-server/package.json | Example package definition with MCP SDK, Zod, and Vitest. |
| src/js-host-api/examples/mcp-server/vitest.config.js | Vitest configuration for the example’s tests and timeouts. |
| src/js-host-api/examples/mcp-server/tests/mcp-server.test.js | End-to-end MCP protocol/tool integration tests via stdio NDJSON. |
| src/js-host-api/examples/mcp-server/tests/config.test.js | Tests for env-configurable limits, defaults, and stderr warnings. |
| src/js-host-api/examples/mcp-server/tests/timing.test.js | Tests for HYPERLIGHT_TIMING_LOG JSONL output and timing fields. |
| src/js-host-api/examples/mcp-server/tests/prompt-examples.test.js | Large suite validating outputs for “README prompt” examples. |
| src/js-host-api/examples/mcp-server/demo-copilot-cli.sh | Bash demo script to run prompts via Copilot CLI with MCP config. |
| src/js-host-api/examples/mcp-server/demo-copilot-cli.ps1 | PowerShell demo script to run prompts via Copilot CLI with MCP config. |
| src/js-host-api/examples/mcp-server/README.md | End-user documentation for the example server and demos. |
| src/js-host-api/eslint.config.mjs | Adds performance as an allowed global (used by the new server). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function waitForResponse(proc) { | ||
| return new Promise((resolve, reject) => { | ||
| let buffer = ''; | ||
|
|
||
| const onData = (chunk) => { | ||
| buffer += chunk.toString(); | ||
|
|
||
| // Look for a complete line (NDJSON delimiter) | ||
| const newlineIdx = buffer.indexOf('\n'); | ||
| if (newlineIdx === -1) return; // need more data | ||
|
|
||
| const line = buffer.slice(0, newlineIdx).replace(/\r$/, ''); | ||
| buffer = buffer.slice(newlineIdx + 1); | ||
|
|
||
| proc.stdout.off('data', onData); | ||
|
|
||
| if (line.length === 0) return; // skip empty lines | ||
|
|
||
| try { | ||
| resolve(JSON.parse(line)); | ||
| } catch (_err) { | ||
| reject(new Error(`Invalid JSON from server: ${line}`)); | ||
| } | ||
| }; | ||
|
|
||
| proc.stdout.on('data', onData); | ||
| }); |
There was a problem hiding this comment.
waitForResponse keeps its buffer local to a single call and removes the data handler as soon as it parses the first newline. If multiple NDJSON messages arrive in one stdout chunk, any extra lines are dropped and the next waitForResponse call can hang waiting for a message that was already read. Consider implementing a per-process line queue/reader that preserves leftover buffered data across calls.
src/js-host-api/examples/mcp-server/tests/prompt-examples.test.js
Outdated
Show resolved
Hide resolved
| if (installMode) { | ||
| // Permanent install: fixed well-known paths so Copilot can | ||
| // spawn the server with predictable log locations. | ||
| // \"Roads? Where we're going, we don't need roads.\" — Back to the Future (1985) | ||
| env.HYPERLIGHT_TIMING_LOG = '/tmp/hyperlight-timing.jsonl'; | ||
| env.HYPERLIGHT_CODE_LOG = '/tmp/hyperlight-code.js'; | ||
| } else { | ||
| if (process.env.HYPERLIGHT_TIMING_LOG) env.HYPERLIGHT_TIMING_LOG = process.env.HYPERLIGHT_TIMING_LOG; | ||
| if (process.env.HYPERLIGHT_CODE_LOG) env.HYPERLIGHT_CODE_LOG = process.env.HYPERLIGHT_CODE_LOG; | ||
| } |
There was a problem hiding this comment.
In install mode, the script writes a permanent Copilot MCP config that always sets HYPERLIGHT_CODE_LOG to a fixed path. That means every future Copilot session using this config will persist all executed JS to disk by default, which is a privacy/security footgun and can grow unbounded over time. Consider making code logging opt-in (only set HYPERLIGHT_CODE_LOG when --show-code is requested, or gate it behind a separate --enable-code-log flag) and/or add rotation/truncation guidance.
| /** Prompt: "Calculate π to 50 decimal places using the Bailey–Borwein–Plouffe formula" */ | ||
| const PI_50_DIGITS_CODE = ` | ||
| // Machin's formula: π/4 = 4·arctan(1/5) - arctan(1/239) | ||
| // (BBP naturally produces hex digits; Machin is better for decimal output) | ||
| // Using BigInt for arbitrary-precision fixed-point arithmetic. | ||
| const DIGITS = 50; | ||
| const SCALE = 10n ** BigInt(DIGITS + 10); // extra precision buffer | ||
|
|
||
| function arccot(x) { | ||
| const bx = BigInt(x); | ||
| const x2 = bx * bx; | ||
| let power = SCALE / bx; // 1/x at our scale | ||
| let sum = power; | ||
| for (let n = 1; n < 120; n++) { | ||
| power = -power / x2; | ||
| const term = power / BigInt(2 * n + 1); | ||
| if (term === 0n) break; | ||
| sum += term; | ||
| } | ||
| return sum; | ||
| } | ||
|
|
||
| // π = 4 × (4·arccot(5) - arccot(239)) | ||
| const pi = 4n * (4n * arccot(5) - arccot(239)); | ||
| const s = pi.toString(); | ||
| const formatted = s[0] + '.' + s.slice(1, DIGITS + 1); | ||
| return { pi: formatted, digits: DIGITS, method: 'Machin formula with BigInt' }; | ||
| `; |
There was a problem hiding this comment.
This file claims each constant implements the corresponding README prompt, but several prompt labels don't match the code (e.g., this π example says “Bailey–Borwein–Plouffe formula” while the implementation and method field are Machin). To keep the tests/documentation trustworthy, either update the prompt text to match what’s implemented or update the code to actually follow the prompt.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const builder = new SandboxBuilder(); | ||
| builder.setHeapSize(HEAP_SIZE_BYTES); | ||
| builder.setStackSize(STACK_SIZE_BYTES); | ||
|
|
There was a problem hiding this comment.
SandboxBuilder in js-host-api exposes setScratchSize() (scratch includes the stack), but there is no setStackSize() method. Calling builder.setStackSize(...) will throw at runtime and prevent the server/tests from starting. Use setScratchSize(...) instead (and consider renaming the env var/description from “stack” to “scratch/stack”).
| // timeout or unrecoverable error, jsSandbox is set to null and | ||
| // rebuilt on the next call. | ||
|
|
||
| /** @type {import('../../index.d.ts').JSSandbox | null} */ |
There was a problem hiding this comment.
This JSDoc type reference points at ../../index.d.ts, but there is no src/js-host-api/index.d.ts in the repo. This makes the annotation misleading/broken for editors. Point it at an existing type source (or remove the import-based type).
| /** @type {import('../../index.d.ts').JSSandbox | null} */ | |
| /** @type {any | null} */ |
| # Clean up temp files | ||
| Remove-Item $mcpTmp -ErrorAction SilentlyContinue | ||
| Remove-Item $timingLog -ErrorAction SilentlyContinue | ||
| Remove-Item $promptFile -ErrorAction SilentlyContinue |
There was a problem hiding this comment.
$promptFile is never defined in this function, but it’s referenced under Set-StrictMode -Version Latest. That will throw (“variable cannot be retrieved because it has not been set”) and break the demo script. Remove this cleanup line or initialize $promptFile = $null (and only remove when it was created).
| Remove-Item $promptFile -ErrorAction SilentlyContinue |
| function waitForResponse(proc) { | ||
| return new Promise((resolve, reject) => { | ||
| let buffer = ''; | ||
|
|
||
| const onData = (chunk) => { | ||
| buffer += chunk.toString(); | ||
|
|
||
| // Look for a complete line (NDJSON delimiter) | ||
| const newlineIdx = buffer.indexOf('\n'); | ||
| if (newlineIdx === -1) return; // need more data | ||
|
|
||
| const line = buffer.slice(0, newlineIdx).replace(/\r$/, ''); | ||
| buffer = buffer.slice(newlineIdx + 1); | ||
|
|
||
| proc.stdout.off('data', onData); | ||
|
|
There was a problem hiding this comment.
waitForResponse() only reads up to the first newline in the current stdout chunk and then detaches the listener, dropping any additional NDJSON messages that arrived in the same chunk. This can make the integration tests flaky. Use a persistent per-process line buffer (like the WeakMap approach used in the other test files here) so extra lines aren’t lost across calls.
| > **"Calculate π to 50 decimal places using the Bailey–Borwein–Plouffe formula"** | ||
| > | ||
| > Tests: BigInt arithmetic, series computation, precision handling | ||
|
|
||
| > **"Find all prime numbers below 10,000 using the Sieve of Eratosthenes and return the count and the last 10 primes"** |
There was a problem hiding this comment.
This example prompt says “Bailey–Borwein–Plouffe formula”, but the rest of this example/test suite uses Machin’s formula for decimal digits. Either update the prompt text to Machin’s formula (recommended for decimal output) or update the implementation examples to actually use BBP (and clarify the digit/base differences).
| /** Prompt: "Calculate π to 50 decimal places using the Bailey–Borwein–Plouffe formula" */ | ||
| const PI_50_DIGITS_CODE = ` | ||
| // Machin's formula: π/4 = 4·arctan(1/5) - arctan(1/239) | ||
| // (BBP naturally produces hex digits; Machin is better for decimal output) | ||
| // Using BigInt for arbitrary-precision fixed-point arithmetic. | ||
| const DIGITS = 50; | ||
| const SCALE = 10n ** BigInt(DIGITS + 10); // extra precision buffer |
There was a problem hiding this comment.
This prompt comment says “BBP formula”, but the implementation below is Machin’s formula (and the returned method field also says Machin). Align the comment/prompt with the actual algorithm to avoid confusion when maintaining these examples.
Add an MCP (Model Context Protocol) server that exposes an
execute_javascripttool, allowing AI agents to run arbitrary JavaScript inside an isolated Hyperlight micro-VM sandbox with strict CPU time limits and automatic snapshot/restore recovery after timeouts.Includes server implementation, demo scripts (PowerShell and Bash), vitest test suite, and documentation.