Skip to content

Surface sandbox execution errors descriptively instead of an opaque internal error#1156

Draft
RhysSullivan wants to merge 2 commits into
mainfrom
syntax-error-passthrough
Draft

Surface sandbox execution errors descriptively instead of an opaque internal error#1156
RhysSullivan wants to merge 2 commits into
mainfrom
syntax-error-passthrough

Conversation

@RhysSullivan

@RhysSullivan RhysSullivan commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Problem

Code sent to execute could fail in the sandbox and the model would receive an opaque Internal tool error [id] with no actionable reason. Production telemetry on the executor.runtime.* spans showed this was masking several distinct, user-actionable categories, all in the dynamic-worker runtime:

  • Syntax / compile errors (Unexpected token ..., Invalid or unexpected token, Uncaught SyntaxError) - the user's own parse error, from a const with no binding, smart quotes pasted from a doc, an unbalanced brace, or TS that slips past the strip step and only fails at workerd module compile.
  • Non-serializable return values (DataCloneError, Could not serialize object of type "X". This type does not support serialization.) - the code returned something that can't cross the sandbox boundary.
  • Isolate resource limits (Worker exceeded CPU time limit., Worker exceeded memory limit.) - the code was too heavy.
  • Capacity (Too many concurrent dynamic workers) - a transient, retryable condition.

Each is either the user's own mistake or a transient condition the model can act on, yet all were collapsed to the same opaque string.

Fix

The dynamic-worker runtime now classifies a sandbox rejection (at both the worker-startup and the evaluate-RPC catch) into one of:

  • compilation -> CodeCompilationError
  • runtime (serialization, resource limits, capacity) -> new SandboxRuntimeError
  • internal (anything unrecognized) -> DynamicWorkerExecutionError

Compilation and runtime conditions carry their verbatim message through the descriptive ExecuteResult.error channel so the model sees the real reason. Unrecognized defects stay on DynamicWorkerExecutionError and remain opaque, so the host's failure-channel boundary (and the secret-leak protections around it) is unchanged. Tool-invocation failures are unaffected: those already report through the sandbox's own result envelope and never reach this classifier.

The other runtimes (quickjs, deno-subprocess) already surface errors through ExecuteResult.error, so the masking was specific to dynamic-worker.

Verification

  • Runtime, real WorkerLoader (@cloudflare/vitest-pool-workers): live tests prove syntax errors and a non-serializable return (return Symbol(...)) come back descriptive, not as Internal tool error.
  • Classifier table: every production message category (CPU/memory limits and capacity, which can't be triggered deterministically inside a unit-test isolate) is pinned to its expected classification, plus name-based detection of SyntaxError / DataCloneError and an opaque-by-default case for unrecognized defects.
  • Host (real MCP client over in-memory transport): a syntax error surfaces descriptively while a genuine execution failure stays opaque.
  • e2e (real MCP surface, headless OAuth + real sandbox): execute with const = 5; return 1; returns a result whose text contains the parser reason and not Internal tool error.

Coverage notes

  • The end-to-end e2e runs against self-host, which uses the quickjs runtime (already descriptive). The dynamic-worker path is the one this PR changes and is the runtime apps/cloud uses; it is covered by the real-WorkerLoader runtime tests and the classifier table rather than by an e2e. Exercising it end-to-end would require the heavier cloud e2e target (workerd + emulators); that is the natural follow-up if a live cloud scenario is wanted.
  • Resource-limit and capacity conditions are classified and unit-tested but are impractical to trigger live in CI.

…l error

A genuine parse error in code sent to execute (for example a const with
no binding, or pasted smart quotes) was collapsed to an opaque
"Internal tool error [id]", giving the model nothing to self-correct
with. Two masking paths were involved: sucrase strip-time parse
failures, and plain-JS syntax that slips past sucrase and fails at
workerd module compile.

Add a CodeCompilationError tagged error and classify both paths into the
descriptive ExecuteResult.error channel, carrying the real parser
message. Genuine Effect failures and infra defects still collapse to the
opaque mask, so the MCP host security boundary is unchanged.

Covered red-to-green at the runtime layer (real WorkerLoader), the host
layer (real MCP client), and an e2e scenario over the real MCP surface.
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 27, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
executor-marketing fe201b6 Commit Preview URL

Branch Preview URL
Jun 27 2026, 04:43 AM

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Cloudflare preview

Console https://executor-preview-pr-1156.executor-e2e.workers.dev
MCP https://executor-preview-pr-1156.executor-e2e.workers.dev/mcp
Deployed commit fe201b6

Sign-in is Cloudflare Access (one-time PIN to an allowed email). The preview has its own database and encryption key; it is destroyed when this PR closes.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 27, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
executor-cloud fe201b6 Jun 27 2026, 04:43 AM

@pkg-pr-new

pkg-pr-new Bot commented Jun 27, 2026

Copy link
Copy Markdown

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1156

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1156

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1156

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1156

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1156

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1156

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1156

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1156

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1156

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1156

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1156

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1156

executor

npm i https://pkg.pr.new/executor@1156

commit: fe201b6

… errors

Production telemetry (executor.runtime.* spans) showed the dynamic-worker
runtime was collapsing several more error categories to an opaque
"Internal tool error" before the model could act on them: a
non-serializable return value (DataCloneError, "Could not serialize
object of type X"), the isolate's CPU or memory limit, and being
momentarily at worker capacity. The first pass only routed syntax errors
descriptively.

Replace the narrow syntax-only predicate with a classifier that sorts a
sandbox rejection into compilation, runtime, or internal, applied at both
the worker-startup and the evaluate-RPC catch. Compilation and runtime
conditions carry their verbatim message through the descriptive
ExecuteResult.error channel via the existing CodeCompilationError and a
new SandboxRuntimeError; unrecognized defects stay on
DynamicWorkerExecutionError and remain opaque, preserving the host's
failure-channel boundary.

The other runtimes (quickjs, deno-subprocess) already surface errors
through ExecuteResult.error, so the masking was specific to dynamic-worker.

Covered with a live WorkerLoader test for the serialization path and a
classifier table pinning every production message category, alongside the
existing host-layer descriptive-flow coverage.
@RhysSullivan RhysSullivan changed the title Surface code syntax errors descriptively instead of an opaque internal error Surface sandbox execution errors descriptively instead of an opaque internal error Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant