Surface sandbox execution errors descriptively instead of an opaque internal error#1156
Draft
RhysSullivan wants to merge 2 commits into
Draft
Surface sandbox execution errors descriptively instead of an opaque internal error#1156RhysSullivan wants to merge 2 commits into
RhysSullivan wants to merge 2 commits into
Conversation
…l error A genuine parse error in code sent to execute (for example a const with no binding, or pasted smart quotes) was collapsed to an opaque "Internal tool error [id]", giving the model nothing to self-correct with. Two masking paths were involved: sucrase strip-time parse failures, and plain-JS syntax that slips past sucrase and fails at workerd module compile. Add a CodeCompilationError tagged error and classify both paths into the descriptive ExecuteResult.error channel, carrying the real parser message. Genuine Effect failures and infra defects still collapse to the opaque mask, so the MCP host security boundary is unchanged. Covered red-to-green at the runtime layer (real WorkerLoader), the host layer (real MCP client), and an e2e scenario over the real MCP surface.
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
executor-marketing | fe201b6 | Commit Preview URL Branch Preview URL |
Jun 27 2026, 04:43 AM |
Contributor
Cloudflare preview
Sign-in is Cloudflare Access (one-time PIN to an allowed email). The preview has its own database and encryption key; it is destroyed when this PR closes. |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
executor-cloud | fe201b6 | Jun 27 2026, 04:43 AM |
@executor-js/cli
@executor-js/config
@executor-js/execution
@executor-js/sdk
@executor-js/codemode-core
@executor-js/runtime-quickjs
@executor-js/plugin-file-secrets
@executor-js/plugin-graphql
@executor-js/plugin-keychain
@executor-js/plugin-mcp
@executor-js/plugin-onepassword
@executor-js/plugin-openapi
executor
commit: |
… errors Production telemetry (executor.runtime.* spans) showed the dynamic-worker runtime was collapsing several more error categories to an opaque "Internal tool error" before the model could act on them: a non-serializable return value (DataCloneError, "Could not serialize object of type X"), the isolate's CPU or memory limit, and being momentarily at worker capacity. The first pass only routed syntax errors descriptively. Replace the narrow syntax-only predicate with a classifier that sorts a sandbox rejection into compilation, runtime, or internal, applied at both the worker-startup and the evaluate-RPC catch. Compilation and runtime conditions carry their verbatim message through the descriptive ExecuteResult.error channel via the existing CodeCompilationError and a new SandboxRuntimeError; unrecognized defects stay on DynamicWorkerExecutionError and remain opaque, preserving the host's failure-channel boundary. The other runtimes (quickjs, deno-subprocess) already surface errors through ExecuteResult.error, so the masking was specific to dynamic-worker. Covered with a live WorkerLoader test for the serialization path and a classifier table pinning every production message category, alongside the existing host-layer descriptive-flow coverage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Code sent to
executecould fail in the sandbox and the model would receive an opaqueInternal tool error [id]with no actionable reason. Production telemetry on theexecutor.runtime.*spans showed this was masking several distinct, user-actionable categories, all in the dynamic-worker runtime:Unexpected token ...,Invalid or unexpected token,Uncaught SyntaxError) - the user's own parse error, from aconstwith no binding, smart quotes pasted from a doc, an unbalanced brace, or TS that slips past the strip step and only fails at workerd module compile.DataCloneError,Could not serialize object of type "X". This type does not support serialization.) - the code returned something that can't cross the sandbox boundary.Worker exceeded CPU time limit.,Worker exceeded memory limit.) - the code was too heavy.Too many concurrent dynamic workers) - a transient, retryable condition.Each is either the user's own mistake or a transient condition the model can act on, yet all were collapsed to the same opaque string.
Fix
The dynamic-worker runtime now classifies a sandbox rejection (at both the worker-startup and the evaluate-RPC catch) into one of:
CodeCompilationErrorSandboxRuntimeErrorDynamicWorkerExecutionErrorCompilation and runtime conditions carry their verbatim message through the descriptive
ExecuteResult.errorchannel so the model sees the real reason. Unrecognized defects stay onDynamicWorkerExecutionErrorand remain opaque, so the host's failure-channel boundary (and the secret-leak protections around it) is unchanged. Tool-invocation failures are unaffected: those already report through the sandbox's own result envelope and never reach this classifier.The other runtimes (
quickjs,deno-subprocess) already surface errors throughExecuteResult.error, so the masking was specific to dynamic-worker.Verification
@cloudflare/vitest-pool-workers): live tests prove syntax errors and a non-serializable return (return Symbol(...)) come back descriptive, not asInternal tool error.SyntaxError/DataCloneErrorand an opaque-by-default case for unrecognized defects.executewithconst = 5; return 1;returns a result whose text contains the parser reason and notInternal tool error.Coverage notes
quickjsruntime (already descriptive). The dynamic-worker path is the one this PR changes and is the runtimeapps/clouduses; it is covered by the real-WorkerLoader runtime tests and the classifier table rather than by an e2e. Exercising it end-to-end would require the heaviercloude2e target (workerd + emulators); that is the natural follow-up if a live cloud scenario is wanted.