Add non-code MCP mode (?codemode=false): search + invoke#1127
Draft
RhysSullivan wants to merge 4 commits into
Draft
Add non-code MCP mode (?codemode=false): search + invoke#1127RhysSullivan wants to merge 4 commits into
RhysSullivan wants to merge 4 commits into
Conversation
By default the MCP server runs in code mode: it advertises a single `execute` tool and the model writes TypeScript that calls `tools.search`, `tools.describe.tool`, and the connection tools. That keeps the tool list tiny, but clients that do lazy tool loading expect every tool enumerated directly so they can fetch schemas on demand. This adds a `?codemode=false` query parameter that switches the session into transparent mode: instead of `execute`, the server lists every directly-callable tool (connection tools plus the static core and plugin tools) with its own input schema, and routes `tools/call` straight to a single-tool invoke. Code mode stays the default. Threading: - New engine seams `listTools` / `invokeTool` / `invokeToolWithPause` alongside the existing code-execution methods, carried through every host (cloud, cloudflare, local, self-host) and the usage decorator. - The MCP host registers low-level `ListTools` / `CallTools` handlers in transparent mode and keeps the high-level `registerTool` path for code mode; the session reads the flag off the connection URL. A normalizer stamps `type: "object"` onto any advertised input schema whose root lacks one (a union-root tool such as add-server otherwise compiles to `anyOf` with no top-level type, which makes the MCP client reject the whole tools/list response). Covered by a cross-target e2e scenario that seeds an OpenAPI connection, opens a transparent session, asserts the tools are dumped directly, and makes a verifiable direct core-tool call. Green on self-host and on the workerd Durable Object path.
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
executor-marketing | 175edf5 | Commit Preview URL Branch Preview URL |
Jun 25 2026, 05:53 PM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
executor-cloud | 175edf5 | Jun 25 2026, 05:54 PM |
Contributor
Cloudflare preview
Sign-in is Cloudflare Access (one-time PIN to an allowed email). The preview has its own database and encryption key; it is destroyed when this PR closes. |
@executor-js/cli
@executor-js/config
@executor-js/execution
@executor-js/sdk
@executor-js/codemode-core
@executor-js/runtime-quickjs
@executor-js/plugin-file-secrets
@executor-js/plugin-graphql
@executor-js/plugin-keychain
@executor-js/plugin-mcp
@executor-js/plugin-onepassword
@executor-js/plugin-openapi
executor
commit: |
A direct tool call in transparent mode unwraps the tool's `ToolResult`
envelope (renders `data` natively, sets `isError` on failures). The
`resume` path is shared with code mode and formatted every resumed
completion with the code-mode `execute` envelope, so a transparent-mode
tool that paused for approval and then resumed came back wrapped in
`{ status, result, logs }` instead of the tool's own result, unlike the
same tool when it did not pause.
Pick the resume completion formatter by session mode: a paused execution
can only have originated from the tool this session registered (`execute`
in code mode, a direct single-tool invoke in transparent mode), so format
the resumed completion the same way that origin tool formats a non-paused
completion.
Covered by a second case in the codemode-off scenario that drives the
approval-gated `policies.create` through pause, approve, and resume in a
transparent session and asserts the resumed structured content is the
policy itself, not the execute envelope. Green on self-host and workerd;
the assertion fails against the pre-fix formatter.
`?codemode=false` previously dumped every tool into one `tools/list`.
That does not scale: the full Microsoft Graph connection is ~16.5k tools
and ~640 MB of self-contained schema, which no client can load in a
single response and which exceeds the runtime's memory budget. The server
builds it fast, but the payload itself is the wall, and no client (Codex
does not paginate tools/list; the spec's cursor pagination only helps
clients that do) can usefully receive a catalog that large.
Switch non-code mode to a fixed two-tool surface instead:
- `search({ query, limit?, offset? })` ranks over the whole catalog and
returns only a bounded page, each hit carrying its own input schema so
it can be called directly.
- `invoke({ name, arguments? })` runs a tool by name, reusing the same
resolve/invoke/pause/resume path (the resumed result stays unwrapped).
This is the lazy-loading shape: the client pulls the handful of tools it
needs rather than the whole catalog, so it works for any client and any
catalog size. It is essentially code mode's own search/invoke primitives
exposed as flat MCP tools instead of behind the `execute` sandbox.
Engine: add a bounded `searchTools` seam (reuses the existing discovery
ranking, enriches the page with schemas) and drop the now-unused
`listTools` seam that backed the dump.
Covered end to end:
- codemode-off: a non-code session advertises search/invoke (not execute,
not a dumped catalog); search finds a seeded connection's tools; invoke
runs one and returns its real result; the pause/resume shape guard
still holds. Green on self-host and the workerd DO.
- codemode-scale: the full 16.5k Graph catalog is searched (bounded page)
and invoked, with trace assertions that the catalog is never dumped,
each invoke dispatches once, and a single invoke neither searches nor
rebuilds the catalog. Green on cloud.
The non-code mode no longer dumps the catalog, so nothing enumerates tools with their schemas anymore. Revert `tools.list`'s `includeSchemas` branch (the bulk self-contained-schema enrichment) to the original projected-only listing, drop the `ToolListFilter.includeSchemas` field, and fold the now-single-use `ToolListing` type into `ToolSearchResult`. Also refresh the doc comments that still described the dump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
By default the MCP server runs in code mode: one
executetool the model writes TypeScript against, discovering connections throughtools.search()/tools.describe.tool()and calling them inside a sandbox. Some clients can't drive a code sandbox and want to discover and call tools through plain MCP tool calls.This adds
?codemode=false, which switches a session into non-code mode: instead ofexecute, it exposes two meta-tools,searchandinvoke.search({ query, limit?, offset? })ranks over the whole catalog and returns a bounded page, each hit carrying its own input schema so it can be called directly.invoke({ name, arguments? })runs a tool by name, reusing the same resolve / invoke / pause / resume path as code mode.Code mode stays the default, so existing clients are unaffected.
Why search + invoke, not a tool dump
The obvious reading of the flag (dump every tool as an individual MCP tool, like Cloudflare's
?codemode=false) does not scale. The full Microsoft Graph connection alone is ~16,575 tools / ~640 MB of self-contained inlined schema. The server builds that in under a second, but the payload itself is the wall:tools/list(Codex does not paginatetools/listat all; the spec's cursor pagination only helps clients that loop onnextCursor).search+invokeis the lazy-loading shape: the client pulls the handful of tools it needs rather than the whole catalog, so it works for any client and any catalog size.How
searchTools(bounded, ranked, paginated; reuses the existing discovery ranking and enriches only the returned page with schemas). The dump-onlylistToolsseam is removed.ListTools/CallToolshandlers advertisingsearch/invoke(+resumewhen an approval pauses); code mode keeps the high-levelexecutepath.invokereturns the tool's own result, unwrapped from theToolResultenvelope, the same shape it returns without pausing.Evidence
e2e/scenarios/mcp-codemode-off.test.ts): a non-code session advertisessearch/invoke(notexecute, not a dumped catalog);searchfinds a seeded connection's tools;invokeruns one and returns its real result; a second case drives an approval-gated tool through invoke → pause → approve → resume and asserts the resumed result is the unwrapped tool result. Green on self-host and the workerd Durable Object.e2e/scenarios/mcp-codemode-scale.test.ts): the full 16,575-tool Graph catalog is searched (bounded page) and invoked, with trace assertions that the catalog is never dumped, each invoke dispatches exactly once, and a single invoke neither searches nor rebuilds the catalog. Green on cloud.Full gates pass:
format:check,lint(0/0),typecheck(41/41), plus the touched packages' unit tests.