Skip to content

Commit 2108eb4

Browse files
phurynclaude
andcommitted
Google target: Claude-on-Vertex spike, mixed-model seam, gap workarounds, honest reframe
Spike (offline-verified, live blocked on Model Garden enablement): - experiments/claude-on-vertex/: ADK 1.34.3 resolves Claude on Vertex and the mixed-model shape composes (web sub-agents stay Gemini); guarded live probe + RESULTS.md. No new dependency (1.34.3 is already our floor). Code (implemented + tested, no user-facing passthrough until a live receipt exists): - google_codegen: web_model() pins wrapped web tool-agents to Gemini by construction (Search grounding / URL Context are Gemini built-ins) -- robust to a future Claude parent. - google_plan: refuse a Claude --google-model (google.deploy_model.claude_unsupported) rather than silently encode an unverified deploy path. Docs (reframe gaps as non-equivalences / non-goals, not parity TODOs): - deploy-google.md: document the two Google gaps + workarounds (sandbox -> URL MCP server, :ask -> gate client-side or keep on Anthropic). - README/provider-matrix/limitations/CLAUDE: roadmap reframe, audit reason de-undersold, coverage-matrix footnote notes the web tools' separate live receipt. Tests: pin the web_model invariant + the deploy-model guard. 115 passed (offline suite). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent ad48437 commit 2108eb4

12 files changed

Lines changed: 469 additions & 16 deletions

CLAUDE.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -82,10 +82,10 @@ A bare ref resolves to the agent's **own** resource first, then `shared/`.
8282
| Skills | uploaded, shared by id (skill-bearing agents auto-get `read` — Managed Agents needs it to open `SKILL.md`) | ✅ embedded in source package, loaded via ADK `load_skill_from_dir` (update = redeploy) | export comment only |
8383
| Remote MCP | mapped | ✅ URL → ADK `McpToolset` + `tool_filter`; inline auth → Agent Engine `env_vars` (resolved at deploy, never inlined) | export comment only |
8484
| Built-in web tools (`web_search`/`web_fetch`) | mapped |`web_search`→Google Search grounding, `web_fetch`→URL Context, each a wrapped single-tool ADK sub-agent (`AgentTool`, `propagate_grounding_metadata=True`); always-wrap so they coexist with `transfer_to_agent`; pins `google-adk>=1.34.3` | `WebSearchTool` / self-host fetch |
85-
| Built-in sandbox tools (`bash/files/glob-grep`) | mapped | 🚧 skipped (sandbox is Python/JS only) | self-host runner |
86-
| `:ask` | permission policy | 🚧 unsupported on `VertexAiSessionService` | client-side |
85+
| Built-in sandbox tools (`bash/files/glob-grep`) | mapped | 🚧 skipped (sandbox is Python/JS only — in-engine emulation is a **non-goal**; expose equivalents via a URL MCP server) | self-host runner |
86+
| `:ask` | permission policy | 🚧 unsupported on `VertexAiSessionService` (gate client-side, or keep on Anthropic) | client-side |
8787
| Idempotency | lockfile + content hashes |`.agentlift-google.json` spec hash → create/update/skip | n/a |
88-
| Model | Claude (native) | 🔁 mapped to Gemini (`gemini-2.5-flash`) | 🔁 mapped to `gpt-*` |
88+
| Model | Claude (native) | 🔁 mapped to Gemini (`gemini-2.5-flash`); Claude-on-Vertex is an offline-verified **spike, not shipped** (`experiments/claude-on-vertex/`) — a Claude `--google-model` is refused (`google.deploy_model.claude_unsupported`) | 🔁 mapped to `gpt-*` |
8989

9090
**Live-verified (6/6 both):** one neutral fixture (`tests/live/fixtures/coverage-matrix`) was deployed
9191
+ queried on **both** Anthropic and Google; all six portability dimensions (agents · subagents ·
@@ -111,9 +111,19 @@ so the objective signal is the tool-call + its response content, not citation ch
111111
`deploy --target google` reports *agentlift's current implementation*. These now agree on
112112
skills, URL MCP, and the built-in **web** tools (all mapped). They still diverge on the
113113
built-in **sandbox** tools and `:ask` (`audit` rates them `degraded`/`unsupported` for
114-
Google; `deploy` skips a stdio MCP server / sandbox-tool-only folder). Pipeline for Google
115-
mirrors Anthropic's *plan-is-the-contract* discipline: `google_plan.py` is pure and
116-
offline-tested, only `google_target.py` touches the network.
114+
Google; `deploy` skips a stdio MCP server / sandbox-tool-only folder). Those two are framed
115+
as **non-goals with workarounds**, not parity TODOs (sandbox → expose via a URL MCP server;
116+
`:ask` → gate client-side or keep on Anthropic — see
117+
[docs/deploy-google.md](docs/deploy-google.md)). Pipeline for Google mirrors Anthropic's
118+
*plan-is-the-contract* discipline: `google_plan.py` is pure and offline-tested, only
119+
`google_target.py` touches the network.
120+
121+
**Claude-on-Vertex (spike, not shipped):** ADK 1.34.3 resolves Claude on Vertex and the
122+
mixed-model shape composes (web sub-agents must stay Gemini — Search/URL-Context are Gemini
123+
built-ins, encoded by `web_model()` in `google_codegen.py`). Offline-verified in
124+
`experiments/claude-on-vertex/`; no live receipt yet, so `build_google_plan` **refuses** a
125+
Claude `--google-model` (`google.deploy_model.claude_unsupported`) rather than silently
126+
shipping it (the *confirm-live-before-encoding* rule).
117127

118128
## Commands
119129

README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ $ agentlift audit ./examples/team --targets anthropic,google,openai
184184
reason: web_search -> Google Search grounding, web_fetch -> URL Context (each a wrapped tool-agent)
185185
degraded:
186186
! Built-in tool sandbox (bash / files / glob-grep)
187-
reason: hosted sandbox is Python/JS only
187+
reason: hosted sandbox is Python/JS only - no bash, no persistent workspace filesystem
188188
unsupported:
189189
x Per-tool approval gate (:ask / human-in-the-loop)
190190
reason: not enforced with VertexAiSessionService on the deployed runtime
@@ -225,7 +225,7 @@ One neutral [coverage fixture](tests/live/fixtures/coverage-matrix/) — a coord
225225

226226
> `*` **OpenAI is an export target, not a live deploy** — there is no code-define-and-OpenAI-host path. agents + subagents are real (the `as_tool` composition is trace-verified in [`experiments/subagent-composition`](experiments/subagent-composition/)); MCP and skills compile to guided self-host scaffolding (`HostedMCPTool` / Skills-API call sites), since the orchestration loop runs in *your* app. `` = compiled to a self-host stub, not live-exercised.
227227
>
228-
> `EXERCISED` = an objective runtime event proved it. **Both Anthropic and Google exercised all six dimensions server-side (6/6)** on a real, billable deploy. Anthropic's subagents cell keys on the native delegation event (`session.thread_created` + `agent.thread_message_sent`) because coordinator delegation is async (the worker's reply lands after a one-shot query returns). The **wired** layer is pinned offline in [`tests/test_coverage_matrix_plan.py`](tests/test_coverage_matrix_plan.py) (CI); the EXERCISED column comes from committed live receipts under [`tests/live/receipts/`](tests/live/receipts/). Full evidence + reproduce steps: [`docs/tested-platforms.md`](docs/tested-platforms.md#live-coverage-matrix--receipt-evidence-not-a-capability-ranking) and [`tests/live/README.md`](tests/live/README.md).
228+
> `EXERCISED` = an objective runtime event proved it. **Both Anthropic and Google exercised all six dimensions server-side (6/6)** on a real, billable deploy. Anthropic's subagents cell keys on the native delegation event (`session.thread_created` + `agent.thread_message_sent`) because coordinator delegation is async (the worker's reply lands after a one-shot query returns). The **wired** layer is pinned offline in [`tests/test_coverage_matrix_plan.py`](tests/test_coverage_matrix_plan.py) (CI); the EXERCISED column comes from committed live receipts under [`tests/live/receipts/`](tests/live/receipts/). The built-in **web tools** (`web_search`/`web_fetch`) are not part of this six-dimension fixture — they were exercised on a **separate** Google live deploy (both tool-agents fired server-side), receipted independently. Full evidence + reproduce steps: [`docs/tested-platforms.md`](docs/tested-platforms.md#live-coverage-matrix--receipt-evidence-not-a-capability-ranking) and [`tests/live/README.md`](tests/live/README.md).
229229
230230
## Isolation: each agent gets only its folder
231231

@@ -392,7 +392,15 @@ Everything is here or one click away:
392392

393393
## Roadmap
394394

395-
- **Google deploy parity** — the live `deploy --target google` now ships prompts + coordinator/`sub_agents` + **skills + URL MCP (with inline-auth-via-env-vars) + built-in web tools (`web_search`→Google Search grounding, `web_fetch`→URL Context)** + model, idempotent via a spec hash. Remaining for full parity: the built-in **sandbox** tools (bash/files/glob-grep — Vertex's sandbox is Python/JS only), `:ask`/per-tool approval (not enforced on `VertexAiSessionService`), Claude-on-Vertex models (today Claude→Gemini), and per-agent IDs via A2A.
395+
The live `deploy --target google` ships prompts + coordinator/`sub_agents` + **skills + URL MCP (with inline-auth-via-env-vars) + built-in web tools (`web_search`→Google Search grounding, `web_fetch`→URL Context)** + model, idempotent via a spec hash. The remaining Google differences are **known gaps and non-equivalences, not all roadmap items** — two are deliberate non-goals:
396+
397+
- **Built-in sandbox tools (bash/files/glob-grep) — emulating them in-engine is a non-goal, not a TODO.** Vertex's hosted sandbox is Python/JS only with no persistent workspace filesystem; pretending it's a shell+FS *inside* Agent Engine would be exactly the silent degradation agentlift exists to surface. If a Google-hosted agent needs filesystem/shell/code-search, expose that environment deliberately through a URL MCP server, which *does* deploy — see [the workaround](docs/deploy-google.md#two-known-gaps-and-how-to-work-around-them).
398+
- **`:ask` / per-tool approval — gate it in your caller, don't fake it in the runtime.** Not enforced on `VertexAiSessionService`, so enforce approval client-side in the loop that calls the engine, or keep `:ask` agents on Anthropic where the gate is native. Surfaced as a diagnostic, never lost.
399+
- **Claude-on-Vertex models — offline-verified spike, not shipped.** Today Claude folder models map to Gemini. ADK 1.34.3 *can* resolve Claude on Vertex and the mixed-model shape composes (web sub-agents stay Gemini) — proven offline in [`experiments/claude-on-vertex/`](experiments/claude-on-vertex/) — but with no live receipt yet, a Claude `--google-model` is **refused**, not silently shipped.
400+
- **Per-agent IDs via A2A.** Google deploys the roster as one `reasoningEngine`; per-agent addressability would need the A2A protocol across deployments.
401+
402+
Genuinely on the roadmap:
403+
396404
- **`export openai-chatkit`** — wrap the `openai-agents` script in a self-hostable ChatKit server (the Agents SDK export already ships)
397405
- Authenticated remote MCP via the Vaults API
398406
- `agentlift diff --remote` deeper drift detection (full account reconciliation)

docs/deploy-google.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,55 @@ into the generated agent code. At deploy time it:
9898
**"Env vars to populate"**) so you can confirm the referenced local variables are set
9999
before deploying. A referenced-but-unset variable is flagged, not silently skipped.
100100

101+
## Two known gaps, and how to work around them
102+
103+
Google deploy maps skills, URL MCP (with inline auth), and the built-in **web** tools. Two
104+
capabilities don't map — but both have a workaround that keeps the *same* neutral folder.
105+
106+
### Built-in sandbox tools (`bash`/`files`/`glob`/`grep`) → expose them as an MCP server
107+
108+
Agent Engine's hosted sandbox runs **Python/JS only** — there is no shell, and no workspace
109+
filesystem to glob/grep over. So `bash`/`edit`/`write`/`glob`/`grep`/`read` deploy *without*
110+
their built-in (a `google.builtin.degraded` warning, never a silent drop).
111+
112+
The escape hatch: anything the sandbox tools would have done, a **URL MCP server can do**
113+
and URL MCP *is* mapped on Google. Host a small MCP server that exposes the capability you
114+
need (a filesystem server over a bucket/volume, a shell-exec server, a code-search server),
115+
put it behind HTTPS, and add it to the agent's `mcp.json`. The agent then calls e.g.
116+
`fs.read`/`shell.run` as MCP tools instead of the built-in `read`/`bash`. agentlift wires it
117+
as an ADK `McpToolset` with a `tool_filter` allowlist, and inline auth resolves into an
118+
Agent Engine `env_var` (see above) — so a private server stays private. This is the
119+
provider-neutral way to give a Google deploy real filesystem/shell reach: the *definition*
120+
is still one folder; only the runtime substrate differs (rented MCP server vs Anthropic's
121+
built-in sandbox).
122+
123+
> Reframe, not a TODO: emulating Anthropic's sandbox tools *inside* Agent Engine is an
124+
> explicit **non-goal** — the substrate is Python/JS, not a shell+FS, and pretending
125+
> otherwise would be the kind of silent degradation agentlift exists to surface. MCP is the
126+
> supported path.
127+
128+
### `:ask` / per-tool approval → gate it in your runner (or keep the agent on Anthropic)
129+
130+
A `:ask` suffix is a human-in-the-loop **permission policy**. On Anthropic it deploys as a
131+
real per-tool gate. On Google it is **not enforced** — ADK tool-confirmation doesn't ride
132+
through `VertexAiSessionService` on the deployed engine today — so a `:ask`-gated tool stays
133+
callable without a prompt (a `google.tool_approval.unsupported` warning at plan time).
134+
135+
Two ways to keep the approval semantics:
136+
137+
1. **Gate client-side.** You already invoke a deployed engine *from your own code*
138+
(`remote.stream_query(...)`) — that loop is where approval belongs. Stream the events,
139+
pause when the model requests a `:ask`-marked tool, prompt your operator, and only then
140+
continue. The deployed engine is a callable; the human-in-the-loop lives in the caller,
141+
exactly as it would for any hosted API. (This is the same "where does the loop run"
142+
split as subagents: rented runtime, self-hosted control.)
143+
2. **Keep `:ask` agents on Anthropic.** If the gate must be enforced *by the runtime* rather
144+
than your caller, deploy that agent to Anthropic, where the policy is native. The folder
145+
is unchanged; only the target differs.
146+
147+
Either way the `:ask` in the folder is never lost — it surfaces as a diagnostic, and the
148+
policy is honored at the boundary you control.
149+
101150
## Cost
102151

103152
A deployed Agent Engine is billed compute (it provisions a managed `reasoningEngine`),

docs/limitations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ same folder reaches every target. What differs is how far each runtime takes it:
6060
| Target | Status | Limits |
6161
|---|---|---|
6262
| Anthropic Managed Agents | Live deploy | Reference target; most complete mapping (skills, MCP, `:ask`, coordinator). |
63-
| Google Vertex AI Agent Engine | Live deploy, preview | Deployed as a real `reasoningEngine`; maps **skills** (embedded + ADK `load_skill_from_dir`), **URL MCP** (`McpToolset` + `tool_filter`, inline auth → Agent Engine `env_vars`), and the **built-in web tools** (`web_search` → Google Search grounding, `web_fetch` → URL Context, each a wrapped tool-agent), idempotent via a spec hash. **All six portability dimensions exercised live** (delegation, both MCP servers, both skills — see the [coverage matrix](tested-platforms.md#live-coverage-matrix--receipt-evidence-not-a-capability-ranking)); the web tools were separately exercised live (both tool-agents fired on a deployed engine). Not mapped: the built-in **sandbox** tools (`bash/files/glob-grep` — Vertex's sandbox is Python/JS only) and `:ask`/per-tool approval; stdio MCP refused; Claude models map to Gemini. |
63+
| Google Vertex AI Agent Engine | Live deploy, preview | Deployed as a real `reasoningEngine`; maps **skills** (embedded + ADK `load_skill_from_dir`), **URL MCP** (`McpToolset` + `tool_filter`, inline auth → Agent Engine `env_vars`), and the **built-in web tools** (`web_search` → Google Search grounding, `web_fetch` → URL Context, each a wrapped tool-agent), idempotent via a spec hash. **All six portability dimensions exercised live** (delegation, both MCP servers, both skills — see the [coverage matrix](tested-platforms.md#live-coverage-matrix--receipt-evidence-not-a-capability-ranking)); the web tools were separately exercised live (both tool-agents fired on a deployed engine). Not mapped, each with a workaround in [deploy-google.md](deploy-google.md#two-known-gaps-and-how-to-work-around-them): the built-in **sandbox** tools (`bash/files/glob-grep` — Vertex's sandbox is Python/JS only; expose equivalents via a URL MCP server, an explicit non-goal to emulate in-engine) and `:ask`/per-tool approval (gate it client-side); stdio MCP refused; Claude models map to Gemini (Claude-on-Vertex is an [offline-verified spike](../experiments/claude-on-vertex/), not shipped). |
6464
| OpenAI Agents SDK | Export / self-host | Subagents via agent-as-tool; the delegation loop runs in your app — no hosted-deploy target. |
6565

6666
## Cost numbers are estimates

docs/provider-matrix.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,21 @@ never a silent drop) · ❌ refused / not applicable.
5353
**approximate**: URL Context decides what to fetch from the prompt rather than taking an
5454
explicit URL argument. Deploy pins `google-adk>=1.34.3` when any web tool is present.
5555
- **Built-in sandbox tools (Google).** Agent Engine's hosted sandbox is Python/JS only — no
56-
shell, no glob/grep over a workspace (there is no workspace). Supply equivalents via an
57-
MCP server. `bash/edit/write/glob/grep/read` deploy without the built-in, with a warning.
56+
shell, no glob/grep over a workspace (there is no workspace). `bash/edit/write/glob/grep/read`
57+
deploy without the built-in, with a warning. Emulating a shell+FS *inside* the engine is an
58+
explicit **non-goal** (it would be the silent degradation the tool exists to surface); the
59+
supported path is a URL MCP server, which does deploy — see
60+
[the workaround](deploy-google.md#two-known-gaps-and-how-to-work-around-them).
5861
- **`:ask` (Google).** ADK tool-confirmation is not enforced under the Agent Engine
59-
session service today, so a `:ask`-gated tool stays available without a gate. Keep
60-
`:ask` agents on the Anthropic target where the gate is real.
62+
session service today, so a `:ask`-gated tool stays available without a gate. Enforce
63+
approval **client-side** in the loop that calls the engine, or keep `:ask` agents on the
64+
Anthropic target where the gate is native — see
65+
[the workaround](deploy-google.md#two-known-gaps-and-how-to-work-around-them).
66+
- **Model (Google).** Claude folder models map to Gemini (`gemini-2.5-flash`). Keeping a
67+
Claude brain via **Claude-on-Vertex** is offline-verified but **not shipped** — ADK 1.34.3
68+
resolves Claude on Vertex and the mixed-model shape composes (web sub-agents stay Gemini),
69+
but with no live receipt a Claude `--google-model` is refused, not silently deployed. See
70+
[`experiments/claude-on-vertex/`](../experiments/claude-on-vertex/).
6171
- **Subagents (per-agent IDs).** Anthropic gives each agent its own addressable id;
6272
Google deploys the whole roster as **one** `reasoningEngine` with server-side
6373
delegation, so the roster is not individually addressable (the A2A protocol across

0 commit comments

Comments
 (0)