phuryn
diff --git a/‎CLAUDE.md‎
Lines changed: 16 additions & 6 deletions b/‎CLAUDE.md‎
Lines changed: 16 additions & 6 deletions
diff --git a/‎README.md‎
Lines changed: 11 additions & 3 deletions b/‎README.md‎
Lines changed: 11 additions & 3 deletions
diff --git a/‎docs/deploy-google.md‎
Lines changed: 49 additions & 0 deletions b/‎docs/deploy-google.md‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎docs/limitations.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/limitations.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/provider-matrix.md‎
Lines changed: 14 additions & 4 deletions b/‎docs/provider-matrix.md‎
Lines changed: 14 additions & 4 deletions
@@ -82,10 +82,10 @@ A bare ref resolves to the agent's **own** resource first, then `shared/`.
 | Skills | uploaded, shared by id (skill-bearing agents auto-get `read` — Managed Agents needs it to open `SKILL.md`) | ✅ embedded in source package, loaded via ADK `load_skill_from_dir` (update = redeploy) | export comment only |
 | Remote MCP | mapped | ✅ URL → ADK `McpToolset` + `tool_filter`; inline auth → Agent Engine `env_vars` (resolved at deploy, never inlined) | export comment only |
 | Built-in web tools (`web_search`/`web_fetch`) | mapped | ✅ `web_search`→Google Search grounding, `web_fetch`→URL Context, each a wrapped single-tool ADK sub-agent (`AgentTool`, `propagate_grounding_metadata=True`); always-wrap so they coexist with `transfer_to_agent`; pins `google-adk>=1.34.3` | `WebSearchTool` / self-host fetch |
-| Built-in sandbox tools (`bash/files/glob-grep`) | mapped | 🚧 skipped (sandbox is Python/JS only) | self-host runner |
-| `:ask` | permission policy | 🚧 unsupported on `VertexAiSessionService` | client-side |
+| Built-in sandbox tools (`bash/files/glob-grep`) | mapped | 🚧 skipped (sandbox is Python/JS only — in-engine emulation is a **non-goal**; expose equivalents via a URL MCP server) | self-host runner |
+| `:ask` | permission policy | 🚧 unsupported on `VertexAiSessionService` (gate client-side, or keep on Anthropic) | client-side |
 | Idempotency | lockfile + content hashes | ✅ `.agentlift-google.json` spec hash → create/update/skip | n/a |
-| Model | Claude (native) | 🔁 mapped to Gemini (`gemini-2.5-flash`) | 🔁 mapped to `gpt-*` |
+| Model | Claude (native) | 🔁 mapped to Gemini (`gemini-2.5-flash`); Claude-on-Vertex is an offline-verified **spike, not shipped** (`experiments/claude-on-vertex/`) — a Claude `--google-model` is refused (`google.deploy_model.claude_unsupported`) | 🔁 mapped to `gpt-*` |
 
 **Live-verified (6/6 both):** one neutral fixture (`tests/live/fixtures/coverage-matrix`) was deployed
 + queried on **both** Anthropic and Google; all six portability dimensions (agents · subagents ·
@@ -111,9 +111,19 @@ so the objective signal is the tool-call + its response content, not citation ch
 `deploy --target google` reports *agentlift's current implementation*. These now agree on
 skills, URL MCP, and the built-in **web** tools (all mapped). They still diverge on the
 built-in **sandbox** tools and `:ask` (`audit` rates them `degraded`/`unsupported` for
-Google; `deploy` skips a stdio MCP server / sandbox-tool-only folder). Pipeline for Google
-mirrors Anthropic's *plan-is-the-contract* discipline: `google_plan.py` is pure and
-offline-tested, only `google_target.py` touches the network.
+Google; `deploy` skips a stdio MCP server / sandbox-tool-only folder). Those two are framed
+as **non-goals with workarounds**, not parity TODOs (sandbox → expose via a URL MCP server;
+`:ask` → gate client-side or keep on Anthropic — see
+[docs/deploy-google.md](docs/deploy-google.md)). Pipeline for Google mirrors Anthropic's
+*plan-is-the-contract* discipline: `google_plan.py` is pure and offline-tested, only
+`google_target.py` touches the network.
+
+**Claude-on-Vertex (spike, not shipped):** ADK 1.34.3 resolves Claude on Vertex and the
+mixed-model shape composes (web sub-agents must stay Gemini — Search/URL-Context are Gemini
+built-ins, encoded by `web_model()` in `google_codegen.py`). Offline-verified in
+`experiments/claude-on-vertex/`; no live receipt yet, so `build_google_plan` **refuses** a
+Claude `--google-model` (`google.deploy_model.claude_unsupported`) rather than silently
+shipping it (the *confirm-live-before-encoding* rule).
 
 ## Commands
 
 
@@ -184,7 +184,7 @@ $ agentlift audit ./examples/team --targets anthropic,google,openai
         reason: web_search -> Google Search grounding, web_fetch -> URL Context (each a wrapped tool-agent)
   degraded:
     ! Built-in tool sandbox (bash / files / glob-grep)
-        reason: hosted sandbox is Python/JS only
+        reason: hosted sandbox is Python/JS only - no bash, no persistent workspace filesystem
   unsupported:
     x Per-tool approval gate (:ask / human-in-the-loop)
         reason: not enforced with VertexAiSessionService on the deployed runtime
@@ -225,7 +225,7 @@ One neutral [coverage fixture](tests/live/fixtures/coverage-matrix/) — a coord
 
 > `*` **OpenAI is an export target, not a live deploy** — there is no code-define-and-OpenAI-host path. agents + subagents are real (the `as_tool` composition is trace-verified in [`experiments/subagent-composition`](experiments/subagent-composition/)); MCP and skills compile to guided self-host scaffolding (`HostedMCPTool` / Skills-API call sites), since the orchestration loop runs in *your* app. `◐` = compiled to a self-host stub, not live-exercised.
 >
-> `EXERCISED` = an objective runtime event proved it. **Both Anthropic and Google exercised all six dimensions server-side (6/6)** on a real, billable deploy. Anthropic's subagents cell keys on the native delegation event (`session.thread_created` + `agent.thread_message_sent`) because coordinator delegation is async (the worker's reply lands after a one-shot query returns). The **wired** layer is pinned offline in [`tests/test_coverage_matrix_plan.py`](tests/test_coverage_matrix_plan.py) (CI); the EXERCISED column comes from committed live receipts under [`tests/live/receipts/`](tests/live/receipts/). Full evidence + reproduce steps: [`docs/tested-platforms.md`](docs/tested-platforms.md#live-coverage-matrix--receipt-evidence-not-a-capability-ranking) and [`tests/live/README.md`](tests/live/README.md).
+> `EXERCISED` = an objective runtime event proved it. **Both Anthropic and Google exercised all six dimensions server-side (6/6)** on a real, billable deploy. Anthropic's subagents cell keys on the native delegation event (`session.thread_created` + `agent.thread_message_sent`) because coordinator delegation is async (the worker's reply lands after a one-shot query returns). The **wired** layer is pinned offline in [`tests/test_coverage_matrix_plan.py`](tests/test_coverage_matrix_plan.py) (CI); the EXERCISED column comes from committed live receipts under [`tests/live/receipts/`](tests/live/receipts/). The built-in **web tools** (`web_search`/`web_fetch`) are not part of this six-dimension fixture — they were exercised on a **separate** Google live deploy (both tool-agents fired server-side), receipted independently. Full evidence + reproduce steps: [`docs/tested-platforms.md`](docs/tested-platforms.md#live-coverage-matrix--receipt-evidence-not-a-capability-ranking) and [`tests/live/README.md`](tests/live/README.md).
 
 ## Isolation: each agent gets only its folder
 
@@ -392,7 +392,15 @@ Everything is here or one click away:
 
 ## Roadmap
 
-- **Google deploy parity** — the live `deploy --target google` now ships prompts + coordinator/`sub_agents` + **skills + URL MCP (with inline-auth-via-env-vars) + built-in web tools (`web_search`→Google Search grounding, `web_fetch`→URL Context)** + model, idempotent via a spec hash. Remaining for full parity: the built-in **sandbox** tools (bash/files/glob-grep — Vertex's sandbox is Python/JS only), `:ask`/per-tool approval (not enforced on `VertexAiSessionService`), Claude-on-Vertex models (today Claude→Gemini), and per-agent IDs via A2A.
+The live `deploy --target google` ships prompts + coordinator/`sub_agents` + **skills + URL MCP (with inline-auth-via-env-vars) + built-in web tools (`web_search`→Google Search grounding, `web_fetch`→URL Context)** + model, idempotent via a spec hash. The remaining Google differences are **known gaps and non-equivalences, not all roadmap items** — two are deliberate non-goals:
+
+- **Built-in sandbox tools (bash/files/glob-grep) — emulating them in-engine is a non-goal, not a TODO.** Vertex's hosted sandbox is Python/JS only with no persistent workspace filesystem; pretending it's a shell+FS *inside* Agent Engine would be exactly the silent degradation agentlift exists to surface. If a Google-hosted agent needs filesystem/shell/code-search, expose that environment deliberately through a URL MCP server, which *does* deploy — see [the workaround](docs/deploy-google.md#two-known-gaps-and-how-to-work-around-them).
+- **`:ask` / per-tool approval — gate it in your caller, don't fake it in the runtime.** Not enforced on `VertexAiSessionService`, so enforce approval client-side in the loop that calls the engine, or keep `:ask` agents on Anthropic where the gate is native. Surfaced as a diagnostic, never lost.
+- **Claude-on-Vertex models — offline-verified spike, not shipped.** Today Claude folder models map to Gemini. ADK 1.34.3 *can* resolve Claude on Vertex and the mixed-model shape composes (web sub-agents stay Gemini) — proven offline in [`experiments/claude-on-vertex/`](experiments/claude-on-vertex/) — but with no live receipt yet, a Claude `--google-model` is **refused**, not silently shipped.
+- **Per-agent IDs via A2A.** Google deploys the roster as one `reasoningEngine`; per-agent addressability would need the A2A protocol across deployments.
+
+Genuinely on the roadmap:
+
 - **`export openai-chatkit`** — wrap the `openai-agents` script in a self-hostable ChatKit server (the Agents SDK export already ships)
 - Authenticated remote MCP via the Vaults API
 - `agentlift diff --remote` deeper drift detection (full account reconciliation)
 
@@ -98,6 +98,55 @@ into the generated agent code. At deploy time it:
 **"Env vars to populate"**) so you can confirm the referenced local variables are set
 before deploying. A referenced-but-unset variable is flagged, not silently skipped.
 
+## Two known gaps, and how to work around them
+
+Google deploy maps skills, URL MCP (with inline auth), and the built-in **web** tools. Two
+capabilities don't map — but both have a workaround that keeps the *same* neutral folder.
+
+### Built-in sandbox tools (`bash`/`files`/`glob`/`grep`) → expose them as an MCP server
+
+Agent Engine's hosted sandbox runs **Python/JS only** — there is no shell, and no workspace
+filesystem to glob/grep over. So `bash`/`edit`/`write`/`glob`/`grep`/`read` deploy *without*
+their built-in (a `google.builtin.degraded` warning, never a silent drop).
+
+The escape hatch: anything the sandbox tools would have done, a **URL MCP server can do** —
+and URL MCP *is* mapped on Google. Host a small MCP server that exposes the capability you
+need (a filesystem server over a bucket/volume, a shell-exec server, a code-search server),
+put it behind HTTPS, and add it to the agent's `mcp.json`. The agent then calls e.g.
+`fs.read`/`shell.run` as MCP tools instead of the built-in `read`/`bash`. agentlift wires it
+as an ADK `McpToolset` with a `tool_filter` allowlist, and inline auth resolves into an
+Agent Engine `env_var` (see above) — so a private server stays private. This is the
+provider-neutral way to give a Google deploy real filesystem/shell reach: the *definition*
+is still one folder; only the runtime substrate differs (rented MCP server vs Anthropic's
+built-in sandbox).
+
+> Reframe, not a TODO: emulating Anthropic's sandbox tools *inside* Agent Engine is an
+> explicit **non-goal** — the substrate is Python/JS, not a shell+FS, and pretending
+> otherwise would be the kind of silent degradation agentlift exists to surface. MCP is the
+> supported path.
+
+### `:ask` / per-tool approval → gate it in your runner (or keep the agent on Anthropic)
+
+A `:ask` suffix is a human-in-the-loop **permission policy**. On Anthropic it deploys as a
+real per-tool gate. On Google it is **not enforced** — ADK tool-confirmation doesn't ride
+through `VertexAiSessionService` on the deployed engine today — so a `:ask`-gated tool stays
+callable without a prompt (a `google.tool_approval.unsupported` warning at plan time).
+
+Two ways to keep the approval semantics:
+
+1. **Gate client-side.** You already invoke a deployed engine *from your own code*
+   (`remote.stream_query(...)`) — that loop is where approval belongs. Stream the events,
+   pause when the model requests a `:ask`-marked tool, prompt your operator, and only then
+   continue. The deployed engine is a callable; the human-in-the-loop lives in the caller,
+   exactly as it would for any hosted API. (This is the same "where does the loop run"
+   split as subagents: rented runtime, self-hosted control.)
+2. **Keep `:ask` agents on Anthropic.** If the gate must be enforced *by the runtime* rather
+   than your caller, deploy that agent to Anthropic, where the policy is native. The folder
+   is unchanged; only the target differs.
+
+Either way the `:ask` in the folder is never lost — it surfaces as a diagnostic, and the
+policy is honored at the boundary you control.
+
 ## Cost
 
 A deployed Agent Engine is billed compute (it provisions a managed `reasoningEngine`),
 
@@ -60,7 +60,7 @@ same folder reaches every target. What differs is how far each runtime takes it:
 | Target | Status | Limits |
 |---|---|---|
 | Anthropic Managed Agents | Live deploy | Reference target; most complete mapping (skills, MCP, `:ask`, coordinator). |
-| Google Vertex AI Agent Engine | Live deploy, preview | Deployed as a real `reasoningEngine`; maps **skills** (embedded + ADK `load_skill_from_dir`), **URL MCP** (`McpToolset` + `tool_filter`, inline auth → Agent Engine `env_vars`), and the **built-in web tools** (`web_search` → Google Search grounding, `web_fetch` → URL Context, each a wrapped tool-agent), idempotent via a spec hash. **All six portability dimensions exercised live** (delegation, both MCP servers, both skills — see the [coverage matrix](tested-platforms.md#live-coverage-matrix--receipt-evidence-not-a-capability-ranking)); the web tools were separately exercised live (both tool-agents fired on a deployed engine). Not mapped: the built-in **sandbox** tools (`bash/files/glob-grep` — Vertex's sandbox is Python/JS only) and `:ask`/per-tool approval; stdio MCP refused; Claude models map to Gemini. |
+| Google Vertex AI Agent Engine | Live deploy, preview | Deployed as a real `reasoningEngine`; maps **skills** (embedded + ADK `load_skill_from_dir`), **URL MCP** (`McpToolset` + `tool_filter`, inline auth → Agent Engine `env_vars`), and the **built-in web tools** (`web_search` → Google Search grounding, `web_fetch` → URL Context, each a wrapped tool-agent), idempotent via a spec hash. **All six portability dimensions exercised live** (delegation, both MCP servers, both skills — see the [coverage matrix](tested-platforms.md#live-coverage-matrix--receipt-evidence-not-a-capability-ranking)); the web tools were separately exercised live (both tool-agents fired on a deployed engine). Not mapped, each with a workaround in [deploy-google.md](deploy-google.md#two-known-gaps-and-how-to-work-around-them): the built-in **sandbox** tools (`bash/files/glob-grep` — Vertex's sandbox is Python/JS only; expose equivalents via a URL MCP server, an explicit non-goal to emulate in-engine) and `:ask`/per-tool approval (gate it client-side); stdio MCP refused; Claude models map to Gemini (Claude-on-Vertex is an [offline-verified spike](../experiments/claude-on-vertex/), not shipped). |
 | OpenAI Agents SDK | Export / self-host | Subagents via agent-as-tool; the delegation loop runs in your app — no hosted-deploy target. |
 
 ## Cost numbers are estimates
 
@@ -53,11 +53,21 @@ never a silent drop) · ❌ refused / not applicable.
   **approximate**: URL Context decides what to fetch from the prompt rather than taking an
   explicit URL argument. Deploy pins `google-adk>=1.34.3` when any web tool is present.
 - **Built-in sandbox tools (Google).** Agent Engine's hosted sandbox is Python/JS only — no
-  shell, no glob/grep over a workspace (there is no workspace). Supply equivalents via an
-  MCP server. `bash/edit/write/glob/grep/read` deploy without the built-in, with a warning.
+  shell, no glob/grep over a workspace (there is no workspace). `bash/edit/write/glob/grep/read`
+  deploy without the built-in, with a warning. Emulating a shell+FS *inside* the engine is an
+  explicit **non-goal** (it would be the silent degradation the tool exists to surface); the
+  supported path is a URL MCP server, which does deploy — see
+  [the workaround](deploy-google.md#two-known-gaps-and-how-to-work-around-them).
 - **`:ask` (Google).** ADK tool-confirmation is not enforced under the Agent Engine
-  session service today, so a `:ask`-gated tool stays available without a gate. Keep
-  `:ask` agents on the Anthropic target where the gate is real.
+  session service today, so a `:ask`-gated tool stays available without a gate. Enforce
+  approval **client-side** in the loop that calls the engine, or keep `:ask` agents on the
+  Anthropic target where the gate is native — see
+  [the workaround](deploy-google.md#two-known-gaps-and-how-to-work-around-them).
+- **Model (Google).** Claude folder models map to Gemini (`gemini-2.5-flash`). Keeping a
+  Claude brain via **Claude-on-Vertex** is offline-verified but **not shipped** — ADK 1.34.3
+  resolves Claude on Vertex and the mixed-model shape composes (web sub-agents stay Gemini),
+  but with no live receipt a Claude `--google-model` is refused, not silently deployed. See
+  [`experiments/claude-on-vertex/`](../experiments/claude-on-vertex/).
 - **Subagents (per-agent IDs).** Anthropic gives each agent its own addressable id;
   Google deploys the whole roster as **one** `reasoningEngine` with server-side
   delegation, so the roster is not individually addressable (the A2A protocol across