Claude-on-Vertex spike: make the live probe turn-key on claude-sonnet-4-6

phuryn · claude · phuryn · commit e1132255fb92 · 2026-06-04T19:18:53.000+02:00
The Model Garden precondition is now satisfied (claude-sonnet-4-6 enabled), so the deploy probe is no longer blocked on enablement -- only on being run. - deploy probe: default CLAUDE_VERTEX_MODEL=claude-sonnet-4-6, engine region us-central1; add optional CLAUDE_VERTEX_REGION -> injected as a GOOGLE_CLOUD_LOCATION Agent Engine env var so the in-engine Claude client can target the global model endpoint while the engine resource stays regional (the one live unknown). - construct probe: default to the bare claude-sonnet-4-6 id (verified to resolve in ADK 1.34.3 without an @Version suffix). - RESULTS.md: refresh console output, correct the "id must be @versioned" claim (bare resolves), and reframe the live-half preconditions around the region question. Still offline-only: no live receipt, planner still refuses a Claude --google-model. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
diff --git a/experiments/claude-on-vertex/RESULTS.md b/experiments/claude-on-vertex/RESULTS.md
@@ -17,19 +17,19 @@ probe that would graduate it.
 
 ```
 registry:
-  'claude-sonnet-4-5@20250929' -> google.adk.models.anthropic_llm.Claude
+  'claude-sonnet-4-6' -> google.adk.models.anthropic_llm.Claude
   Claude.supported_models() = ['claude-3-.*', 'claude-.*-4.*']
 
 constructed (offline, no ADC):
-  parent  : lead  model=claude-sonnet-4-5@20250929  -> Claude
+  parent  : lead  model=claude-sonnet-4-6  -> Claude
   web sub : lead_web_search  model=gemini-2.5-flash  -> Gemini
 
 OK: Claude main agent + Gemini-pinned web sub-agent compose. Mixed-model invariant holds.
 ```
 
 Two facts established:
 
-1. **ADK natively resolves Claude on Vertex.** `LLMRegistry.resolve("claude-sonnet-4-5@20250929")`
+1. **ADK natively resolves Claude on Vertex.** `LLMRegistry.resolve("claude-sonnet-4-6")`
    returns `google.adk.models.anthropic_llm.Claude`, backed by `AsyncAnthropicVertex` —
    Claude served through Vertex AI, no extra package. An `LlmAgent(model="claude-…")` is a
    valid ADK agent.
@@ -38,27 +38,35 @@ Two facts established:
    `GoogleSearchTool`/`url_context`, which are **Gemini built-ins** — they cannot run on a
    Claude model. So a Claude parent must pin its wrapped web sub-agents to Gemini.
 
-### The Vertex Claude id is `@versioned`
+### The Vertex Claude id resolves bare (a `@version` suffix is optional)
 
-A Vertex Claude model id carries an `@version` suffix (`claude-sonnet-4-5@20250929`),
-unlike the Anthropic API ids the folder uses (`claude-haiku-4-5`). Any future passthrough
-would have to map folder ids → the `@versioned` Vertex ids, per region availability.
+ADK resolves the **bare** Vertex Claude id (`claude-sonnet-4-6`) — confirmed above against
+`Claude.supported_models() = ['claude-3-.*', 'claude-.*-4.*']` — and an `@versioned` form
+(`claude-sonnet-4-5@20250929`) resolves too. So a future passthrough maps folder ids →
+Vertex Claude ids subject to **per-region/Model-Garden availability**, not a mandatory
+version-pinning step.
 
 ## What this does NOT prove (the live half — blocked)
 
 `claude_on_vertex_deploy.py` is the live probe that would close the loop: it deploys ONE
 `reasoningEngine` with a Claude-on-Vertex root + Gemini web sub-agent, queries it (the
 instruction prepends a literal `CLAUDEVTX` token so the reply confirms which brain
 answered), and tears it down. It is **env-driven and committed without identifiers**, and
-has **not been run** — it needs preconditions that aren't satisfiable in this session:
+has **not been run yet**. Preconditions:
 
 - **Claude enabled in the project's Vertex AI Model Garden** — a one-time console action;
-  Claude on Vertex is an enable-per-project, region-gated partner model.
-- **A region that serves the chosen Claude model** (e.g. `us-east5` — not every region
-  serves every Claude model).
+  Claude on Vertex is an enable-per-project, region-gated partner model. *Now satisfied:*
+  `claude-sonnet-4-6` was enabled in this project (2026-06-04), so this is no longer the
+  blocker — only running the probe is.
+- **The model-call region (the one live unknown).** An Agent Engine *resource* deploys to a
+  real region; at runtime the in-engine ADK Claude client calls `AsyncAnthropicVertex` with
+  `GOOGLE_CLOUD_LOCATION`. If the model is served only at the **global** endpoint (the
+  Vertex quickstart uses `region="global"`), the probe injects `GOOGLE_CLOUD_LOCATION=global`
+  as an engine env var (`CLAUDE_VERTEX_REGION=global`) while the engine stays in a real
+  region. Whether one knob or the override is needed is exactly what the live run settles.
 - **A billable project + staging bucket + ADC**, exactly like a normal Google deploy.
 
-Until that runs green, "Agent Engine will deploy *and run* a Claude-on-Vertex engine
+Until the probe runs green, "Agent Engine will deploy *and run* a Claude-on-Vertex engine
 end-to-end" is **NOT-PROVEN** — distinct from the offline-verified construction.
 
 ## What shipped in agentlift as a result of this spike
@@ -85,10 +93,11 @@ user-facing passthrough flag. Concretely:
 1. Run `claude_on_vertex_deploy.py` against a project with Claude enabled in Model Garden;
    capture the `CLAUDEVTX`-prefixed reply as a receipt (the unforgeable signal that the
    Claude brain — not the Gemini default — answered).
-2. Encode the wire behavior: the folder-id → `@versioned`-Vertex-id map (per region), and
-   whatever `requirements`/region constraints the live deploy revealed.
+2. Encode the wire behavior: the folder-id → Vertex-Claude-id map (per Model-Garden/region
+   availability; bare id ok), and whatever `requirements`/region constraints the live
+   deploy revealed (notably whether the model call needs the `global` endpoint).
 3. Replace the planner guard with a real passthrough (e.g. `--google-model claude-…` or a
    per-agent opt-in), keeping `web_model()` pinning the web sub-agents to Gemini.
 
 *Offline half confirmed 2026-06-04 with google-adk 1.34.3. Live half: NOT-PROVEN (Model
-Garden enablement required).*
+Garden now enabled for `claude-sonnet-4-6`; deploy probe not yet run).*
diff --git a/experiments/claude-on-vertex/claude_on_vertex_construct.py b/experiments/claude-on-vertex/claude_on_vertex_construct.py
@@ -22,9 +22,9 @@
 """
 from __future__ import annotations
 
-# A Vertex Claude model id carries an @version suffix (unlike the Anthropic API ids the
-# folder uses, e.g. claude-haiku-4-5). Pick one your project has enabled for the deploy half.
-CLAUDE_VERTEX_MODEL = "claude-sonnet-4-5@20250929"
+# ADK resolves the bare Vertex Claude id (an @version suffix also works). Pick one your
+# project has enabled for the deploy half; this matches claude_on_vertex_deploy.py's default.
+CLAUDE_VERTEX_MODEL = "claude-sonnet-4-6"
 GEMINI_WEB_MODEL = "gemini-2.5-flash"  # web grounding / URL Context are Gemini built-ins
 
 
diff --git a/experiments/claude-on-vertex/claude_on_vertex_deploy.py b/experiments/claude-on-vertex/claude_on_vertex_deploy.py
@@ -9,18 +9,29 @@
 
 PRECONDITIONS (all on you, the deployer):
   * Claude models ENABLED in your Vertex AI Model Garden (a one-time console action;
-    Claude on Vertex is an enable-per-project, region-gated partner model).
-  * A region that offers the chosen Claude model (e.g. us-east5 -- NOT every region
-    serves every Claude model; check Model Garden for availability).
+    Claude on Vertex is an enable-per-project, region-gated partner model). Confirmed
+    available for claude-sonnet-4-6 in this project (2026-06-04).
   * A billable GCP project + a Cloud Storage staging bucket + ADC, exactly like a
     normal Google deploy (see docs/deploy-google.md).
 
+THE REGION QUESTION (the one thing the live run is here to settle):
+  An Agent Engine *resource* deploys to a real region (GOOGLE_CLOUD_LOCATION). At
+  runtime the in-engine ADK Claude client builds AsyncAnthropicVertex(region=that same
+  location) -- so the model is called in the engine's region. If your Claude model is
+  served *regionally* (e.g. the engine region also serves it), one knob is enough. If
+  it is served only at the **global** endpoint (the Vertex quickstart uses
+  region="global"), set CLAUDE_VERTEX_REGION=global: the engine still deploys to a real
+  region, but we inject GOOGLE_CLOUD_LOCATION=global as an engine env var so the model
+  call targets the global endpoint. Try the one-knob path first; reach for the override
+  only if the run fails to find the model.
+
 Environment:
     GOOGLE_CLOUD_PROJECT=your-project
-    GOOGLE_CLOUD_LOCATION=us-east5            # a region where your Claude model is served
+    GOOGLE_CLOUD_LOCATION=us-central1         # where the ENGINE deploys (Agent Engine region)
     GOOGLE_GENAI_USE_VERTEXAI=TRUE
     AGENTLIFT_GCP_STAGING_BUCKET=gs://your-bucket
-    CLAUDE_VERTEX_MODEL=claude-sonnet-4-5@20250929   # the @versioned Vertex Claude id
+    CLAUDE_VERTEX_MODEL=claude-sonnet-4-6      # the Vertex Claude id you enabled (bare id ok)
+    # CLAUDE_VERTEX_REGION=global             # OPTIONAL: model-call region if != engine region
     # ADC from `gcloud auth application-default login`, or GOOGLE_APPLICATION_CREDENTIALS
 
 Run:
@@ -87,19 +98,29 @@ def deploy() -> None:
     import vertexai
     from vertexai import agent_engines
 
-    env = _require("GOOGLE_CLOUD_PROJECT", "AGENTLIFT_GCP_STAGING_BUCKET", "CLAUDE_VERTEX_MODEL")
-    location = os.environ.get("GOOGLE_CLOUD_LOCATION", "us-east5")
+    env = _require("GOOGLE_CLOUD_PROJECT", "AGENTLIFT_GCP_STAGING_BUCKET")
+    model = os.environ.get("CLAUDE_VERTEX_MODEL", "claude-sonnet-4-6")
+    location = os.environ.get("GOOGLE_CLOUD_LOCATION", "us-central1")
     os.environ.setdefault("GOOGLE_GENAI_USE_VERTEXAI", "TRUE")
 
+    # The model-call region. If the Claude model is only served at the global endpoint
+    # (the Vertex quickstart uses region="global"), set CLAUDE_VERTEX_REGION=global: the
+    # engine still deploys to `location`, but we inject GOOGLE_CLOUD_LOCATION as an engine
+    # env var so the in-engine AsyncAnthropicVertex client targets that endpoint instead.
+    model_region = os.environ.get("CLAUDE_VERTEX_REGION")
+    env_vars = {"GOOGLE_CLOUD_LOCATION": model_region} if model_region else None
+
     vertexai.init(
         project=env["GOOGLE_CLOUD_PROJECT"],
         location=location,
         staging_bucket=env["AGENTLIFT_GCP_STAGING_BUCKET"],
     )
-    print(f"deploying Claude-on-Vertex engine: model={env['CLAUDE_VERTEX_MODEL']} region={location}")
+    region_note = f" model_region={model_region}" if model_region else ""
+    print(f"deploying Claude-on-Vertex engine: model={model} region={location}{region_note}")
     remote = agent_engines.create(
-        agent_engine=_build_app(env["CLAUDE_VERTEX_MODEL"]),
+        agent_engine=_build_app(model),
         requirements=["google-cloud-aiplatform[adk,agent_engines]", "google-adk>=1.34.3"],
+        env_vars=env_vars,
     )
     with open(STATE, "w", encoding="utf-8") as fh:
         fh.write(remote.resource_name)
@@ -129,7 +150,7 @@ def teardown() -> None:
         raise SystemExit("no state file; nothing to tear down (or delete the engine in the console).")
     resource_name = open(STATE, encoding="utf-8").read().strip()
     env = _require("GOOGLE_CLOUD_PROJECT")
-    location = os.environ.get("GOOGLE_CLOUD_LOCATION", "us-east5")
+    location = os.environ.get("GOOGLE_CLOUD_LOCATION", "us-central1")
     vertexai.init(project=env["GOOGLE_CLOUD_PROJECT"], location=location)
     print(f"deleting {resource_name} ...")
     agent_engines.get(resource_name).delete(force=True)