Skip to content

feat(client): add dynamo_chat transport + routed_experts to renderer generate#79

Open
biswapanda wants to merge 26 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4
Open

feat(client): add dynamo_chat transport + routed_experts to renderer generate#79
biswapanda wants to merge 26 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

Conversation

@biswapanda

@biswapanda biswapanda commented Jun 9, 2026

Copy link
Copy Markdown

Description

Adds a dynamo_chat transport to the renderer-based generate() client so it can run against NVIDIA Dynamo, which serves no /inference/v1/generate route. Selected per-call via transport=; defaults to the existing vLLM path, so behavior is unchanged unless opted in.

Two transports:

  • vllm_generate (default): unchanged — messages → render_ids() → POST /inference/v1/generate → parse_response() (vLLM TITO surface).
  • dynamo_chat: messages → render_ids() → POST /v1/chat/completions with nvext.token_data (pre-tokenized prompt) + nvext.extra_fields=["engine_data"]. Completion token IDs and logprobs are read back from nvext.engine_data.

Dynamo wire shape (_post_dynamo_chat)

Mirrors the verifiers token client so the payload is identical whether a rollout goes through the token client or the renderer client. nvext.token_data (Dynamo skips tokenization when present); cache_saltnvext.cache_salt, prioritynvext.agent_hints.priority; a single placeholder user message; sampling remap (max_tokensmax_completion_tokens, logprobs=Nlogprobs=true + top_logprobs=N); passthrough fields ride the Dynamo allowlist. Tools are baked into token_data by the renderer (not sent on the wire).

routed_experts (MoE expert replay) — now surfaced on dynamo_chat

(Supersedes the earlier "routed_experts intentionally NOT surfaced" note — it now is.) parse reads routed_experts from nvext.routed_experts (or nvext.engine_data.routed_experts) and maps it to the downstream RoutedExpertsPayload {data, shape, start, dtype}. The Dynamo worker returns full-sequence routing with start=0; the renderer row-trims the leading prompt rows only when the caller explicitly sets routed_experts_prompt_start — a first-turn request with no caller start stays full-sequence with start=0 (no phantom prefix). Completion logprobs prefer nvext.engine_data.completion_logprobs (the same authoritative source as the engine token IDs) over the chat echo; a present-but-empty engine list is authoritative and does not fall back to chat.

Other

  • Public RendererTransport = Literal["vllm_generate", "dynamo_chat"] alias. A present-but-empty completion_token_ids is a valid zero-token completion; only a fully absent field raises. Multimodal renderers raise NotImplementedError on dynamo_chat (vLLM path / token-client TITO remain available for VLMs).

Type of Change

  • New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (F1/F2/F3 + the N1 logprob-presence finding resolved; head 5f2a914). All review threads resolved.

Testing

tests/test_client.py covers the Dynamo request body shape (priority/detokenize/sampling remap), routed_experts parse + row-trim (explicit prompt_start vs first-turn full-sequence), engine-logprob preference incl. present-but-empty, and missing/empty completion IDs.


Note

Medium Risk
New Dynamo wire/parse path affects RL-critical completion IDs, logprobs, and MoE routed_experts; strict runtime errors and no Dynamo multimodal are new failure modes for opted-in rollouts.

Overview
Adds a per-call transport parameter to generate() ("vllm" default, "dynamo" opt-in). The existing vLLM TITO flow is moved into _VllmGenerateTransport; behavior stays the same when transport is omitted.

Dynamo uses _DynamoChatTransport: pre-tokenized prompts go to POST /v1/chat/completions via nvext.token_data, with cache_salt, priority, and routed_experts_prompt_start mapped into nvext and vLLM-only sampling keys dropped. Responses read nvext.engine_data for completion IDs and logprobs (not chat echo), normalize routed_experts, keep large blobs as zero-copy memoryview, and optionally client-trim prompt rows when an older worker returns full-sequence routing.

Multimodal on Dynamo raises NotImplementedError; missing engine completion IDs or logprob length mismatches raise RuntimeError. Tests cover wire shape, nvext merge, and parse edge cases.

Reviewed by Cursor Bugbot for commit 57846ec. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add dynamo transport and routed-experts support to generate() in the renderer client

  • Adds a transport parameter to generate() in renderers/client.py, defaulting to 'vllm' (existing /inference/v1/generate path); passing 'dynamo' routes to OpenAI-compatible /v1/chat/completions with NVIDIA Dynamo nvext fields.
  • Introduces a _Transport ABC with _VllmGenerateTransport and _DynamoChatTransport implementations; each handles body construction, POST, and response normalization into a common _WireResult.
  • The Dynamo transport maps sampling params (dropping vLLM-only keys), moves cache_salt/priority into nvext, and prefers engine_data fields over chat-echo fields when parsing responses.
  • Adds client-side trimming of base64-encoded routed_experts via _trim_dynamo_routed_experts when routed_experts_prompt_start is set and the worker has not already trimmed.
  • Risk: generate() with transport='dynamo' raises RuntimeError on missing completion_token_ids or logprob/token-ID length mismatches, where the vLLM path does not.

Macroscope summarized 57846ec.

Comment thread renderers/client.py Outdated
Comment thread renderers/client.py Outdated
Comment thread renderers/client.py Outdated
@biswapanda biswapanda changed the title feat(client): add dynamo_chat_nvext transport to renderer generate() feat(client): add dynamo_chat_nvext transport to renderer Jun 9, 2026
…ols from dynamo body, raise on missing ids; rename transport to dynamo_chat
@biswapanda biswapanda changed the title feat(client): add dynamo_chat_nvext transport to renderer feat(client): add dynamo_chat transport to renderer generate() Jun 9, 2026
Comment thread renderers/client.py
@biswapanda biswapanda changed the title feat(client): add dynamo_chat transport to renderer generate() feat(client): add dynamo_chat transport to renderer generate Jun 10, 2026
Comment thread renderers/client.py Outdated
@biswapanda biswapanda changed the title feat(client): add dynamo_chat transport to renderer generate feat(client): add dynamo_chat transport + routed_experts to renderer generate Jun 10, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit fec0a81. Configure here.

Comment thread renderers/client.py
Comment thread renderers/client.py Outdated
Comment thread renderers/client.py Outdated
Comment thread renderers/client.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants