fix(orchestrator): Harden Grok v2 settlement and steer message visibility#3578
fix(orchestrator): Harden Grok v2 settlement and steer message visibility#3578mwolson wants to merge 2 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| hasRunningTool, | ||
| hasPendingRuntimeRequest, | ||
| hasActivePlan: context.plan !== null, | ||
| hasOutput: context.assistant.nextSegment > 0, |
There was a problem hiding this comment.
🟠 High Adapters/AcpAdapterV2.ts:2163
acpRootTurnIsIdle requires hasOutput to be true before settling, but hasOutput is set to context.assistant.nextSegment > 0 — it only counts assistant text segments. When the turn produces tool calls or plan updates but no assistant text, acpRootTurnIsIdle always returns false, so the turn never settles. Simultaneously, the liveness watchdog bails out because acpRootTurnHasIngestedOutput(context) returns true for tool/plan output. If runtime.prompt also never returns, the turn is stuck in running forever — it neither settles nor fails.
Also found in 1 other location(s)
apps/server/src/orchestration-v2/Adapters/GrokAdapterV2.ts:136
Enabling both
settleRootTurnWhenIdleandrootTurnPromptLivenessMsfor Grok leaves a turn stuckrunningwhen Grok emits only reasoning/tool/plan output and then strandssession/prompt. The idle-settle path only completes turns after assistant text (hasOutput: context.assistant.nextSegment > 0), while the watchdog is disabled by any ingested output (acpRootTurnHasIngestedOutputtreats reasoning/tools/plan as output). In that case neither path finalizes the turn, so the UI can stay on Working indefinitely.
🤖 Copy this AI Prompt to have your agent fix this:
In file @apps/server/src/orchestration-v2/Adapters/AcpAdapterV2.ts around line 2163:
`acpRootTurnIsIdle` requires `hasOutput` to be true before settling, but `hasOutput` is set to `context.assistant.nextSegment > 0` — it only counts assistant text segments. When the turn produces tool calls or plan updates but no assistant text, `acpRootTurnIsIdle` always returns false, so the turn never settles. Simultaneously, the liveness watchdog bails out because `acpRootTurnHasIngestedOutput(context)` returns true for tool/plan output. If `runtime.prompt` also never returns, the turn is stuck in `running` forever — it neither settles nor fails.
Also found in 1 other location(s):
- apps/server/src/orchestration-v2/Adapters/GrokAdapterV2.ts:136 -- Enabling both `settleRootTurnWhenIdle` and `rootTurnPromptLivenessMs` for Grok leaves a turn stuck `running` when Grok emits only reasoning/tool/plan output and then strands `session/prompt`. The idle-settle path only completes turns after assistant text (`hasOutput: context.assistant.nextSegment > 0`), while the watchdog is disabled by any ingested output (`acpRootTurnHasIngestedOutput` treats reasoning/tools/plan as output). In that case neither path finalizes the turn, so the UI can stay on Working indefinitely.
There was a problem hiding this comment.
Out of scope for this PR — idle settle intentionally requires assistant text to avoid early completion during tool-heavy turns. No repro in manual validation or grok replays. Listed under Out of scope in the PR description.
There was a problem hiding this comment.
Sorry, I'm unable to act on this request because you do not have permissions within this repository.
| () => new Set(serverProjection?.messages.map((message) => message.id) ?? []), | ||
| [serverProjection], | ||
| () => deriveCommittedServerUserMessageIds(serverVisibleTurnItems), | ||
| [serverVisibleTurnItems], |
There was a problem hiding this comment.
User-only count breaks env locking
Medium Severity
committedServerMessageIds now holds only visible user_message turn item ids, but activeMessageCount still uses that set’s size for envLocked and canOverrideServerThreadEnvMode. During the projection-vs-turn-item gap (and vs all projection.messages), the thread can look message-empty and allow env overrides that used to be blocked.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 61f4adb. Configure here.
There was a problem hiding this comment.
Out of scope — steer fix intentionally keys off visible user turn items. Brief projection gap only. Out of scope in PR description.
ApprovabilityVerdict: Needs human review 6 blocking correctness issues found. This PR introduces significant runtime behavior changes to Grok v2 orchestration including new settlement logic and liveness watchdogs. Multiple unresolved review comments identify potential bugs (tool-only turns never settling, interrupt handling issues) that warrant human attention. You can customize Macroscope's approvability policy. Learn more. |
| hasRunningTool, | ||
| hasPendingRuntimeRequest, | ||
| hasActivePlan: context.plan !== null, | ||
| hasOutput: context.assistant.nextSegment > 0, |
There was a problem hiding this comment.
Tool-only turns never settle
Medium Severity
Idle settlement requires context.assistant.nextSegment > 0, while the 30s liveness watchdog skips failure once hasRootActivity or acpRootTurnHasIngestedOutput is true. A root turn that only ingests tools/reasoning with a stranded session/prompt can stay running forever with neither idle settle nor the watchdog firing.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 1e43aa0. Configure here.
There was a problem hiding this comment.
Out of scope follow-up — not required for the shipped repro. Out of scope in PR description.
| driver: GROK_PROVIDER, | ||
| capabilities: GrokProviderCapabilitiesV2, | ||
| settleRootTurnWhenIdle: true, | ||
| interruptPromptOnCancel: false, |
There was a problem hiding this comment.
🟠 High Adapters/GrokAdapterV2.ts:137
makeGrokAdapterV2 sets interruptPromptOnCancel: false, so cancelling a Grok turn calls session/cancel without first interrupting the local prompt fiber. The interruptPromptOnCancel option exists specifically to unblock Grok's wedged session/prompt calls, but leaving it false means a stuck prompt keeps the turn in running after Stop/cancel. Consider setting it to true.
| interruptPromptOnCancel: false, | |
| interruptPromptOnCancel: true, |
🤖 Copy this AI Prompt to have your agent fix this:
In file @apps/server/src/orchestration-v2/Adapters/GrokAdapterV2.ts around line 137:
`makeGrokAdapterV2` sets `interruptPromptOnCancel: false`, so cancelling a Grok turn calls `session/cancel` without first interrupting the local prompt fiber. The `interruptPromptOnCancel` option exists specifically to unblock Grok's wedged `session/prompt` calls, but leaving it `false` means a stuck prompt keeps the turn in `running` after Stop/cancel. Consider setting it to `true`.
There was a problem hiding this comment.
Intentional: interruptPromptOnCancel stays false because true breaks message_steering/grok replay. PR description updated under Out of scope; idle settle uses runtime.cancel (session/cancel).
There was a problem hiding this comment.
Sorry, I'm unable to act on this request because you do not have permissions within this repository.
| * `_x.ai/session/prompt_complete` notifications (sessionId-matched). Subagent | ||
| * completions on foreign session ids are ignored by the pending-entry gate. | ||
| */ | ||
| export const makeXAiPromptCompletionRuntime = Effect.fn("makeXAiPromptCompletionRuntime")( |
There was a problem hiding this comment.
🟡 Medium acp/XAiAcpExtension.ts:455
The dedup guard in resolveXAiPromptCompletionFallback only checks completedPromptIdsRef when notification.promptId is defined. When a _x.ai/session/prompt_complete notification arrives without a promptId, the guard is skipped and the code resolves the first pending entry matching by sessionId alone. Because makeXAiPromptCompletionRuntime registers a pending entry for every new prompt before the RPC completes, a late duplicate notification from a previous turn (without promptId) will match the new prompt's pending entry and resolve it with the stale notification's stopReason and agentResult instead of waiting for the new prompt's own response. Consider tracking completed sessions alongside completed prompt IDs so that promptId-less notifications can be deduplicated per session.
🤖 Copy this AI Prompt to have your agent fix this:
In file @apps/server/src/provider/acp/XAiAcpExtension.ts around line 455:
The dedup guard in `resolveXAiPromptCompletionFallback` only checks `completedPromptIdsRef` when `notification.promptId` is defined. When a `_x.ai/session/prompt_complete` notification arrives without a `promptId`, the guard is skipped and the code resolves the first pending entry matching by `sessionId` alone. Because `makeXAiPromptCompletionRuntime` registers a pending entry for every new prompt before the RPC completes, a late duplicate notification from a previous turn (without `promptId`) will match the new prompt's pending entry and resolve it with the stale notification's `stopReason` and `agentResult` instead of waiting for the new prompt's own response. Consider tracking completed sessions alongside completed prompt IDs so that `promptId`-less notifications can be deduplicated per session.
There was a problem hiding this comment.
Out of scope — serialized root prompts limit blast radius today. Out of scope in PR description.
There was a problem hiding this comment.
Sorry, I'm unable to act on this request because you do not have permissions within this repository.
| cancel: Effect.gen(function* () { | ||
| const started = yield* runtime.start(); | ||
| yield* abortPendingPromptCompletions(pendingXAiPromptCompletionsRef, started.sessionId); | ||
| yield* runtime.cancel; | ||
| }), |
There was a problem hiding this comment.
🟡 Medium acp/XAiAcpExtension.ts:529
The cancel method calls runtime.start() unconditionally, which creates a new ACP session even when the session was never started. If a caller cancels a run before any prompt has been sent, this spawns a brand-new Grok session just to immediately cancel it, producing unnecessary provider-side side effects from a pure stop request. Consider guarding the start() call so it only runs when there are already pending prompt completions (which implies the session was previously started).
- cancel: Effect.gen(function* () {
- const started = yield* runtime.start();
- yield* abortPendingPromptCompletions(pendingXAiPromptCompletionsRef, started.sessionId);
- yield* runtime.cancel;
- }),
+ cancel: Effect.gen(function* () {
+ const pending = yield* Ref.get(pendingXAiPromptCompletionsRef);
+ if (pending.length > 0) {
+ const started = yield* runtime.start();
+ yield* abortPendingPromptCompletions(pendingXAiPromptCompletionsRef, started.sessionId);
+ }
+ yield* runtime.cancel;
+ }),🤖 Copy this AI Prompt to have your agent fix this:
In file @apps/server/src/provider/acp/XAiAcpExtension.ts around lines 529-533:
The `cancel` method calls `runtime.start()` unconditionally, which creates a new ACP session even when the session was never started. If a caller cancels a run before any prompt has been sent, this spawns a brand-new Grok session just to immediately cancel it, producing unnecessary provider-side side effects from a pure stop request. Consider guarding the `start()` call so it only runs when there are already pending prompt completions (which implies the session was previously started).
There was a problem hiding this comment.
Out of scope — low risk on the Grok path we validated. Out of scope in PR description.
There was a problem hiding this comment.
Sorry, I'm unable to act on this request because you do not have permissions within this repository.
Base optimistic eviction on visible user turn items instead of projection.messages so steer rows do not disappear between message.updated and turn-item.updated events.
4c4d0dd to
8f6ec37
Compare
| ) { | ||
| return; | ||
| } | ||
| yield* finalizeTurn(context, "completed", undefined, { drainTrailingChunks: true }); |
There was a problem hiding this comment.
Idle settle ignores user interrupt
Medium Severity
trySettleRootTurnWhenIdle always calls finalizeTurn with status completed. After the idle check, finalizeTurn can yield during trailing-chunk drain without re-reading context.interrupted, so a concurrent interruptTurn can still terminalize the run as completed instead of interrupted.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 9da925e. Configure here.
| ), | ||
| ); | ||
| }), | ||
| cancel: Effect.gen(function* () { |
There was a problem hiding this comment.
🟡 Medium acp/XAiAcpExtension.ts:529
The cancel method calls abortPendingPromptCompletions for the entire session, which completes every pending fallback deferred with a synthetic cancelled response. Because prompt() registers its fallback deferred before the base runtime's serialization semaphore allows the RPC to execute, a prompt queued behind an active one is also in that list. When two callers invoke prompt() concurrently and the user cancels the active run, the second queued prompt receives cancelled even though it was never the active prompt and the base runtime would have allowed it to run afterward. Consider tracking the active prompt separately so cancel only aborts that one, leaving queued prompts intact.
🤖 Copy this AI Prompt to have your agent fix this:
In file @apps/server/src/provider/acp/XAiAcpExtension.ts around line 529:
The `cancel` method calls `abortPendingPromptCompletions` for the entire session, which completes every pending fallback deferred with a synthetic `cancelled` response. Because `prompt()` registers its fallback deferred before the base runtime's serialization semaphore allows the RPC to execute, a prompt queued behind an active one is also in that list. When two callers invoke `prompt()` concurrently and the user cancels the active run, the second queued prompt receives `cancelled` even though it was never the active prompt and the base runtime would have allowed it to run afterward. Consider tracking the active prompt separately so `cancel` only aborts that one, leaving queued prompts intact.
46fc8ca to
938e815
Compare
| requestId: fallback.promptId, | ||
| }, | ||
| }) | ||
| .pipe(Effect.forkChild); |
There was a problem hiding this comment.
🟡 Medium acp/XAiAcpExtension.ts:504
Effect.forkChild attaches promptRpcFiber to the caller's scope, so when the outer effect is interrupted (timeout, scope shutdown, parent cancellation) the scope auto-interrupts the fiber. The catchCause on Fiber.join then converts that interrupt-only cause into a synthetic { stopReason: "cancelled" } success, swallowing the real caller interruption and making aborted runs look like clean provider cancellations. The ensuring block already calls Fiber.interrupt(promptRpcFiber) for cleanup, so using Effect.forkDetach instead would prevent the scope from auto-interrupting the fiber and let the interruption propagate normally.
- .pipe(Effect.forkChild);
+ .pipe(Effect.forkDetach);🤖 Copy this AI Prompt to have your agent fix this:
In file @apps/server/src/provider/acp/XAiAcpExtension.ts around line 504:
`Effect.forkChild` attaches `promptRpcFiber` to the caller's scope, so when the outer effect is interrupted (timeout, scope shutdown, parent cancellation) the scope auto-interrupts the fiber. The `catchCause` on `Fiber.join` then converts that interrupt-only cause into a synthetic `{ stopReason: "cancelled" }` success, swallowing the real caller interruption and making aborted runs look like clean provider cancellations. The `ensuring` block already calls `Fiber.interrupt(promptRpcFiber)` for cleanup, so using `Effect.forkDetach` instead would prevent the scope from auto-interrupting the fiber and let the interruption propagate normally.
There was a problem hiding this comment.
Deferring the forkDetach change for now. The post-completion Interrupt in provider logs on manual retest is intentional: idle settle calls finalizeTurn first, then runtime.cancel to tear down the hung session/prompt RPC. The prompt fork already no-ops when context.finalized is true, so the synthetic cancelled response from catchCause does not affect turn status.
Scope interruption swallowing is a separate hardening concern; added to the PR out-of-scope table for a follow-up pass if we get a repro.
There was a problem hiding this comment.
Sorry, I'm unable to act on this request because you do not have permissions within this repository.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 938e815. Configure here.
Settle root Grok turns from idle root-session output when session/prompt stays hung, with debounced completion and a liveness watchdog for silent post-prompt hangs. Restore root-matched prompt_complete racing in the xAI extension without treating subagent completions as root terminalization. Fix replay/CI regressions: use forkIn(sessionScope) instead of the removed forkDaemon API, drop drainEvents (orchestration projects updates via handleSessionUpdate, not getEvents), and avoid TestClock-blocking sleeps in the completion drain path. Keep interruptPromptOnCancel off so steering replay can cancel in-flight partial assistant output before restart.


Fixes #3580
What you'll notice
Steered messages stay visible
When you steer an in-flight Grok run, your message no longer flashes away between server events. The timeline keeps the optimistic user row until the committed
user_messageturn item lands.Grok runs finish instead of wedging on Working
Grok root turns that already streamed a full reply could stay
runningwhen the provider'ssession/promptRPC never returned. The next message was then auto-routed as a steer (restart_active), which felt like the app ignored what you typed.This PR terminalizes those root turns from root-session streaming (debounced idle settle) instead of treating xAI subagent
prompt_completenotifications as root completion. The next message starts a normal turn.Stacks on
t3code/codex-turn-mapping(#2829 orchestrator v2 integration).This PR only: two commits on
fix/grok-v2— (1)apps/websteer visibility (ChatView.tsx,ChatView.logic.ts); (2) Grok settlement (AcpAdapterV2.ts,GrokAdapterV2.ts,XAiAcpExtension.ts, tests). Everything else comes from the base branch.User-facing problems and fixes
message.updatedarrived beforeturn-item.updated; optimistic rows were evicted whenprojection.messagescontained the id, but rendering usesvisibleTurnItems.user_messageturn items (deriveCommittedServerUserMessageIds).session/promptleft the provider turnrunning; client restart policy kicked in._x.ai/session/prompt_completeracing inXAiAcpExtension(ignore foreign/subagent session ids) plus debounced idle settle when assistant output is already ingested.runtime.cancelinterrupted the hungsession/promptwhile tools were still in flight.hasToolHistory,hasRunningSubagentinacpRootTurnIsIdle).Server-side mechanics (same Grok wedge as above)
acpRootSessionUpdateIngestsOutput); keepalives are ignored.context.toolsis non-empty or a subagent is running; assistant-only turns still use idle settle for hung-prompt recovery.Effect.sleepdrain — replay/TestClock deadlock).xAI
prompt_completescopePer upstream review on the base branch: many
prompt_completenotifications are subagent completions, not root-turn completion. We did not port the v1 wholesale race. This PR only resolves root pending prompts whennotification.sessionIdmatches the root entry. Child-session routing for subagent visibility stays on thecodex-turn-mappingbase.Review-driven fixes (not from the original repro)
These came from Macroscope/Bugbot review and manual retest of the settlement path. They tighten correctness without changing the user-visible goals above.
returnbefore scheduler)breakinstead ofreturnon chunk handlers.currentopen, blocking idle settle after debouncetrySettleRootTurnWhenIdle.hasActivePlantrue foreverhasActivePlanidle guard (running tools + pending requests already gate settle).session/cancel, risking concurrent promptsruntime.cancelbefore idlefinalizeTurn.rearmRootTurnRecoveryTimersre-schedules both idle settle and the liveness watchdog when pending requests clear.session/updatetraffic re-armed settle debounce without ingestible outputacpRootSessionUpdateIngestsOutputgateshasRootActivityand settle scheduling to real chunks, tools, and plan updates.session/promptRPC completedacpRootTurnIsIdlereturns false when tool history or a running subagent exists.Out of scope (follow-up PRs or upstream)
session/prompt(live chunks never land in T3)interruptPromptOnCancel: truefor Grokfalse—truebreaksmessage_steering/grokreplay. User Stop still sendssession/cancel; idle settle also callsruntime.cancel.session/updatebeyond keepalives suppressing the liveness watchdogcommittedServerMessageIdsfrom env-lockactiveMessageCountprompt_completededup withoutpromptId; guardcancelfrom callingstart()_meta.sessionIdEffect.forkChild→Effect.forkDetachfor the hungsession/promptRPC fiber inXAiAcpExtension{ stopReason: "cancelled" }. Not the cause of post-completion provider-logInterrupt(that is intentional idle-settleruntime.cancelafterfinalizeTurn). Revisit as a separate hardening pass if scope-shutdown misclassification shows up in repro.Validation
vp check: passvp test apps/web/src/components/ChatView.logic.test.ts: passvp test apps/server/src/orchestration-v2/Adapters/GrokAdapterV2.test.ts: passvp test apps/server/src/provider/acp/XAiAcpExtension.test.ts: passvp test apps/server/src/orchestration-v2/testkit/OrchestratorReplayFixtures.integration.test.ts(grok +message_steering/grok): passorchestrator-v2+ this branch @c64ce4c53): manual pass on Grok session019f067e-5178-70b1-86b5-34f0ab0f2a40/ T3 thread93bb87ad-2108-4ea4-9990-45f7056c75a6:turn_start): assistant-only replay, ~15s, full reasoning + assistant in DBturn_start): tool turn (prettier on notes), ~8s, file_search + command_execution + assistant in DBcompleted; zeroinputIntent: steerin provider logKnown limitations
completed; harmless but noisy.session/promptending withInterruptin the provider log (idle settle callsruntime.cancelon the hung RPC after content is already ingested). Runs complete with full content; this is log noise, not the run-4 failure mode.file_searchitem failed on a very large notes file in manual testing; Grok recovered and completed normally.Note
Medium Risk
Changes core Grok/ACP turn lifecycle (finalize, cancel, timers) and client optimistic-message timing; wrong idle gates could complete turns early or leave hangs, but behavior is heavily unit-tested and Grok-scoped.
Overview
Grok runs that already streamed a reply but left
session/prompthanging no longer stay on Working forever. The ACP v2 adapter gains debounced idle settlement (root assistant output quiescent, no tools/subagents/pending approvals), a 30s liveness watchdog for silent hangs, keepalive filtering, and re-arming of those timers when approval/user-input clears. Grok opts in via flavor flags; idle settle callsruntime.cancelon the wedged RPC.xAI
prompt_completehandling is reworked:makeXAiPromptCompletionRuntimeraces the prompt RPC against root-session_x.ai/session/prompt_complete(session-matched), injectspromptId/requestIdin_meta, and ignores foreign session ids.Steer UX: optimistic user rows stop flashing away when
message.updatedbeatsturn-item.updated—ChatViewnow treats committed ids from visibleuser_messageturn items (deriveCommittedServerUserMessageIds) instead of allprojection.messages.Tests updated for settlement helpers and xAI extension; minor test imports move to
vite-plus/test.Reviewed by Cursor Bugbot for commit c64ce4c. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Harden Grok v2 root-turn settlement with idle detection and liveness watchdog
AcpAdapterV2that debounces root-turn finalization when no streams, tools, or subagents are active, with a configurable drain period to catch trailing chunks.provider_errorif no root-session output arrives within a configured window and no runtime request (elicitation/permission) is pending.makeXAiPromptCompletionRuntimein XAiAcpExtension.ts so Grok prompts race the underlying RPC against a_x.ai/session/prompt_completenotification fallback, guarded bysessionIdto ignore foreign sessions.settleRootTurnWhenIdle=trueandinterruptPromptOnCancel=false.ChatViewto use committed user turn items instead of all projection messages, preventing premature dropping of steer rows.Macroscope summarized c64ce4c.