feat(daemon): split opaque execution_failed into close-reason sub-details#4502
Open
lefarcen wants to merge 1 commit into
Open
feat(daemon): split opaque execution_failed into close-reason sub-details#4502lefarcen wants to merge 1 commit into
lefarcen wants to merge 1 commit into
Conversation
…ails `AGENT_EXECUTION_FAILED` whose text matches no classifier pattern currently collapses into a single opaque `failure_detail: execution_failed`. PostHog shows this is the largest current-version failure detail (~4.5k/wk across providers), and it hides three genuinely distinct shapes that the daemon already distinguishes via the `runtime_close` diagnostic's `rpc_close_reason`: mid-stream agent error (stream_error, ~51%), bare non-zero exit (exit_nonzero, ~44%), and ACP fatal close (fatal_rpc_error). That split lived only on PostHog's `rpc_close_reason` field, so the canonical `failure_detail` — and therefore the Langfuse sink, which classifies through the same `classifyRunFailure` — stayed opaque and the two sinks disagreed. `classifyRunFailure` already receives the run events, so it reads the `runtime_close` diagnostic and promotes the generic `execution_failed` detail to the specific close reason. Only the opaque catch-all is refined; specific details (exit_code, cli_not_installed, …) and retry behavior are untouched, so this is observational-only. This is the first step of the #3408 follow-up that turns the opaque bucket into named sub-buckets; mining the stream_error texts for real provider/tool error patterns is a Langfuse-backed follow-up. Relates to #3408.
PerishCode
approved these changes
Jun 18, 2026
PerishCode
left a comment
Contributor
There was a problem hiding this comment.
@lefarcen I reviewed the changed classifier path, the added daemon regression coverage, and the analytics contract enum extension. The implementation keeps the refinement scoped to the generic AGENT_EXECUTION_FAILED / execution_failed bucket, preserves specific process-exit details, and aligns the new failure_detail values with the existing runtime_close close-reason vocabulary. CI is green; my local focused test/typecheck attempt could not run because this prepared worktree has no installed dependencies. Nice work tightening the analytics signal without broadening runtime behavior.
lefarcen
pushed a commit
that referenced
this pull request
Jun 18, 2026
…n_failed)
Combines the P0-b fix_config fix and the P1 process_exit/execution_failed
deepening into one spec under 'engineering-view failure reduction' (the ~7% we
can actually fix). Slice 1 = fix_config: codex writes service_tier="default"
(Langfuse-confirmed), but codex-config-normalize.ts only handles "priority" —
generalize to normalize any value not in {fast,flex}. Slice 2 = execution_failed
deepening (#4502 done the close-reason split; next is Langfuse-mining the
stream_error texts to add real classifier patterns). Slice 3 = the already-named
real bugs in process_exit (spawn_ebadf/eperm 149/wk, agent_protocol_error 255,
fabricated_role_marker 310). Grounded with the 7d process_exit breakdown + the
confirmed 'default' value + code anchors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
I'm taking over the run-reliability lane (#3408). Profiling current-version (0.10.0+) failures in PostHog, the single largest
failure_detailis the opaqueexecution_failed— ~4.5k/week across providers (opencode, codex_cli, claude_code, gemini, …). It's theAGENT_EXECUTION_FAILEDcatch-all that fires when no text pattern matched, so on the dashboard it reads as "the agent failed and we don't know why."But the daemon does already know more: the
runtime_closediagnostic carriesrpc_close_reason, which splits this bucket into three genuinely distinct shapes — mid-stream agent error (stream_error, ~51%), bare non-zero exit (exit_nonzero, ~44%), and ACP fatal close (fatal_rpc_error). That split only lived on PostHog'srpc_close_reasonfield, so the canonicalfailure_detail— and therefore the Langfuse sink, which classifies through the sameclassifyRunFailure— stayed opaque, and the two sinks disagreed per run.What users will see
No user-facing UI change. This is internal failure-analytics: dashboards and Langfuse traces keyed on
failure_detailnow seestream_error/exit_nonzero/fatal_rpc_errorinstead of one undifferentiatedexecution_failed, so the largest failure bucket becomes actionable.How
classifyRunFailurealready receives the run's events, so it reads the latestruntime_closediagnostic and promotes the genericexecution_faileddetail to the matching close reason. Only the opaque catch-all is refined — specific details (exit_code,cli_not_installed, …) and all retry behavior are untouched, so the change is observational-only. Three values added toTrackingRunFailureDetail.This is the first step of the #3408 follow-up; mining the
stream_errortexts (1,082/wk carry alangfuse_trace_id) for real provider/tool error patterns is a Langfuse-backed follow-up PR.Surface area
stream_error/exit_nonzero/fatal_rpc_errortoTrackingRunFailureDetailinpackages/contractsfailure_detailvalue emitted for a subset of failed runs (analytics only)Validation
pnpm --dir apps/daemon exec vitest run -c vitest.config.ts tests/run-failure-classification.test.ts— 44/44 (6 new red specs: each close reason → its detail, plus guards that a missing/unknown diagnostic staysexecution_failedand a specific detail is never relabeled)pnpm --filter @open-design/contracts typecheck,pnpm --filter @open-design/daemon typecheckpnpm guardRelates to #3408.