fix(analytics): count run artifacts agent-agnostically + emit ACP tool events#4520
fix(analytics): count run artifacts agent-agnostically + emit ACP tool events#4520elihahhahahah11 wants to merge 5 commits into
Conversation
…l events run_finished.artifact_count was 0 for every agent except claude_code. The counter reconstructs file writes from each agent's tool-call stream, but only claude-style tool names (Write/Edit/create_file) are recognized, and the ACP adapter never emitted tool_use/tool_result at all — so codex / opencode / gemini / cursor / amr / hermes / … all reported artifact_count: 0 and the behaviour funnel's "produced an artifact" step silently measured claude_code only. - ACP adapter (acp.ts) now mirrors artifact-write tool calls into canonical tool_use/tool_result events (also feeds UI / transcript / langfuse). - New agent-agnostic counter (run-artifact-fs.ts): snapshot the project's artifact files before the run and diff at finish. A created OR modified file counts, so an edit-only turn that rewrites an existing file still reports >0. Falls back to the tool-stream counter for no-project runs. - design_system_created and preview_module_count migrated to the same diff. - contracts: run_finished gains artifacts_created / artifacts_modified (created ~= activation, modified ~= iteration).
|
Hey @elihahhahahah11! 👋 Thanks for digging into the analytics gap here — the write-up makes the Claude-vs-other-agents mismatch very clear, and the ACP + filesystem-diff direction looks thoughtfully scoped. I've queued the normal maintainer review flow for this PR and will keep the orchestration moving as CI and reviewer feedback come in. 💡 To drive this PR to merge hands-free, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …): Thanks for making open-design better! 🙌 |
nettee
left a comment
There was a problem hiding this comment.
Found one blocking correctness issue in the new ACP/no-project fallback path: pathless ACP write calls still collapse to artifact_count: 0, so this PR does not fully repair the analytics gap for that slice.
…lback Review follow-up (PR nexu-io#4520, nettee): a pathless ACP write (title only, e.g. "edit", no locations/path) fell back to `file_path: toolCallId`, which has no artifact extension, so `countNewArtifacts` (`isArtifactPath`) rejected it. Since server.ts skips the filesystem baseline when cwd is null, no-project ACP runs kept reporting artifact_count: 0 for that shape. - acpArtifactWritePath now also extracts a path from rawInput and from a filename token in the title (real extensions let isArtifactPath correctly exclude non-artifact writes like config.json). - When still pathless, fall back to a synthetic `acp-write/<id>.html` so the already-classified artifact write stays countable (deduped per tool-call). - Tests: pathless ACP write counts; title naming a non-artifact file does not.
|
Thanks for the quick follow-up in @nettee's review thread is the current blocking one on this PR; once that pathless ACP case is confirmed resolved on the latest head, the normal review flow can keep moving. |
nettee
left a comment
There was a problem hiding this comment.
Found one blocking correctness issue in the updated ACP no-project fallback: pathless generic edits are being coerced into .html artifact writes, which can now inflate artifact_count for non-artifact files.
…tic .html Review follow-up (PR nexu-io#4520, nettee, round 2): the previous fix emitted the mirrored tool_use on the first frame with a synthetic `.html` fallback, which (a) coerced generic pathless edits (config.json, README.md) into counted artifacts — false positives — and (b) locked in the guess so a real path on a later frame could not correct it. - Defer the mirrored tool_use/tool_result to the TERMINAL frame, accumulating the best concrete path across frames (ACP often sends `locations` only on the completing update). - Never fabricate an artifact extension: classification is left to `isArtifactPath`, which correctly excludes non-artifact edits and includes real artifacts. A write that never carries a concrete path stays keyed on its extension-less toolCallId and is simply not counted (honest under-count beats a false positive). - Tests: late-arriving path counts; truly pathless write is not coerced into an artifact; non-artifact filename stays uncounted.
nettee
left a comment
There was a problem hiding this comment.
I found one merge-safe coverage gap in the new filesystem-based artifact counter tests. The implementation is otherwise coherent, but this path still has one unpinned edge around same-size rewrites.
🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.
nettee
left a comment
There was a problem hiding this comment.
Found one blocking correctness issue in the new project-wide baseline/diff path: overlapping runs in the same cwd can attribute each other's artifact writes.
🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.|
Thanks for the quick iteration on The current blocking item on this head is @nettee's latest same- |
…ze rewrites Review follow-up (PR nexu-io#4520, nettee, round 3): Blocking — concurrent runs cross-attributed artifacts. The filesystem diff is taken over the whole project tree, and the daemon allows overlapping runs (cf. the antigravity lock), so two runs sharing a cwd would each claim the other's new/modified files. Introduce a baseline registry (createRunArtifactBaselines) that flags any run overlapping another in the same cwd as `contended`; contended runs skip the whole-tree diff and fall back to the per-run tool-stream count, so no run claims another's artifact_count / design_system_created / preview_module_count / activation milestones. Non-blocking — same-size rewrites. The size+mtime fingerprint could miss an edit that keeps the byte length and preserves mtime. Add a bounded content hash (sha1 for files <= 1 MiB) to the fingerprint so such rewrites are still detected; large media keep size+mtime to bound run-finish cost. Tests: concurrent same-cwd baselines are both flagged contended (distinct cwd stays clean); a same-size rewrite with mtime pinned back is detected as modified.
|
Thanks for the quick follow-up in The current gating item on this head is still @nettee's blocking review from the previous round; once that same- |
nettee
left a comment
There was a problem hiding this comment.
Found one blocking correctness issue in the new filesystem-diff path: it classifies native snapshot paths with slash-only helpers, so the new design_system_created / preview_module_count signals still break on Windows project runs.
Review follow-up (PR nexu-io#4520, nettee, round 4): snapshotProjectArtifacts stores native paths (path.join → backslashes on Windows), but the diff classified them with isPreviewModulePath / isDesignSystemFile, which match forward slashes only. So design_system_created and preview_module_count silently reported false on Windows project runs. Normalize separators (\ → /) before those classifiers. Regression test builds a synthetic backslash snapshot and asserts DESIGN.md + preview/*.html are detected.
|
Thanks for the quick Windows-path follow-up in The current gating item on this head is @nettee's latest blocking review around the Windows snapshot-path classification path; once that gets rechecked on |
nettee
left a comment
There was a problem hiding this comment.
@elihahhahahah11 I re-reviewed the current head and verified the follow-up fixes in the changed ranges: ACP write events now wait for the terminal frame so late-arriving paths classify correctly, the filesystem diff covers same-size rewrites, shared-cwd runs fall back instead of cross-attributing artifacts, and the Windows path normalization keeps the design-system signals accurate. I also checked the accompanying regression coverage for those cases. This is a solid cleanup of a tricky analytics path.
🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.|
Thanks for the re-review, @nettee. The blocking review on this head is cleared now. From here the remaining step is just the normal mergeability/check bookkeeping on the current head before maintainer merge flow can take over. |
Why
While drilling the user-behaviour funnel down by runtime, I found
run_finished.artifact_countwas 0 for every agent exceptclaude_code(line-by-line in PostHog: codex_cli, opencode, gemini_cli, cursor_agent, amr, hermes, copilot, pi, kimi, qoder, qwen — all 0.0%; only claude_code ~45.7%). The "produced an artifact" step of the behaviour funnel was therefore silently measuring claude_code only, which made AMR (and every non-Claude runtime) look like it never generated anything.Two root causes:
run-artifacts.ts#countNewArtifacts) reconstructs file writes from each agent's tool-call stream, but only recognizes claude-style tool names (Write/Edit/create_file). opencode/gemini/codex/etc. use different names or shapes.acp.ts, used by amr / hermes / kilo / kiro / devin / vibe / trae / reasonix) never emittedtool_use/tool_resultevents at all — it recognized file-write tool calls internally (for UI text suppression) but dropped them, so those runs had no countable events.What users will see
No user-facing UI change — this is analytics instrumentation behind existing flows. Internally,
run_finishednow reports a correctartifact_countfor all agents, plus two new properties:artifacts_created(new files ≈ activation) andartifacts_modified(edited files ≈ iteration). An edit-only turn that rewrites an existing file now reportsartifact_count > 0(previously the directory file-count was unchanged so it would have looked like nothing happened).Surface area
packages/contracts:run_finishedgainsartifacts_created/artifacts_modified.artifact_countis now derived from a filesystem snapshot diff (agent-agnostic) instead of the claude-only tool stream. The value semantics are unchanged for claude_code (Write+Edit already both counted); other agents go from a constant 0 to real counts. Analytics-only; no behavioral change to runs themselves. Not backfilled — takes effect for runs after this ships and clients update.How it works
tool_use/tool_resultevents (also feeds UI / transcript / langfuse, not just the counter).run-artifact-fs.ts: snapshot the project's artifact files (+DESIGN.md) before the run, diff at finish. Created or modified files count (fingerprint = size + mtime). Falls back to the tool-stream counter for no-project runs / runs that predate this code.design_system_createdandpreview_module_countmigrated to the same diff.Bug fix verification
Red-spec first, per AGENTS.md:
apps/daemon/tests/acp.test.ts— an ACP write tool call now surfaces aWritetool_use+ pairedtool_result, socountNewArtifactsreturns 1; read-only tool calls do not count. Red onmain(notool_useemitted) → green.apps/daemon/tests/run-artifact-fs.test.ts— pins the snapshot/diff: an edit-only turn reports{ created:0, modified:1, touched:1 }(file count unchanged), created vs modified split, DESIGN.md →designSystemCreated, preview modules counted, ignored dirs/non-artifacts excluded.Validation
pnpm --filter @open-design/daemon typecheck+pnpm --filter @open-design/contracts build✅pnpm guard✅ (61 passed)🤖 Generated with Claude Code