fix(analytics): count run artifacts agent-agnostically + emit ACP tool events by elihahhahahah11 · Pull Request #4520 · nexu-io/open-design

elihahhahahah11 · 2026-06-18T10:10:49Z

Why

While drilling the user-behaviour funnel down by runtime, I found run_finished.artifact_count was 0 for every agent except claude_code (line-by-line in PostHog: codex_cli, opencode, gemini_cli, cursor_agent, amr, hermes, copilot, pi, kimi, qoder, qwen — all 0.0%; only claude_code ~45.7%). The "produced an artifact" step of the behaviour funnel was therefore silently measuring claude_code only, which made AMR (and every non-Claude runtime) look like it never generated anything.

Two root causes:

The artifact counter (run-artifacts.ts#countNewArtifacts) reconstructs file writes from each agent's tool-call stream, but only recognizes claude-style tool names (Write/Edit/create_file). opencode/gemini/codex/etc. use different names or shapes.
The ACP adapter (acp.ts, used by amr / hermes / kilo / kiro / devin / vibe / trae / reasonix) never emitted tool_use/tool_result events at all — it recognized file-write tool calls internally (for UI text suppression) but dropped them, so those runs had no countable events.

What users will see

No user-facing UI change — this is analytics instrumentation behind existing flows. Internally, run_finished now reports a correct artifact_count for all agents, plus two new properties: artifacts_created (new files ≈ activation) and artifacts_modified (edited files ≈ iteration). An edit-only turn that rewrites an existing file now reports artifact_count > 0 (previously the directory file-count was unchanged so it would have looked like nothing happened).

Surface area

API / contract — packages/contracts: run_finished gains artifacts_created / artifacts_modified.
Default behavior change — artifact_count is now derived from a filesystem snapshot diff (agent-agnostic) instead of the claude-only tool stream. The value semantics are unchanged for claude_code (Write+Edit already both counted); other agents go from a constant 0 to real counts. Analytics-only; no behavioral change to runs themselves. Not backfilled — takes effect for runs after this ships and clients update.

How it works

ACP adapter mirrors artifact-write tool calls into canonical tool_use/tool_result events (also feeds UI / transcript / langfuse, not just the counter).
New run-artifact-fs.ts: snapshot the project's artifact files (+ DESIGN.md) before the run, diff at finish. Created or modified files count (fingerprint = size + mtime). Falls back to the tool-stream counter for no-project runs / runs that predate this code.
design_system_created and preview_module_count migrated to the same diff.

Bug fix verification

Red-spec first, per AGENTS.md:

apps/daemon/tests/acp.test.ts — an ACP write tool call now surfaces a Write tool_use + paired tool_result, so countNewArtifacts returns 1; read-only tool calls do not count. Red on main (no tool_use emitted) → green.
apps/daemon/tests/run-artifact-fs.test.ts — pins the snapshot/diff: an edit-only turn reports { created:0, modified:1, touched:1 } (file count unchanged), created vs modified split, DESIGN.md → designSystemCreated, preview modules counted, ignored dirs/non-artifacts excluded.

Validation

pnpm --filter @open-design/daemon typecheck + pnpm --filter @open-design/contracts build ✅
daemon run/analytics/artifact/design suites: 753 passed (55 files)
pnpm guard ✅ (61 passed)

🤖 Generated with Claude Code

…l events run_finished.artifact_count was 0 for every agent except claude_code. The counter reconstructs file writes from each agent's tool-call stream, but only claude-style tool names (Write/Edit/create_file) are recognized, and the ACP adapter never emitted tool_use/tool_result at all — so codex / opencode / gemini / cursor / amr / hermes / … all reported artifact_count: 0 and the behaviour funnel's "produced an artifact" step silently measured claude_code only. - ACP adapter (acp.ts) now mirrors artifact-write tool calls into canonical tool_use/tool_result events (also feeds UI / transcript / langfuse). - New agent-agnostic counter (run-artifact-fs.ts): snapshot the project's artifact files before the run and diff at finish. A created OR modified file counts, so an edit-only turn that rewrites an existing file still reports >0. Falls back to the tool-stream counter for no-project runs. - design_system_created and preview_module_count migrated to the same diff. - contracts: run_finished gains artifacts_created / artifacts_modified (created ~= activation, modified ~= iteration).

lefarcen · 2026-06-18T10:13:33Z

Hey @elihahhahahah11! 👋

Thanks for digging into the analytics gap here — the write-up makes the Claude-vs-other-agents mismatch very clear, and the ACP + filesystem-diff direction looks thoughtfully scoped.

I've queued the normal maintainer review flow for this PR and will keep the orchestration moving as CI and reviewer feedback come in.

💡 To drive this PR to merge hands-free, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …):
Take over nexu-io/open-design#4520 until it merges — read https://raw.githubusercontent.com/nexu-io/looper/main/skills/pr-takeover/SKILL.md and follow it.

Thanks for making open-design better! 🙌

nettee

Found one blocking correctness issue in the new ACP/no-project fallback path: pathless ACP write calls still collapse to artifact_count: 0, so this PR does not fully repair the analytics gap for that slice.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

…lback Review follow-up (PR nexu-io#4520, nettee): a pathless ACP write (title only, e.g. "edit", no locations/path) fell back to `file_path: toolCallId`, which has no artifact extension, so `countNewArtifacts` (`isArtifactPath`) rejected it. Since server.ts skips the filesystem baseline when cwd is null, no-project ACP runs kept reporting artifact_count: 0 for that shape. - acpArtifactWritePath now also extracts a path from rawInput and from a filename token in the title (real extensions let isArtifactPath correctly exclude non-artifact writes like config.json). - When still pathless, fall back to a synthetic `acp-write/<id>.html` so the already-classified artifact write stays countable (deduped per tool-call). - Tests: pathless ACP write counts; title naming a non-artifact file does not.

lefarcen · 2026-06-18T13:52:04Z

Thanks for the quick follow-up in 3198843e1.

@nettee's review thread is the current blocking one on this PR; once that pathless ACP case is confirmed resolved on the latest head, the normal review flow can keep moving.

nettee

Found one blocking correctness issue in the updated ACP no-project fallback: pathless generic edits are being coerced into .html artifact writes, which can now inflate artifact_count for non-artifact files.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

…tic .html Review follow-up (PR nexu-io#4520, nettee, round 2): the previous fix emitted the mirrored tool_use on the first frame with a synthetic `.html` fallback, which (a) coerced generic pathless edits (config.json, README.md) into counted artifacts — false positives — and (b) locked in the guess so a real path on a later frame could not correct it. - Defer the mirrored tool_use/tool_result to the TERMINAL frame, accumulating the best concrete path across frames (ACP often sends `locations` only on the completing update). - Never fabricate an artifact extension: classification is left to `isArtifactPath`, which correctly excludes non-artifact edits and includes real artifacts. A write that never carries a concrete path stays keyed on its extension-less toolCallId and is simply not counted (honest under-count beats a false positive). - Tests: late-arriving path counts; truly pathless write is not coerced into an artifact; non-artifact filename stays uncounted.

nettee

I found one merge-safe coverage gap in the new filesystem-based artifact counter tests. The implementation is otherwise coherent, but this path still has one unpinned edge around same-size rewrites.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

nettee

Found one blocking correctness issue in the new project-wide baseline/diff path: overlapping runs in the same cwd can attribute each other's artifact writes.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

lefarcen · 2026-06-18T14:14:06Z

Thanks for the quick iteration on 4b4b7d776.

The current blocking item on this head is @nettee's latest same-cwd overlap review thread around whole-tree snapshot attribution. Once that one is addressed and rechecked, the rest of the flow can move forward.

…ze rewrites Review follow-up (PR nexu-io#4520, nettee, round 3): Blocking — concurrent runs cross-attributed artifacts. The filesystem diff is taken over the whole project tree, and the daemon allows overlapping runs (cf. the antigravity lock), so two runs sharing a cwd would each claim the other's new/modified files. Introduce a baseline registry (createRunArtifactBaselines) that flags any run overlapping another in the same cwd as `contended`; contended runs skip the whole-tree diff and fall back to the per-run tool-stream count, so no run claims another's artifact_count / design_system_created / preview_module_count / activation milestones. Non-blocking — same-size rewrites. The size+mtime fingerprint could miss an edit that keeps the byte length and preserves mtime. Add a bounded content hash (sha1 for files <= 1 MiB) to the fingerprint so such rewrites are still detected; large media keep size+mtime to bound run-finish cost. Tests: concurrent same-cwd baselines are both flagged contended (distinct cwd stays clean); a same-size rewrite with mtime pinned back is detected as modified.

lefarcen · 2026-06-18T14:25:01Z

Thanks for the quick follow-up in d901a417d.

The current gating item on this head is still @nettee's blocking review from the previous round; once that same-cwd contention fix gets rechecked on d901a417d, the PR can move to the next step.

nettee

Found one blocking correctness issue in the new filesystem-diff path: it classifies native snapshot paths with slash-only helpers, so the new design_system_created / preview_module_count signals still break on Windows project runs.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

Review follow-up (PR nexu-io#4520, nettee, round 4): snapshotProjectArtifacts stores native paths (path.join → backslashes on Windows), but the diff classified them with isPreviewModulePath / isDesignSystemFile, which match forward slashes only. So design_system_created and preview_module_count silently reported false on Windows project runs. Normalize separators (\ → /) before those classifiers. Regression test builds a synthetic backslash snapshot and asserts DESIGN.md + preview/*.html are detected.

lefarcen · 2026-06-18T14:35:27Z

Thanks for the quick Windows-path follow-up in af52081db.

The current gating item on this head is @nettee's latest blocking review around the Windows snapshot-path classification path; once that gets rechecked on af52081db, the PR can move to the next step.

nettee

@elihahhahahah11 I re-reviewed the current head and verified the follow-up fixes in the changed ranges: ACP write events now wait for the terminal frame so late-arriving paths classify correctly, the filesystem diff covers same-size rewrites, shared-cwd runs fall back instead of cross-attributing artifacts, and the Windows path normalization keeps the design-system signals accurate. I also checked the accompanying regression coverage for those cases. This is a solid cleanup of a tricky analytics path.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

lefarcen · 2026-06-18T14:48:24Z

Thanks for the re-review, @nettee.

The blocking review on this head is cleared now. From here the remaining step is just the normal mergeability/check bookkeeping on the current head before maintainer merge flow can take over.

lefarcen requested a review from nettee June 18, 2026 10:13

lefarcen added size/L PR changes 300-700 lines risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps type/bugfix Bug fix labels Jun 18, 2026

Conversation

elihahhahahah11 commented Jun 18, 2026

Why

What users will see

Surface area

How it works

Bug fix verification

Validation

Uh oh!

lefarcen commented Jun 18, 2026

Uh oh!

nettee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lefarcen commented Jun 18, 2026

Uh oh!

nettee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nettee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nettee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lefarcen commented Jun 18, 2026

Uh oh!

lefarcen commented Jun 18, 2026

Uh oh!

nettee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lefarcen commented Jun 18, 2026

Uh oh!

nettee left a comment

Choose a reason for hiding this comment

Uh oh!

lefarcen commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants