Skip to content

fix(analytics): count run artifacts agent-agnostically + emit ACP tool events#4520

Open
elihahhahahah11 wants to merge 5 commits into
nexu-io:mainfrom
elihahhahahah11:posthogai-1.-2/3/4
Open

fix(analytics): count run artifacts agent-agnostically + emit ACP tool events#4520
elihahhahahah11 wants to merge 5 commits into
nexu-io:mainfrom
elihahhahahah11:posthogai-1.-2/3/4

Conversation

@elihahhahahah11

Copy link
Copy Markdown
Contributor

Why

While drilling the user-behaviour funnel down by runtime, I found run_finished.artifact_count was 0 for every agent except claude_code (line-by-line in PostHog: codex_cli, opencode, gemini_cli, cursor_agent, amr, hermes, copilot, pi, kimi, qoder, qwen — all 0.0%; only claude_code ~45.7%). The "produced an artifact" step of the behaviour funnel was therefore silently measuring claude_code only, which made AMR (and every non-Claude runtime) look like it never generated anything.

Two root causes:

  1. The artifact counter (run-artifacts.ts#countNewArtifacts) reconstructs file writes from each agent's tool-call stream, but only recognizes claude-style tool names (Write/Edit/create_file). opencode/gemini/codex/etc. use different names or shapes.
  2. The ACP adapter (acp.ts, used by amr / hermes / kilo / kiro / devin / vibe / trae / reasonix) never emitted tool_use/tool_result events at all — it recognized file-write tool calls internally (for UI text suppression) but dropped them, so those runs had no countable events.

What users will see

No user-facing UI change — this is analytics instrumentation behind existing flows. Internally, run_finished now reports a correct artifact_count for all agents, plus two new properties: artifacts_created (new files ≈ activation) and artifacts_modified (edited files ≈ iteration). An edit-only turn that rewrites an existing file now reports artifact_count > 0 (previously the directory file-count was unchanged so it would have looked like nothing happened).

Surface area

  • API / contractpackages/contracts: run_finished gains artifacts_created / artifacts_modified.
  • Default behavior changeartifact_count is now derived from a filesystem snapshot diff (agent-agnostic) instead of the claude-only tool stream. The value semantics are unchanged for claude_code (Write+Edit already both counted); other agents go from a constant 0 to real counts. Analytics-only; no behavioral change to runs themselves. Not backfilled — takes effect for runs after this ships and clients update.

How it works

  • ACP adapter mirrors artifact-write tool calls into canonical tool_use/tool_result events (also feeds UI / transcript / langfuse, not just the counter).
  • New run-artifact-fs.ts: snapshot the project's artifact files (+ DESIGN.md) before the run, diff at finish. Created or modified files count (fingerprint = size + mtime). Falls back to the tool-stream counter for no-project runs / runs that predate this code.
  • design_system_created and preview_module_count migrated to the same diff.

Bug fix verification

Red-spec first, per AGENTS.md:

  • apps/daemon/tests/acp.test.ts — an ACP write tool call now surfaces a Write tool_use + paired tool_result, so countNewArtifacts returns 1; read-only tool calls do not count. Red on main (no tool_use emitted) → green.
  • apps/daemon/tests/run-artifact-fs.test.ts — pins the snapshot/diff: an edit-only turn reports { created:0, modified:1, touched:1 } (file count unchanged), created vs modified split, DESIGN.md → designSystemCreated, preview modules counted, ignored dirs/non-artifacts excluded.

Validation

  • pnpm --filter @open-design/daemon typecheck + pnpm --filter @open-design/contracts build
  • daemon run/analytics/artifact/design suites: 753 passed (55 files)
  • pnpm guard ✅ (61 passed)

🤖 Generated with Claude Code

…l events

run_finished.artifact_count was 0 for every agent except claude_code. The
counter reconstructs file writes from each agent's tool-call stream, but only
claude-style tool names (Write/Edit/create_file) are recognized, and the ACP
adapter never emitted tool_use/tool_result at all — so codex / opencode /
gemini / cursor / amr / hermes / … all reported artifact_count: 0 and the
behaviour funnel's "produced an artifact" step silently measured claude_code
only.

- ACP adapter (acp.ts) now mirrors artifact-write tool calls into canonical
  tool_use/tool_result events (also feeds UI / transcript / langfuse).
- New agent-agnostic counter (run-artifact-fs.ts): snapshot the project's
  artifact files before the run and diff at finish. A created OR modified file
  counts, so an edit-only turn that rewrites an existing file still reports >0.
  Falls back to the tool-stream counter for no-project runs.
- design_system_created and preview_module_count migrated to the same diff.
- contracts: run_finished gains artifacts_created / artifacts_modified
  (created ~= activation, modified ~= iteration).
@lefarcen

Copy link
Copy Markdown
Contributor

Hey @elihahhahahah11! 👋

Thanks for digging into the analytics gap here — the write-up makes the Claude-vs-other-agents mismatch very clear, and the ACP + filesystem-diff direction looks thoughtfully scoped.

I've queued the normal maintainer review flow for this PR and will keep the orchestration moving as CI and reviewer feedback come in.

💡 To drive this PR to merge hands-free, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …):
Take over nexu-io/open-design#4520 until it merges — read https://raw.githubusercontent.com/nexu-io/looper/main/skills/pr-takeover/SKILL.md and follow it.

Thanks for making open-design better! 🙌

@lefarcen lefarcen requested a review from nettee June 18, 2026 10:13
@lefarcen lefarcen added size/L PR changes 300-700 lines risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps type/bugfix Bug fix labels Jun 18, 2026

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one blocking correctness issue in the new ACP/no-project fallback path: pathless ACP write calls still collapse to artifact_count: 0, so this PR does not fully repair the analytics gap for that slice.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/acp.ts Outdated
…lback

Review follow-up (PR nexu-io#4520, nettee): a pathless ACP write (title only, e.g.
"edit", no locations/path) fell back to `file_path: toolCallId`, which has no
artifact extension, so `countNewArtifacts` (`isArtifactPath`) rejected it. Since
server.ts skips the filesystem baseline when cwd is null, no-project ACP runs
kept reporting artifact_count: 0 for that shape.

- acpArtifactWritePath now also extracts a path from rawInput and from a
  filename token in the title (real extensions let isArtifactPath correctly
  exclude non-artifact writes like config.json).
- When still pathless, fall back to a synthetic `acp-write/<id>.html` so the
  already-classified artifact write stays countable (deduped per tool-call).
- Tests: pathless ACP write counts; title naming a non-artifact file does not.
@lefarcen

Copy link
Copy Markdown
Contributor

Thanks for the quick follow-up in 3198843e1.

@nettee's review thread is the current blocking one on this PR; once that pathless ACP case is confirmed resolved on the latest head, the normal review flow can keep moving.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one blocking correctness issue in the updated ACP no-project fallback: pathless generic edits are being coerced into .html artifact writes, which can now inflate artifact_count for non-artifact files.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/acp.ts Outdated
…tic .html

Review follow-up (PR nexu-io#4520, nettee, round 2): the previous fix emitted the
mirrored tool_use on the first frame with a synthetic `.html` fallback, which
(a) coerced generic pathless edits (config.json, README.md) into counted
artifacts — false positives — and (b) locked in the guess so a real path on a
later frame could not correct it.

- Defer the mirrored tool_use/tool_result to the TERMINAL frame, accumulating
  the best concrete path across frames (ACP often sends `locations` only on the
  completing update).
- Never fabricate an artifact extension: classification is left to
  `isArtifactPath`, which correctly excludes non-artifact edits and includes
  real artifacts. A write that never carries a concrete path stays keyed on its
  extension-less toolCallId and is simply not counted (honest under-count beats
  a false positive).
- Tests: late-arriving path counts; truly pathless write is not coerced into an
  artifact; non-artifact filename stays uncounted.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one merge-safe coverage gap in the new filesystem-based artifact counter tests. The implementation is otherwise coherent, but this path still has one unpinned edge around same-size rewrites.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/tests/run-artifact-fs.test.ts

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one blocking correctness issue in the new project-wide baseline/diff path: overlapping runs in the same cwd can attribute each other's artifact writes.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/server.ts
@lefarcen

Copy link
Copy Markdown
Contributor

Thanks for the quick iteration on 4b4b7d776.

The current blocking item on this head is @nettee's latest same-cwd overlap review thread around whole-tree snapshot attribution. Once that one is addressed and rechecked, the rest of the flow can move forward.

…ze rewrites

Review follow-up (PR nexu-io#4520, nettee, round 3):

Blocking — concurrent runs cross-attributed artifacts. The filesystem diff is
taken over the whole project tree, and the daemon allows overlapping runs (cf.
the antigravity lock), so two runs sharing a cwd would each claim the other's
new/modified files. Introduce a baseline registry (createRunArtifactBaselines)
that flags any run overlapping another in the same cwd as `contended`;
contended runs skip the whole-tree diff and fall back to the per-run tool-stream
count, so no run claims another's artifact_count / design_system_created /
preview_module_count / activation milestones.

Non-blocking — same-size rewrites. The size+mtime fingerprint could miss an
edit that keeps the byte length and preserves mtime. Add a bounded content hash
(sha1 for files <= 1 MiB) to the fingerprint so such rewrites are still detected;
large media keep size+mtime to bound run-finish cost.

Tests: concurrent same-cwd baselines are both flagged contended (distinct cwd
stays clean); a same-size rewrite with mtime pinned back is detected as modified.
@lefarcen

Copy link
Copy Markdown
Contributor

Thanks for the quick follow-up in d901a417d.

The current gating item on this head is still @nettee's blocking review from the previous round; once that same-cwd contention fix gets rechecked on d901a417d, the PR can move to the next step.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one blocking correctness issue in the new filesystem-diff path: it classifies native snapshot paths with slash-only helpers, so the new design_system_created / preview_module_count signals still break on Windows project runs.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/run-artifact-fs.ts Outdated
Review follow-up (PR nexu-io#4520, nettee, round 4): snapshotProjectArtifacts stores
native paths (path.join → backslashes on Windows), but the diff classified them
with isPreviewModulePath / isDesignSystemFile, which match forward slashes only.
So design_system_created and preview_module_count silently reported false on
Windows project runs.

Normalize separators (\ → /) before those classifiers. Regression test builds a
synthetic backslash snapshot and asserts DESIGN.md + preview/*.html are detected.
@lefarcen

Copy link
Copy Markdown
Contributor

Thanks for the quick Windows-path follow-up in af52081db.

The current gating item on this head is @nettee's latest blocking review around the Windows snapshot-path classification path; once that gets rechecked on af52081db, the PR can move to the next step.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elihahhahahah11 I re-reviewed the current head and verified the follow-up fixes in the changed ranges: ACP write events now wait for the terminal frame so late-arriving paths classify correctly, the filesystem diff covers same-size rewrites, shared-cwd runs fall back instead of cross-attributing artifacts, and the Windows path normalization keeps the design-system signals accurate. I also checked the accompanying regression coverage for those cases. This is a solid cleanup of a tricky analytics path.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

@lefarcen

Copy link
Copy Markdown
Contributor

Thanks for the re-review, @nettee.

The blocking review on this head is cleared now. From here the remaining step is just the normal mergeability/check bookkeeping on the current head before maintainer merge flow can take over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps size/L PR changes 300-700 lines type/bugfix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants