docs(examples): three-persona cleanup — newcomer/senior/junior by drewstone · Pull Request #376 · tangle-network/agent-runtime

drewstone · 2026-06-24T15:09:04Z

Applies the genuine-defect fixes from a three-persona (newcomer / senior / junior) review of the 22 examples. Scope: fix the things that actively mislead a reader, and clean the rough edges that teach the wrong habit — without churning the examples judged good/great. Every example still typechecks (typecheck:examples), the held-out coding-benchmark anti-cheat + its smoke test stay intact, and docs:check stays green.

Genuine defects fixed (these mislead a reader)

self-improving-loop — the demo gates at n=3 yet called it "the production held-out gate's statistical core" flatly, re-teaching this repo's documented fix: persist final runtime stream failures #1 failure mode (the small-n mirage). Added a loud minimum-evidence-floor caveat at the gate() definition AND in the README: the production gate floors the evidence (heldoutSignificance won't report a pair under minSamples, default 8; HeldOutGate rejects below minProductiveRuns with few_runs) — never ship a real change on n=3. (helps the junior most — the persona most likely to lift gate() verbatim).
delegate — was a Tier-1 example with no README, and the file was a CI test (E2E PASSED, process.exit) wearing an example's clothes, with internal history in the header. Added README.md; split a lean teaching delegate.ts + reusable shared.ts from the regression proof, which moved to tests/delegate-example.test.ts (env-gated: a paid live e2e when TANGLE_API_KEY is set, an always-on offline fail-loud assertion otherwise); stripped the history per the repo's no-history-in-source rule. (newcomer + senior).
improve — README named ImprovementDriver / gepaDriver; improve() actually builds SurfaceProposer / gepaProposer. Corrected both, plus the stale test path (src/improvement/improve.test.ts). (a README grep now resolves).
knowledge-gating — README featured adapter.onKnowledgeBlocked as the headline hook but the adapter never defined it (doc/code drift). Wired the hook into the adapter so the blocked run demonstrates it (converts the gap into a "would ask the user" decision that flows through as the stop reason).

Quality wins (remove ceremony / amateur tells, no behavior change)

coding-benchmark — renamed offlineSolutions → offlineAgentScripts with a clear 2-line header; kept the rate-limiter cheat/real pair inline (the one anti-cheat teaching moment), moved the round-invariant csv-parser / lru-cache real impls to fixtures.ts as readable template literals (no + '\n' + escaped-string ceremony). The held-out anti-cheat, the firewall test, the reps-don't-fake-n regression, and the BH-corrected stats are all unchanged and still pass.
supervise — wrapped the flagship in main().catch (matching the sibling examples) and uncommented the completion-oracle deliverable so the headline models the safe path.
ui-audit — LENSES_TO_RUN → lensesToRun (the publish-safe module-global convention; an UPPERCASE module-global trips the Tangle obfuscator).
driver-loop / researcher-loop — one-line justification at the offline-box as unknown as SandboxInstance casts (the other casts already carried one).
strategy-evolution README — one line noting promoted: false at toy scale is the gate working, not a break.

Left alone (already good/great — no churn)

driver-loop, strategy-suite, supervisor-loop, chat-handler, recursive-supervisor, runtime-run, stream-backends, sanitized-telemetry-streaming, mcp-delegation, fleet-delegation, intelligence-recommend, intelligence-drop-in, agents-of-all-shapes, product-eval. The verdict's older snapshot flagged a few of these (mcp-delegation's delegate_ui_audit, the fleet-delegation casts) but they already carry the right framing/justification on current main; forcing fake-complete SandboxInstance helpers would add ceremony against the cleanup goal.

Verification

pnpm run build — clean
pnpm run typecheck (src + examples) — clean
pnpm run lint — clean (333 files)
pnpm run docs:check — green (0 errors; freshness OK)
pnpm test — 115 files / 1120 passed, 2 skipped (the env-gated live e2es)
ran each materially-changed example offline: knowledge-gating (hook fires), self-improving-loop, ui-audit, coding-benchmark (identical 0.944 leaderboard)

Apply the genuine-defect fixes from the three-persona example verdict, leaving good/great examples untouched. - self-improving-loop: add a loud minimum-evidence-floor caveat at the gate and in the README — the demo gates at n=3 for runnability, but the production gate floors at minSamples (8 in heldoutSignificance) / minProductiveRuns; never ship a real change on n=3 (the small-n mirage). - delegate: add a README, split a lean teaching delegate.ts + shared.ts from the regression proof (moved to tests/delegate-example.test.ts — env-gated live e2e + an always-on offline fail-loud assertion); drop the test-in-example clothing (E2E PASSED / process.exit) and the internal history from the header. - improve: README symbol drift — ImprovementDriver → SurfaceProposer, gepaDriver → gepaProposer (match what improve() actually builds); fix the test path to src/improvement/improve.test.ts. - knowledge-gating: wire the headline onKnowledgeBlocked hook into the adapter so the README's documented hook is demonstrated (the blocked run now converts the gap into a "would ask the user" decision). - coding-benchmark: simplify offlineSolutions → offlineAgentScripts; keep the rate-limiter cheat/real pair inline (the one anti-cheat teaching moment), move csv/lru real impls to fixtures.ts as readable template literals (no escaped strings). The held-out anti-cheat + smoke + firewall tests stay intact. - supervise: wrap the flagship in main().catch (match the siblings) and uncomment the completion-oracle deliverable so the headline models the safe path. - ui-audit: LENSES_TO_RUN → lensesToRun (the publish-safe module-global convention). - driver-loop / researcher-loop: one-line justification at the offline-box casts. - strategy-evolution README: note promoted:false at toy scale is the gate working.

tangletools

✅ Auto-approved PR — `efb6b428`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T15:09:11Z}

tangletools approved these changes Jun 24, 2026

View reviewed changes

drewstone merged commit 322211d into main Jun 24, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(examples): three-persona cleanup — newcomer/senior/junior#376

docs(examples): three-persona cleanup — newcomer/senior/junior#376
drewstone merged 1 commit into
mainfrom
examples/three-persona-cleanup

drewstone commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 24, 2026

Genuine defects fixed (these mislead a reader)

Quality wins (remove ceremony / amateur tells, no behavior change)

Left alone (already good/great — no churn)

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — efb6b428

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `efb6b428`