Skip to content

docs(examples): three-persona cleanup — newcomer/senior/junior#376

Merged
drewstone merged 1 commit into
mainfrom
examples/three-persona-cleanup
Jun 24, 2026
Merged

docs(examples): three-persona cleanup — newcomer/senior/junior#376
drewstone merged 1 commit into
mainfrom
examples/three-persona-cleanup

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Applies the genuine-defect fixes from a three-persona (newcomer / senior / junior) review of the 22 examples. Scope: fix the things that actively mislead a reader, and clean the rough edges that teach the wrong habit — without churning the examples judged good/great. Every example still typechecks (typecheck:examples), the held-out coding-benchmark anti-cheat + its smoke test stay intact, and docs:check stays green.

Genuine defects fixed (these mislead a reader)

  • self-improving-loop — the demo gates at n=3 yet called it "the production held-out gate's statistical core" flatly, re-teaching this repo's documented fix: persist final runtime stream failures #1 failure mode (the small-n mirage). Added a loud minimum-evidence-floor caveat at the gate() definition AND in the README: the production gate floors the evidence (heldoutSignificance won't report a pair under minSamples, default 8; HeldOutGate rejects below minProductiveRuns with few_runs) — never ship a real change on n=3. (helps the junior most — the persona most likely to lift gate() verbatim).
  • delegate — was a Tier-1 example with no README, and the file was a CI test (E2E PASSED, process.exit) wearing an example's clothes, with internal history in the header. Added README.md; split a lean teaching delegate.ts + reusable shared.ts from the regression proof, which moved to tests/delegate-example.test.ts (env-gated: a paid live e2e when TANGLE_API_KEY is set, an always-on offline fail-loud assertion otherwise); stripped the history per the repo's no-history-in-source rule. (newcomer + senior).
  • improve — README named ImprovementDriver / gepaDriver; improve() actually builds SurfaceProposer / gepaProposer. Corrected both, plus the stale test path (src/improvement/improve.test.ts). (a README grep now resolves).
  • knowledge-gating — README featured adapter.onKnowledgeBlocked as the headline hook but the adapter never defined it (doc/code drift). Wired the hook into the adapter so the blocked run demonstrates it (converts the gap into a "would ask the user" decision that flows through as the stop reason).

Quality wins (remove ceremony / amateur tells, no behavior change)

  • coding-benchmark — renamed offlineSolutionsofflineAgentScripts with a clear 2-line header; kept the rate-limiter cheat/real pair inline (the one anti-cheat teaching moment), moved the round-invariant csv-parser / lru-cache real impls to fixtures.ts as readable template literals (no + '\n' + escaped-string ceremony). The held-out anti-cheat, the firewall test, the reps-don't-fake-n regression, and the BH-corrected stats are all unchanged and still pass.
  • supervise — wrapped the flagship in main().catch (matching the sibling examples) and uncommented the completion-oracle deliverable so the headline models the safe path.
  • ui-auditLENSES_TO_RUNlensesToRun (the publish-safe module-global convention; an UPPERCASE module-global trips the Tangle obfuscator).
  • driver-loop / researcher-loop — one-line justification at the offline-box as unknown as SandboxInstance casts (the other casts already carried one).
  • strategy-evolution README — one line noting promoted: false at toy scale is the gate working, not a break.

Left alone (already good/great — no churn)

driver-loop, strategy-suite, supervisor-loop, chat-handler, recursive-supervisor, runtime-run, stream-backends, sanitized-telemetry-streaming, mcp-delegation, fleet-delegation, intelligence-recommend, intelligence-drop-in, agents-of-all-shapes, product-eval. The verdict's older snapshot flagged a few of these (mcp-delegation's delegate_ui_audit, the fleet-delegation casts) but they already carry the right framing/justification on current main; forcing fake-complete SandboxInstance helpers would add ceremony against the cleanup goal.

Verification

  • pnpm run build — clean
  • pnpm run typecheck (src + examples) — clean
  • pnpm run lint — clean (333 files)
  • pnpm run docs:check — green (0 errors; freshness OK)
  • pnpm test — 115 files / 1120 passed, 2 skipped (the env-gated live e2es)
  • ran each materially-changed example offline: knowledge-gating (hook fires), self-improving-loop, ui-audit, coding-benchmark (identical 0.944 leaderboard)

Apply the genuine-defect fixes from the three-persona example verdict, leaving
good/great examples untouched.

- self-improving-loop: add a loud minimum-evidence-floor caveat at the gate and
  in the README — the demo gates at n=3 for runnability, but the production gate
  floors at minSamples (8 in heldoutSignificance) / minProductiveRuns; never
  ship a real change on n=3 (the small-n mirage).
- delegate: add a README, split a lean teaching delegate.ts + shared.ts from the
  regression proof (moved to tests/delegate-example.test.ts — env-gated live e2e
  + an always-on offline fail-loud assertion); drop the test-in-example clothing
  (E2E PASSED / process.exit) and the internal history from the header.
- improve: README symbol drift — ImprovementDriver → SurfaceProposer,
  gepaDriver → gepaProposer (match what improve() actually builds); fix the test
  path to src/improvement/improve.test.ts.
- knowledge-gating: wire the headline onKnowledgeBlocked hook into the adapter so
  the README's documented hook is demonstrated (the blocked run now converts the
  gap into a "would ask the user" decision).
- coding-benchmark: simplify offlineSolutions → offlineAgentScripts; keep the
  rate-limiter cheat/real pair inline (the one anti-cheat teaching moment), move
  csv/lru real impls to fixtures.ts as readable template literals (no escaped
  strings). The held-out anti-cheat + smoke + firewall tests stay intact.
- supervise: wrap the flagship in main().catch (match the siblings) and uncomment
  the completion-oracle deliverable so the headline models the safe path.
- ui-audit: LENSES_TO_RUN → lensesToRun (the publish-safe module-global convention).
- driver-loop / researcher-loop: one-line justification at the offline-box casts.
- strategy-evolution README: note promoted:false at toy scale is the gate working.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — efb6b428

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T15:09:11Z

@drewstone drewstone merged commit 322211d into main Jun 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants