docs: two-agent research-loop A/B — result + how to run#34
Merged
Conversation
Public-facing writeup of the equal-compute A/B between the two-agent research loop (worker + verifying driver) and a single-agent accumulate-only loop. Records the real result (n=9 ML topics, glm-5.2: 2.33 fewer admitted sources/topic at identical coverage, 95% CI [1.78, 2.89] via pairedBootstrap), the honest nuance (the win is mostly de-duplication, not relevance filtering), threats to validity, and the verified offline + live run commands. Links it from the README's two-agent research loop section.
tangletools
previously approved these changes
Jun 24, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved PR — c4ae34fe
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T22:50:00Z
tangletools
approved these changes
Jun 24, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved PR — 367bbe4f
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T23:02:55Z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A public-facing writeup of the two-agent research-loop A/B, so a Twitter/X post can link to it.
What the doc covers
runTwoAgentResearchLoop(src/two-agent-research-loop.ts),createWebResearchWorker+createVerifyingResearchDriver(src/web-research-worker.ts), and the A/B harness (tests/loops/research-loop-equal-compute.test.ts).glm-5.2: the two-agent loop admits 2.33 fewer sources/topic at identical (1.00) coverage, 95% CI [1.78, 2.89] via agent-evalpairedBootstrap. Includes the per-topic delta table.AGENT_KNOWLEDGE_LIVE=1+TANGLE_API_KEY+AGENT_KNOWLEDGE_LIVE_GOALS|-separated, ~$0.20 for the full 9-topic sweep).Verification
pnpm run lint— clean (2 pre-existing warnings inwikilinks.ts, unrelated).pnpm exec vitest run tests/loops/research-loop-equal-compute.test.ts— 2 passed, 1 skipped (the live arm). The documented offline command ran green:[A/B @ B<=6 passes] two-agent: passes=2 junk=0 coverage=1.00 | single-agent: passes=2 junk=2 coverage=1.00.Links the doc from the README's two-agent research loop section. Docs-only + one README line; no code changes.