docs: two-agent research-loop A/B — result + how to run by drewstone · Pull Request #34 · tangle-network/agent-knowledge

drewstone · 2026-06-24T22:49:52Z

A public-facing writeup of the two-agent research-loop A/B, so a Twitter/X post can link to it.

What the doc covers

What it is — a two-agent research loop that grows one knowledge base: a worker finds web sources for the open knowledge-gaps; a verifying driver (a second agent) judges each fetched source before it's saved (on-topic relevance + near-duplicate), then gates on a readiness spec. Compared against a single-agent loop that just accumulates. Points at the real code: runTwoAgentResearchLoop (src/two-agent-research-loop.ts), createWebResearchWorker + createVerifyingResearchDriver (src/web-research-worker.ts), and the A/B harness (tests/loops/research-loop-equal-compute.test.ts).
The result — real run, n=9 ML topics, equal compute, glm-5.2: the two-agent loop admits 2.33 fewer sources/topic at identical (1.00) coverage, 95% CI [1.78, 2.89] via agent-eval pairedBootstrap. Includes the per-topic delta table.
The honest nuance — the win is mostly de-duplication (same canonical paper mirrored across arxiv/openreview/neurips/blogs), which is band-independent and fires on every topic; off-scope rejection is real but the minority. A cheap content-hash/canonical-URL dedup would capture most of the value without an LLM verifier.
Threats to validity — verifier is also the judge (admitted-count is a proxy); deltas are conservative (single-agent stops early on readiness); n=2 clean offline controls is thin; glm-5.2-specific; high web run-to-run variance.
How to run — the offline A/B (deterministic, no creds) and the live sweep (AGENT_KNOWLEDGE_LIVE=1 + TANGLE_API_KEY + AGENT_KNOWLEDGE_LIVE_GOALS |-separated, ~$0.20 for the full 9-topic sweep).

Verification

pnpm run lint — clean (2 pre-existing warnings in wikilinks.ts, unrelated).
pnpm exec vitest run tests/loops/research-loop-equal-compute.test.ts — 2 passed, 1 skipped (the live arm). The documented offline command ran green: [A/B @ B<=6 passes] two-agent: passes=2 junk=0 coverage=1.00 | single-agent: passes=2 junk=2 coverage=1.00.

Links the doc from the README's two-agent research loop section. Docs-only + one README line; no code changes.

Public-facing writeup of the equal-compute A/B between the two-agent research loop (worker + verifying driver) and a single-agent accumulate-only loop. Records the real result (n=9 ML topics, glm-5.2: 2.33 fewer admitted sources/topic at identical coverage, 95% CI [1.78, 2.89] via pairedBootstrap), the honest nuance (the win is mostly de-duplication, not relevance filtering), threats to validity, and the verified offline + live run commands. Links it from the README's two-agent research loop section.

tangletools

✅ Auto-approved PR — `c4ae34fe`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T22:50:00Z}

…,3.00])

tangletools

✅ Auto-approved PR — `367bbe4f`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T23:02:55Z}

tangletools previously approved these changes Jun 24, 2026

View reviewed changes

docs: note the independent re-run reproduced the A/B (+2.67, CI [2.22…

367bbe4

…,3.00])

drewstone dismissed tangletools’s stale review via 367bbe4 June 24, 2026 23:02

drewstone merged commit c8beaf6 into main Jun 24, 2026

tangletools approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: two-agent research-loop A/B — result + how to run#34

docs: two-agent research-loop A/B — result + how to run#34
drewstone merged 2 commits into
mainfrom
docs/two-agent-research-ab

drewstone commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 24, 2026

What the doc covers

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — c4ae34fe

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 367bbe4f

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `c4ae34fe`

✅ Auto-approved PR — `367bbe4f`