feat(research): real web-research worker + genuinely-live A/B arm by drewstone · Pull Request #32 · tangle-network/agent-knowledge

drewstone · 2026-06-24T22:16:31Z

What

Makes the two-agent research loop's live A/B arm genuinely live. Before this, tests/loops/research-loop-equal-compute.test.ts's describe.skipIf(!AGENT_KNOWLEDGE_LIVE) block was a skeleton — it ran the SAME offline naive proposer over a hardcoded chicken-coop pool with a junk/-prefix verifier, so AGENT_KNOWLEDGE_LIVE=1 proved nothing about real research. The repo had no committed real web-research worker.

New module — `src/web-research-worker.ts` (general, any-topic, no hardcoded corpus)

createWebResearchWorker — given the open knowledge gaps, glm-5.2 forms focused search queries → real web search over the Tangle router (POST /v1/search, the same endpoint tcloud mcp's web_search tool forwards to) → fetches each top hit with the repo's politeFetch → reduces with htmlToText → proposes citing knowledge/*.md pages via buildPages. Conforms to the loop's ResearchWorker contract.
createVerifyingResearchDriver — the differentiated driver role: a glm-5.2 verifySource pass judging each fetched source's on-topic relevance to the goal + open gaps and near-duplicate against the round. Fail-closed (reject) on router/parse failure so an unverified source never poisons the KB.
createTangleRouterClient — dependency-free router client over fetch (search + chat), so it works with or without the tcloud CLI installed. glm-5.2 calls get max_tokens >= 1200 so visible content isn't starved by hidden reasoning_content.

Live-arm wiring

The live arm injects the real worker + real verifier + topic-relevant readiness specs, runs both arms at equal agent-pass budget, cost-gates with a cheap glm-5.2 smoke first, asserts the worker actually web-searched (fails loud on zero sources = a false null), and reports admitted-source count + coverage per arm with agent-eval's pairedBootstrap. The offline arm is unchanged (CI, $0, deterministic) — the arm-runners only gain defaulted parameters.

For real research there is no planted-junk oracle, so the live cleanliness signal is admitted-source COUNT: the verifying driver rejects off-topic fetches, so the two-agent KB admits FEWER sources at equal-or-higher coverage — the real-world analogue of the offline junk count.

Verification (all run)

pnpm run lint / typecheck / build / offline pnpm test: all green (131 passed, 7 creds-gated skips; offline arm byte-identical: two-agent junk=0 coverage=1.00 | single-agent junk=2 coverage=1.00).
Live run (AGENT_KNOWLEDGE_LIVE=1, glm-5.2, goal "self-speculative decoding", B≤4): the worker really web-searched (Google Research / NVIDIA / BentoML pages on speculative decoding) and the A/B reported:
- two-agent (real worker + LLM verifier): admitted=2, coverage=1.00
- single-agent (real worker, no verifier): admitted=3, coverage=1.00
- paired delta (single−two admitted) = 1 → the verifying driver kept the KB cleaner at equal coverage and equal compute.

DO NOT MERGE — review first.

Replace the live A/B arm's skeleton (which ran the same offline naive proposer over a hardcoded pool with a junk/-prefix verifier) with a real, any-topic web-research worker and a real LLM verifying driver, so AGENT_KNOWLEDGE_LIVE=1 runs a genuine experiment. src/web-research-worker.ts (general, no hardcoded corpus): - createWebResearchWorker — glm-5.2 turns open gaps into search queries, runs real web search over the router (/v1/search, the endpoint tcloud mcp's web_search forwards to), fetches with politeFetch, reduces with htmlToText, proposes citing pages via buildPages. Conforms to the loop's ResearchWorker contract. - createVerifyingResearchDriver — a glm-5.2 verifySource pass judging each fetched source's on-topic relevance + near-duplicate against the round, fail-closed on parse/router failure. - createTangleRouterClient — dependency-free router client over fetch (search + chat), so it works with or without the tcloud CLI installed. Reasoning-model floor: glm-5.2 calls get max_tokens >= 1200 so visible content isn't starved by reasoning_content. Live arm: injects the real worker + verifier + topic-relevant readiness specs, runs both arms at equal agent-pass budget, cost-gates with a cheap glm-5.2 smoke first, asserts the worker actually web-searched, and reports admitted-source count + coverage per arm with agent-eval's pairedBootstrap. The offline arm is unchanged (CI, $0, deterministic) — the arm-runners gain defaulted parameters only.

tangletools

✅ Auto-approved PR — `113f6efc`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-24T22:16:38Z}

tangletools approved these changes Jun 24, 2026

View reviewed changes

drewstone merged commit 99e3bc8 into main Jun 24, 2026
1 check passed

drewstone mentioned this pull request Jun 24, 2026

chore(release): agent-knowledge 1.9.0 — real web-research worker + live A/B #33

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(research): real web-research worker + genuinely-live A/B arm#32

feat(research): real web-research worker + genuinely-live A/B arm#32
drewstone merged 1 commit into
mainfrom
feat/real-web-research-worker

drewstone commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 24, 2026

What

New module — src/web-research-worker.ts (general, any-topic, no hardcoded corpus)

Live-arm wiring

Verification (all run)

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 113f6efc

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New module — `src/web-research-worker.ts` (general, any-topic, no hardcoded corpus)

✅ Auto-approved PR — `113f6efc`