Agent Knowledge Forge is a multi-agent knowledge pipeline for frontier AI-agent engineering. It searches recent sources, screens noisy results, extracts source-grounded knowledge cards, and exports artifacts that other agents or humans can use.
It produces:
- source-grounded
knowledge_index.md/json - RAG-ready
knowledge_chunks.jsonl - compact and uncompressed agent memory packs
- human-readable learning reports
- a read-only MCP server over completed runs
- evaluation metrics and next-run improvement plans
The project focuses on practical agent-development knowledge: memory, RAG, MCP/tool protocols, multi-agent handoffs, planning/reasoning, task hardness, tool routing, durable state/runtime, observability, coding agents, browser/computer use, context engineering, identity/access, human review, guardrails, cost/latency, safety, and production hardening.
- Python 3.11+
- One OpenAI-compatible LLM API key for LLM stages
- One search API key for automatic discovery: Tavily, Brave Search, SerpAPI, or Exa
- Playwright Chromium dependencies for Crawl4AI fallback ingestion
The shortest path is pip install -r requirements.txt. For development, install the package in editable mode with the dev extra.
git clone https://github.com/redflyingfish/Agent-Knowledge-Forge.git
cd Agent-Knowledge-Forge
python -m venv .venv
.venv/bin/python -m pip install --upgrade pip setuptools wheel
.venv/bin/python -m pip install -r requirements.txt
.venv/bin/python -m pip install -e ".[dev]"
.venv/bin/python -m playwright install --with-deps chromiumIf you use uv:
uv sync --extra dev
bash scripts/install_playwright_deps.shCreate .env from the example:
cp .env.example .envMinimal LLM configuration:
AKH_LLM_API_KEY=your_provider_key
AKH_LLM_BASE_URL=https://api.deepseek.com/v1
AKH_LLM_FAST_MODEL=deepseek-chat
AKH_LLM_PRO_MODEL=deepseek-reasonerMinimal search configuration:
AKH_SEARCH_PROVIDER=tavily
AKH_SEARCH_API_KEY=your_search_key
AKH_SEARCH_MAX_QUERIES=80
AKH_SEARCH_RESULTS_PER_QUERY=5Provider-specific search keys are also supported:
AKH_TAVILY_API_KEY=...
AKH_BRAVE_SEARCH_API_KEY=...
AKH_SERPAPI_API_KEY=...
AKH_EXA_API_KEY=...Check LLM configuration without spending tokens:
.venv/bin/akf llm-check --stage screeningMake a tiny test call:
.venv/bin/akf llm-check --stage screening --pingThe legacy harvester command is kept as a compatibility alias for akf.
Run automatic discovery only:
.venv/bin/akf discover \
--year 2026 \
--max-queries 40 \
--results-per-query 5 \
--out data/discovery-runRun the complete multi-agent pipeline:
.venv/bin/akf run-team \
--discover \
--discovery-year 2026 \
--use-llm-agents \
--report-language both \
--llm-max-candidates 25 \
--max-selected-urls 80 \
--include-review \
--concurrency 4 \
--llm-extraction-concurrency 2 \
--ingestion-timeout 45 \
--max-markdown-chars 120000 \
--out data/broad-runRun a tiny explicit-URL smoke test:
.venv/bin/akf run-team \
--url https://modelcontextprotocol.io/specification/2025-06-18 \
--use-llm-agents \
--out data/team-runAfter a run, inspect:
data/<run>/03_knowledge_base/knowledge_index.md
data/<run>/03_knowledge_base/knowledge_chunks.jsonl
data/<run>/04_memory_packs/agent_memory_pack.compact.md
data/<run>/05_human_report/frontier_learning_report.md
data/<run>/06_evaluation/evaluation_metrics.md
Mine candidate emerging topics from the ingested articles:
.venv/bin/akf topic-discovery \
--in-dir data/broad-run/02_ingested \
--search-report data/broad-run/00_discovery/search_results.json \
--out data/broad-run/07_topic_discoveryThis is a bounded topic-mining check. It does not mutate the fixed taxonomy or
keep searching forever. It reports known-topic coverage, candidate new topics,
promoted cross-source topics, source buckets such as frontier_scout and
stop_signal, and suggested follow-up queries. To test semantic extraction,
add --use-llm-topic-mining.
The same workflow is available through make:
make install-dev
make playwright
make llm-check
make check
make discover OUT=data/discovery-run YEAR=2026
make run OUT=data/broad-run YEAR=2026
make topic-discovery RUN_DIR=data/broad-run
make smoke
make mcp-server RUN_DIR=data/broad-runExpose a completed run as a read-only MCP knowledge server:
.venv/bin/akf mcp-server \
--run-dir data/broad-runThe default transport is stdio, which is the usual mode for local MCP clients. The server exposes:
get_corpus_summarylist_topicssearch_agent_knowledgeget_knowledge_cardread_memory_packread_human_report
Example MCP client command configuration:
{
"mcpServers": {
"agent-knowledge-forge": {
"command": "/absolute/path/to/.venv/bin/akf",
"args": [
"mcp-server",
"--run-dir",
"/absolute/path/to/data/broad-run"
]
}
}
}For quick local HTTP inspection:
.venv/bin/akf mcp-server \
--run-dir data/broad-run \
--transport streamable-http \
--host 127.0.0.1 \
--port 8000Agent frameworks and best practices change quickly. A coding agent often has stale built-in knowledge, while raw web search is noisy and hard to reuse. This project builds a repeatable pipeline that finds recent high-signal sources, filters them, extracts source-grounded patterns, and exports both machine-readable and human-readable knowledge artifacts.
The design is intentionally close to recent automated research and survey systems: staged retrieval/synthesis, schema-checked extraction, evidence-preserving outputs, frozen local evaluation sets, and explicit next-run feedback are treated as first-class engineering requirements.
- Unified multi-agent runner: one command runs automatic search discovery, filtering, deep reading, memory synthesis, human report writing, and evaluation.
- Provider-neutral search discovery: executes expanded frontier-agent queries through Tavily, Brave Search, SerpAPI, or Exa, then writes deduplicated URL candidates.
- Emerging topic mining: reads ingested frontier/stop-signal sources, extracts candidate new topics with deterministic or LLM-assisted parsing, and reports coverage/yield metrics without unbounded re-search.
- LLM expert stages: optional OpenAI-compatible LLM calls for semantic screening, knowledge-card extraction, compact memory synthesis, human learning reports, and quality reflection.
- 2025+ broad frontier discovery policy: plans broad searches across recent papers, repos, docs, and technical blogs, then screens candidates with relevance, authority, preview text, and optional LLM judgment.
- Robust ingestion: Jina Reader first, Crawl4AI fallback, per-URL hard timeout, Markdown cleanup, and token/character budgets.
- Source screening: combines relevance, authority, freshness, novelty, GitHub metadata, and optional LLM judgment.
- Knowledge cards: each card records the claim, why it matters, implementation takeaway, topics, scores, source URL, and evidence.
- RAG-ready chunks:
knowledge_chunks.jsonl/json/mdpreserve claim, source, evidence, topics, scores, and retrieval-query hints for vector stores or file search. - Topic clusters:
knowledge_clusters.md/jsongroups cards into survey-style themes for browsing, gap analysis, and next-run planning. - Memory layers:
agent_memory_pack.uncompressed.md/json: long-term memory/RAG/review layer with evidence.agent_memory_pack.md/json: bounded working memory.agent_memory_pack.compact.md,ultra_compact.md,llm_compact.md: direct context-injection layers.
- Human learning report: an English study guide with richer themes, source URLs, comparison tables, glossary, checklist, reading path, and practice questions.
- Evaluation loop: baseline metrics, evidence/source-diversity checks, optional LLM reflection, and a deterministic next-run plan for tuning discovery, screening, reading, memory, and evaluation.
- Recovery path: if a long run stops after ingestion, analysis and finalization can resume from saved artifacts.
search_discovery
-> query_plan.md/json, search_results.md/json, candidate_urls.txt
discovery_filter
-> source_screening.json, selected_urls.txt
deep_reader
-> *.knowledge.md/json, knowledge_index.md/json, retrieval_manifest.md/json,
knowledge_chunks.jsonl, knowledge_clusters.md/json
memory_synthesizer
-> uncompressed, working, compact, ultra-compact, and optional LLM compact memory
human_learning_writer
-> frontier_learning_report.md
quality_evaluator
-> evaluation_metrics.md/json, quality_reflection.md/json, next_run_plan.md/json
The specialist agents are role stages inside one orchestrated command. Users do not need to start separate background agents.
Generate an expanded search plan without network calls:
.venv/bin/akf query-plan \
--year 2026 \
--topic memory \
--topic rag \
--topic agent_hardening \
--out data/query-planIf a long run finishes ingestion but stops before analysis:
.venv/bin/akf analyze \
--in-dir data/broad-run/02_ingested \
--out data/broad-run/03_knowledge_base \
--use-llm-extraction \
--llm-extraction-concurrency 2
.venv/bin/akf finalize-run \
--run-dir data/broad-run \
--use-llm-agents \
--report-language both \
--memory-max-entries 80knowledge_index.md/json keeps every extracted card by default. Use --max-index-entries only for small smoke tests or intentionally bounded review lists; compact context budgets are handled later by memory-pack outputs.
data/<run>/
├── 00_discovery/
│ ├── query_plan.md/json
│ ├── search_results.md/json
│ ├── discovery_stats.json
│ └── candidate_urls.txt
├── 01_screening/
│ ├── source_screening.md/json
│ └── selected_urls.txt
├── 02_ingested/
│ ├── *.md
│ ├── *.json
│ └── run_stats.json
├── 03_knowledge_base/
│ ├── *.knowledge.md/json
│ ├── knowledge_index.md/json
│ ├── knowledge_index.rich.md
│ ├── frontier_brief.md/json
│ ├── retrieval_manifest.md/json
│ ├── knowledge_chunks.md/json/jsonl
│ └── knowledge_clusters.md/json
├── 04_memory_packs/
│ ├── agent_memory_pack.uncompressed.md/json
│ ├── agent_memory_pack.md/json
│ ├── agent_memory_pack.compact.md/json
│ ├── agent_memory_pack.ultra_compact.md/json
│ └── agent_memory_pack.llm_compact.md/json
├── 05_human_report/
│ ├── frontier_learning_report.md
│ ├── frontier_learning_report.en.md
│ └── frontier_learning_report.zh.md
├── 06_evaluation/
│ ├── evaluation_metrics.md/json
│ ├── quality_reflection.md/json
│ └── next_run_plan.md/json
└── team_run_trace.md/json
Compute baseline metrics:
.venv/bin/akf evaluate \
--screening-report data/broad-run/01_screening/source_screening.json \
--knowledge-index data/broad-run/03_knowledge_base/knowledge_index.json \
--markdown-dir data/broad-run/04_memory_packs \
--out data/broad-run/06_evaluationCurated evaluation sets can be kept locally under data/ to calibrate screening and coverage, but they are ignored by default because they often contain local judgments, source snapshots, and paid LLM outputs.
.venv/bin/ruff check src/agent_knowledge_harvester tests
.venv/bin/python -m pytest -qCurrent test coverage includes ingestion timeout behavior, source screening, novelty checks, LLM JSON parsing, LLM knowledge-card extraction, query expansion, multi-agent runtime traces, memory-pack generation, retrieval manifests, human-report prompts, and quality reflection.
This is a research/engineering prototype intended to demonstrate a production-oriented agent knowledge pipeline. It is suitable for experimentation, portfolio review, and further extension into a hosted RAG/MCP knowledge service.
The current open-source pipeline supports automatic search discovery, explicit URLs, optional local URL files, and GitHub Trending. The next natural milestones are stronger recall evaluation, richer connector coverage, and hosted retrieval/MCP serving.