Agent Knowledge Forge

Agent Knowledge Forge is a multi-agent knowledge pipeline for frontier AI-agent engineering. It searches recent sources, screens noisy results, extracts source-grounded knowledge cards, and exports artifacts that other agents or humans can use.

It produces:

source-grounded knowledge_index.md/json
RAG-ready knowledge_chunks.jsonl
compact and uncompressed agent memory packs
human-readable learning reports
a read-only MCP server over completed runs
evaluation metrics and next-run improvement plans

The project focuses on practical agent-development knowledge: memory, RAG, MCP/tool protocols, multi-agent handoffs, planning/reasoning, task hardness, tool routing, durable state/runtime, observability, coding agents, browser/computer use, context engineering, identity/access, human review, guardrails, cost/latency, safety, and production hardening.

Requirements

Python 3.11+
One OpenAI-compatible LLM API key for LLM stages
One search API key for automatic discovery: Tavily, Brave Search, SerpAPI, or Exa
Playwright Chromium dependencies for Crawl4AI fallback ingestion

The shortest path is pip install -r requirements.txt. For development, install the package in editable mode with the dev extra.

Install

git clone https://github.com/redflyingfish/Agent-Knowledge-Forge.git
cd Agent-Knowledge-Forge

python -m venv .venv
.venv/bin/python -m pip install --upgrade pip setuptools wheel
.venv/bin/python -m pip install -r requirements.txt
.venv/bin/python -m pip install -e ".[dev]"
.venv/bin/python -m playwright install --with-deps chromium

If you use uv:

uv sync --extra dev
bash scripts/install_playwright_deps.sh

Configure

Create .env from the example:

cp .env.example .env

Minimal LLM configuration:

AKH_LLM_API_KEY=your_provider_key
AKH_LLM_BASE_URL=https://api.deepseek.com/v1
AKH_LLM_FAST_MODEL=deepseek-chat
AKH_LLM_PRO_MODEL=deepseek-reasoner

Minimal search configuration:

AKH_SEARCH_PROVIDER=tavily
AKH_SEARCH_API_KEY=your_search_key
AKH_SEARCH_MAX_QUERIES=80
AKH_SEARCH_RESULTS_PER_QUERY=5

Provider-specific search keys are also supported:

AKH_TAVILY_API_KEY=...
AKH_BRAVE_SEARCH_API_KEY=...
AKH_SERPAPI_API_KEY=...
AKH_EXA_API_KEY=...

Check LLM configuration without spending tokens:

.venv/bin/akf llm-check --stage screening

Make a tiny test call:

.venv/bin/akf llm-check --stage screening --ping

The legacy harvester command is kept as a compatibility alias for akf.

Quick Start

Run automatic discovery only:

.venv/bin/akf discover \
  --year 2026 \
  --max-queries 40 \
  --results-per-query 5 \
  --out data/discovery-run

Run the complete multi-agent pipeline:

.venv/bin/akf run-team \
  --discover \
  --discovery-year 2026 \
  --use-llm-agents \
  --report-language both \
  --llm-max-candidates 25 \
  --max-selected-urls 80 \
  --include-review \
  --concurrency 4 \
  --llm-extraction-concurrency 2 \
  --ingestion-timeout 45 \
  --max-markdown-chars 120000 \
  --out data/broad-run

Run a tiny explicit-URL smoke test:

.venv/bin/akf run-team \
  --url https://modelcontextprotocol.io/specification/2025-06-18 \
  --use-llm-agents \
  --out data/team-run

After a run, inspect:

data/<run>/03_knowledge_base/knowledge_index.md
data/<run>/03_knowledge_base/knowledge_chunks.jsonl
data/<run>/04_memory_packs/agent_memory_pack.compact.md
data/<run>/05_human_report/frontier_learning_report.md
data/<run>/06_evaluation/evaluation_metrics.md

Mine candidate emerging topics from the ingested articles:

.venv/bin/akf topic-discovery \
  --in-dir data/broad-run/02_ingested \
  --search-report data/broad-run/00_discovery/search_results.json \
  --out data/broad-run/07_topic_discovery

This is a bounded topic-mining check. It does not mutate the fixed taxonomy or keep searching forever. It reports known-topic coverage, candidate new topics, promoted cross-source topics, source buckets such as frontier_scout and stop_signal, and suggested follow-up queries. To test semantic extraction, add --use-llm-topic-mining.

Common Make Commands

The same workflow is available through make:

make install-dev
make playwright
make llm-check
make check
make discover OUT=data/discovery-run YEAR=2026
make run OUT=data/broad-run YEAR=2026
make topic-discovery RUN_DIR=data/broad-run
make smoke
make mcp-server RUN_DIR=data/broad-run

MCP Server

Expose a completed run as a read-only MCP knowledge server:

.venv/bin/akf mcp-server \
  --run-dir data/broad-run

The default transport is stdio, which is the usual mode for local MCP clients. The server exposes:

get_corpus_summary
list_topics
search_agent_knowledge
get_knowledge_card
read_memory_pack
read_human_report

Example MCP client command configuration:

{
  "mcpServers": {
    "agent-knowledge-forge": {
      "command": "/absolute/path/to/.venv/bin/akf",
      "args": [
        "mcp-server",
        "--run-dir",
        "/absolute/path/to/data/broad-run"
      ]
    }
  }
}

For quick local HTTP inspection:

.venv/bin/akf mcp-server \
  --run-dir data/broad-run \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8000

Why This Exists

Agent frameworks and best practices change quickly. A coding agent often has stale built-in knowledge, while raw web search is noisy and hard to reuse. This project builds a repeatable pipeline that finds recent high-signal sources, filters them, extracts source-grounded patterns, and exports both machine-readable and human-readable knowledge artifacts.

The design is intentionally close to recent automated research and survey systems: staged retrieval/synthesis, schema-checked extraction, evidence-preserving outputs, frozen local evaluation sets, and explicit next-run feedback are treated as first-class engineering requirements.

Features

Unified multi-agent runner: one command runs automatic search discovery, filtering, deep reading, memory synthesis, human report writing, and evaluation.
Provider-neutral search discovery: executes expanded frontier-agent queries through Tavily, Brave Search, SerpAPI, or Exa, then writes deduplicated URL candidates.
Emerging topic mining: reads ingested frontier/stop-signal sources, extracts candidate new topics with deterministic or LLM-assisted parsing, and reports coverage/yield metrics without unbounded re-search.
LLM expert stages: optional OpenAI-compatible LLM calls for semantic screening, knowledge-card extraction, compact memory synthesis, human learning reports, and quality reflection.
2025+ broad frontier discovery policy: plans broad searches across recent papers, repos, docs, and technical blogs, then screens candidates with relevance, authority, preview text, and optional LLM judgment.
Robust ingestion: Jina Reader first, Crawl4AI fallback, per-URL hard timeout, Markdown cleanup, and token/character budgets.
Source screening: combines relevance, authority, freshness, novelty, GitHub metadata, and optional LLM judgment.
Knowledge cards: each card records the claim, why it matters, implementation takeaway, topics, scores, source URL, and evidence.
RAG-ready chunks: knowledge_chunks.jsonl/json/md preserve claim, source, evidence, topics, scores, and retrieval-query hints for vector stores or file search.
Topic clusters: knowledge_clusters.md/json groups cards into survey-style themes for browsing, gap analysis, and next-run planning.
Memory layers:
- agent_memory_pack.uncompressed.md/json: long-term memory/RAG/review layer with evidence.
- agent_memory_pack.md/json: bounded working memory.
- agent_memory_pack.compact.md, ultra_compact.md, llm_compact.md: direct context-injection layers.
Human learning report: an English study guide with richer themes, source URLs, comparison tables, glossary, checklist, reading path, and practice questions.
Evaluation loop: baseline metrics, evidence/source-diversity checks, optional LLM reflection, and a deterministic next-run plan for tuning discovery, screening, reading, memory, and evaluation.
Recovery path: if a long run stops after ingestion, analysis and finalization can resume from saved artifacts.

Architecture

search_discovery
  -> query_plan.md/json, search_results.md/json, candidate_urls.txt

discovery_filter
  -> source_screening.json, selected_urls.txt

deep_reader
  -> *.knowledge.md/json, knowledge_index.md/json, retrieval_manifest.md/json,
     knowledge_chunks.jsonl, knowledge_clusters.md/json

memory_synthesizer
  -> uncompressed, working, compact, ultra-compact, and optional LLM compact memory

human_learning_writer
  -> frontier_learning_report.md

quality_evaluator
  -> evaluation_metrics.md/json, quality_reflection.md/json, next_run_plan.md/json

The specialist agents are role stages inside one orchestrated command. Users do not need to start separate background agents.

Generate an expanded search plan without network calls:

.venv/bin/akf query-plan \
  --year 2026 \
  --topic memory \
  --topic rag \
  --topic agent_hardening \
  --out data/query-plan

Recovery Commands

If a long run finishes ingestion but stops before analysis:

.venv/bin/akf analyze \
  --in-dir data/broad-run/02_ingested \
  --out data/broad-run/03_knowledge_base \
  --use-llm-extraction \
  --llm-extraction-concurrency 2

.venv/bin/akf finalize-run \
  --run-dir data/broad-run \
  --use-llm-agents \
  --report-language both \
  --memory-max-entries 80

knowledge_index.md/json keeps every extracted card by default. Use --max-index-entries only for small smoke tests or intentionally bounded review lists; compact context budgets are handled later by memory-pack outputs.

Main Outputs

data/<run>/
├── 00_discovery/
│   ├── query_plan.md/json
│   ├── search_results.md/json
│   ├── discovery_stats.json
│   └── candidate_urls.txt
├── 01_screening/
│   ├── source_screening.md/json
│   └── selected_urls.txt
├── 02_ingested/
│   ├── *.md
│   ├── *.json
│   └── run_stats.json
├── 03_knowledge_base/
│   ├── *.knowledge.md/json
│   ├── knowledge_index.md/json
│   ├── knowledge_index.rich.md
│   ├── frontier_brief.md/json
│   ├── retrieval_manifest.md/json
│   ├── knowledge_chunks.md/json/jsonl
│   └── knowledge_clusters.md/json
├── 04_memory_packs/
│   ├── agent_memory_pack.uncompressed.md/json
│   ├── agent_memory_pack.md/json
│   ├── agent_memory_pack.compact.md/json
│   ├── agent_memory_pack.ultra_compact.md/json
│   └── agent_memory_pack.llm_compact.md/json
├── 05_human_report/
│   ├── frontier_learning_report.md
│   ├── frontier_learning_report.en.md
│   └── frontier_learning_report.zh.md
├── 06_evaluation/
│   ├── evaluation_metrics.md/json
│   ├── quality_reflection.md/json
│   └── next_run_plan.md/json
└── team_run_trace.md/json

Evaluation

Compute baseline metrics:

.venv/bin/akf evaluate \
  --screening-report data/broad-run/01_screening/source_screening.json \
  --knowledge-index data/broad-run/03_knowledge_base/knowledge_index.json \
  --markdown-dir data/broad-run/04_memory_packs \
  --out data/broad-run/06_evaluation

Curated evaluation sets can be kept locally under data/ to calibrate screening and coverage, but they are ignored by default because they often contain local judgments, source snapshots, and paid LLM outputs.

Development

.venv/bin/ruff check src/agent_knowledge_harvester tests
.venv/bin/python -m pytest -q

Current test coverage includes ingestion timeout behavior, source screening, novelty checks, LLM JSON parsing, LLM knowledge-card extraction, query expansion, multi-agent runtime traces, memory-pack generation, retrieval manifests, human-report prompts, and quality reflection.

Project Status

This is a research/engineering prototype intended to demonstrate a production-oriented agent knowledge pipeline. It is suitable for experimentation, portfolio review, and further extension into a hosted RAG/MCP knowledge service.

The current open-source pipeline supports automatic search discovery, explicit URLs, optional local URL files, and GitHub Trending. The next natural milestones are stronger recall evaluation, richer connector coverage, and hosted retrieval/MCP serving.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
scripts		scripts
src/agent_knowledge_harvester		src/agent_knowledge_harvester
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
agent.md		agent.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Knowledge Forge

Requirements

Install

Configure

Quick Start

Common Make Commands

MCP Server

Why This Exists

Features

Architecture

Recovery Commands

Main Outputs

Evaluation

Development

Project Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Knowledge Forge

Requirements

Install

Configure

Quick Start

Common Make Commands

MCP Server

Why This Exists

Features

Architecture

Recovery Commands

Main Outputs

Evaluation

Development

Project Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages