Skip to content

redflyingfish/Agent-Knowledge-Forge

Repository files navigation

Agent Knowledge Forge

Agent Knowledge Forge is a multi-agent knowledge pipeline for frontier AI-agent engineering. It searches recent sources, screens noisy results, extracts source-grounded knowledge cards, and exports artifacts that other agents or humans can use.

It produces:

  • source-grounded knowledge_index.md/json
  • RAG-ready knowledge_chunks.jsonl
  • compact and uncompressed agent memory packs
  • human-readable learning reports
  • a read-only MCP server over completed runs
  • evaluation metrics and next-run improvement plans

The project focuses on practical agent-development knowledge: memory, RAG, MCP/tool protocols, multi-agent handoffs, planning/reasoning, task hardness, tool routing, durable state/runtime, observability, coding agents, browser/computer use, context engineering, identity/access, human review, guardrails, cost/latency, safety, and production hardening.

Requirements

  • Python 3.11+
  • One OpenAI-compatible LLM API key for LLM stages
  • One search API key for automatic discovery: Tavily, Brave Search, SerpAPI, or Exa
  • Playwright Chromium dependencies for Crawl4AI fallback ingestion

The shortest path is pip install -r requirements.txt. For development, install the package in editable mode with the dev extra.

Install

git clone https://github.com/redflyingfish/Agent-Knowledge-Forge.git
cd Agent-Knowledge-Forge

python -m venv .venv
.venv/bin/python -m pip install --upgrade pip setuptools wheel
.venv/bin/python -m pip install -r requirements.txt
.venv/bin/python -m pip install -e ".[dev]"
.venv/bin/python -m playwright install --with-deps chromium

If you use uv:

uv sync --extra dev
bash scripts/install_playwright_deps.sh

Configure

Create .env from the example:

cp .env.example .env

Minimal LLM configuration:

AKH_LLM_API_KEY=your_provider_key
AKH_LLM_BASE_URL=https://api.deepseek.com/v1
AKH_LLM_FAST_MODEL=deepseek-chat
AKH_LLM_PRO_MODEL=deepseek-reasoner

Minimal search configuration:

AKH_SEARCH_PROVIDER=tavily
AKH_SEARCH_API_KEY=your_search_key
AKH_SEARCH_MAX_QUERIES=80
AKH_SEARCH_RESULTS_PER_QUERY=5

Provider-specific search keys are also supported:

AKH_TAVILY_API_KEY=...
AKH_BRAVE_SEARCH_API_KEY=...
AKH_SERPAPI_API_KEY=...
AKH_EXA_API_KEY=...

Check LLM configuration without spending tokens:

.venv/bin/akf llm-check --stage screening

Make a tiny test call:

.venv/bin/akf llm-check --stage screening --ping

The legacy harvester command is kept as a compatibility alias for akf.

Quick Start

Run automatic discovery only:

.venv/bin/akf discover \
  --year 2026 \
  --max-queries 40 \
  --results-per-query 5 \
  --out data/discovery-run

Run the complete multi-agent pipeline:

.venv/bin/akf run-team \
  --discover \
  --discovery-year 2026 \
  --use-llm-agents \
  --report-language both \
  --llm-max-candidates 25 \
  --max-selected-urls 80 \
  --include-review \
  --concurrency 4 \
  --llm-extraction-concurrency 2 \
  --ingestion-timeout 45 \
  --max-markdown-chars 120000 \
  --out data/broad-run

Run a tiny explicit-URL smoke test:

.venv/bin/akf run-team \
  --url https://modelcontextprotocol.io/specification/2025-06-18 \
  --use-llm-agents \
  --out data/team-run

After a run, inspect:

data/<run>/03_knowledge_base/knowledge_index.md
data/<run>/03_knowledge_base/knowledge_chunks.jsonl
data/<run>/04_memory_packs/agent_memory_pack.compact.md
data/<run>/05_human_report/frontier_learning_report.md
data/<run>/06_evaluation/evaluation_metrics.md

Mine candidate emerging topics from the ingested articles:

.venv/bin/akf topic-discovery \
  --in-dir data/broad-run/02_ingested \
  --search-report data/broad-run/00_discovery/search_results.json \
  --out data/broad-run/07_topic_discovery

This is a bounded topic-mining check. It does not mutate the fixed taxonomy or keep searching forever. It reports known-topic coverage, candidate new topics, promoted cross-source topics, source buckets such as frontier_scout and stop_signal, and suggested follow-up queries. To test semantic extraction, add --use-llm-topic-mining.

Common Make Commands

The same workflow is available through make:

make install-dev
make playwright
make llm-check
make check
make discover OUT=data/discovery-run YEAR=2026
make run OUT=data/broad-run YEAR=2026
make topic-discovery RUN_DIR=data/broad-run
make smoke
make mcp-server RUN_DIR=data/broad-run

MCP Server

Expose a completed run as a read-only MCP knowledge server:

.venv/bin/akf mcp-server \
  --run-dir data/broad-run

The default transport is stdio, which is the usual mode for local MCP clients. The server exposes:

  • get_corpus_summary
  • list_topics
  • search_agent_knowledge
  • get_knowledge_card
  • read_memory_pack
  • read_human_report

Example MCP client command configuration:

{
  "mcpServers": {
    "agent-knowledge-forge": {
      "command": "/absolute/path/to/.venv/bin/akf",
      "args": [
        "mcp-server",
        "--run-dir",
        "/absolute/path/to/data/broad-run"
      ]
    }
  }
}

For quick local HTTP inspection:

.venv/bin/akf mcp-server \
  --run-dir data/broad-run \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8000

Why This Exists

Agent frameworks and best practices change quickly. A coding agent often has stale built-in knowledge, while raw web search is noisy and hard to reuse. This project builds a repeatable pipeline that finds recent high-signal sources, filters them, extracts source-grounded patterns, and exports both machine-readable and human-readable knowledge artifacts.

The design is intentionally close to recent automated research and survey systems: staged retrieval/synthesis, schema-checked extraction, evidence-preserving outputs, frozen local evaluation sets, and explicit next-run feedback are treated as first-class engineering requirements.

Features

  • Unified multi-agent runner: one command runs automatic search discovery, filtering, deep reading, memory synthesis, human report writing, and evaluation.
  • Provider-neutral search discovery: executes expanded frontier-agent queries through Tavily, Brave Search, SerpAPI, or Exa, then writes deduplicated URL candidates.
  • Emerging topic mining: reads ingested frontier/stop-signal sources, extracts candidate new topics with deterministic or LLM-assisted parsing, and reports coverage/yield metrics without unbounded re-search.
  • LLM expert stages: optional OpenAI-compatible LLM calls for semantic screening, knowledge-card extraction, compact memory synthesis, human learning reports, and quality reflection.
  • 2025+ broad frontier discovery policy: plans broad searches across recent papers, repos, docs, and technical blogs, then screens candidates with relevance, authority, preview text, and optional LLM judgment.
  • Robust ingestion: Jina Reader first, Crawl4AI fallback, per-URL hard timeout, Markdown cleanup, and token/character budgets.
  • Source screening: combines relevance, authority, freshness, novelty, GitHub metadata, and optional LLM judgment.
  • Knowledge cards: each card records the claim, why it matters, implementation takeaway, topics, scores, source URL, and evidence.
  • RAG-ready chunks: knowledge_chunks.jsonl/json/md preserve claim, source, evidence, topics, scores, and retrieval-query hints for vector stores or file search.
  • Topic clusters: knowledge_clusters.md/json groups cards into survey-style themes for browsing, gap analysis, and next-run planning.
  • Memory layers:
    • agent_memory_pack.uncompressed.md/json: long-term memory/RAG/review layer with evidence.
    • agent_memory_pack.md/json: bounded working memory.
    • agent_memory_pack.compact.md, ultra_compact.md, llm_compact.md: direct context-injection layers.
  • Human learning report: an English study guide with richer themes, source URLs, comparison tables, glossary, checklist, reading path, and practice questions.
  • Evaluation loop: baseline metrics, evidence/source-diversity checks, optional LLM reflection, and a deterministic next-run plan for tuning discovery, screening, reading, memory, and evaluation.
  • Recovery path: if a long run stops after ingestion, analysis and finalization can resume from saved artifacts.

Architecture

search_discovery
  -> query_plan.md/json, search_results.md/json, candidate_urls.txt

discovery_filter
  -> source_screening.json, selected_urls.txt

deep_reader
  -> *.knowledge.md/json, knowledge_index.md/json, retrieval_manifest.md/json,
     knowledge_chunks.jsonl, knowledge_clusters.md/json

memory_synthesizer
  -> uncompressed, working, compact, ultra-compact, and optional LLM compact memory

human_learning_writer
  -> frontier_learning_report.md

quality_evaluator
  -> evaluation_metrics.md/json, quality_reflection.md/json, next_run_plan.md/json

The specialist agents are role stages inside one orchestrated command. Users do not need to start separate background agents.

Generate an expanded search plan without network calls:

.venv/bin/akf query-plan \
  --year 2026 \
  --topic memory \
  --topic rag \
  --topic agent_hardening \
  --out data/query-plan

Recovery Commands

If a long run finishes ingestion but stops before analysis:

.venv/bin/akf analyze \
  --in-dir data/broad-run/02_ingested \
  --out data/broad-run/03_knowledge_base \
  --use-llm-extraction \
  --llm-extraction-concurrency 2

.venv/bin/akf finalize-run \
  --run-dir data/broad-run \
  --use-llm-agents \
  --report-language both \
  --memory-max-entries 80

knowledge_index.md/json keeps every extracted card by default. Use --max-index-entries only for small smoke tests or intentionally bounded review lists; compact context budgets are handled later by memory-pack outputs.

Main Outputs

data/<run>/
├── 00_discovery/
│   ├── query_plan.md/json
│   ├── search_results.md/json
│   ├── discovery_stats.json
│   └── candidate_urls.txt
├── 01_screening/
│   ├── source_screening.md/json
│   └── selected_urls.txt
├── 02_ingested/
│   ├── *.md
│   ├── *.json
│   └── run_stats.json
├── 03_knowledge_base/
│   ├── *.knowledge.md/json
│   ├── knowledge_index.md/json
│   ├── knowledge_index.rich.md
│   ├── frontier_brief.md/json
│   ├── retrieval_manifest.md/json
│   ├── knowledge_chunks.md/json/jsonl
│   └── knowledge_clusters.md/json
├── 04_memory_packs/
│   ├── agent_memory_pack.uncompressed.md/json
│   ├── agent_memory_pack.md/json
│   ├── agent_memory_pack.compact.md/json
│   ├── agent_memory_pack.ultra_compact.md/json
│   └── agent_memory_pack.llm_compact.md/json
├── 05_human_report/
│   ├── frontier_learning_report.md
│   ├── frontier_learning_report.en.md
│   └── frontier_learning_report.zh.md
├── 06_evaluation/
│   ├── evaluation_metrics.md/json
│   ├── quality_reflection.md/json
│   └── next_run_plan.md/json
└── team_run_trace.md/json

Evaluation

Compute baseline metrics:

.venv/bin/akf evaluate \
  --screening-report data/broad-run/01_screening/source_screening.json \
  --knowledge-index data/broad-run/03_knowledge_base/knowledge_index.json \
  --markdown-dir data/broad-run/04_memory_packs \
  --out data/broad-run/06_evaluation

Curated evaluation sets can be kept locally under data/ to calibrate screening and coverage, but they are ignored by default because they often contain local judgments, source snapshots, and paid LLM outputs.

Development

.venv/bin/ruff check src/agent_knowledge_harvester tests
.venv/bin/python -m pytest -q

Current test coverage includes ingestion timeout behavior, source screening, novelty checks, LLM JSON parsing, LLM knowledge-card extraction, query expansion, multi-agent runtime traces, memory-pack generation, retrieval manifests, human-report prompts, and quality reflection.

Project Status

This is a research/engineering prototype intended to demonstrate a production-oriented agent knowledge pipeline. It is suitable for experimentation, portfolio review, and further extension into a hosted RAG/MCP knowledge service.

The current open-source pipeline supports automatic search discovery, explicit URLs, optional local URL files, and GitHub Trending. The next natural milestones are stronger recall evaluation, richer connector coverage, and hosted retrieval/MCP serving.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages