A curated list of truly exceptional autonomous AI agent frameworks, tools, benchmarks, and papers.
Frameworks Β· OpenClaw Ecosystem Β· Coding Agents Β· Multi-Agent Β· Web & Browser Β· Chinese Contributions Β· Benchmarks Β· Papers
- The AI agent space exploded in 2023 and hasn't stopped.
- This list only includes projects that moved the field forward.
- Every entry was verified for active development status, genuine technical contribution, and community adoption.
- Chinese-origin projects are marked with π¨π³ because their impactful work deserves first-class visibility.
- ποΈ General-Purpose Frameworks
- πΎ The OpenClaw Ecosystem
- π» Coding Agents
- π₯ Multi-Agent Systems
- π Web & Browser Agents
- π₯οΈ Computer Use & GUI Agents
- π¬ Research & Science Agents
- π§ Memory & Knowledge Systems
- π§ Tool-Use & Function Calling
- π§© Planning & Reasoning Frameworks
- π¨π³ Chinese Contributions
- π Benchmarks & Evaluation
- π Protocols & Standards
- π Foundational Papers
- π Surveys & Curated Lists
- Contributing
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| AutoGPT | ~183k β | The original autonomous GPT-4 agent (April 2023), now evolved into a visual agent-builder platform with marketplace and cloud deployment. | Pioneered autonomous agents β defined the category; most-starred agent repo on all of GitHub. |
| LangGraph | ~27k β | Low-level graph-based orchestration for stateful agents with durable execution, human-in-the-loop, and streaming. | Most downloaded agent framework (~34.5M monthly PyPI downloads); Pregel/Apache Beamβinspired execution model. |
| AutoGen / AG2 | ~54k β | Microsoft's multi-agent conversation framework; AG2 (ag2ai/ag2) is the community fork continuing active development. | Best Paper at ICLR 2024 LLM Agents Workshop; defined the multi-agent conversation paradigm. Now merging into Microsoft Agent Framework with Semantic Kernel. |
| CrewAI | ~44k β | Standalone framework for orchestrating role-playing AI agents with "Crews" (collaborative teams) and "Flows" (event-driven pipelines). | 1.4B agentic automations claimed; 60% of Fortune 500 as users. Fully independent of LangChain. |
| Semantic Kernel | ~27k β | Microsoft's enterprise SDK for .NET, Python, and Java with planners, function-calling, and deep Azure integration. | Enterprise-first β merging with AutoGen into unified Microsoft Agent Framework (GA Q1 2026). |
| smolagents | ~25k β | HuggingFace's minimal agent library where agents write Python code as actions. ~1,000 lines of core logic. | Radical simplicity β code-as-action paradigm with Hub integrations for sharing tools. Sandboxed execution via E2B/Docker. |
| Agno | ~26k β | High-performance multi-modal agent runtime (formerly Phidata) with memory, knowledge graphs, and MCP support. | Claims 5,000Γ faster than LangGraph; levels 0β3 of progressive agent autonomy. |
| PydanticAI | ~15k β | Type-safe, code-first agent framework using Python type hints β the "FastAPI for agents." | Zero magic β built for senior Python teams wanting explicit control over every agent decision. |
| Open Interpreter | ~62k β | Natural-language interface that lets LLMs run code locally β open-source alternative to OpenAI Code Interpreter. | Full local access β no internet restrictions, file size limits, or time limits. Supports --os mode for computer control. |
| OpenAI Agents SDK | ~19k β | Lightweight, production-ready Python SDK (evolution from Swarm) with built-in web search, file search, and computer use tools. | Official OpenAI agent primitive β designed as the canonical way to build agents with OpenAI models. |
| Google ADK | ~17k β | Code-first agent toolkit optimized for Gemini with Agent-to-Agent (A2A) protocol support and MCP Toolbox for Databases. | A2A-native β first framework with built-in support for Google's agent interoperability protocol. Model-agnostic via LiteLLM. |
| Mastra | ~19k β | TypeScript-first agent framework from the Gatsby team with workflows, RAG, evals, and tool calling. 300k+ weekly npm downloads. | Fills the JS/TS gap β native OpenTelemetry, workflow engine, and first-class TypeScript support. |
| Strands Agents | Growing | AWS's model-agnostic agent framework with optional deep Bedrock integrations and first-class OpenTelemetry tracing. | AWS-native β production-ready with deep AWS service integrations while remaining model-agnostic. |
| DSPy | ~33k β | Stanford's framework for programming (not prompting) LLMs β compiles declarative calls into self-improving pipelines with optimizers. | Paradigm-shifting β replaces prompt engineering with compiled, optimizable programs. ICLR 2024 paper. MIPROv2 optimizer. |
| Haystack | ~23k β | Production-ready orchestration framework for NLP/AI pipelines with retrieval, generation, and agent components. | Mature production tooling β battle-tested since pre-LLM era; comprehensive pipeline abstraction for RAG and agents. |
OpenClaw is the fastest-growing open-source project in GitHub history, spawning 52+ derivatives across 9 programming languages as of March 2026. Community directory: shelldex.com.
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| OpenClaw | ~337k β | Messaging-native autonomous personal AI agent connecting through 20+ platforms (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, WeChat, etc.). 8 core agents, SOUL.md personality config, ClawHub skill marketplace (13,700+ skills). | Defined a new category β always-on, messaging-native autonomous agents. Beat React's 10-year star record in ~60 days. Created by Peter Steinberger; transferred to independent foundation after he joined OpenAI. |
| NemoClaw | ~16k β | NVIDIA's enterprise security wrapper for OpenClaw β adds OS-level sandboxing (Landlock + seccomp + network namespaces), default-deny networking, YAML policy engine, and privacy routing to local Nemotron models. | Enterprise-grade security layer β not a standalone agent but the missing security infrastructure for OpenClaw. Launched at GTC 2026. Hardware-agnostic despite NVIDIA origin. |
| Hermes Agent | ~14k β | NousResearch's self-improving personal AI agent with built-in RL training via Atropos, GEPA-based skill evolution, and cross-session memory. Multi-platform gateway (Telegram, Discord, Slack, WhatsApp, Signal). | Only agent with a closed-loop learning system β uses GEPA (ICLR 2026 Oral) + DSPy to optimize its own skills from execution traces. hermes claw migrate imports from OpenClaw. |
| Project | Stars | Language | Key Differentiator |
|---|---|---|---|
| Nanobot | ~33k β | Python | Ultra-lightweight (~4,000 lines vs OpenClaw's ~430,000); research-ready with minimal dependencies. |
| ZeroClaw | ~27k β | Rust | 99% less RAM than OpenClaw; 10ms cold boot; targets edge/IoT devices. |
| NanoClaw | ~26k β | TypeScript | Container-first security β every agent runs in an isolated Linux container by default. |
| PicoClaw | ~13k β | Go | Runs on $10 hardware with <10MB RAM; built for embedded and edge deployments. |
| OpenFang | Growing | Rust | "Agent Operating System" with 7 autonomous modules, 38 tools, and 40 messaging channels. |
| Moltis | ~2k β | Rust | Enterprise Rust β 150K lines with zero unsafe blocks; native Prometheus/Grafana observability. |
| Project | Description |
|---|---|
| awesome-openclaw-agents | 187 production-ready agent templates across 24 categories. |
| hermes-agent-self-evolution | GEPA-based self-improvement module for Hermes Agent. |
| claw0 | Educational 10-section tutorial for building an OpenClaw-compatible agent from scratch. |
| ClawWork | Agents that earn income from professional tasks and pay for their own token usage β earned $15K in 11 hours. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| OpenHands | ~69k β | Open-source AI software development platform β modifies code, runs terminals, browses the web, executes multi-step dev tasks in Docker sandboxes. | Most-starred coding agent on GitHub. 77.6% on SWE-bench Verified. Model-agnostic. $18.8M Series A (All Hands AI). |
| Aider | ~41k β | AI pair programming in your terminal β every AI edit auto-committed to git. Builds a map of your entire codebase for context. | Git-native workflow β terminal-first, BYOM, top SWE-bench scores. Aider wrote 21% of its own recent code. |
| GPT-Engineer | ~55k β | Specify what you want built and it generates the entire codebase. Asks clarifying questions, writes specs, then codes. | Precursor to Lovable.dev β one of the first "prompt-to-codebase" tools. Now mostly archived; commercial product spun out. |
| GPT-Pilot | ~33k β | AI developer with "95% AI / 5% human" paradigm β writes full features while keeping human-in-the-loop for review. Agent roles: Architect, Developer, etc. | Human-in-the-loop by design β YC W24 backed. Powers the Pythagora VS Code extension. |
| SWE-agent | ~19k β | Takes a GitHub issue and automatically fixes it. Introduced the Agent-Computer Interface (ACI) concept. Also handles cybersecurity CTFs via EnIGMA mode. | NeurIPS 2024 paper. SoTA on SWE-bench with Claude 3.7. Open-weights SWE-agent-LM-32b. Princeton + Stanford. |
| Devin | N/A | First widely-publicized fully autonomous AI software engineer β plans, codes, debugs, deploys with its own shell, editor, and browser. | Proprietary but category-defining. 13.86% on SWE-bench at launch (vs 1.96% prior SOTA). Cognition valued at $billions. Acquired Windsurf. |
| Cursor | N/A | AI-native code editor (VS Code fork) with Tab completion, Composer multi-file edits, and agent mode. | $1B ARR in <24 months, $29.3B valuation. 50%+ of Fortune 500. Proprietary but sets the standard for AI-assisted coding. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| MetaGPT π¨π³ | ~58k β | Simulates an entire software company β product managers, architects, and engineers collaborate via SOPs to turn one-line requirements into working code. | ICLR 2024 Oral (top 1.2%). Philosophy: "Code = SOP(Team)." Also produced AFlow (ICLR 2025 oral). DeepWisdom (Chinese company). |
| ChatDev π¨π³ | ~32k β | Chat-powered virtual software company. Now ChatDev 2.0: zero-code multi-agent orchestration platform for "Developing Everything." | NeurIPS 2025 paper on evolving orchestration. From Tsinghua/OpenBMB. Bilingual docs (CN/EN). |
| CAMEL π¨π³ | ~15k β | Pioneering role-playing framework for multi-agent cooperation β agents assume roles and collaborate via inception prompting. | First multi-agent framework (March 2023). NeurIPS 2023. Training data used in Microsoft Phi and OpenHermes. OWL subproject hit #1 on GAIA benchmark. |
| AgentVerse π¨π³ | ~5k β | Dual-framework design: task-solving mode (collaborative problem-solving) and simulation mode (observing emergent behaviors). | Unique simulation capabilities β observe and study emergent multi-agent behaviors. OpenBMB/Tsinghua. |
| Generative Agents | ~21k β | 25 AI agents inhabiting "Smallville" β they form relationships, gossip, coordinate events, all emerging from memory + reflection + planning. | Landmark UIST 2023 paper. Stanford (Park, Liang, Bernstein). Introduced "believable simulacra of human behavior." |
| Agency Swarm | Growing | Framework for creating collaborative AI agent swarms with role-based organization and inter-agent communication. | Swarm-native β agents self-organize with customizable communication topologies. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| Browser Use | ~55k β | Open-source framework enabling LLMs to control browsers for web automation. Claims 89% on WebVoyager (SOTA). | Y Combinator backed. Released open-source bu-30b model. MIT licensed. Cloud offering available. |
| GPT Researcher | ~26k β | Autonomous deep research agent β scrapes and synthesizes 20+ web sources to produce factual, cited research reports. | Original open-source deep research agent (May 2023, predating the "deep research" trend by ~2 years). Created by Tavily founder. |
| BrowserGym | Growing | Unified gym-like environment for web agent research β standardized observation/action spaces across WebArena, MiniWoB, WorkArena, etc. | TMLR 2025 paper. ServiceNow. De facto standard for web agent evaluation infrastructure. |
| MindSearch π¨π³ | ~5k β | LLM-based multi-agent web search engine (like Perplexity Pro). Supports DuckDuckGo, Bing, Brave, Google, Tencent search backends. | Open-source Perplexity alternative from Shanghai AI Lab. Bilingual (CN/EN) query support. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| Agent TARS / UI-TARS π¨π³ | ~10k β | ByteDance's multimodal AI agent stack β CLI and Web UI for browser automation + native GUI automation with remote computer/browser operators. | Pioneering GUI agent paper (arXiv:2501.12326). Ships as both CLI agent and desktop application. |
| AppAgent π¨π³ | ~5k β | Multimodal agent that operates smartphone apps through tap/swipe β two-phase approach: exploration (learning) then deployment (executing). | Mobile-first agent. Tencent. Tested on 50 tasks across 10 apps. Supports GPT-4V and Qwen-VL-Max. |
| Voyager | ~7k β | First LLM-powered embodied lifelong learning agent in Minecraft β continuously explores, acquires skills, and makes discoveries without human intervention. | NeurIPS 2023 paper. NVIDIA (Jim Fan). 3.3Γ more unique items, 15.3Γ faster tech tree milestones vs prior SOTA. Code as action space. |
| OSWorld | Research | First scalable real computer environment benchmark for multimodal agents β 369 tasks across Ubuntu, Windows, macOS. | NeurIPS 2024. Best agents achieve ~30% vs much higher human baselines. The hardest computer-use benchmark. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| BabyAGI | ~20k β | Pioneering AI task management system (140 lines of code) β creates, prioritizes, and executes tasks in a loop. Now on BabyAGI 3 with scheduling. | Defined task-driven agents (March 2023). Created by VC Yohei Nakajima. One of the first viral autonomous agent concepts. |
| Tongyi DeepResearch π¨π³ | Growing | Leading open-source deep research agent using on-policy RL with Group Relative Policy Optimization. | End-to-end RL approach to deep research β not prompt-engineered. Alibaba NLP. Compatible with ReAct and IterResearch paradigms. |
| OpenAgents | ~5k β | Three specialized agents: Data Agent, Plugins Agent (200+ tools), Web Agent β open-source ChatGPT Plus alternative. | COLM 2024 paper. XLANG Lab (HKU). First attempt at replicating full ChatGPT Plus as open source. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| Letta (MemGPT) | ~22k β | Platform for stateful agents with self-editing memory inspired by OS virtual memory β two-tier architecture: core memory + archival memory. | Foundational memory paper ("MemGPT: Towards LLMs as Operating Systems"). UC Berkeley. $10M seed (Felicis). Transitioning to Letta V1 for GPT-5/Claude 4.5. |
| Mem0 | Growing | Memory layer for AI agents with personalization β short-term + long-term memory via conversation history and user preferences. | Plug-and-play memory β flexible embedding and search configs. Works with any agent framework. |
| Zep | Growing | Session-based memory with automatic summarization, graph-based knowledge extraction, and fact extraction. | Graph-based knowledge memory β maintains conversation transcripts as memory blocks with automated summarization. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| Gorilla | ~12k β | LLM fine-tuned for API/function calling. Hosts the Berkeley Function Calling Leaderboard (BFCL V4 Agentic). GoEx runtime for safe execution. | BFCL is the industry standard for evaluating function-calling in LLMs. Apache 2.0. UC Berkeley. |
| ToolBench / ToolLLM π¨π³ | Growing | Training framework for LLMs to master 16,000+ real-world APIs with DFSDT (Depth-First Search Decision Tree) reasoning. | ICLR 2024. Introduced backtracking in tool-use reasoning. Three-stage pipeline: API collection β instruction gen β solution annotation. |
| Composio | Growing | Integration platform: 90+ tool connections for AI agents with managed authentication and hierarchical task execution. | Managed auth + 90 integrations β solves the "connecting agents to real tools" problem out of the box. |
| AgentLego π¨π³ | ~500 β | Versatile tool API library extending agents with multimodal capabilities β vision, image gen/edit, speech, VL reasoning. | Multimodal tool library from Shanghai AI Lab. Integrates with LangChain, Transformers Agents, and Lagent. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| DeerFlow 2.0 π¨π³ | ~45k β | ByteDance's "SuperAgent harness" for research, coding, and creative tasks. v2.0 is a complete rewrite with sandboxed execution, persistent memory, sub-agent orchestration. | Hit #1 GitHub Trending and reached 45k stars in weeks. Built on LangGraph. MIT license. Supports Doubao, DeepSeek, Kimi, GPT-4o, Claude. |
| XAgent π¨π³ | ~8k β | Autonomous agent with three-component architecture: Dispatcher (routing), Planner (milestones), Actor (execution). Proactive human collaboration via AskForHumanHelp tool. | Dispatcher-Planner-Actor paradigm. OpenBMB/Tsinghua. Docker-sandboxed. ToolServer with file editor, Python notebooks, shell, web browser. |
| TaskWeaver | ~6k β | Microsoft's code-first agent for data analytics β converts natural language to executable Python with two-layer planning (Planner β Code Generator β Executor). | Data analytics specialist β unique focus on rich data structures (pandas DataFrames). Stateful conversations. |
| SuperAGI | ~17k β | Full-featured agent platform with toolkit marketplace, GUI dashboard, Docker deployment, and multi-agent workflows. | Early marketplace approach for agent toolkits and templates. Dev has slowed significantly since mid-2024. |
| AgentGPT | ~36k β | First browser-based autonomous agent UI β name a custom AI and set it on any goal. YC-backed. | Pioneered browser-based agent UX. |
China has produced some of the most impactful agent research and production frameworks. This section highlights projects not already featured above.
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| AgentScope | ~20k β | Production-ready multi-agent framework with ReAct agents, MCP/A2A support, real-time voice, memory management, and evaluation. | Broadest Chinese agent ecosystem β AgentScope-Java (2.2k β ), ReMe memory kit, CoPaw assistant, Trinity-RFT for agentic RL. Biweekly community meetings. |
| CoPaw | ~13k β | Personal AI assistant workstation β multi-channel (DingTalk, Feishu, QQ, Discord, iMessage), local LLM support via llama.cpp/MLX, cron scheduling. | Chinese-platform native β first-class support for DingTalk, Feishu, QQ, WeCom. Built on AgentScope. |
| Qwen-Agent | ~13k β | Official agent framework for the Qwen model family β function calling, MCP, Code Interpreter, RAG (1M+ token context), Chrome extension. | Qwen ecosystem's agent layer. Supports Qwen3, Qwen3-Coder, QwQ. Includes DeepPlanning evaluation benchmark. |
| ModelScope-Agent | ~4k β | Lightweight framework with MCP support, deep research (55.43 on DeepResearch Bench), code generation, and AgentFabric for custom agent creation. | Alibaba's model-hub-native agent with Anthropic Agent Skills protocol support. |
| Qwen Code | Growing | Terminal AI agent for coding (CLI), forked from Gemini CLI and optimized for Qwen models. | Qwen's answer to Claude Code β terminal-first coding agent with Qwen optimization. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| Trae Agent | ~2k β | LLM-based agent for general software engineering β features "Lakeview" for concise step summarization, trajectory recording, YAML config. | Research-friendly design with trajectory recording for academic analysis. Multi-LLM support including Doubao. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| Lagent | ~2k β | Lightweight, PyTorch-inspired agent framework with imperative Pythonic style, async/sync dual interface, ReAct/ReWOO agents. | PyTorch-like API for agent building. InternLM ecosystem integration. HTTP deployment for distributed multi-agent apps. |
| HuixiangDou | Growing | Domain-specific QA agent β technical group chat assistant for answering questions in specialized domains. | Domain QA specialist from Shanghai AI Lab's InternLM team. |
| Project | Stars | Description | Key Differentiator |
|---|---|---|---|
| C3-Benchmark | Recent | Comprehensive agent evaluation covering all action spaces β bilingual (CN/EN) data generation, controllable task generation. | Most analysis dimensions among agent eval frameworks. Tencent Hunyuan. Multi-agent data generation. |
| Project | Description |
|---|---|
| Hello-Agents | Chinese-language educational tutorial for building AI agents from scratch. Datawhale community. |
| awesome-hermes-agent | Curated resource list for the Hermes Agent ecosystem. |
| Benchmark | GitHub / Link | Paper | Key Differentiator |
|---|---|---|---|
| SWE-bench | arXiv:2310.06770 | ICLR 2024 Oral. 2,294 real GitHub issue-PR pairs. Gold standard for coding agents. Variants: Lite, Verified (500 human-validated), Multimodal. | The benchmark every coding agent cites. |
| AgentBench π¨π³ | arXiv:2308.03688 | ICLR 2024. First comprehensive multi-domain agent benchmark β 8 environments (OS, DB, KG, web, games). | Revealed massive gap between commercial and open-source LLMs on agent tasks. |
| Ο-bench | arXiv:2406.12045 | Emulates user-agent-tool conversations with domain-specific APIs and policy guidelines. pass^k reliability metric. | Even GPT-4o succeeds on <50% of tasks. Sierra AI (Shunyu Yao). Expanded to ΟΒ²-bench and ΟΒ³-bench. |
| WebArena | arXiv:2307.13854 | Realistic web environment with functional website replicas. 812 tasks. | Spawned VisualWebArena, WorkArena, TheAgentCompany. BrowserGym integration. |
| GAIA | Meta AI | 466 real-world questions requiring reasoning, multimodality, web browsing, tool use. Humans: 92%, GPT-4: 15%. | Conceptually simple for humans yet devastatingly hard for AI. |
| ToolBench π¨π³ | arXiv:2307.16789 | ICLR 2024. 16,000+ real-world APIs from RapidAPI with DFSDT reasoning. | Introduced backtracking reasoning for tool use. ToolLLaMA model. |
| OSWorld | arXiv:2404.07972 | NeurIPS 2024. Real computer environment (Ubuntu/Windows/macOS). 369 tasks. Best agents ~30%. | Most challenging computer-use benchmark. Requires actual VM execution. |
| MLAgentBench | Stanford | Evaluates agents on ML research tasks: running experiments, analyzing results, writing code. | Tests the "AI researcher" use case end-to-end. |
| Mind2Web | NeurIPS 2023 | 2,000+ crowd-sourced web tasks across 137 real websites. | Large-scale real-website benchmark. |
| BFCL | UC Berkeley / Gorilla | Berkeley Function Calling Leaderboard β V4 Agentic evaluates tool-calling in real-world agentic settings. | Industry standard for function-calling evaluation. |
| ScienceAgentBench | ICLR 2025 | 102 tasks for data-driven scientific discovery. | First rigorous benchmark for science agents. |
| Protocol | Origin | Description |
|---|---|---|
| MCP (Model Context Protocol) | Anthropic (Nov 2024) β Linux Foundation (Dec 2025) | Open standard for connecting AI models to tools and data sources. 97M+ monthly SDK downloads. Supported by ChatGPT, Claude, Gemini, Cursor, VS Code. |
| A2A (Agent-to-Agent Protocol) | Google β Linux Foundation | Protocol for agent discovery, capability advertisement, and collaboration across frameworks. 150+ supporting organizations. |
These papers introduced paradigms now embedded in virtually every agent framework.
| Paper | Year | Venue | Key Contribution |
|---|---|---|---|
| ReAct: Synergizing Reasoning and Acting | 2023 | ICLR 2023 | The agent paradigm. Interleaved reasoning traces and actions β the foundation of modern LLM agents. Shunyu Yao et al. (Princeton). |
| Chain-of-Thought Prompting | 2022 | NeurIPS 2022 | Step-by-step reasoning. The foundational technique underlying all agent reasoning. Jason Wei et al. (Google). |
| Reflexion: Verbal Reinforcement Learning | 2023 | NeurIPS 2023 | Self-correction without weight updates. Agents learn from verbal self-reflection. 91% pass@1 on HumanEval. Noah Shinn et al. (Princeton). |
| Tree of Thoughts | 2023 | NeurIPS 2023 | Search-based reasoning. Extends CoT to explore multiple reasoning paths with BFS/DFS. Shunyu Yao et al. (Princeton). |
| Toolformer | 2023 | NeurIPS 2023 | Self-supervised tool learning. LLMs learn when/how to call tools. 6B model outperforms GPT-3 175B on math. Meta AI. |
| HuggingGPT | 2023 | NeurIPS 2023 | LLM as model orchestrator. ChatGPT plans and dispatches tasks to expert models from HuggingFace Hub. |
| LATS: Language Agent Tree Search | 2024 | ICML 2024 | Unifies reasoning + acting + planning via MCTS. 92.7% pass@1 on HumanEval with GPT-4. Andy Zhou et al. (UIUC). |
| GEPA: Reflective Prompt Evolution | 2026 | ICLR 2026 Oral | Self-improving agents. Outperforms GRPO by 6% with 35Γ fewer rollouts. Powers Hermes Agent's self-evolution. |
| AutoGen | 2024 | ICLR 2024 Workshop (Best Paper) | Multi-agent conversation paradigm. Defined how agents collaborate through structured dialogue. Microsoft. |
| MetaGPT | 2024 | ICLR 2024 Oral | SOP-driven multi-agent collaboration. "Code = SOP(Team)" β encodes human workflows as agent prompts. |
| CAMEL | 2023 | NeurIPS 2023 | Role-playing agents. First framework for multi-agent cooperation via inception prompting. |
| Generative Agents | 2023 | UIST 2023 | Believable simulacra of human behavior. Memory streams + reflection + planning = emergent social behaviors. Stanford. |
| OpenHands | 2024 | β | AI software developer as generalist agent. Defines the platform architecture for coding agents. |
| SWE-agent (ACI) | 2024 | NeurIPS 2024 | Agent-Computer Interfaces. How interface design changes agent performance. Princeton/Stanford. |
| Voyager | 2023 | NeurIPS 2023 | Embodied lifelong learning. Auto-curriculum + skill library + self-verification in Minecraft. NVIDIA. |
| MemGPT | 2023 | β | LLMs as operating systems. Self-editing memory with virtual context management. UC Berkeley. |
| A Survey on LLM-based Autonomous Agents | 2023 | Updated Mar 2025 | Most-cited agent survey. Unified framework: Profile + Memory + Planning + Action modules. |
| Resource | Description |
|---|---|
| awesome-ai-agents | 1,500+ resources on AI agents β the largest general-purpose collection. |
| awesome-ai-agents | 300+ agentic resources with star counts and categorization. |
| AI Agent Benchmark Compendium | 50+ benchmarks categorized and compared. |
| shelldex.com | Community directory tracking 52+ OpenClaw derivatives across 9 languages. Weekly growth rankings. |
| Agentic AI Survey | Dual-paradigm framework (Symbolic vs Neural); PRISMA-based review of 90 studies. |
| LLM Agent Methodology Survey | March 2025 methodology-centered taxonomy of agent architectures. |
Contributions welcome! Please read the guidelines below before submitting a PR.
- Open-source frameworks with a real GitHub repo, >500 stars or a published paper
- Academic papers that introduced a technique now widely used in agent systems
- Benchmarks that are actively used by the research community
- Production deployments that are open source and demonstrate novel architecture
- Thin API wrappers with no novel architecture
- Closed-source products without open-source components (exceptions: category-defining products like Devin and Cursor)
- Projects with no activity for 12+ months and <1,000 stars
- Tutorials, courses, or blog posts (except as supplementary links)
- Fork this repository
- Add your entry in the appropriate category with:
[Project Name](URL) | stars | One-line description | Key differentiator - Verify the GitHub URL is live and the project is active
- Submit a PR with a brief explanation of why this project is notable
β Star this repo if you found it useful!
Last updated: March 2026 Β· Maintained with care Β· CC0 1.0 Universal