This repository curates the latest research papers on the applications and architectural technologies of AI agents. We perform weekly Arxiv searches using specific keywords and pick only those that are particularly interesting. Rather than striving for comprehensiveness, we add papers when they introduce a distinctively new approach or novel concept that stands out from existing methods.
An AI Agent is an autonomous system powered by large language models that can perceive its environment, reason through complex tasks, and use tools to take actions in pursuit of specific goals. It combines reasoning, planning, memory, and tool-use capabilities to operate independently or as part of a multi-agent system.
AI Agent Workflows🔥: Recommended papers
📖: Survey papers
⚖️: Benchmark papers
- Agent Capabilities
- AI Agents Architecture
- AI Agents Applications
- GenAI Agents Presentations
- "ReFlect: An Effective Harness System for Complex Long-Horizon LLM Reasoning" [paper]
- "PriorZero: Bridging Language Priors and World Models for Decision Making" [paper]
- "AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents" [paper]
- "Harnessing Agentic Evolution" [paper]
- ⚖️ "Auditing Agent Harness Safety" [paper]
- 🔥 "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search" [paper]
- "Harnessing LLM Agents with Skill Programs" [paper]
- "Code as Agent Harness" [paper]
- "A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents" [paper]
- "Harnesses for Inference-Time Alignment over Execution Trajectories" [paper]
- "Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents" [paper]
- "Polar: Agentic RL on Any Harness at Scale" [paper]
- "Meta-Engineering Harnesses for AI-Native Software Production" [paper]
- "From Model Scaling to System Scaling: Scaling the Harness in Agentic AI" [paper]
- "Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems" [paper]
- "SIA: Self Improving AI with Harness & Weight Updates" [paper]
- 📖 "Agent Harness Engineering: A Survey" [paper]
- "Interactive Evaluation Requires a Design Science"
- "HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools" [paper]
- ⚖️ "Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows" [paper]
- "HEAVYSKILL: Heavy Thinking as the Inner Skill in Agentic Harness" [paper]
- "SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills" [paper]
- "SkillOS: Learning Skill Curation for Self-Evolving Agents" [paper]
- 📖 "A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications" [paper]
- "Counterfactual Trace Auditing of LLM Agent Skills" [paper]
- "SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration" [paper]
- "SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution" [paper]
- ⚖️ "SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents" [paper]
- "SkillOpt: Executive Strategy for Self-Evolving Agent Skills" [paper]
- "MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation" [paper]
- "Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems" [paper]
- "Toward User Comprehension Supports for LLM Agent Skill Specifications" [paper]
- "You Live More Than Once: Towards Hierarchical Skill Meta-Evolving" [paper]
- "SkillGrad: Optimizing Agent Skills Like Gradient Descent" [paper]
- "CODESKILL: Learning Self-Evolving Skills for Coding Agents" [paper]
- 📖 "Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning" [paper]
- 📖 "A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications" [paper]
- 📖 "Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems" [paper]
- 📖 "Planning in the LLM Era: Building for Reliability and Efficiency" [paper]
- 📖 "Agent Harness Engineering: A Survey" [paper]
- "CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery" [paper]
- "EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification" [paper]
- "SkillX: Automatically Constructing Skill Knowledge Bases for Agents" [paper]
- "SkillClaw: Let Skills Evolve Collectively with Agentic Evolver" [paper]
- ⚖️ "SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment" [paper]
- "Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks" [paper]
- ⚖️ "Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization" [paper]
- "EVOSPARK: Endogenous Interactive Agent Societies for Unified Long-Horizon Narrative Evolution" [paper]
- "Discovering Novel LLM Experts via Task-Capability Convolution" [paper]
- "PolicyBank: Evolving Policy Understanding for LLM Agents" [paper]
- "BILEVEL OPTIMIZATION OF AGENT SKILLS VIA MONTE CARLO TREE SEARCH" [paper]
- ⚖️ "HORIZONBENCH: Long-Horizon Personalization with Evolving Preferences" [paper]
- "Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration" [paper]
- "Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence" [paper]
- "A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression" [paper]
- "Prism: An Evolutionary Memory Substrate for Multi-Agent Open-Ended Discovery" [paper]
- "EVOAGENT: AN EVOLVABLE AGENT FRAMEWORK WITH SKILL LEARNING AND MULTI-AGENT DELEGATION" [paper]
- "Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks" [paper]
- "AEL: Agent Evolving Learning for Open-Ended Environments" [paper]
- "SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization" [paper]
- "EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification" [paper]
- "How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings" [paper]
- "SkillX: Automatically Constructing Skill Knowledge Bases for Agents" [paper]
- "SkillClaw: Let Skills Evolve Collectively with Agentic Evolver" [paper]
- "Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub" [paper]
- "BILEVEL OPTIMIZATION OF AGENT SKILLS VIA MONTE CARLO TREE SEARCH" [paper]
- "Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents" [paper]
- ⚖️ "SKILLFLOW: Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents" [paper]
- ⚖️ "SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks" [paper]
- "EVOAGENT: AN EVOLVABLE AGENT FRAMEWORK WITH SKILL LEARNING AND MULTI-AGENT DELEGATION" [paper]
- "Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks" [paper]
- "From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills" [paper]
- "Skill Retrieval Augmentation for Agentic AI" [paper]
- "Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction" [paper]
- "DeepXiv-SDK: An Agentic Data Interface for Scientific Literature" [paper]
- "EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery" [paper]
- "HLER: Human-in-the-Loop Economic Research via Multi-Agent Pipelines for Empirical Discovery" [paper]
- "Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 LLM-Guided ML Experiments" [paper]
- "AwesomeLit: Towards Hypothesis Generation with Agent-Supported Literature Research" [paper]
- "VILLA: Versatile Information Retrieval From Scientific Literature Using Large LAnguage Models" [paper]
- "Can AI Scientist Agents Learn from Lab-in-the-Loop Feedback? Evidence from Iterative Perturbation Discovery" [paper]
- ⚖️ "SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents" [paper]
- "FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration" [paper]
- "A Visionary Look at Vibe Researching" [paper]
- "ViviDoc: Generating Interactive Documents through Human-Agent Collaboration" [paper]
- "An Empirical Study of Interaction Smells in Multi-Turn Human-LLM Collaborative Code Generation" [paper]
- "InterDeepResearch: Enabling Human-Agent Collaborative Information Seeking through Interactive Deep Research" [paper]
- "Invisible Failures in Human–AI Interactions" [paper]
- "Argumentative Human-AI Decision-Making: Toward AI Agents That Reason With Us, Not For Us" [paper]
- "Agentic AI and the next intelligence explosion" [paper]
- "Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents" [paper]
- "Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities" [paper]
- "Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned" [paper]
- "AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents" [paper]
- "OpenClaw-RL: Train Any Agent Simply by Talking" [paper]
- "MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild" [paper]
- 📖 "The Evolution of Tool Use in LLM Agents: From Single-Tool Call to Multi-Tool Orchestration" [paper]
- "EvoSkill: Automated Skill Discovery for Multi-Agent Systems" [paper]
- "Evolving Deception: When Agents Evolve, Deception Wins" [paper]
- "EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery" [paper]
- "AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents" [paper]
- "Trajectory-Informed Memory Generation for Self-Improving Agent Systems" [paper]
- "MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild" [paper]
- "AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse" [paper]
- "HyperAgents" [paper]
- "Self-Evolving Multi-Agent Framework for Efficient Decision Making in Real-Time Strategy Scenarios" [paper]
- "Experiential Reflective Learning for Self-Improving LLM Agents" [paper]
- "ASI-Evolve: AI Accelerates AI" [paper]
- Deep research Trends Report (JA)
- Ideation Trends Report (JA)
- Prediction Trends Report (JA)
- Skills Trends Report (JA)
- "From Human-Human Collaboration to Human-Agent Collaboration: A Vision, Design Philosophy, and an Empirical Framework for Achieving Successful Partnerships Between Humans and LLM Agents" [paper]
- "Intelligent AI Delegation" [paper]
- "The Next Paradigm Is User-Centric Agent, Not Platform-Centric Service" [paper]
- "SKILLRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning" [paper]
- "Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality" [paper]
- 📖 "Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward" [paper]
- ⚖️ "SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks" [paper]
- 📖 "SoK: Agentic Skills — Beyond Tool Use in LLM Agents" [paper]
- "Self-Consolidation for Self-Evolving Agents" [paper]
- "Live-Evo: Online Evolution of Agentic Memory from Continuous Feedback" [paper]
- "MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents" [paper]
- ⚖️ "AGENTRX: Diagnosing AI Agent Failures from Execution Trajectories" [paper]
- "Empirical-MCTS: Continuous Agent Evolution via Dual-Experience Monte Carlo Tree Search" [paper]
- "AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection" [paper]
- "AORCHESTRA: Automating Sub-Agent Creation for Agentic Orchestration" [paper]
- "Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs" [paper]
- ⚖️ "VeRO: An Evaluation Harness for Agents to Optimize Agents" [paper]
- ⚖️ "FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights" [paper]
- "DeltaEvolve: Accelerating Scientific Discovery through Momentum-Driven Evolution" [paper]
- "Accelerating Scientific Research with Gemini: Case Studies and Common Techniques" [paper]
- 📖 "Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science" [paper]
- ⚖️ "AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents" [paper]
- "IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery" [paper]
- "DeepInnovator: Triggering the Innovative Capabilities of LLMs" [paper]
- "Grounding LLMs in Scientific Discovery via Embodied Actions" [paper]
- 📖 "Agentic Reasoning for Large Language Models" [paper]
- 📖 "Toward Efficient Agents: Memory, Tool learning, and Planning" [paper]
- "JENIUS AGENT: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios" [paper]
- "EvoRoute: Experience-Driven Self-Routing LLM Agent Systems" [paper]
- "MEMRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory" [paper]
- "PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution" [paper]
- "Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning" [paper]
- "WISE-Flow: Workflow-Induced Structured Experience for Self-Evolving Conversational Service Agents" [paper]
- "To Retrieve or To Think? An Agentic Approach for Context Evolution" [paper]
- "Controlled Self-Evolution for Algorithmic Code Optimization" [paper]
- "Learn Like Humans: Use Meta-cognitive Reflection for Efficient Self-Improvement" [paper]
- 📖 "From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms" [paper]
- "Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification" [paper]
- "Optimizing Agentic Workflows using Meta-tools" [paper]
- "Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks" [paper]
- "Large Language Model Agents Are Not Always Faithful Self-Evolvers" [paper]
- "Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents" [paper]
- "SimpleMem: Efficient Lifelong Memory for LLM Agents" [paper]
- "MEMRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory" [paper]
- "Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning" [paper]
- "Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction" [paper]
- "Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems" [paper]
- "MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents" [paper]
- "PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution" [paper]
- "The AI Hippocampus: How Far are We From Human Memory?" [paper]
- "MemoBrain: Executive Memory as an Agentic Brain for Reasoning" [paper]
- "AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation" [paper]
- "Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management" [paper]
- "Structured Episodic Event Memory" [paper]
- "Active Context Compression: Autonomous Memory Management in LLM Agents"[paper]
- 📖 "From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms" [paper]
- "AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement" [paper]
- "Progressive Ideation using an Agentic AI Framework for Human-AI Co-Creation" [paper]
- "OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment" [paper]
- "Sci-Reasoning: A Dataset Decoding AI Innovation Patterns" [paper]
- "SuS: Strategy-aware Surprise for Intrinsic Exploration" [paper]
- "Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments" [paper]
- "LLM Review: Enhancing Creative Writing via Blind Peer Review Feedback" [paper]
- "Agentic AI and Machine Learning for Accelerated Materials Discovery and Applications" [paper]
- "Who Owns Creativity and Who Does the Work? Trade-offs in LLM-Supported Research Ideation" [paper]
- "Improved Bug Localization with AI Agents Leveraging Hypothesis and Dynamic Cognition" [paper]
- "Rethinking the AI Scientist: Interactive Multi-Agent Workflows for Scientific Discovery" [paper]
- "Learning to Discover at Test Time" [paper]
- "Insight Agents: An LLM-Based Multi-Agent System for Data Insights" [paper]
- "Probing the Future of Meta-Analysis: Eliciting Design Principles via an Agentic Research IDE" [paper]
- "Generating Literature-Driven Scientific Theories at Scale" [paper]
- "Improved Bug Localization with AI Agents Leveraging Hypothesis and Dynamic Cognition" [paper]
- "LLM-in-Sandbox Elicits General Agentic Intelligence" [paper]
- "SERA: Soft-Verified Efficient Repository Agents" [paper]
- "Who Writes the Docs in SE 3.0? Agent vs. Human Documentation Pull Requests" [paper]
- "How do Agents Refactor: An Empirical Study" [paper]
- "Beyond Bug Fixes: An Empirical Investigation of Post-Merge Code Quality Issues in Agent-Generated Pull Requests" [paper]
- "Are We All Using Agents the Same Way? An Empirical Study of Core and Peripheral Developers’ Use of Coding Agents" [paper]
- "Strategic Self-Improvement for Competitive Agents in AI Labour Markets" [paper]
- "Guided Self-Evolving LLMs with Minimal Human Supervision" [paper]
- "Evolving Excellence: Automated Optimization of LLM-based Agents" [paper]
- "Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution" [paper]
- "Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM" [paper]
- "SCOPE: Prompt Evolution for Enhancing Agent Effectiveness" [paper]
- "Reinforcement Learning for Self-Improving Agent with Skill Library" [paper]
- "MemEvolve: Meta-Evolution of Agent Memory Systems" [paper]
- 📖 "Memory in the Age of AI Agents: A Survey Forms, Functions and Dynamics" [paper]
- 📖 "Adaptation of Agentic AI: A Survey of Post-Training, Memory, and Skills" [paper]
- 📖 "Deep Research: A Systematic Survey" [paper]
- 🔥 "Measuring Agents in Production" [paper]
- 🔥 "Towards a Science of Scaling Agent Systems" [paper]
- ⚖️ "Evaluating Large Language Models in Scientific Discovery" [paper]
- 🔥 "How Far Are We from Genuinely Useful Deep Research Agents?" [paper]
- "Can Agentic AI Match the Performance of Human Data Scientists?" [paper]
04/25 ~ 12/25 [link]
