High-performance skill aggregation, classification & routing platform for AI agents.
Aggregate 100+ skill repositories → Deduplicate 8000+ workflows → Classify semantically → Route to AI agents
Features • Quick Start • Documentation • FreeLLMAPI Integration • Troubleshooting
- Overview
- Features
- Prerequisites
- Quick Start
- FreeLLMAPI Integration
- Commands Reference
- Classification Architecture
- Project Structure
- Environment Variables
- Supported AI Agents
- Troubleshooting
- Contributing
- License
skills-bank is a unified skill aggregation platform designed to solve the "skill discovery bottleneck" for AI agents. Instead of agents searching 100+ distributed GitHub repositories for workflows, skills-bank:
- Aggregates 8000+
SKILL.mdfiles from distributed repositories. - Deduplicates by name and description, catching cross-repo clones.
- Classifies skills into 12 domain hubs and 40+ sub-hubs using a hybrid keyword + LLM approach.
- Routes skills to 12+ AI agent platforms (Claude, Copilot, Cursor, Hermes, etc.).
- Maintains a canonical cache in
lib/with automatic updates via file watcher.
Result: Agents access 3,600+ curated, deduplicated skills through a single unified interface.
- Source-of-Truth Loading: Agents load canonical
SKILL.mdfiles directly from source repositories, not from catalogs (eliminates hallucination, optimizes tokens). - Hybrid Classification: Combines fast keyword rules (Step A) with optional LLM semantic classification (Step B).
- Smart Deduplication: Two-key strategy catches both exact collisions and renamed cross-repo clones.
- Multi-Tool Support: Syncs to GitHub Copilot, Claude, Cursor, Hermes, Windsurf, Antigravity, and more.
- Token Efficiency: Load minimal metadata first, then source files on-demand.
- Parallel Cloning: Shallow clone optimization (80% faster, lower disk footprint).
- Smart Deduplication: Normalization and Levenshtein-based matching of name OR description catches cross-repo clones.
- Incremental Updates: Checks for remote updates first via
git fetch --depth 1and compares HEAD with FETCH_HEAD hashes before executing resets, cleans, or pulls, avoiding disk and network overhead on already up-to-date repositories. - Legacy Migration: Automatic legacy repository directory cleanup.
- Hybrid Pipeline: Pre-filter keyword rules (90% confidence) + LLM semantic classification (60-95% confidence).
- Domain Hubs (12):
code-quality,server-side,frontend,business, and others. - Sub-Hubs (40+):
security,testing-qa,performance,ux-design, etc. - Conflict Resolution: Special conflict table ensures domain specialists (e.g., security, testing-qa) always win over generic programming language hubs.
- Confidence Scoring: Path-based (98%), keyword-based (90%), LLM-based (60-95%).
- Multi-Target Sync: Automatic distribution to Claude, GitHub Copilot, Hermes, Cursor, Windsurf, Antigravity, and more.
- Atomic updates: Sync is all-or-nothing; failures don't corrupt state.
- Junction management: Safe symlink and junction manipulation on Windows, macOS, and Linux.
- Rust 1.70+ (Install)
- Git (for repository cloning)
- ~2GB disk space (for aggregated skills cache)
- Node.js 20+ (optional, only if running FreeLLMAPI locally)
🚄 FreeLLMAPI Integration
When classifying 8000+ skills, your LLM provider will hit rate limits. Individual providers (Groq, OpenAI, Anthropic) each have free-tier limits (e.g., Groq: ~120 requests/min), which can cause cascading failures during aggregation.
FreeLLMAPI is a unified proxy that aggregates 12+ free LLM providers behind a single OpenAI-compatible endpoint. It automatically falls back when a provider rate-limits, eliminating rate-limit errors during aggregation.
skills-bank (classification)
│
▼
FreeLLMAPI (unified proxy)
│ (automatic fallback/rotation)
├─ Google Gemini (1,500 RPM)
├─ Groq (120 RPM)
├─ Cerebras (200 RPM)
├─ SambaNova (100 RPM)
├─ Mistral (100 RPM)
├─ OpenRouter (20 RPM)
├─ GitHub Models (100 RPM)
├─ Cloudflare (10,000 req/day)
├─ Cohere (100 RPM)
└─ (Others: Z.ai, NVIDIA, HuggingFace)
Total capacity: ~1.3 billion tokens/month (free tier combined)
Setup FreeLLMAPI
The entire workspace is consolidated under a single source of truth configuration file at freellmapi/.env. native Unix symlinks pointing back to this centralized environment file.
To configure the proxy and automatically seed your provider keys:
- Consolidated environment config is managed centrally at
freellmapi/.env. - A database seeding script automatically reads upstream provider API keys, encrypts them using FreeLLMAPI's native AES-256-GCM, and seeds them into the SQLite database.
- The unified API key is synchronized automatically to
freellmapi.
- Start FreeLLMAPI:
cd freellmapi && npm run dev
- Run
skills-bankaggregation:./target/release/skills-bank aggregate
cd skills-bank/ cargo build --release
### 2. Run Aggregation
```bash
# First-time interactive setup
./target/release/skills-bank setup
# Execute the aggregation pipeline
./target/release/skills-bank aggregate
./target/release/skills-bank sync| Command | Purpose | When to Use |
|---|---|---|
setup |
Interactive configuration wizard | Initial setup or configuration change |
aggregate |
Collect, deduplicate, classify, and route skills | First run or when repositories change |
sync |
Distribute aggregated skills to configured AI tool directories | After aggregation completes |
run |
Execute the full pipeline (aggregate → sync) in sequence |
Automated daily updates |
watch |
Auto-trigger aggregation on config changes (file watcher) | Local development / hot-reload |
add-repo <URL> |
Add a new skill repository | Onboarding a new source repository |
doctor |
Validate installation and report repository state | Troubleshooting or pre-cleanup inspection |
release-gate |
Validate aggregation output integrity | Before releases or production sync |
cleanup-legacy-duplicates |
Remove legacy repository folders | Migration from older versions |
8000+ SKILL.md files
│
▼
┌──────────────┐
│ YAML Parse │ Extract name, description, triggers
└──────┬───────┘
│
▼
┌──────────────┐
│ Keyword │ Fast token-based routing to hub/sub-hub
│ Rules │ (fallback if LLM unavailable)
└──────┬───────┘
│
▼
┌──────────────┐
│ Dedup Check │ Name OR Description Levenshtein HashSet
└──────┬───────┘
│
▼
┌──────────────────────────────────────┐
│ Hybrid Exclusion + LLM Classify │
│ Step A: Keyword pre-filter │
│ Step B: LLM semantic classify │
│ (supports JSON object parser)│
└──────┬───────────────────────────────┘
│
▼
┌──────────────┐
│ Output │ routing.csv, per-hub manifests,
│ Artifacts │ subhub-index.json
└──────────────┘
- Repository Name Substring Matching:
Repository directory names are analyzed using substring matching (e.g.,
anthropic-cybersecurity-skills-> matchessecurity-> routes tocode-quality/security). This runs first with a near-deterministic 98% confidence score. - Sub-Hub Conflict Resolution:
Precedence rules resolve overlap (e.g.,
securitywins over programming languages likepythonorrust). - Robust Object Response Support:
The LLM parser supports both standard JSON strings and direct JSON objects in
choices[0].message.contentresponses, avoiding parsing crashes.
skills-bank/
├── src/ # Rust source code
│ ├── components/
│ │ ├── llm/ # LLM classification & provider logic
│ │ │ ├── providers/ # Provider implementations (custom, groq, etc.)
│ │ │ │ └── custom.rs # FreeLLMAPI provider
│ │ ├── aggregator/ # Aggregation engine
│ │ │ ├── mod.rs # Aggregator lifecycle
│ │ │ └── rules.rs # Keyword & conflict resolution rules
│ │ ├── native_pipeline.rs # Core pipeline orchestration
│ │ ├── syncer.rs # Sync files to AI tools
│ │ └── diagnostics.rs # Health checks
│ ├── main.rs # CLI entry point
│ └── lib.rs # Library root
├── Cargo.toml # Cargo manifest
├── config.json # User config (repos, sync targets)
├── skills-aggregated/ # Output directory (generated)
│ ├── routing.csv # Master routing table
│ ├── hub-manifests.csv # Hub manifests registry
│ └── subhub-index.json # Sub-hub index
└── README.md # Project documentation
The project loads environment variables via a symlink skills-bank/.env pointing to the centralized configuration at freellmapi/.env. Update settings inside freellmapi/.env to configure:
PARALLEL_JOBS=8 # CPU threads for parallel cloning (default: auto)
SKILL_MANAGE_DEBUG=1 # Enable debug logging
# Deduplication thresholds
DEDUP_DESC_THRESHOLD=0.8 # Levenshtein distance for description matching
DEDUP_NAME_THRESHOLD=0.7 # Levenshtein distance for name matchingLLM_PROVIDER=freellmapi # 'freellmapi', 'groq', 'openai', 'claude', 'gemini'
LLM_API_KEY=freellmapi-... # API Key (unified key if using freellmapi)
LLM_API_URL=http://localhost:3001/v1 # API Endpoint (default: local proxy)
LLM_MODEL=auto # Model identifier ('auto' for proxy routing)
# Batch processing
LLM_BATCH_SIZE=10 # Number of skills to process in a single batch
LLM_CONCURRENCY=5 # Concurrent batch request limit
LLM_MAX_RETRIES=3 # Max retry attempts for failed batches
LLM_INITIAL_BACKOFF_MS=500 # Exponential backoff base duration| Agent | Project Local Path | User Global Path |
|---|---|---|
| Claude | .claude/skills/ |
~/.claude/skills/ |
| Hermes | .hermes/skills/ |
~/.hermes/skills/ |
| GitHub Copilot | .agents/skills/ |
~/.agents/skills/ |
| Cursor | .cursor/skills/ |
~/.cursor/skills/ |
| Windsurf | .windsurf/skills/ |
~/.codeium/windsurf/skills/ |
| Antigravity | .agent/skills/ |
~/.agent/skills/ |
- Solution: Configure
LLM_PROVIDER=freellmapiand add multiple provider keys via the FreeLLMAPI dashboard.
Issue: "Connection refused" to http://localhost:3001
- Solution: Verify FreeLLMAPI is running via
npm run devand port3001is open.
- Solution: The tool handles junctions atomically. If conflicting locks occur, manually clean target directories:
rm -rf ~/.claude/skills/* # Linux/macOS
- Fork the repository.
- Create a feature branch:
git checkout -b feature/your-feature. - Add test coverage for new keyword rules or functionality.
- Run
cargo testand verify format withcargo fmt. - Submit a pull request.
This project is licensed under the MIT License. See LICENSE for details.