🏦 skills-bank

High-performance skill aggregation, classification & routing platform for AI agents.

Aggregate 100+ skill repositories → Deduplicate 8000+ workflows → Classify semantically → Route to AI agents

Features • Quick Start • Documentation • FreeLLMAPI Integration • Troubleshooting

📚 Table of Contents

Overview
Features
Prerequisites
Quick Start
FreeLLMAPI Integration
Commands Reference
Classification Architecture
Project Structure
Environment Variables
Supported AI Agents
Troubleshooting
Contributing
License

🎯 Overview

skills-bank is a unified skill aggregation platform designed to solve the "skill discovery bottleneck" for AI agents. Instead of agents searching 100+ distributed GitHub repositories for workflows, skills-bank:

Aggregates 8000+ SKILL.md files from distributed repositories.
Deduplicates by name and description, catching cross-repo clones.
Classifies skills into 12 domain hubs and 40+ sub-hubs using a hybrid keyword + LLM approach.
Routes skills to 12+ AI agent platforms (Claude, Copilot, Cursor, Hermes, etc.).
Maintains a canonical cache in lib/ with automatic updates via file watcher.

Result: Agents access 3,600+ curated, deduplicated skills through a single unified interface.

Core Design Principles

Source-of-Truth Loading: Agents load canonical SKILL.md files directly from source repositories, not from catalogs (eliminates hallucination, optimizes tokens).
Hybrid Classification: Combines fast keyword rules (Step A) with optional LLM semantic classification (Step B).
Smart Deduplication: Two-key strategy catches both exact collisions and renamed cross-repo clones.
Multi-Tool Support: Syncs to GitHub Copilot, Claude, Cursor, Hermes, Windsurf, Antigravity, and more.
Token Efficiency: Load minimal metadata first, then source files on-demand.

✨ Features

Aggregation & Deduplication

Parallel Cloning: Shallow clone optimization (80% faster, lower disk footprint).
Smart Deduplication: Normalization and Levenshtein-based matching of name OR description catches cross-repo clones.
Incremental Updates: Checks for remote updates first via git fetch --depth 1 and compares HEAD with FETCH_HEAD hashes before executing resets, cleans, or pulls, avoiding disk and network overhead on already up-to-date repositories.
Legacy Migration: Automatic legacy repository directory cleanup.

Classification & Routing

Hybrid Pipeline: Pre-filter keyword rules (90% confidence) + LLM semantic classification (60-95% confidence).
Domain Hubs (12): code-quality, server-side, frontend, business, and others.
Sub-Hubs (40+): security, testing-qa, performance, ux-design, etc.
Conflict Resolution: Special conflict table ensures domain specialists (e.g., security, testing-qa) always win over generic programming language hubs.
Confidence Scoring: Path-based (98%), keyword-based (90%), LLM-based (60-95%).

Sync & Distribution

Multi-Target Sync: Automatic distribution to Claude, GitHub Copilot, Hermes, Cursor, Windsurf, Antigravity, and more.
Atomic updates: Sync is all-or-nothing; failures don't corrupt state.
Junction management: Safe symlink and junction manipulation on Windows, macOS, and Linux.

🔧 Prerequisites

Rust 1.70+ (Install)
Git (for repository cloning)
~2GB disk space (for aggregated skills cache)
Node.js 20+ (optional, only if running FreeLLMAPI locally)

🚄 FreeLLMAPI Integration

The Problem: LLM Rate Limits

When classifying 8000+ skills, your LLM provider will hit rate limits. Individual providers (Groq, OpenAI, Anthropic) each have free-tier limits (e.g., Groq: ~120 requests/min), which can cause cascading failures during aggregation.

The Solution: FreeLLMAPI

FreeLLMAPI is a unified proxy that aggregates 12+ free LLM providers behind a single OpenAI-compatible endpoint. It automatically falls back when a provider rate-limits, eliminating rate-limit errors during aggregation.

skills-bank (classification)
     │
     ▼
FreeLLMAPI (unified proxy)
     │ (automatic fallback/rotation)
     ├─ Google Gemini (1,500 RPM)
     ├─ Groq (120 RPM)
     ├─ Cerebras (200 RPM)
     ├─ SambaNova (100 RPM)
     ├─ Mistral (100 RPM)
     ├─ OpenRouter (20 RPM)
     ├─ GitHub Models (100 RPM)
     ├─ Cloudflare (10,000 req/day)
     ├─ Cohere (100 RPM)
     └─ (Others: Z.ai, NVIDIA, HuggingFace)

Total capacity: ~1.3 billion tokens/month (free tier combined)

Setup FreeLLMAPI

The entire workspace is consolidated under a single source of truth configuration file at freellmapi/.env. native Unix symlinks pointing back to this centralized environment file.

To configure the proxy and automatically seed your provider keys:

Consolidated environment config is managed centrally at freellmapi/.env.
A database seeding script automatically reads upstream provider API keys, encrypts them using FreeLLMAPI's native AES-256-GCM, and seeds them into the SQLite database.
The unified API key is synchronized automatically to freellmapi.

Next Steps:

Start FreeLLMAPI:
```
cd freellmapi && npm run dev
```
Run skills-bank aggregation:
```
./target/release/skills-bank aggregate
```

cd skills-bank/ cargo build --release


### 2. Run Aggregation
```bash
# First-time interactive setup
./target/release/skills-bank setup

# Execute the aggregation pipeline
./target/release/skills-bank aggregate

3. Sync to Your Tools

./target/release/skills-bank sync

🎮 Commands Reference

Command	Purpose	When to Use
`setup`	Interactive configuration wizard	Initial setup or configuration change
`aggregate`	Collect, deduplicate, classify, and route skills	First run or when repositories change
`sync`	Distribute aggregated skills to configured AI tool directories	After aggregation completes
`run`	Execute the full pipeline (`aggregate` → `sync`) in sequence	Automated daily updates
`watch`	Auto-trigger aggregation on config changes (file watcher)	Local development / hot-reload
`add-repo <URL>`	Add a new skill repository	Onboarding a new source repository
`doctor`	Validate installation and report repository state	Troubleshooting or pre-cleanup inspection
`release-gate`	Validate aggregation output integrity	Before releases or production sync
`cleanup-legacy-duplicates`	Remove legacy repository folders	Migration from older versions

🏗️ Classification Architecture

Multi-Stage Pipeline

 8000+ SKILL.md files
        │
        ▼
 ┌──────────────┐
 │  YAML Parse  │  Extract name, description, triggers
 └──────┬───────┘
            │
            ▼
 ┌──────────────┐
 │  Keyword     │  Fast token-based routing to hub/sub-hub
 │  Rules       │  (fallback if LLM unavailable)
 └──────┬───────┘
            │
            ▼
 ┌──────────────┐
 │  Dedup Check │  Name OR Description Levenshtein HashSet
 └──────┬───────┘
            │
            ▼
 ┌──────────────────────────────────────┐
 │  Hybrid Exclusion + LLM Classify     │
 │  Step A: Keyword pre-filter          │
 │  Step B: LLM semantic classify       │
 │         (supports JSON object parser)│
 └──────┬───────────────────────────────┘
            │
            ▼
 ┌──────────────┐
 │  Output      │  routing.csv, per-hub manifests,
 │  Artifacts   │  subhub-index.json
 └──────────────┘

Classification Improvements (v2.0+)

Repository Name Substring Matching: Repository directory names are analyzed using substring matching (e.g., anthropic-cybersecurity-skills -> matches security -> routes to code-quality/security). This runs first with a near-deterministic 98% confidence score.
Sub-Hub Conflict Resolution: Precedence rules resolve overlap (e.g., security wins over programming languages like python or rust).
Robust Object Response Support: The LLM parser supports both standard JSON strings and direct JSON objects in choices[0].message.content responses, avoiding parsing crashes.

📁 Project Structure

skills-bank/
├── src/                              # Rust source code
│   ├── components/
│   │   ├── llm/                     # LLM classification & provider logic
│   │   │   ├── providers/           # Provider implementations (custom, groq, etc.)
│   │   │   │   └── custom.rs        # FreeLLMAPI provider
│   │   ├── aggregator/              # Aggregation engine
│   │   │   ├── mod.rs               # Aggregator lifecycle
│   │   │   └── rules.rs             # Keyword & conflict resolution rules
│   │   ├── native_pipeline.rs       # Core pipeline orchestration
│   │   ├── syncer.rs                # Sync files to AI tools
│   │   └── diagnostics.rs           # Health checks
│   ├── main.rs                      # CLI entry point
│   └── lib.rs                       # Library root
├── Cargo.toml                       # Cargo manifest
├── config.json     # User config (repos, sync targets)
├── skills-aggregated/               # Output directory (generated)
│   ├── routing.csv                 # Master routing table
│   ├── hub-manifests.csv           # Hub manifests registry
│   └── subhub-index.json           # Sub-hub index
└── README.md                        # Project documentation

⚙️ Environment Variables

The project loads environment variables via a symlink skills-bank/.env pointing to the centralized configuration at freellmapi/.env. Update settings inside freellmapi/.env to configure:

Core Configuration

PARALLEL_JOBS=8                    # CPU threads for parallel cloning (default: auto)
SKILL_MANAGE_DEBUG=1              # Enable debug logging

# Deduplication thresholds
DEDUP_DESC_THRESHOLD=0.8          # Levenshtein distance for description matching
DEDUP_NAME_THRESHOLD=0.7          # Levenshtein distance for name matching

LLM Classification Configuration

LLM_PROVIDER=freellmapi           # 'freellmapi', 'groq', 'openai', 'claude', 'gemini'
LLM_API_KEY=freellmapi-...        # API Key (unified key if using freellmapi)
LLM_API_URL=http://localhost:3001/v1 # API Endpoint (default: local proxy)
LLM_MODEL=auto                    # Model identifier ('auto' for proxy routing)

# Batch processing
LLM_BATCH_SIZE=10                 # Number of skills to process in a single batch
LLM_CONCURRENCY=5                 # Concurrent batch request limit
LLM_MAX_RETRIES=3                 # Max retry attempts for failed batches
LLM_INITIAL_BACKOFF_MS=500        # Exponential backoff base duration

🌐 Supported AI Agents

Agent	Project Local Path	User Global Path
Claude	`.claude/skills/`	`~/.claude/skills/`
Hermes	`.hermes/skills/`	`~/.hermes/skills/`
GitHub Copilot	`.agents/skills/`	`~/.agents/skills/`
Cursor	`.cursor/skills/`	`~/.cursor/skills/`
Windsurf	`.windsurf/skills/`	`~/.codeium/windsurf/skills/`
Antigravity	`.agent/skills/`	`~/.agent/skills/`

🔧 Troubleshooting

Issue: Rate Limit Errors (429 Too Many Requests)

Solution: Configure LLM_PROVIDER=freellmapi and add multiple provider keys via the FreeLLMAPI dashboard.

Issue: "Connection refused" to http://localhost:3001

Solution: Verify FreeLLMAPI is running via npm run dev and port 3001 is open.

Issue: Sync Failing with Symlink/Junction Errors

Solution: The tool handles junctions atomically. If conflicting locks occur, manually clean target directories:
```
rm -rf ~/.claude/skills/*  # Linux/macOS
```

🤝 Contributing

Fork the repository.
Create a feature branch: git checkout -b feature/your-feature.
Add test coverage for new keyword rules or functionality.
Run cargo test and verify format with cargo fmt.
Submit a pull request.

📄 License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.cargo		.cargo
bin		bin
docs		docs
src		src
tests		tests
.env-example		.env-example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
cc-wrapper		cc-wrapper
config.json		config.json
dist-workspace.toml		dist-workspace.toml
hub-manifests.csv		hub-manifests.csv
package-lock.json		package-lock.json
package.json		package.json
setup-freellmapi.sh		setup-freellmapi.sh

Folders and files

Latest commit

History

Repository files navigation

🏦 skills-bank

📚 Table of Contents

🎯 Overview

Core Design Principles

✨ Features

Aggregation & Deduplication

Classification & Routing

Sync & Distribution

🔧 Prerequisites

🚄 FreeLLMAPI Integration

The Problem: LLM Rate Limits

The Solution: FreeLLMAPI

Setup FreeLLMAPI

Next Steps:

3. Sync to Your Tools

🎮 Commands Reference

🏗️ Classification Architecture

Multi-Stage Pipeline

Classification Improvements (v2.0+)

📁 Project Structure

⚙️ Environment Variables

Core Configuration

LLM Classification Configuration

🌐 Supported AI Agents

🔧 Troubleshooting

Issue: Rate Limit Errors (429 Too Many Requests)

Issue: "Connection refused" to http://localhost:3001

Issue: Sync Failing with Symlink/Junction Errors

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages