14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
-
Updated
Apr 1, 2026 - Python
14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
97% token reduction for AI coding sessions — zero deps, 31 languages, MCP server
Cut your Claude / OpenAI / Gemini bill 70–95% on AI coding. Local proxy that compresses context, keeps provider caches hot, and verifies LLM output ($0 hallucination guard). Drop-in for Cursor, Claude Code, Codex, Aider + 34 more and custom providers — 30s, no code changes
Portable CC-inspired skills for memory, verification, multi-agent coordination, context compression, and proactive coding-agent workflows.
A drop-in proxy that compresses bloated code context in real-time, cutting LLM API costs by 50–80% without losing what the model actually needs to know.
The official repo for "LLoCo: Learning Long Contexts Offline"
Local streaming reverse proxy between AI coding agents (Claude Code, Cursor, Codex) and model APIs (Anthropic, OpenAI, Gemini, MiniMax). Meters every token + USD cost, compacts bloated context to cut pay-per-token API spend, and runs shadow-eval to prove quality held. ccusage-style metering + live local dashboard.
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
Context compression plugin for Claude Code. Trims large JSON, logs, stack traces, and source files before they enter the context window.
State aware knowledge compression, ingestion, and hybrid retrieval engine. Zero dependencies. Sub-100ms queries.
Cursor uses AI to edit code — we use AI to edit AI's context. 🪆 Context map + compression + version control for LLM context windows.
A unified CLI to install and update token-saving plugins — RTK, Caveman, CodeGraph, and Context-Mode — for Claude Code, OpenCode, Codex, and Antigravity. Minimal setup. Any OS.
Convert long AI conversations into portable conversation state graphs for LLM handoffs.
LLM reliability layer -keeps agents alive with smart routing, context compaction, and local fallback
Unified agent memory and context compression stack for 2026 NVIDIA + edge (Vera CPU, Grace, Jetson Thor, 3090). Glues busyBee-cpu, honey-comb, and rust-brain. Better effective reasoning per token.
🦞 LobsterPress(龙虾饼) - Cognitive Memory System for AI Agents 基于认知科学的 LLM 永久记忆引擎
⚡ Cut Claude token usage by 90%+ — free, open-source, local-first context compression for Claude Code. Hybrid RAG (BM25 + ONNX vectors), AST chunking, reranking. No API needed.
Local-first Model Context Protocol (MCP) memory layer for Codex CLI/Desktop, Claude Code, Gemini CLI, Qwen/DeepSeek/Ollama and agent workflows. SQLite + FTS5 compact context packs, token savings, read-only mode, no external memory server.
Rolling context compression for Claude Code — never hit the context wall. Auto-compresses old messages while keeping recent context verbatim. Zero config, zero latency. Works as a Claude Code plugin.
Local-first context compression for AI coding tools. One binary saves 85-93% of redundant tokens across every LLM call.
Add a description, image, and links to the context-compression topic page so that developers can more easily learn about it.
To associate your repository with the context-compression topic, visit your repo's landing page and select "manage topics."