compressed spec-driven development for claude code
one file · one loop · zero sub-agents
Plan-then-execute forgets. SDD remembers — but most SDD frameworks bury that value under agent swarms, dashboards, and ceremony that costs more tokens than it saves.
Cavekit is the simplest full loop: grill → spec → research → review →
build, over one SPEC.md file, no sub-agents. Three commands you run
every time; four more you reach for only when the change earns it.
The spine is three properties that earn their tokens:
- durable spec —
SPEC.mdat repo root survives context resets. It is the agent's long-term memory: lose the window, reload the spec, keep going. - caveman encoding — ~75% fewer tokens than prose. Symbols, fragments, pipe tables. All nine skill descriptions cost ~1.1k context — 16× lighter than spec-kit's 18.6k. That is the whole point.
- backprop reflex — every test failure becomes a
§Bentry; classes of bug become§Vinvariants the spec never forgets.
And one rule that keeps it from bloating into the frameworks it replaces:
right-size. A one-line fix is just /build. The full chain is for
genuinely uncertain or high-blast-radius work — never for a typo.
the loop — run these every time:
| cmd | job |
|---|---|
/ck:spec |
create / amend / backprop SPEC.md. Sole mutator. |
/ck:build |
native plan → execute against spec. Names which test proves each §V. Auto-backprops on failure. |
/ck:check |
read-only drift report. Lists §V / §I / §T violations. The drift detector. |
reach for these — only when the change earns the ceremony:
| cmd | job |
|---|---|
/ck:grill |
interrogate a fuzzy idea into a sharp §G/§C, one question at a time, before you spec. |
/ck:research |
gather external knowledge into §R so build grounds in facts, not hallucinations. Every finding cites a source. |
/ck:review |
adversarial senior review of the spec before build. Refutes, hardens §V, ends in a go/no-go gate. |
/ck:deepen |
spare-budget design pass — make one shallow module deep. Behavior held, tests green before & after. |
One line, via the skills CLI:
npx skills add JuliusBrussee/cavekitInstalls nine skills into ~/.claude/skills/: spec, build, check
(the loop), grill, research, review, deepen (reach-for), plus
caveman and backprop (the utilities). Claude activates each when its
trigger context matches — e.g. "write a spec for…" invokes spec, a fuzzy
idea invokes grill, a risky change before build invokes review. Claude
Code picks them up on next launch.
Or via the Claude Code marketplace (also adds the /ck:spec, /ck:build,
/ck:check, /ck:grill, /ck:research, /ck:review, /ck:deepen slash
commands):
/plugin marketplace add juliusbrussee/cavekit
/plugin install ck@cavekitOr clone directly:
git clone https://github.com/juliusbrussee/cavekit.git ~/.claude/plugins/cavekitSee FORMAT.md. Sections: §G goal, §C constraints, §I
interfaces, §R research (optional, pipe table), §V invariants, §T tasks
(pipe table), §B bugs (pipe table). Each verb owns specific sections —
no verb rewrites a section it does not own.
FORMAT.md spec schema + caveman encoding + sectioned ownership
commands/ seven thin slash-command entry points → the skills (loop + reach-for)
skills/spec spec mutator — sole writer
skills/build plan-execute, verification contract
skills/check drift report
skills/grill sharpen a fuzzy idea → §G/§C before spec
skills/research external knowledge → §R, every finding sourced
skills/review adversarial senior review of the spec → hardens §V
skills/deepen spare-budget design pass — make one module deep
skills/caveman encoding utility
skills/backprop bug → spec protocol (six steps)
- no sub-agents. Main Claude does the work.
- no dashboards.
cat SPEC.mdis the dashboard. - no parallel workers. One thread, one spec, one diff.
- no JSON / YAML spec bodies. Markdown + pipe tables.
- no hooks, no orchestration binaries, no TypeScript helpers.
The previous generation is not deprecated — it is frozen at tag
v3.1.0 and
remains a fully working plugin.
What it is:
Spec-driven AI development with an autonomous execution loop. Four-command Hunt lifecycle (
/ck:sketch→/ck:map→/ck:make→/ck:check), plus/ck:ship,/ck:review,/ck:revise,/ck:status,/ck:design,/ck:research,/ck:init,/ck:config,/ck:resume,/ck:help— 16 slash commands total. 12 named sub-agents. Per-task token budgets, stop-hook state machine, model-tier routing, auto-backpropagation from test failures, tool-result caching, Codex peer review, Karpathy behavioral guardrails, caveman token compression, knowledge-graph integration, and design-system enforcement. Parallel wave execution and team mode.
Pick v3.1.0 if you want the full autonomous loop, parallel agents, peer review, or design-system workflow. Pick v4 if you want the distilled loop — one spec, no orchestration, right-sized ceremony.
Marketplace:
/plugin marketplace add juliusbrussee/cavekit@v3.1.0
/plugin install ck@cavekitGit:
git clone -b v3.1.0 https://github.com/juliusbrussee/cavekit.gitFull docs live at the tag — git checkout v3.1.0 and read the README
there for command reference, skill catalog, and the Hunt lifecycle guide.
See UPGRADE.md. Honest framing:
- Stay on v3.1.0 if your project has active
context/kits/investment. - Move to v4 if you want fewer moving parts and smaller token bills.
- It is a two-way door —
SPEC.mdis plain markdown; nothing traps you in either direction.
Cavekit is one rock in the caveman family:
| repo | what |
|---|---|
| caveman | output compression skill — why use many token when few do trick |
| cavemem | cross-agent persistent memory — why agent forget when agent can remember |
| cavekit (you here) | spec-driven build loop — why agent guess when agent can know |
| cavegemma | Gemma 4 31B fine-tuned on caveman pairs — why prompt every turn when weight remember |
The spec is the only artifact that earns its tokens. Everything else that costs tokens must either save more tokens later, or the user's attention, or it gets cut.
See CHANGELOG.md for the full v3 → v4 break.
MIT.