Codebase Oracle

Ask your codebase what your product actually does. Get PM-language answers with confidence levels, not code dumps.

A product-aware RAG system where the corpus is code, the index is organized by product behavior, and the consumer is a PM.

What I found when I ran this on a real codebase

I pointed the oracle at Documenso, an open-source e-signature platform (~50K lines of code). Here's what it found that no spec, user manual, or dashboard could tell me:

A signer whose deadline expires mid-session gets a generic error with no explanation.

The backend defines a specific, helpful message: "Recipient signing window has expired." But the frontend error handler doesn't inspect the error code — it displays "An error occurred while signing the document." The signer has no idea what went wrong. They'll retry, fail again, and contact support. The backend team built the right error. The frontend never wires it up.

The oracle caught this because it traces to the rendering layer, not just the backend. Every user-facing claim is tagged:

"An error occurred while signing the document."  [RENDERED — traced to document-signing-text-field.tsx]
"Recipient signing window has expired"            [BACKEND-DEFINED — frontend does not surface this]

Two other findings from the same codebase:

→ When two signers try to sign the same field simultaneously, the system is safe by design — fields are permanently bound to one recipient at the data model level. But there's a theoretical TOCTOU race condition for same-signer double-submits on non-signature fields (text, checkboxes) because the field.inserted check runs outside the database transaction.

→ When a signer opens a link for an already-completed document, there's no error — they're silently redirected to a "Document Signed" completion page. The experience is seamless, but confusing if the signer didn't actually sign. A removed-and-re-added recipient would see "Everyone has signed" without ever having signed themselves.

None of these are in any spec. None show up in any dashboard. They live in the code, and until now, only engineers could find them.

What this is

A Claude Code toolkit with six skills that build a product intelligence cache on top of any codebase:

You ask:     "What happens when a signing link expires mid-session?"
Oracle reads: 14 source files across 3 modules
Oracle answers: PM-formatted behavior with exact UI text, edge cases,
                risk ratings, and confidence level
Oracle caches: Answer saved — next time anyone asks, instant response

The system has three tiers (think CPU cache hierarchy):

L1  Product Map     ~2K tokens    Routes questions        Always loaded
L2  Flow Documents  1-3K each     Answers from cache      On-demand, growing
L3  Live Trace      75-100K       Explores codebase       Fallback, writes back to L2

Every query that hits L3 writes back to L2. The system gets faster and more comprehensive with every question asked.

Installation

Prerequisites: Claude Code (install guide) with a Claude Pro ($20/month) or Max subscription.

# Clone the repo
git clone https://github.com/ajinkya-t/codebase-oracle

# Run the setup script (symlinks skills into ~/.claude/skills/)
cd codebase-oracle
./setup

# Verify the installation
./setup --verify

# Navigate to any codebase
cd your-project

# Recommended: run /init first for better results
claude
/init

# Build the product intelligence cache
/oracle-init

# Start asking questions
/oracle-ask What happens when a user tries to sign an expired document?

Updating: Just run git pull inside your clone — the setup script installs a post-merge hook that re-runs automatically, so new and updated skills are always reflected.

Optional: install the git hook for automatic staleness tracking

bash ~/path/to/codebase-oracle/scripts/install-git-hook.sh

Run this from inside the project you want to track (not the oracle repo). Replace the path with wherever you cloned codebase-oracle.

Available skills

Skill	Purpose	When to use
`/oracle-init`	Scan the codebase, build L1 product map + starter L2 flow docs	First time setup on any project (~8 min)
`/oracle-ask`	Ask product behavior questions with cached answers	Daily PM work — "what does X actually do?"
`/oracle-refresh`	Regenerate stale docs or refresh the product map	After significant code changes
`/oracle-eval`	View cache hit rates, usage patterns, cost savings	Weekly review of oracle performance
`/oracle-diagram` (planned)	Generate PM-readable mermaid flowcharts from cache	When you need a visual for a stakeholder brief
`/oracle-review-spec` (planned)	Cross-reference a new spec against known product behavior	Before engineering starts on a new feature

How it works

The query flow

Question asked
     │
     ▼
  Load L1 product map → route to relevant flow
     │
     ├── L2 doc exists + fresh ──────► Answer from cache (42s)
     │
     ├── L2 doc exists + partial ────► Ask user: [1] cache-only or [2] trace?
     │                                  └─► [2]: targeted trace → merge → answer
     │
     ├── L2 doc exists + stale ──────► Answer with ⚠ warning
     │
     └── No L2 doc (cache miss) ─────► Full trace → write new L2 doc → answer

Cache discipline

The oracle does NOT re-read code on every question. Cache hits answer exclusively from stored docs. The Explore subagent only launches on genuine cache misses or when the user explicitly approves an enrichment trace. This is enforced through explicit prohibition rules in the skill, not left to the model's judgment.

Anti-hallucination

For fullstack apps, the backend may define one error message while the frontend displays a completely different one. The oracle traces user-facing text to the rendering layer (the React component, toast handler, or error boundary that actually displays it) and tags every claim:

[RENDERED] — traced to the frontend component that displays this text
[BACKEND-DEFINED] — found only in backend code; what the user actually sees may differ

This prevents the most dangerous class of AI hallucination: stating technically-grounded but user-invisible text as fact.

Self-improving loop

Every L3 trace and every user-approved enrichment writes findings back to the L2 cache. The cache grows with use:

After oracle-init:        5-8 starter flow docs
After 8 queries:          3 additional docs, 87.5% cache-served rate
After enrichment cycle:   Same question → instant cache hit (42s vs 3 min)

Observed performance

Tested on Documenso (~50K LOC, TypeScript/Remix) with Claude Opus 4.6:

Metric	Observed
oracle-init time	8m 17s (4 parallel Explore agents, ~356K tokens)
Cache hit response	9–47s wall time
Cache miss response	2-3 min (includes write-back)
Enrichment response	~3 min (targeted trace + merge)
Cache-served ratio (8 queries)	87.5% (62.5% full hits + 25% enriched)
Self-improving loop	Confirmed — enrichments persist across sessions

From the eval report (8 queries):

Actual time:   (5 × ~15s) + (2 × ~157s) + (1 × ~96s) = 75 + 314 + 96 ≈ 485s (~8 min)
Without cache: 8 × ~120s = 960s (~16 min)
Time saved:    ~475s (~8 minutes)

(avg cache hit ~15s; enriched ~157s = 2m 37s observed; miss ~96s = 1m 36s observed; without-cache baseline ~120s = midpoint of 2–3 min full trace)

Confidence levels

Every answer includes a confidence indicator so you know how much to trust it:

Level	Meaning
✅ VERIFIED	Fresh L2 doc, all source files unchanged since analysis
⚠ LIKELY ACCURATE	L2 doc exists, minor source files changed (not the entry-point)
⚠ NEEDS VERIFICATION	Primary entry-point file changed, or complex multi-module trace
⚠ LOW CONFIDENCE	No test coverage on this code path, or dynamic behavior that can't be statically resolved

Calibration note: Any answer where a user-facing text claim is tagged [BACKEND-DEFINED] should be treated as LIKELY ACCURATE at most — the frontend rendering layer was not verified. HIGH confidence is only warranted when all "what the user sees" claims carry [RENDERED] tags.

Directory structure generated

.product-oracle/
├── product-map.md              # L1: Feature domains, flows, roles, error codes
├── .staleness.json             # Tracks which L2 docs need refreshing
├── .eval-log.jsonl             # Query-by-query usage metrics
├── flows/                      # L2: One doc per product flow
│   ├── recipient-signing.md
│   ├── document-sending.md
│   └── ...
├── globals/                    # L2: Cross-cutting behaviors
│   ├── authentication.md
│   ├── error-formatting.md
│   └── notification-triggers.md
├── diagrams/                   # Generated mermaid diagrams
└── reviews/                    # Spec review reports

Framework agnostic

The oracle detects your application type and adapts its scanning strategy:

Application type	Entry points scanned	Output vocabulary
Web app (SPA, SSR, API)	Routes, page components, API handlers	"User sees," "page displays"
CLI tool	Command definitions, argument parsers	"Output shows," "terminal displays"
Library / SDK	Public API surface, exported functions	"Consumer gets," "function returns"
Event-driven service	Event handlers, queue consumers	"System processes," "handler produces"
Data pipeline	Pipeline stages, transformations	"Stage outputs," "pipeline produces"

Works with any language or framework. Tested on TypeScript/Remix. Designed to work on Python, Go, Java, Rust, and anything else Claude Code can read.

Known limitations

Initial L2 docs can contain hallucinated behavior. The [RENDERED]/[BACKEND-DEFINED] tagging mitigates this for user-facing text. Enrichment traces are more accurate than init scans. For production use, add a "PM verified" flag for human-checked docs.
Source file dependency tracking is best-effort. If the Explore subagent reads a file but doesn't list it in structured output, the staleness checker won't track it. Consequence: some stale docs may appear fresh.
Doesn't trace non-code product logic. Feature flags in a config service, business rules in a database, or behavior from environment variables won't be found unless they're visible in the source code.
Cache discipline depends on skill instruction quality. Claude may occasionally auto-escalate on ambiguous partial-coverage scenarios. The explicit prohibition language reduces this significantly but doesn't eliminate it.
Token usage on init is substantial. oracle-init uses ~356K tokens on a 50K LOC codebase. On a Pro subscription this may consume a significant portion of your usage window. Plan accordingly.
Git hook is not installed automatically. You must run install-git-hook.sh manually. Without it, stale L2 docs won't be flagged after code changes.
Enrichment traces are expensive. A user-approved enrichment (Path A-Enrich) costs ~104K tokens — comparable to a full cache miss. The user-gating mechanism ensures this cost is intentional, not accidental.

Roadmap

/oracle-diagram — Generate PM-readable mermaid flowcharts from L2 cache with risk color-coding
/oracle-review-spec — Cross-reference new specs against accumulated oracle knowledge
Playwright validation — Runtime verification via browser automation, DOM text extraction for ground-truth checking
PM verified flag — Human toggle on L2 docs after manual spot-checking
Cross-codebase testing — Verify framework-agnostic claims on Python CLI, Go API, React component library
Confidence calibration tightening — Cap confidence at MEDIUM when any user-facing claim is [BACKEND-DEFINED]

How this was built

Designed and tested in one intensive day. Five design iterations, each driven by specific evaluation findings — not assumptions:

Version	What changed	What broke that led to this
V1	Flat product directory	Too expensive to load and maintain
V2	Three-tier cache	Cache hits were auto-escalating to live traces
V3	Cache discipline + write-back	Enrichments weren't persisting across sessions
V4	User-gated escalation (interrupt before answer)	Trace offers after the answer were being ignored
V5	Anti-hallucination ([RENDERED] tags)	Backend-defined error text stated as user-visible

The full design document, evaluation framework, and iteration history are available at: [link to reference doc / blog post]

Architecture deep dive

See docs/architecture.md for the complete technical design including the three-tier cache, cache invalidation via git hooks, the fork/write-back constraint, and the anti-hallucination system.

See docs/why-this-matters.md for the structural argument: why specs, user manuals, and dashboards systematically fail to capture actual product behavior.

Using this for content

If you're building in public and want to create content from your oracle findings, see docs/for-content-creators.md for guidance on capturing compelling screenshots, structuring LinkedIn posts, and framing the narrative around information asymmetry rather than tool usage.

Contributing

Issues and PRs welcome. The most valuable contributions:

Test on different codebases — especially non-TypeScript, non-web-app projects. The framework-agnostic claims need broader validation.
Improve skill instructions — if the oracle misbehaves on your codebase (breaks cache discipline, misformats answers, misidentifies entry points), the fix is usually in the SKILL.md instructions.
Add to the eval framework — new test questions, new failure types, regression test cases.

License

MIT — see LICENSE

Built with Claude Code by Ajinkya. The oracle is a Claude Code toolkit — it uses Claude's intelligence but the architecture, cache design, eval framework, and anti-hallucination system are original work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codebase Oracle

What I found when I ran this on a real codebase

What this is

Installation

Available skills

How it works

The query flow

Cache discipline

Anti-hallucination

Self-improving loop

Observed performance

Confidence levels

Directory structure generated

Framework agnostic

Known limitations

Roadmap

How this was built

Architecture deep dive

Using this for content

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
examples		examples
hooks		hooks
scripts		scripts
skills		skills
LICENSE		LICENSE
README.md		README.md
setup		setup

Folders and files

Latest commit

History

Repository files navigation

Codebase Oracle

What I found when I ran this on a real codebase

What this is

Installation

Available skills

How it works

The query flow

Cache discipline

Anti-hallucination

Self-improving loop

Observed performance

Confidence levels

Directory structure generated

Framework agnostic

Known limitations

Roadmap

How this was built

Architecture deep dive

Using this for content

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages