|
| 1 | +# AGENTS.md |
| 2 | + |
| 3 | +Guidance for any coding agent (Codex, Claude Code, etc.) working on this repository. |
| 4 | + |
| 5 | +> **Naming note.** This project *analyzes* Claude Code's local usage logs, so "Claude Code" below always refers to that product (the source of the JSONL data) — not to the agent reading this file. The agent working on the codebase is referred to as "the coding agent" or just "you". |
| 6 | +
|
| 7 | +## Project shape |
| 8 | + |
| 9 | +Three Python files, stdlib only, no `pip install` step. Python 3.8+. |
| 10 | + |
| 11 | +- [scanner.py](scanner.py) — parses Claude Code JSONL transcripts into a SQLite DB at `~/.claude/usage.db`. |
| 12 | +- [cli.py](cli.py) — terminal commands (`scan` / `today` / `week` / `stats` / `dashboard`). |
| 13 | +- [dashboard.py](dashboard.py) — single-file `http.server` serving an embedded HTML/JS SPA on `localhost:8080`. |
| 14 | + |
| 15 | +Use `python` on Windows, `python3` on macOS/Linux. Both work the same. |
| 16 | + |
| 17 | +## Common commands |
| 18 | + |
| 19 | +``` |
| 20 | +python cli.py scan # incremental scan (fast on re-run) |
| 21 | +python cli.py today # today's usage by model |
| 22 | +python cli.py week # last 7 days, per-day + by-model |
| 23 | +python cli.py stats # all-time stats |
| 24 | +python cli.py dashboard # scan + open http://localhost:8080 |
| 25 | +python cli.py scan --projects-dir PATH # scan a custom transcripts dir |
| 26 | +HOST=0.0.0.0 PORT=9000 python cli.py dashboard |
| 27 | +
|
| 28 | +python -m unittest discover -s tests -v # full test suite (CI runs this) |
| 29 | +python -m unittest tests.test_scanner -v # one file |
| 30 | +python -m unittest tests.test_scanner.TestProjectNameFromCwd.test_windows_path # one test |
| 31 | +``` |
| 32 | + |
| 33 | +CI ([.github/workflows/tests.yml](.github/workflows/tests.yml)) runs the suite on Python 3.9 / 3.11 / 3.12 against `main` and PRs. |
| 34 | + |
| 35 | +## Architecture |
| 36 | + |
| 37 | +### Data flow |
| 38 | + |
| 39 | +``` |
| 40 | +~/.claude/projects/**/*.jsonl → scanner.parse_jsonl_file() |
| 41 | +~/Library/.../Xcode/... ↓ |
| 42 | + aggregate_sessions() → upsert_sessions() + insert_turns() |
| 43 | + ↓ |
| 44 | + ~/.claude/usage.db (SQLite) |
| 45 | + ↓ |
| 46 | + cli.py queries ←──────────→ dashboard.py /api/data |
| 47 | +``` |
| 48 | + |
| 49 | +By default the scanner walks both `~/.claude/projects/` and the Xcode coding-assistant directory; missing dirs are silently skipped. Override with `--projects-dir`. |
| 50 | + |
| 51 | +### SQLite schema (created/migrated in [scanner.py](scanner.py) `init_db`) |
| 52 | + |
| 53 | +- **`turns`** — one row per assistant API response. The source of truth for tokens and per-model attribution. |
| 54 | +- **`sessions`** — aggregated per session (denormalized totals + chosen primary model). |
| 55 | +- **`processed_files`** — incremental-scan tracking: `(path, mtime, lines)`. A file is skipped if its mtime matches; if it grew, only lines past the stored `lines` count are processed. |
| 56 | + |
| 57 | +A conditional unique index on `turns.message_id` (where non-empty) lets `INSERT OR IGNORE` cheaply dedupe replays across rescans. |
| 58 | + |
| 59 | +### Non-obvious invariants |
| 60 | + |
| 61 | +These three things will bite you if you don't know them: |
| 62 | + |
| 63 | +1. **Streaming dedupe by `message.id`.** Claude Code writes multiple JSONL records per API response — only the *last* one for a given `message.id` has the final usage tallies. `parse_jsonl_file` keeps the last record per `message_id` in a dict; earlier records are discarded. Don't sum across records of the same `message_id`. |
| 64 | + |
| 65 | +2. **Session totals are recomputed from `turns` at the end of `scan()`.** During an incremental scan `upsert_sessions` adds tokens additively, but `insert_turns` uses `INSERT OR IGNORE` against the `message_id` unique index — so if a turn is a duplicate, session totals would drift. The final `UPDATE sessions ... (SELECT SUM ... FROM turns)` block reconciles this. Preserve it if you refactor scan logic. |
| 66 | + |
| 67 | +3. **Session primary model priority is opus > sonnet > haiku** (`_model_priority` in [scanner.py](scanner.py)). This prevents a subagent's haiku turn from overwriting the session's opus model when an existing session is updated. Per-turn model is always honored in the `turns` table; only the session-level summary uses the priority. |
| 68 | + |
| 69 | +### Cost calculation |
| 70 | + |
| 71 | +Costs are computed **per turn** (each turn knows its own model), then summed. This is true in both the CLI ([cli.py](cli.py) `calc_cost`) and the dashboard JS ([dashboard.py](dashboard.py) `calcCost` inside the embedded HTML). Aggregating tokens first and applying a single price is wrong for sessions that span multiple models. |
| 72 | + |
| 73 | +Pricing is duplicated in two places that **must stay in sync**: |
| 74 | +- [cli.py](cli.py) `PRICING` dict (Python) |
| 75 | +- [dashboard.py](dashboard.py) `PRICING` const inside `HTML_TEMPLATE` (JavaScript) |
| 76 | + |
| 77 | +`get_pricing` / `getPricing` resolve in three tiers: exact match → `startswith` (handles date-suffixed model IDs like `claude-opus-4-7-20260215`) → substring fallback on `opus` / `sonnet` / `haiku`. Models that don't match any tier return `None` and are billed at $0 (shown as `n/a`) — this is intentional so local/3rd-party models (gemma, glm, etc.) aren't charged at Sonnet rates. |
| 78 | + |
| 79 | +### Dashboard server |
| 80 | + |
| 81 | +`http.server.BaseHTTPRequestHandler`-based, two endpoints: |
| 82 | +- `GET /api/data` → JSON snapshot from `get_dashboard_data()`. Returns *all* history; client-side filters by date range and model. |
| 83 | +- `POST /api/rescan` → deletes the DB and runs a full rescan. Passes `db_path` and `projects_dirs` explicitly so tests that monkey-patch the module globals work — scan's default arg values are frozen at def time, so don't switch to bare defaults. |
| 84 | + |
| 85 | +The entire UI lives in `HTML_TEMPLATE` as a raw string. Chart.js is loaded from CDN. |
| 86 | + |
| 87 | +## Testing notes |
| 88 | + |
| 89 | +- `tests/test_scanner.py` and `tests/test_dashboard.py` use `tempfile.NamedTemporaryFile` for an isolated DB; never touch the user's real `~/.claude/usage.db`. |
| 90 | +- The `/api/rescan` test patches `dashboard.DB_PATH` and `scanner.DEFAULT_PROJECTS_DIRS` — keep that contract intact (see commit 8ae2664). |
| 91 | +- On Windows, `~/.claude/` may not exist on a fresh checkout. `get_db` creates the parent dir (`mkdir(parents=True, exist_ok=True)`) — don't remove that or `sqlite3.connect` will fail in CI / fresh installs (commit b5d1e15). |
0 commit comments