You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,9 +49,9 @@ There are **three** kinds of tests, and it matters which is which:
49
49
50
50
1.**`npm test` — grok-free unit/DOM/integration suite (414 tests).** Pure logic, happy-dom tests that drive the real `media/chat.js`, a real-`/bin/sh` TerminalManager smoke, and a fake-CLI ACP integration suite (`test/fixtures/fake-grok-acp.cjs`). **Never spawns the real `grok` binary.** Runs in <2s with no network, no login, no subscription. This is the floor — every change keeps it green.
51
51
2.**CI — the *same* suite.**`.github/workflows/ci.yml` runs `npm ci && npm test && npm run package` on a clean Ubuntu box. **CI ≡ layer 1, verbatim** — there is no separate CI-only set. CI has no `grok` binary, no auth, no SuperGrok subscription, so it *cannot* run anything that touches the real CLI. That's the whole reason layer 1 is grok-free.
52
-
3. **`npm run test:live` — on-demand pre-release suite against REAL grok (`scripts/live-tests.cjs`).** Spawns the actual `grok agent stdio` and exercises the surfaces layers 1–2 can't: the real ACP handshake, a prompt round-trip, session restore, plan-mode enforcement, and the v1.4.x generative features (image gen, video gen; the subagent path is exercised opportunistically and SKIPs when grok doesn't delegate — it's deferred/research-only). It **reuses the real compiled modules** (`out/acp-dispatch.js`, `out/plan-gate.js`, `media/webview-helpers.js`) — it feeds genuine wire output through the same `isMediaGenToolCall`/`extractGeneratedMediaPaths`/`isSubagentToolCall`/`shouldBlockWrite` the extension uses, not a re-implementation. **Run it manually before every release-to-`main`** (the user triggers it; it needs a logged-in grok + subscription and burns credits, so it must never be in `npm test` or CI). Flags: `--quick` (skip the slow generative tests), `--only=`, `--skip=`, `GROK_BIN=…`. A SKIP (no subscription, grok chose not to delegate, etc.) does not fail the gate — only a FAIL does. Real-grok **diagnostic probes** (`research/*.cjs`) remain manual one-offs for capturing wire shapes; the live suite is the repeatable gate.
52
+
3. **`npm run test:live` — on-demand pre-release suite against REAL grok (`scripts/live-tests.cjs`).** Spawns the actual `grok agent stdio` and exercises the surfaces layers 1–2 can't: the real ACP handshake, a prompt round-trip, session restore, plan-mode enforcement, and the v1.4.x generative features (image gen, video gen; the subagent path is exercised opportunistically and SKIPs when grok doesn't delegate — it's deferred/research-only). It **reuses the real compiled modules** (`out/acp-dispatch.js`, `out/plan-gate.js`, `media/webview-helpers.js`) — it feeds genuine wire output through the same `isMediaGenToolCall`/`extractGeneratedMediaPaths`/`isSubagentToolCall`/`shouldBlockWrite` the extension uses, not a re-implementation. **Always run it before every release-to-`main` — it's a non-negotiable, standing part of the release gate; run it without asking** (it needs a logged-in grok + subscription and burns credits, so it must never be in `npm test` or CI, but it is mandatory before any tag/release). Flags: `--quick` (skip the slow generative tests), `--only=`, `--skip=`, `GROK_BIN=…`. A SKIP (no subscription, grok chose not to delegate, etc.) does not fail the gate — only a FAIL does. Real-grok **diagnostic probes** (`research/*.cjs`) remain manual one-offs for capturing wire shapes; the live suite is the repeatable gate.
53
53
54
-
**So:** local == CI (both grok-free). The real-grok tests are a separate, manual, pre-release gate — run on request, not on commit.
54
+
**So:** local == CI (both grok-free). The real-grok tests are a separate, mandatory pre-release gate — always run before a tag/release (no need to ask), never on every commit.
55
55
56
56
### grok CLI version + updating
57
57
@@ -122,7 +122,7 @@ It reads the version from `package.json`, runs the gate, builds the vsix, commit
122
122
What the script encodes, step by step:
123
123
124
124
1. Bump `version` in `package.json` (user-initiated) and add the dated section to `CHANGELOG.md`.
2.`npm test` (414-test floor, all green) + `tsc -p . --noEmit` clean, **and `npm run test:live` against real grok — mandatory, run without asking** (the `release.*` scripts don't run it, so run it by hand before invoking them).
126
126
3. Commit + push to `main` (direct-to-main, no feature branches).
127
127
4.**Annotated git tag**`vX.Y.Z` at the release commit → `git tag -a vX.Y.Z -m "Release vX.Y.Z"` → `git push origin vX.Y.Z`.
128
128
5.**GitHub Release** for that tag → `gh release create vX.Y.Z --title "Release vX.Y.Z" --notes-file <notes> <vsix>` (notes = the new changelog section(s); include any earlier version that was bumped but never released). **Always attach the built `grok-vscode-phuryn-X.Y.Z.vsix` as a release asset** so the exact installable build is downloadable from the release.
0 commit comments