CLAUDE.md — grok-build-vscode

VS Code sidebar extension for xAI's Grok Build CLI, driven by grok agent stdio over the Agent Client Protocol (ACP). Thin client — all session state, MCP servers, subagents, memory, and plan-mode bookkeeping live in the CLI.

Status

v1.4.10 (releasing now; v1.4.8 is the latest GitHub Release — the Marketplace-published build lags behind). 503 tests passing, all grok-free (CI never spawns the binary; grok-dependent probes live separately in research/*.cjs). v1.4.10 makes the session-history popover scale to thousands of sessions — it loads the most recent 100 (newest-first by last activity), pulls in older pages as you scroll, and the search box filters by name across the whole history (server-side, complete — not just the loaded page); under the hood a summary.json-mtime index orders everything with one cheap stat per dir (no reads), windowed reads parse only the visible page, and an mtime-keyed read cache makes steady-state re-opens cost ~zero reads. An opt-in perf simulation (npm run test:perf, out of npm test/CI) proves the op-count drop (first open 5000→100 reads, ~98%). See § History pagination. v1.4.9 adds an adjustable chat font size (grok.chatFontScale, issue #14) — a chat-only zoom (text + icons + spacing together) that leaves the rest of VS Code at its normal size (unlike global Ctrl/Cmd+Shift+=), applies live with no reload, supports User/Workspace scope, and keeps the composer pinned to the bottom at any scale (the body derives both the CSS zoom and a height: calc(100vh / var(--chat-zoom)) compensation from one --chat-zoom variable, since vh ignores zoom); see the grok.chatFontScale config bullet below. v1.4.8 ships the multi-session Agent Dashboard — a pool of live sessions with lossless instant re-focus, status dots in the history dropdown, and silent idle/LRU reaping (see § Session pool) — plus a primer-latency fix: the hidden plan-mode primer now fires eagerly + non-blocking the moment a session goes live (new + restore + post-/compact) instead of in front of the user's first prompt, the primer text was trimmed to v4 (the product-blurb paragraph + repo URL that tempted agentic grok into a 15–40s pre-turn workspace exploration are gone, replaced by an explicit do-not-act / reply with just ok constraint), and a "Grokking…" waiting indicator (Thinking-block font + animated dots, no chevron) shows on every user turn the instant you send and is replaced in place by the first thought/message/tool card (see § Primer + waiting indicator). The v1.4.x line added image + video generation rendered inline (/imagine → image_gen, or image_edit for reference-photo edits; /imagine-video → video_gen, older/Linux builds image_to_video; grok writes the file to the session dir and reports its path as JSON-in-text on the completed tool result — not an ACP image block — so the host parses the path, classifies image-vs-video by extension, and serves it to the webview via webview.asWebviewUri — streamed from disk; a base64 data: URI is only a fallback for files outside the grok home. Inline media is capped at 320px with Copy-path / Open-in-VS-Code hover actions pinned to the image; see research/image-generation.md), and a Sign out action (grok logout, command + gear menu, issue #13). A subagent card classifier (isSubagentToolCall/subagentLabel in webview-helpers.js) exists but is research-only / deferred — grok 0.2.x doesn't emit spawn_subagent over ACP (it backgrounds a process and polls get_command_or_subagent_output), so the card rarely fires and isn't advertised as a shipped feature; see research/subagents.md. v1.4.3 also defers the plan-mode primer — it's no longer sent at session start; it rides the user's first real prompt as its own hidden turn, on new AND restored sessions (re-sent on restore rather than trusted from replayed history, which a /compact can drop). v1.4.4 fixes auto-scroll fighting the user (#16): the chat used to snap to the bottom on every streaming chunk, so scrolling up to re-read history while grok was thinking was undone on the next thought chunk. Now a stickToBottom flag (driven by a scroll listener via the pure shouldStickToBottom in webview-helpers.js) follows streaming output only while the user is already pinned to the bottom; interactive activity that must be seen — permission cards, ask-user-question cards, and the user's own sent message — re-pins via forceScrollToBottom() (also addresses #15). Wire shapes were confirmed against grok 0.2.33 (Linux probes); the native Windows build is grok 0.2.x. Smoke-tested end-to-end against grok v0.1.211 on Linux and Windows-via-WSL, and against the native Windows build grok 0.2.3 (irm https://x.ai/cli/install.ps1 | iex) — cli-locator resolves grok.cmd/grok.exe and terminal-manager uses shell:true. The native-Windows smoke test surfaced a handful of webview regressions (history popover that never closed, session rows only clickable on the label, reasoning traces no longer expandable, a cluttered welcome screen), all fixed in earlier builds. Plan mode is enabled and enforced client-side (see research/plan-mode.md § Resolution). Voice input (v1.3.0) adds a composer mic button that records via an ffmpeg child process in the extension host and transcribes through xAI's separate Speech-to-Text API — deliberately outside ACP, because the CLI advertises promptCapabilities.audio:false and webviews can't reach the mic (see research/voice-input.md). Transcription is live/streaming by default (PCM → wss://api.x.ai/v1/stt, partial events accumulated by start; grok.voiceStreaming:false falls back to the batch REST endpoint). Listening is continuous + hands-free: saying "grok send" submits and restarts a fresh stream so the mic keeps listening (each message = one clean utterance), and messages dictated while Grok is responding are queued and flushed on agentEnd. The phrase is sent as a keyterm bias so STT spells it right, and the trailing phrase is highlighted in the composer via a backdrop overlay (pure trailingSendPhrase in webview-helpers.js). This adds the extension's first runtime dep, ws (bundled into the vsix — package/publish no longer pass --no-dependencies, and .vscodeignore un-ignores node_modules/ws). v1.4.5 adds LaTeX / math rendering: grok now answers with TeX (inline $…$, display \[…\], incl. \begin{pmatrix} matrices), which the hand-rolled renderMarkdown previously showed raw. The pure splitMath (in webview-helpers.js) pulls math out before HTML-escaping (so backslashes/braces survive the inline-markdown pass) into \x00D/\x00M placeholders — mirroring the code-block/table extraction; renderMath in chat.js renders each span — originally via vendored KaTeX, replaced by MathJax in v1.4.7 (see below) — with a raw-TeX fallback when the engine isn't loaded (e.g. happy-dom unit tests). Display math is its own block with horizontal scroll; CSP gains font-src ${webview.cspSource} for the KaTeX fonts. Single $…$ is deliberately not a delimiter (prose-currency false positives). The pure stripUnsupportedTex (in webview-helpers.js) drops \label{…} before rendering — grok emits it inside align/equation blocks for cross-referencing, but KaTeX has no \ref/\eqref system so it paints the label as a red error; \label produces no visible output in real LaTeX anyway, so stripping it loses nothing. v1.4.6 adds Mermaid diagram rendering: grok answers with ```mermaid fenced blocks (flowcharts, sequence/state diagrams, git graphs, class diagrams, …), which renderMarkdown previously showed as raw source. The fenced block now becomes a .mermaid-block placeholder carrying the source as a fallback code block; a post-render pass renderMermaidIn (in chat.js) renders it to SVG via vendored Mermaid (media/mermaid/mermaid.min.js, the self-contained 3.3 MB IIFE that sets globalThis.mermaid — all diagram types inlined, zero dynamic import(), no eval/new Function so the nonce CSP needs no unsafe-eval). Unlike KaTeX's synchronous string render, mermaid.render is async and DOM-based (it measures text to lay out nodes), so it can't run inline in renderMarkdown — it post-processes the inserted element. The streaming agent bubble re-runs renderMarkdown (rebuilding the DOM) every animation frame, so two module-level caches keyed by the diagram source keep that cheap + flicker-free: mermaidSvgCache (src → svg) re-applies the SVG synchronously on a cache hit (same frame, no flash), and mermaidInFlight (src) stops the same diagram being laid out dozens of times before the first async render resolves; a failed/ malformed render caches null and leaves the readable source. initMermaid themes it to VS Code dark/light (document.body.classList), securityLevel:"strict" + suppressErrorRendering:true. A half-streamed block stays raw text until its closing ``` arrives (the code-block regex requires it). No CSP change needed (mermaid's inline <style>/style= are covered by the existing style-src 'unsafe-inline'). Limitation: a live theme switch doesn't re-theme already-rendered diagrams (cache holds the old-theme SVG) until the webview reloads. v1.4.7 replaces KaTeX with MathJax for math (media/mathjax/tex-svg-full.js, the self-contained ~2.3 MB IIFE that sets globalThis.MathJax; its only eval/new Function are a globalThis polyfill + a typeof process-guarded Node require, neither reached in the webview, so the nonce CSP is unchanged — and font-src is no longer needed since SVG output embeds glyph paths). renderMath now calls MathJax.tex2svg(tex,{display}) (synchronous once startup resolves; before that, raw-TeX fallback + an upgradeMathInDom pass on MathJax.startup.promise). Config: svg.fontCache:'local' (self-contained per-equation SVG — required for export), startup.typeset:false (we render manually, MathJax never scans the page), enableMenu:false, and enableAssistiveMml:false — critical, because driving MathJax via tex2svg+outerHTML skips the stylesheet it normally injects to hide the assistive-MathML copy, so Chromium rendered that MathML as a visible second copy of every equation (the "nn" / "E=mc²E=mc²" doubling). We also add mjx-container[display="true"]{display:block;text-align:center} ourselves (same missing-stylesheet reason) so display math sits on its own line; inline keeps MathJax's inline vertical-align. v1.4.7 also adds math/diagram export (research/ n/a — webview-only): display math and rendered mermaid become an export host (.math-export / .mermaid-block) carrying the source + a Copy/Download/Open hover overlay (delegated .expr-btn handler, mirrors buildMediaActions). Both render to a self-contained <svg>, so: Copy → clipboard the TeX/mermaid source; Download → host quick-pick of PNG (rasterized via offscreen canvas with the VS Code theme background — WYSIWYG) or a transparent SVG for a dark or light background (math recolors currentColor: #e8e8e8/#1f1f1f; mermaid is re-rendered per theme via a %%{init:{'theme':…}}%% directive — the two .svg variants share an extension, so a save-dialog filter can't pick between them → quick-pick); Open → writes the PNG to context.globalStorageUri/exports/ and opens VS Code's image preview. The webview pre-renders all variants and posts them; sidebar.ts exportExpr does the quick-pick + save. The video-gen live test is now opt-in (--only=video-gen), excluded from the default gate — grok 0.2.x spins on /imagine-video in the bare harness and never completes (works interactively).

Module map

File	Role
`src/extension.ts`	Entry point — registers commands, keybindings, output channel
`src/sidebar.ts`	Webview provider, message routing, fs handlers, diff editor preview, `logout`, generated-media inlining (`postGeneratedMedia`)
`src/acp.ts`	ACP client — spawns CLI, manages session lifecycle, emits events (incl. `mediaContent` from `emitToolMedia`)
`src/acp-dispatch.ts`	Pure protocol helpers — line parsing, update routing, response builders, generated-media extraction (`isMediaGenToolCall`/`extractGeneratedMediaPaths`)
`src/session.ts`	Per-session state bag (`Session`) — one per live `grok agent stdio` process; the sidebar holds a pool of these + one `focused`
`src/session-pool.ts`	Pure reaping policy (`selectReapable`) — idle-TTL + LRU cap over the live-session pool (never the focused/working/needs-you)
`src/cli-locator.ts`	Locate `grok` binary (configured path → `~/.grok/bin/grok` → PATH); cross-platform
`src/terminal-manager.ts`	Headless shell children for the agent's `terminal/*` ACP calls; cross-platform via `shell:true`
`src/chips.ts`	File-chip CRUD (pure)
`src/prompt-builder.ts`	Chip → prompt-string with `@path` refs and fenced code blocks
`src/slash-filter.ts`	Slash-command autocomplete filter
`src/plan-gate.ts`	Plan-mode policy (pure) — workspace-write containment, read-only command allowlist, permission/plan-file classification
`src/plan-restore.ts`	Plan persist + restore decision (pure) — appendPlanEntry + decideRestoreState
`src/grok-primer.ts`	Hidden plan-mode primer text + version/marker constants + pure `isPrimerText()` (detects the primer when grok replays it on restore, so it's hidden + not counted toward plan positions)
`src/sessions.ts`	Disk-driven session listing/delete + customName overrides (pure)
`src/file-ref.ts`	Open-file `path#L<n>` ref parsing + large-file inline-read guard (pure)
`src/plan-review.ts`	Plan-snapshot Markdown filename generation for the "open plan as editor tab" action (pure)
`src/voice.ts`	Voice-input pure helpers — STT request/response/error, per-platform ffmpeg args, DirectShow device parsing, API-key resolution
`src/voice-recorder.ts`	Batch capture: `VoiceRecorder` (spawns `ffmpeg` → WAV, graceful `q`-stop) + `transcribeAudio` (POST to `api.x.ai/v1/stt`) + `resolveWindowsAudioDevice`
`src/voice-streamer.ts`	Live capture: `VoiceStreamer` (ffmpeg PCM → `ws` → `wss://api.x.ai/v1/stt`, emits partial/final transcript events)
`media/chat.{js,css}`	Webview UI
`media/webview-helpers.js`	Pure webview helpers (file-ref detection, relative-time format, mic-button state machine, trailing send-phrase highlight, subagent classifier `isSubagentToolCall`/`subagentLabel`); shared between webview and tests
`scripts/install.{ps1,sh}`	Auto-detect VS Code CLI, build .vsix, install
`scripts/uninstall.{ps1,sh}`	Uninstall `PawelHuryn.grok-vscode-phuryn`

Pure modules (acp-dispatch, chips, prompt-builder, slash-filter, cli-locator, sessions, plan-gate, plan-restore, grok-primer, file-ref, plan-review, voice, session-pool, webview-helpers) were split out specifically so protocol behavior can be unit-tested without spawning processes. (session.ts is a plain state bag — no vscode/spawn/network either, but it's data, not logic.)

Session pool (Agent Dashboard, v1.4.8)

The sidebar runs a pool of live sessions (one grok agent stdio process each) with one focused (the one rendered). Each session's state is a Session (src/session.ts); the sidebar holds focused + a Set<Session> pool. Re-focus is lossless: every session buffers the webview posts that built its chat (Session.buffer), so focusSession is clearMessages + replay of that buffer — no grok reload, no kill, even mid-turn/mid-approval. Clicking a cold session (no live process — reaped or pre-window) loads it from disk into a fresh member (openSession); a live one re-focuses instantly. Two guards keep it safe: a per-session gen (bumped only when that session's client is torn down, captured by handlers when wired) so a backgrounded session's in-flight events aren't judged stale when focus moves (replaces the old global sessionGen); and session-scoped emit(session,…) (buffers + forwards only when focused) vs UI-wide post(…) (status dots, sessions list). Status dots. Every history row shows a dot whose color is a derived value (the pure computeDot in src/session-pool.ts), not the raw status: gray at rest (idle / read / cold / loaded-from-disk all collapse to one gray — the user never learns warm-vs-cold), blue working, yellow needs-you, green finished-and-unread, red errored-and-unread. The green/red is an unread badge, not a live state: a turn that ends while the session is not focused sets a persisted unread/unreadError flag in SessionMetaOverride (globalState); focusing/opening the session clears it (markRead). So the badge survives reaping AND reload (it's meta-backed, decoupled from the live process) — that's the payoff: come back and the green dots are exactly the sessions with results you haven't seen. No timer/decay — a never-opened session stays green (it is still unread). computeDot({liveStatus, unread, unreadError}) precedence: working → needs-you → unread(error?) → none. Dots are pushed one-at-a-time via pushDot→sessionDot (cheap, no disk read) plus a full dots map (dotForId over disk entries + pool) on each list refresh; palette in CSS — Working=--vscode-charts-blue, Needs-you=--vscode-charts-yellow, unread=--vscode-charts-green, error=--vscode-errorForeground, none/default=--vscode-descriptionForeground. Reaping is the pure selectReapable (src/session-pool.ts): an idle TTL (untouched ≥1h, swept every 5 min) + an LRU cap (~8 live; least-recently-used eligible evicted past it), but it never reaps the focused or a working/needs-you session (so the cap can be exceeded when everything spare is busy, by design). Reaping is silent — kill the process, recompute the dot (an unread reaped session stays green; a read one goes gray); re-click reloads from disk. Recency (Session.lastActiveAt) is stamped on create/focus/made-busy via touch. CLI-update guard: the explicit Update Grok Build CLI action (updateGrokCliOnDemand) confirms before tearing the pool down if any session is working/needs-you; the silent startup auto-update has nothing in flight by construction, so it doesn't ask. Per-workspace scope. The impure voice-recorder/voice-streamer (ffmpeg spawn + STT fetch/WebSocket) are smoke-tested manually via the research/voice-*.cjs probes (voice-stt-probe, voice-e2e-verify, voice-stream-probe, voice-stream-verify, voice-cost-probe).

History pagination

The history popover loads sessions one page at a time (SESSION_PAGE_SIZE = 100, newest-first) so it stays responsive with thousands of sessions on disk. The cost of the old behavior (read + JSON.parse every summary.json on every open, then render every row) grew linearly and stalled the popover at scale.

Two pure primitives replace the monolithic full-scan listSessions (which is kept only for callers that genuinely need the whole list):

indexSessions({fs, grokHome, cwd}) — one stat per session dir, no reads/parses, returning every id ordered newest-first by summary.json mtime. mtime is the cheap last-activity proxy: grok rewrites that file (it holds updated_at) on every turn. We sort by mtime here because the id is a UUIDv7 whose timestamp is creation, not last activity — id-sort would be wrong.
readSessionEntries({fs, grokHome, cwd, ids, overrides}) — reads + parses summary.json for exactly the given ids (the visible page) and applies customName overrides. The exact updated_at order is re-applied within the loaded page (mtime is only an approximate sort key).

Host orchestration (postSessionsList in sidebar.ts): indexSessions orders everything cheaply; a mtimeById map drives an mtime-keyed read cache (sessionCache, readEntriesCached) so a re-open / load-more / search reads only entries whose summary.json actually changed — steady-state opens cost ~zero reads. It posts {type:"sessions", entries, activeId, dots, offset, total, hasMore, query}. Search is server-side and complete: a non-empty query warms the whole catalog once (cache-backed) and filters by display name across all sessions, not just the loaded page. The cache is invalidated on rename (changes displayName without touching mtime), delete, and clearAll. Live-session synthesis (open-too-early fix): a brand-new session has no summary.json yet, so the disk-scan index can't see it — opening history the instant a session goes live would drop the active row entirely (and the previous top session would masquerade as the whole list) until grok flushes the file. So on the first, unfiltered page postSessionsList injects a synthesized liveSessionEntry (pinned newest-first) for any live pool member whose activeSessionId isn't on disk, naming it from the best in-memory source (generated/renamed customName → first user message → "New session"); the next refresh replaces it with grok's own summary once the file lands. hasMore is computed from the on-disk page before injection (load-more pages are disk-only), and these ids are never on disk so they can't duplicate onto a later page. Mirrors the existing dot-patch loop that already covered not-yet-persisted pool members.

Webview (media/chat.js): offset === 0 replaces the list, offset > 0 appends (de-duped by id); a scroll-near-bottom listener requests the next page (hasMore/sessionLoading guards keep it to one request per boundary); the search box posts a debounced (180ms) listSessions with the query; a host-driven refresh that arrives unfiltered while a search is active triggers a sticky-search re-request so the filtered view isn't clobbered. The "Clear all" footer shows when any non-active session exists (loaded or on a later page, via total). Popover positioning: positionDropdownPopover right-aligns the popover to the panel edge (right: 6px/left: auto) and grows it leftward, rather than left-anchoring with a one-shot width clamp. The clamp measured the popover width in a single requestAnimationFrame before the async session rows streamed in and widened it from min-width toward max-width: 360px, so the popover spilled off the right edge on the first open and only looked right on reopen (when state.sessions was already populated). Right-aligning is width-independent — no measurement, no reflow jump. It also caps min/max width to the panel (maxWidth/minWidth inline, overriding the CSS min-width: 280px/max-width: 360px) — min(360|280, parentWidth − 2·EDGE) — so in a narrow panel a long session name ellipsizes (the row already has text-overflow: ellipsis + min-width: 0) instead of the popover overflowing the left edge. Common-case sizing, not extreme (a panel narrower than the margins still clips). Resize/hide: a window resize listener re-runs positionDropdownPopover while the popover is open, so dragging the panel border re-fits it live (no close+reopen); and a document visibilitychange listener closePopovers() when the view is hidden (switching panel tab / extension), since a resize that happens while the webview is hidden fires no resize event — so the dropdown closes rather than reappearing stale on refocus (reopen re-measures).

Perf simulation (npm run test:perf, test/sessions.perf.ts via vitest.perf.config.ts) — opt-in, out of npm test/CI. At N=5000 it asserts the op counts: first open drops file reads from 5000 → 100 (~98%), steady-state re-open is 0 reads (mtime cache), search warms once then 0 reads, with a modeled-latency projection (~3250ms → ~310ms) and a real in-memory parse-cost wall-clock (~2.7x, disk-latency-free lower bound).

Build + test

npm install
npm test         # 503 tests, ~1.5s, vitest — all grok-free (incl. happy-dom DOM tests + fake-CLI ACP integration tests)
npm run test:perf # opt-in session-history perf simulation (NOT in npm test/CI; see § History pagination)
npm run package  # → grok-vscode-phuryn-1.4.10.vsix (clears older *.vsix first)

Test taxonomy — three layers

There are three kinds of tests, and it matters which is which:

npm test — grok-free unit/DOM/integration suite (492 tests). Pure logic, happy-dom tests that drive the real media/chat.js, a real-/bin/sh TerminalManager smoke, and a fake-CLI ACP integration suite (test/fixtures/fake-grok-acp.cjs). Never spawns the real grok binary. Runs in <2s with no network, no login, no subscription. This is the floor — every change keeps it green. (A separate opt-in npm run test:perf runs the session-history perf simulation — test/*.perf.ts via vitest.perf.config.ts, matched only by that config so it stays out of npm test/CI; see § History pagination.)
CI — the same suite. .github/workflows/ci.yml runs npm ci && npm test && npm run package on a clean Ubuntu box. CI ≡ layer 1, verbatim — there is no separate CI-only set. CI has no grok binary, no auth, no SuperGrok subscription, so it cannot run anything that touches the real CLI. That's the whole reason layer 1 is grok-free.
npm run test:live — on-demand pre-release suite against REAL grok (scripts/live-tests.cjs). Spawns the actual grok agent stdio and exercises the surfaces layers 1–2 can't: the real ACP handshake, a prompt round-trip, session restore, plan-mode enforcement, and the v1.4.x generative features (image gen, video gen; the subagent path is exercised opportunistically and SKIPs when grok doesn't delegate — it's deferred/research-only). It reuses the real compiled modules (out/acp-dispatch.js, out/plan-gate.js, media/webview-helpers.js) — it feeds genuine wire output through the same isMediaGenToolCall/extractGeneratedMediaPaths/isSubagentToolCall/shouldBlockWrite the extension uses, not a re-implementation. Always run it before every release-to-main — it's a non-negotiable, standing part of the release gate; run it without asking (it needs a logged-in grok + subscription and burns credits, so it must never be in npm test or CI, but it is mandatory before any tag/release). Flags: --quick (skip the slow generative tests), --only=, --skip=, GROK_BIN=…. A SKIP (no subscription, grok chose not to delegate, etc.) does not fail the gate — only a FAIL does. Real-grok diagnostic probes (research/*.cjs) remain manual one-offs for capturing wire shapes; the live suite is the repeatable gate.

So: local == CI (both grok-free). The real-grok tests are a separate, mandatory pre-release gate — always run before a tag/release (no need to ask), never on every commit.

grok CLI version + updating

The native-Windows build (irm https://x.ai/cli/install.ps1 | iex) is grok 0.2.3 on the stable channel; the Linux probes in the docs were against 0.2.33 (a different release line — note the gap when reconciling wire shapes). grok does not auto-update. Updating is the explicit grok update command: grok update --check [--json] checks without installing, grok update installs the latest on the current channel, --stable (default, weekly) / --alpha switch channels, --version <X.Y.Z> pins a specific build. Re-run npm run test:live after any CLI update — the wire format is the thing that drifts.

Install

macOS / Linux / WSL Ubuntu: ./scripts/install.sh
Windows (native): pwsh scripts\install.ps1 — runs the native Windows grok CLI directly
WSL Ubuntu (alternative): Remote-WSL → install in the WSL-side VS Code server

See README.md § Install for the full per-platform matrix.

ACP surfaces implemented

initialize → session/new / session/load → session/set_model → session/prompt lifecycle
Model switching is agent-aware. Models belong to agent types — grok-build/grok-build-plan vs the cursor agent that owns the Composer models. The CLI binds the agent at spawn and locks it after the first turn (incl. our primer), so a live session/set_model only works within the same agent; a cross-agent switch errors MODEL_SWITCH_INCOMPATIBLE_AGENT ("Start a new session"). switchModel in sidebar.ts tries the live switch, and on that error (detected by the pure isIncompatibleAgentError in acp-dispatch.ts) persists the pick to grok.defaultModel and restarts — newSession re-applies the model before the primer runs, while the agent is still rebindable. No history → transparent restart; with history → the same Summarize/Just-Restart prompt as an effort change (shared pickRestartMode/restartSession helpers). Empty-session cleanup: a restart on a primer-only session (no real conversation — the common case when you flip models/effort a few times right after opening) gets the no-prompt transparent path and discardRestartedEmptySession deletes the abandoned grok session dir afterward, so repeated switches don't pile up identical primer-only sessions in history; the pure carrySessionName (in sessions.ts) moves any user rename (customName) from the discarded id onto the fresh one so the chosen name survives. The same cleanup runs on the effort-change empty-session branch (guarded so a dead client on a session with history keeps its history). The toolbar model label resolves IDs to user-facing names via the pure modelDisplayName helper.
Streaming agent_message_chunk + agent_thought_chunk
Sessions: list/resume via session/load (grok stores them at ~/.grok/sessions/<urlencoded-cwd>/<id>/); rename/delete metadata in context.globalState["grok.sessionMeta"]. We never edit grok's own session files. Per-session delete is the trash button on each non-active history row (deleteSession → deleteSessionDir); the active session has no delete (the live CLI re-persists it, so it wouldn't stick). Clear all (#19) is a footer button in the history popover (shown only when ≥1 non-active session exists) → clearAllSessions in sidebar.ts: a modal-confirmed sweep of every session dir for the current workspace except the focused one, via the pure clearSessions({fs, grokHome, cwd, exceptId}) in sessions.ts (best-effort — a locked dir is skipped, not thrown), then purges the removed ids' grok.sessionMeta overrides and tears down any backgrounded live pool members it deleted. Scale (history pagination): the popover loads one page at a time (newest-first by last activity) so it stays fast with thousands of sessions — see § History pagination. Clear-all is the relief valve for an overgrown store; the pagination is the steady-state fix.
Handlers (mandatory or the agent crashes): fs/read_text_file, fs/write_text_file, terminal/{create,output,wait_for_exit,kill,release}
session/request_permission → chat card with allow-always / allow-once / reject-once, diff editor preview for kind:"edit"
x.ai/ask_user_question → inline question card (the tool was fully broken before — #12). The catch-all ACK'd unknown server requests with {}, which grok's deserializer rejects with "missing field outcome". Now handled: the card renders each question's options (single question + single-select resolves on one click; otherwise pick-then-Submit; Skip → cancel) and replies { outcome: "accepted", answers, annotations } (or cancelled). answers is keyed by question text → chosen label. On answer the card collapses to the question + a green ✓ <choice> (so it's clear grok received it). On session resume the question replays as a tool_call (questions in rawInput) + completed tool_call_update (answer text); chat.js suppresses the generic tool chip for ask_user_question and rebuilds a read-only "You answered" card from that replay — no separate persistence. The full binary-derived wire format is in research/ask-user-question.md. Response builders are pure (makeQuestionResponse in acp-dispatch.ts); the answer map is built by the pure buildQuestionAnswers in webview-helpers.js.
session/set_mode wired; the picker exposes Agent, Plan, and YOLO. The CLI's non-plan mode id is "default" (not "agent"), captured as ACT_MODE_ID in sidebar.ts.
Plan mode is enforced client-side (mirror of YOLO). The CLI's x.ai/exit_plan_mode still treats any client response — result or error — as approval (re-verified broken in 0.2.3), so we don't rely on it. Instead src/plan-gate.ts gates the two mandatory server→client choke points: fs/write_text_file (block writes resolving inside the workspace cwd) and terminal/create (block anything not on the read-only allowlist). grok's own ~/.grok/sessions/<…>/plan.md write lands outside the workspace and is allowed (and snooped to recover the plan text — exit_plan_mode arrives with planContent: null). Approve → drop the gate + send an "implement it now" follow-up prompt; Keep planning → gate stays up. Entering plan mode any way (incl. agent-initiated current_mode_update: plan) raises the gate; it's lowered only by explicit user action, never auto-lowered by CLI mode flapping. The verdict protocol is taught by a hidden primer (src/grok-primer.ts) — it tells grok to ignore the bogus exit_plan_mode result and read [Plan approved]/[Plan rejected]/[Plan cancelled] (optionally + a comment) from the follow-up message. The primer fires eagerly + non-blocking as its own hidden turn the moment a session goes live — on new AND restored sessions, and after /compact — not in front of the user's first prompt and not at composer-disabling cost (ensurePrimed in sidebar.ts, kicked off from startSession's success path). It returns a reused session.primingPromise (src/session.ts) so a racing first send awaits the same in-flight primer (grok runs one turn at a time) and is released the instant the silent primer acks; the user can type and send straight away — their bubble shows immediately (a "Grokking…" indicator stands in until grok produces content) while the real prompt is held behind the primer in code. On restore the primer is re-sent on first go-live rather than trusted from replayed history (a /compact can drop it). The primer's silent turn is hidden by a session-level suppressContent flag checked in post/emit against SUPPRESS_TYPES (which deliberately excludes userMessage/agentStart so a user send that races the background primer still paints its own bubble + Grokking indicator). Primer v4 (PRIMER_VERSION/PRIMER_MARKER) is deliberately minimal: grok-build is agentic, and the old v3 primer's "## Grok Build VS Code extension … open source repo, issues: " paragraph + "Acknowledge briefly" line were tempting grok into a 15–40s pre-turn exploration of the workspace (one capture: 17.7s reading files + searching 114 files) before the user's message even ran — the plan-protocol text itself never did. v4 keeps only the protocol, drops the product paragraph + URL, and adds an explicit do not use any tools / do not read files / do not search the workspace / do not take any action constraint ending in "Reply with exactly: ok". The pure isPrimerText() (version-agnostic PRIMER_PATTERN) detects the primer when grok replays it so the bubble is hidden + not counted toward plan positions. For the full pedagogical course with diagrams and hands-on guidance, see research/understanding-plan-mode.md.
grok.defaultEffort → forwarded as --reasoning-effort <value> before the stdio subcommand (it's an agent-level flag; after stdio the CLI errors "unexpected argument"). Offered values mirror grok's accepted set (none|minimal|low|medium|high|xhigh); the bogus max we used to expose made grok exit code 2 (#3/#4). Args are built by the pure buildGrokAgentArgs(); changing effort restarts the session (setEffort in sidebar.ts).
grok.chatFontScale (#14, webview-only — not ACP) → chat-panel zoom as a percent (default 100, clamped 60–300). The host bakes the initial value into <body style="zoom:…"> in getHtml (no flash) and postFontScale re-posts a fontScale message on onDidChangeConfiguration for a live, reload-free update (chat.js sets document.body.style.zoom). zoom scales the whole chat — text, icons, spacing — but not the rest of VS Code (the point vs. global Ctrl/Cmd+Shift+=). Supports User/Workspace scope for free. chatFontScale() in sidebar.ts reads + clamps.
available_commands_update → slash autocomplete
current_mode_update → bottom-toolbar mode button (the top bar was removed in 0.9.0)
_meta.totalTokens → context donut
Generated media (v1.4.x). /imagine (image_gen, or image_edit for reference-photo edits) and /imagine-video (video_gen; older/Linux builds image_to_video) are subscription-only and do not return ACP image blocks — grok writes the file into its session dir (images/*.jpg, videos/*.mp4) and reports the path as a JSON string in the completed tool result's text content. The pure isMediaGenToolCall/extractGeneratedMediaPaths (in acp-dispatch.ts) detect the tool and parse the path (image-vs-video by extension); acp.ts tracks the tool-call id (the completed update has a null title) and emits mediaContent; sidebar.ts postGeneratedMedia serves the file via webview.asWebviewUri (streamed from disk under the grok-home localResourceRoot — what made multi-MB /imagine-video clips render; a base64 data: URI is only the fallback for files outside the served roots). CSP grants img-src/media-src ${webview.cspSource} data:. Inline media is capped at 320px with Copy-path / Open-in-VS-Code hover actions pinned to the image. On resume grok replays it as a single collapsed tool_call carrying title + path together, so the same path fires. Wire format + probes in research/image-generation.md (research/imagine-probe.cjs, research/video-probe.cjs).
Subagent card (deferred / research-only). A pure classifier (isSubagentToolCall/subagentLabel in webview-helpers.js) would give a delegation a distinct Subagent: <type> card, but grok 0.2.x does not expose subagents as a spawn_subagent ACP tool — it backgrounds a process and polls it via get_command_or_subagent_output (which the classifier explicitly excludes), so the card rarely fires. Not advertised as a shipped feature (dropped from the README in 1.4.3); see research/subagents.md.
Logout (v1.4.0, #13). grok.logout command + gear-menu Sign out → sidebar.logout() runs grok logout, disposes the session, shows the auth-required onboarding.

Known limits

Subagent delegation cards are deferred — grok 0.2.x doesn't emit spawn_subagent over ACP (it backgrounds a process + polls), so the classifier rarely fires; even when it does, child tool calls aren't nested (no inspector)
Generated media is served via asWebviewUri (streamed from disk) when it lives under the grok-home localResourceRoot; files outside that fall back to a base64 data: URI
No worktree UI
Diff editor is preview-only; the write happens via fs/write_text_file after approval
View defaults to left activity bar; user must drag to secondary side bar manually if desired

Cross-platform notes

terminal-manager.ts uses spawn(cmd, { shell: true }) so Node picks cmd.exe on Windows, /bin/sh elsewhere. Don't hardcode shell paths.
cli-locator.ts reads HOME / USERPROFILE env vars first (testability), falls back to os.homedir(). Uses where on Windows, command -v elsewhere. Checks .cmd/.exe/.bat extensions on Windows.
Tests use node -e "..." everywhere, so commands are deterministic across platforms — don't add pwd, awk, sleep, true, etc.

What's next (priority order)

@vscode/test-electron integration suite (scoped in TESTS.md § v0.2)
Status-bar indicator (current model + effort + token usage)
Subagent support — grok 0.2.x doesn't expose spawn_subagent over ACP, so the existing classifier is dormant; revisit if/when the CLI surfaces delegations as a tool call (then add the nested inspector that groups child calls under the card)
Worktree UI (Grok: New Worktree Session)
Optional: auto-move view to secondary side bar on first activation (workbench.action.moveView)

Publishing

Release procedure — ALWAYS tag + create a GitHub Release (with the .vsix attached) on a release push to main (standing convention; mirrors the v1.0.0… tag history + GitHub Releases):

The whole procedure below (steps 2–5) is scripted — after bumping the version + writing the changelog section (step 1, user-initiated), just run:

pwsh scripts\release.ps1        # Windows (native) — what we use here
./scripts/release.sh            # macOS / Linux / WSL

It reads the version from package.json, runs the gate, builds the vsix, commits the working tree (-MessageFile/-Message override the default Release vX.Y.Z), pushes main, creates the annotated tag, and runs gh release create with the vsix attached — extracting the matching ## X.Y.Z changelog section as the release notes. -DryRun/--dry-run previews; -NoTest/--no-test skips the gate. It refuses to run off main or when the tag already exists (i.e. the version wasn't bumped). The script itself doesn't call vsce/ovsx — but the GitHub Release it creates triggers the publish workflow (next paragraph), so it does kick off Marketplace + Open VSX publish.

What the script encodes, step by step:

Bump version in package.json (user-initiated) and add the dated section to CHANGELOG.md.
npm test (503-test floor, all green) + tsc -p . --noEmit clean, and npm run test:live against real grok — mandatory, run without asking (the release.* scripts don't run it, so run it by hand before invoking them).
Commit + push to main (direct-to-main, no feature branches).
Annotated git tag vX.Y.Z at the release commit → git tag -a vX.Y.Z -m "Release vX.Y.Z" → git push origin vX.Y.Z.
GitHub Release for that tag → gh release create vX.Y.Z --title "Release vX.Y.Z" --notes-file <notes> <vsix> (notes = the new changelog section(s); include any earlier version that was bumped but never released). Always attach the built grok-vscode-phuryn-X.Y.Z.vsix as a release asset so the exact installable build is downloadable from the release.
Marketplace publish is separate and explicit — only npm run publish (vsce) when the user asks. The PawelHuryn publisher is registered + authenticated locally; publishing ≠ tagging.

Don't skip the tag/release (or the vsix asset) on a release push. (A pure mid-dev version bump that isn't a release — e.g. the unreleased v1.3.0 voice iteration — is the only exception.)

Repo conventions

Direct-to-main, no feature branches
Commits explain the why, not the what
Don't introduce abstractions speculatively
Don't add comments that explain what well-named code already says
503 tests is the floor — every PR should keep that green. All tests are grok-free (no binary spawn); grok-dependent probes live in research/*.cjs and are run manually, never by npm test or CI
Rebuilding clears older .vsix first — npm run package (and the install/release scripts) delete stale grok-vscode-phuryn-*.vsix before building, so only the current version is on disk. After any doc or code change, rebuild + reinstall so the installed extension's bundled docs are current. Package last: the vsix bundles CLAUDE.md/README.md/docs/ as files, so finish all doc + code edits before the final npm run package + reinstall — otherwise the installed build ships a stale-docs snapshot. If you rebuild mid-task and then touch docs again, rebuild again so the order is always edit-everything → package → reinstall.
Version bumps are user-initiated. Iterate at the current version (rebuild the same vsix and reinstall locally) until the user says to bump and publish. Don't bump package.json on your own.
Sign GitHub comments. Every GitHub issue/PR comment posted on the user's behalf ends with a signature on its own final line (italic). Pick by whether Paweł actually reviewed the text before it was posted: if he reviewed it, use _Written with an agent, reviewed by Paweł_; if the agent posted it without his review, use _Written by Pawel's agent_. Only claim review when it actually happened.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md — grok-build-vscode

Status

Module map

Session pool (Agent Dashboard, v1.4.8)

History pagination

Build + test

Test taxonomy — three layers

grok CLI version + updating

Install

ACP surfaces implemented

Known limits

Cross-platform notes

What's next (priority order)

Publishing

Repo conventions

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md — grok-build-vscode

Status

Module map

Session pool (Agent Dashboard, v1.4.8)

History pagination

Build + test

Test taxonomy — three layers

grok CLI version + updating

Install

ACP surfaces implemented

Known limits

Cross-platform notes

What's next (priority order)

Publishing

Repo conventions