VS Code sidebar extension for xAI's Grok Build CLI, driven by grok agent stdio over the Agent Client Protocol (ACP). Thin client — all session state, MCP servers, subagents, memory, and plan-mode bookkeeping live in the CLI.
v1.4.10 (releasing now; v1.4.8 is the latest GitHub Release — the Marketplace-published build lags behind). 503 tests passing, all grok-free (CI never spawns the binary; grok-dependent probes live separately in research/*.cjs). v1.4.10 makes the session-history popover scale to thousands of sessions — it loads the most recent 100 (newest-first by last activity), pulls in older pages as you scroll, and the search box filters by name across the whole history (server-side, complete — not just the loaded page); under the hood a summary.json-mtime index orders everything with one cheap stat per dir (no reads), windowed reads parse only the visible page, and an mtime-keyed read cache makes steady-state re-opens cost ~zero reads. An opt-in perf simulation (npm run test:perf, out of npm test/CI) proves the op-count drop (first open 5000→100 reads, ~98%). See § History pagination. v1.4.9 adds an adjustable chat font size (grok.chatFontScale, issue #14) — a chat-only zoom (text + icons + spacing together) that leaves the rest of VS Code at its normal size (unlike global Ctrl/Cmd+Shift+=), applies live with no reload, supports User/Workspace scope, and keeps the composer pinned to the bottom at any scale (the body derives both the CSS zoom and a height: calc(100vh / var(--chat-zoom)) compensation from one --chat-zoom variable, since vh ignores zoom); see the grok.chatFontScale config bullet below. v1.4.8 ships the multi-session Agent Dashboard — a pool of live sessions with lossless instant re-focus, status dots in the history dropdown, and silent idle/LRU reaping (see § Session pool) — plus a primer-latency fix: the hidden plan-mode primer now fires eagerly + non-blocking the moment a session goes live (new + restore + post-/compact) instead of in front of the user's first prompt, the primer text was trimmed to v4 (the product-blurb paragraph + repo URL that tempted agentic grok into a 15–40s pre-turn workspace exploration are gone, replaced by an explicit do-not-act / reply with just ok constraint), and a "Grokking…" waiting indicator (Thinking-block font + animated dots, no chevron) shows on every user turn the instant you send and is replaced in place by the first thought/message/tool card (see § Primer + waiting indicator). The v1.4.x line added image + video generation rendered inline (/imagine → image_gen, or image_edit for reference-photo edits; /imagine-video → video_gen, older/Linux builds image_to_video; grok writes the file to the session dir and reports its path as JSON-in-text on the completed tool result — not an ACP image block — so the host parses the path, classifies image-vs-video by extension, and serves it to the webview via webview.asWebviewUri — streamed from disk; a base64 data: URI is only a fallback for files outside the grok home. Inline media is capped at 320px with Copy-path / Open-in-VS-Code hover actions pinned to the image; see research/image-generation.md), and a Sign out action (grok logout, command + gear menu, issue #13). A subagent card classifier (isSubagentToolCall/subagentLabel in webview-helpers.js) exists but is research-only / deferred — grok 0.2.x doesn't emit spawn_subagent over ACP (it backgrounds a process and polls get_command_or_subagent_output), so the card rarely fires and isn't advertised as a shipped feature; see research/subagents.md. v1.4.3 also defers the plan-mode primer — it's no longer sent at session start; it rides the user's first real prompt as its own hidden turn, on new AND restored sessions (re-sent on restore rather than trusted from replayed history, which a /compact can drop). v1.4.4 fixes auto-scroll fighting the user (#16): the chat used to snap to the bottom on every streaming chunk, so scrolling up to re-read history while grok was thinking was undone on the next thought chunk. Now a stickToBottom flag (driven by a scroll listener via the pure shouldStickToBottom in webview-helpers.js) follows streaming output only while the user is already pinned to the bottom; interactive activity that must be seen — permission cards, ask-user-question cards, and the user's own sent message — re-pins via forceScrollToBottom() (also addresses #15). Wire shapes were confirmed against grok 0.2.33 (Linux probes); the native Windows build is grok 0.2.x. Smoke-tested end-to-end against grok v0.1.211 on Linux and Windows-via-WSL, and against the native Windows build grok 0.2.3 (irm https://x.ai/cli/install.ps1 | iex) — cli-locator resolves grok.cmd/grok.exe and terminal-manager uses shell:true. The native-Windows smoke test surfaced a handful of webview regressions (history popover that never closed, session rows only clickable on the label, reasoning traces no longer expandable, a cluttered welcome screen), all fixed in earlier builds. Plan mode is enabled and enforced client-side (see research/plan-mode.md § Resolution). Voice input (v1.3.0) adds a composer mic button that records via an ffmpeg child process in the extension host and transcribes through xAI's separate Speech-to-Text API — deliberately outside ACP, because the CLI advertises promptCapabilities.audio:false and webviews can't reach the mic (see research/voice-input.md). Transcription is live/streaming by default (PCM → wss://api.x.ai/v1/stt, partial events accumulated by start; grok.voiceStreaming:false falls back to the batch REST endpoint). Listening is continuous + hands-free: saying "grok send" submits and restarts a fresh stream so the mic keeps listening (each message = one clean utterance), and messages dictated while Grok is responding are queued and flushed on agentEnd. The phrase is sent as a keyterm bias so STT spells it right, and the trailing phrase is highlighted in the composer via a backdrop overlay (pure trailingSendPhrase in webview-helpers.js). This adds the extension's first runtime dep, ws (bundled into the vsix — package/publish no longer pass --no-dependencies, and .vscodeignore un-ignores node_modules/ws). v1.4.5 adds LaTeX / math rendering: grok now answers with TeX (inline \(…\), display \[…\], incl. \begin{pmatrix} matrices), which the hand-rolled renderMarkdown previously showed raw. The pure splitMath (in webview-helpers.js) pulls math out before HTML-escaping (so backslashes/braces survive the inline-markdown pass) into \x00D/\x00M placeholders — mirroring the code-block/table extraction; renderMath in chat.js renders each span — originally via vendored KaTeX, replaced by MathJax in v1.4.7 (see below) — with a raw-TeX fallback when the engine isn't loaded (e.g. happy-dom unit tests). Display math is its own block with horizontal scroll; CSP gains font-src ${webview.cspSource} for the KaTeX fonts. Single $…$ is deliberately not a delimiter (prose-currency false positives). The pure stripUnsupportedTex (in webview-helpers.js) drops \label{…} before rendering — grok emits it inside align/equation blocks for cross-referencing, but KaTeX has no \ref/\eqref system so it paints the label as a red error; \label produces no visible output in real LaTeX anyway, so stripping it loses nothing. v1.4.6 adds Mermaid diagram rendering: grok answers with ```mermaid fenced blocks (flowcharts, sequence/state diagrams, git graphs, class diagrams, …), which renderMarkdown previously showed as raw source. The fenced block now becomes a .mermaid-block placeholder carrying the source as a fallback code block; a post-render pass renderMermaidIn (in chat.js) renders it to SVG via vendored Mermaid (media/mermaid/mermaid.min.js, the self-contained 3.3 MB IIFE that sets globalThis.mermaid — all diagram types inlined, zero dynamic import(), no eval/new Function so the nonce CSP needs no unsafe-eval). Unlike KaTeX's synchronous string render, mermaid.render is async and DOM-based (it measures text to lay out nodes), so it can't run inline in renderMarkdown — it post-processes the inserted element. The streaming agent bubble re-runs renderMarkdown (rebuilding the DOM) every animation frame, so two module-level caches keyed by the diagram source keep that cheap + flicker-free: mermaidSvgCache (src → svg) re-applies the SVG synchronously on a cache hit (same frame, no flash), and mermaidInFlight (src) stops the same diagram being laid out dozens of times before the first async render resolves; a failed/ malformed render caches null and leaves the readable source. initMermaid themes it to VS Code dark/light (document.body.classList), securityLevel:"strict" + suppressErrorRendering:true. A half-streamed block stays raw text until its closing ``` arrives (the code-block regex requires it). No CSP change needed (mermaid's inline <style>/style= are covered by the existing style-src 'unsafe-inline'). Limitation: a live theme switch doesn't re-theme already-rendered diagrams (cache holds the old-theme SVG) until the webview reloads. v1.4.7 replaces KaTeX with MathJax for math (media/mathjax/tex-svg-full.js, the self-contained ~2.3 MB IIFE that sets globalThis.MathJax; its only eval/new Function are a globalThis polyfill + a typeof process-guarded Node require, neither reached in the webview, so the nonce CSP is unchanged — and font-src is no longer needed since SVG output embeds glyph paths). renderMath now calls MathJax.tex2svg(tex,{display}) (synchronous once startup resolves; before that, raw-TeX fallback + an upgradeMathInDom pass on MathJax.startup.promise). Config: svg.fontCache:'local' (self-contained per-equation SVG — required for export), startup.typeset:false (we render manually, MathJax never scans the page), enableMenu:false, and enableAssistiveMml:false — critical, because driving MathJax via tex2svg+outerHTML skips the stylesheet it normally injects to hide the assistive-MathML copy, so Chromium rendered that MathML as a visible second copy of every equation (the "nn" / "E=mc²E=mc²" doubling). We also add mjx-container[display="true"]{display:block;text-align:center} ourselves (same missing-stylesheet reason) so display math sits on its own line; inline keeps MathJax's inline vertical-align. v1.4.7 also adds math/diagram export (research/ n/a — webview-only): display math and rendered mermaid become an export host (.math-export / .mermaid-block) carrying the source + a Copy/Download/Open hover overlay (delegated .expr-btn handler, mirrors buildMediaActions). Both render to a self-contained <svg>, so: Copy → clipboard the TeX/mermaid source; Download → host quick-pick of PNG (rasterized via offscreen canvas with the VS Code theme background — WYSIWYG) or a transparent SVG for a dark or light background (math recolors currentColor: #e8e8e8/#1f1f1f; mermaid is re-rendered per theme via a %%{init:{'theme':…}}%% directive — the two .svg variants share an extension, so a save-dialog filter can't pick between them → quick-pick); Open → writes the PNG to context.globalStorageUri/exports/ and opens VS Code's image preview. The webview pre-renders all variants and posts them; sidebar.ts exportExpr does the quick-pick + save. The video-gen live test is now opt-in (--only=video-gen), excluded from the default gate — grok 0.2.x spins on /imagine-video in the bare harness and never completes (works interactively).
| File | Role |
|---|---|
src/extension.ts |
Entry point — registers commands, keybindings, output channel |
src/sidebar.ts |
Webview provider, message routing, fs handlers, diff editor preview, logout, generated-media inlining (postGeneratedMedia) |
src/acp.ts |
ACP client — spawns CLI, manages session lifecycle, emits events (incl. mediaContent from emitToolMedia) |
src/acp-dispatch.ts |
Pure protocol helpers — line parsing, update routing, response builders, generated-media extraction (isMediaGenToolCall/extractGeneratedMediaPaths) |
src/session.ts |
Per-session state bag (Session) — one per live grok agent stdio process; the sidebar holds a pool of these + one focused |
src/session-pool.ts |
Pure reaping policy (selectReapable) — idle-TTL + LRU cap over the live-session pool (never the focused/working/needs-you) |
src/cli-locator.ts |
Locate grok binary (configured path → ~/.grok/bin/grok → PATH); cross-platform |
src/terminal-manager.ts |
Headless shell children for the agent's terminal/* ACP calls; cross-platform via shell:true |
src/chips.ts |
File-chip CRUD (pure) |
src/prompt-builder.ts |
Chip → prompt-string with @path refs and fenced code blocks |
src/slash-filter.ts |
Slash-command autocomplete filter |
src/plan-gate.ts |
Plan-mode policy (pure) — workspace-write containment, read-only command allowlist, permission/plan-file classification |
src/plan-restore.ts |
Plan persist + restore decision (pure) — appendPlanEntry + decideRestoreState |
src/grok-primer.ts |
Hidden plan-mode primer text + version/marker constants + pure isPrimerText() (detects the primer when grok replays it on restore, so it's hidden + not counted toward plan positions) |
src/sessions.ts |
Disk-driven session listing/delete + customName overrides (pure) |
src/file-ref.ts |
Open-file path#L<n> ref parsing + large-file inline-read guard (pure) |
src/plan-review.ts |
Plan-snapshot Markdown filename generation for the "open plan as editor tab" action (pure) |
src/voice.ts |
Voice-input pure helpers — STT request/response/error, per-platform ffmpeg args, DirectShow device parsing, API-key resolution |
src/voice-recorder.ts |
Batch capture: VoiceRecorder (spawns ffmpeg → WAV, graceful q-stop) + transcribeAudio (POST to api.x.ai/v1/stt) + resolveWindowsAudioDevice |
src/voice-streamer.ts |
Live capture: VoiceStreamer (ffmpeg PCM → ws → wss://api.x.ai/v1/stt, emits partial/final transcript events) |
media/chat.{js,css} |
Webview UI |
media/webview-helpers.js |
Pure webview helpers (file-ref detection, relative-time format, mic-button state machine, trailing send-phrase highlight, subagent classifier isSubagentToolCall/subagentLabel); shared between webview and tests |
scripts/install.{ps1,sh} |
Auto-detect VS Code CLI, build .vsix, install |
scripts/uninstall.{ps1,sh} |
Uninstall PawelHuryn.grok-vscode-phuryn |
Pure modules (acp-dispatch, chips, prompt-builder, slash-filter, cli-locator, sessions, plan-gate, plan-restore, grok-primer, file-ref, plan-review, voice, session-pool, webview-helpers) were split out specifically so protocol behavior can be unit-tested without spawning processes. (session.ts is a plain state bag — no vscode/spawn/network either, but it's data, not logic.)
The sidebar runs a pool of live sessions (one grok agent stdio process each) with one focused (the one rendered). Each session's state is a Session (src/session.ts); the sidebar holds focused + a Set<Session> pool. Re-focus is lossless: every session buffers the webview posts that built its chat (Session.buffer), so focusSession is clearMessages + replay of that buffer — no grok reload, no kill, even mid-turn/mid-approval. Clicking a cold session (no live process — reaped or pre-window) loads it from disk into a fresh member (openSession); a live one re-focuses instantly. Two guards keep it safe: a per-session gen (bumped only when that session's client is torn down, captured by handlers when wired) so a backgrounded session's in-flight events aren't judged stale when focus moves (replaces the old global sessionGen); and session-scoped emit(session,…) (buffers + forwards only when focused) vs UI-wide post(…) (status dots, sessions list). Status dots. Every history row shows a dot whose color is a derived value (the pure computeDot in src/session-pool.ts), not the raw status: gray at rest (idle / read / cold / loaded-from-disk all collapse to one gray — the user never learns warm-vs-cold), blue working, yellow needs-you, green finished-and-unread, red errored-and-unread. The green/red is an unread badge, not a live state: a turn that ends while the session is not focused sets a persisted unread/unreadError flag in SessionMetaOverride (globalState); focusing/opening the session clears it (markRead). So the badge survives reaping AND reload (it's meta-backed, decoupled from the live process) — that's the payoff: come back and the green dots are exactly the sessions with results you haven't seen. No timer/decay — a never-opened session stays green (it is still unread). computeDot({liveStatus, unread, unreadError}) precedence: working → needs-you → unread(error?) → none. Dots are pushed one-at-a-time via pushDot→sessionDot (cheap, no disk read) plus a full dots map (dotForId over disk entries + pool) on each list refresh; palette in CSS — Working=--vscode-charts-blue, Needs-you=--vscode-charts-yellow, unread=--vscode-charts-green, error=--vscode-errorForeground, none/default=--vscode-descriptionForeground. Reaping is the pure selectReapable (src/session-pool.ts): an idle TTL (untouched ≥1h, swept every 5 min) + an LRU cap (~8 live; least-recently-used eligible evicted past it), but it never reaps the focused or a working/needs-you session (so the cap can be exceeded when everything spare is busy, by design). Reaping is silent — kill the process, recompute the dot (an unread reaped session stays green; a read one goes gray); re-click reloads from disk. Recency (Session.lastActiveAt) is stamped on create/focus/made-busy via touch. CLI-update guard: the explicit Update Grok Build CLI action (updateGrokCliOnDemand) confirms before tearing the pool down if any session is working/needs-you; the silent startup auto-update has nothing in flight by construction, so it doesn't ask. Per-workspace scope. The impure voice-recorder/voice-streamer (ffmpeg spawn + STT fetch/WebSocket) are smoke-tested manually via the research/voice-*.cjs probes (voice-stt-probe, voice-e2e-verify, voice-stream-probe, voice-stream-verify, voice-cost-probe).
The history popover loads sessions one page at a time (SESSION_PAGE_SIZE = 100, newest-first) so it stays responsive with thousands of sessions on disk. The cost of the old behavior (read + JSON.parse every summary.json on every open, then render every row) grew linearly and stalled the popover at scale.
Two pure primitives replace the monolithic full-scan listSessions (which is kept only for callers that genuinely need the whole list):
indexSessions({fs, grokHome, cwd})— onestatper session dir, no reads/parses, returning every id ordered newest-first bysummary.jsonmtime. mtime is the cheap last-activity proxy: grok rewrites that file (it holdsupdated_at) on every turn. We sort by mtime here because the id is a UUIDv7 whose timestamp is creation, not last activity — id-sort would be wrong.readSessionEntries({fs, grokHome, cwd, ids, overrides})— reads + parsessummary.jsonfor exactly the given ids (the visible page) and appliescustomNameoverrides. The exactupdated_atorder is re-applied within the loaded page (mtime is only an approximate sort key).
Host orchestration (postSessionsList in sidebar.ts): indexSessions orders everything cheaply; a mtimeById map drives an mtime-keyed read cache (sessionCache, readEntriesCached) so a re-open / load-more / search reads only entries whose summary.json actually changed — steady-state opens cost ~zero reads. It posts {type:"sessions", entries, activeId, dots, offset, total, hasMore, query}. Search is server-side and complete: a non-empty query warms the whole catalog once (cache-backed) and filters by display name across all sessions, not just the loaded page. The cache is invalidated on rename (changes displayName without touching mtime), delete, and clearAll. Live-session synthesis (open-too-early fix): a brand-new session has no summary.json yet, so the disk-scan index can't see it — opening history the instant a session goes live would drop the active row entirely (and the previous top session would masquerade as the whole list) until grok flushes the file. So on the first, unfiltered page postSessionsList injects a synthesized liveSessionEntry (pinned newest-first) for any live pool member whose activeSessionId isn't on disk, naming it from the best in-memory source (generated/renamed customName → first user message → "New session"); the next refresh replaces it with grok's own summary once the file lands. hasMore is computed from the on-disk page before injection (load-more pages are disk-only), and these ids are never on disk so they can't duplicate onto a later page. Mirrors the existing dot-patch loop that already covered not-yet-persisted pool members.
Webview (media/chat.js): offset === 0 replaces the list, offset > 0 appends (de-duped by id); a scroll-near-bottom listener requests the next page (hasMore/sessionLoading guards keep it to one request per boundary); the search box posts a debounced (180ms) listSessions with the query; a host-driven refresh that arrives unfiltered while a search is active triggers a sticky-search re-request so the filtered view isn't clobbered. The "Clear all" footer shows when any non-active session exists (loaded or on a later page, via total). Popover positioning: positionDropdownPopover right-aligns the popover to the panel edge (right: 6px/left: auto) and grows it leftward, rather than left-anchoring with a one-shot width clamp. The clamp measured the popover width in a single requestAnimationFrame before the async session rows streamed in and widened it from min-width toward max-width: 360px, so the popover spilled off the right edge on the first open and only looked right on reopen (when state.sessions was already populated). Right-aligning is width-independent — no measurement, no reflow jump. It also caps min/max width to the panel (maxWidth/minWidth inline, overriding the CSS min-width: 280px/max-width: 360px) — min(360|280, parentWidth − 2·EDGE) — so in a narrow panel a long session name ellipsizes (the row already has text-overflow: ellipsis + min-width: 0) instead of the popover overflowing the left edge. Common-case sizing, not extreme (a panel narrower than the margins still clips). Resize/hide: a window resize listener re-runs positionDropdownPopover while the popover is open, so dragging the panel border re-fits it live (no close+reopen); and a document visibilitychange listener closePopovers() when the view is hidden (switching panel tab / extension), since a resize that happens while the webview is hidden fires no resize event — so the dropdown closes rather than reappearing stale on refocus (reopen re-measures).
Perf simulation (npm run test:perf, test/sessions.perf.ts via vitest.perf.config.ts) — opt-in, out of npm test/CI. At N=5000 it asserts the op counts: first open drops file reads from 5000 → 100 (~98%), steady-state re-open is 0 reads (mtime cache), search warms once then 0 reads, with a modeled-latency projection (~3250ms → ~310ms) and a real in-memory parse-cost wall-clock (~2.7x, disk-latency-free lower bound).
npm install
npm test # 503 tests, ~1.5s, vitest — all grok-free (incl. happy-dom DOM tests + fake-CLI ACP integration tests)
npm run test:perf # opt-in session-history perf simulation (NOT in npm test/CI; see § History pagination)
npm run package # → grok-vscode-phuryn-1.4.10.vsix (clears older *.vsix first)There are three kinds of tests, and it matters which is which:
npm test— grok-free unit/DOM/integration suite (492 tests). Pure logic, happy-dom tests that drive the realmedia/chat.js, a real-/bin/shTerminalManager smoke, and a fake-CLI ACP integration suite (test/fixtures/fake-grok-acp.cjs). Never spawns the realgrokbinary. Runs in <2s with no network, no login, no subscription. This is the floor — every change keeps it green. (A separate opt-innpm run test:perfruns the session-history perf simulation —test/*.perf.tsviavitest.perf.config.ts, matched only by that config so it stays out ofnpm test/CI; see § History pagination.)- CI — the same suite.
.github/workflows/ci.ymlrunsnpm ci && npm test && npm run packageon a clean Ubuntu box. CI ≡ layer 1, verbatim — there is no separate CI-only set. CI has nogrokbinary, no auth, no SuperGrok subscription, so it cannot run anything that touches the real CLI. That's the whole reason layer 1 is grok-free. npm run test:live— on-demand pre-release suite against REAL grok (scripts/live-tests.cjs). Spawns the actualgrok agent stdioand exercises the surfaces layers 1–2 can't: the real ACP handshake, a prompt round-trip, session restore, plan-mode enforcement, and the v1.4.x generative features (image gen, video gen; the subagent path is exercised opportunistically and SKIPs when grok doesn't delegate — it's deferred/research-only). It reuses the real compiled modules (out/acp-dispatch.js,out/plan-gate.js,media/webview-helpers.js) — it feeds genuine wire output through the sameisMediaGenToolCall/extractGeneratedMediaPaths/isSubagentToolCall/shouldBlockWritethe extension uses, not a re-implementation. Always run it before every release-to-main— it's a non-negotiable, standing part of the release gate; run it without asking (it needs a logged-in grok + subscription and burns credits, so it must never be innpm testor CI, but it is mandatory before any tag/release). Flags:--quick(skip the slow generative tests),--only=,--skip=,GROK_BIN=…. A SKIP (no subscription, grok chose not to delegate, etc.) does not fail the gate — only a FAIL does. Real-grok diagnostic probes (research/*.cjs) remain manual one-offs for capturing wire shapes; the live suite is the repeatable gate.
So: local == CI (both grok-free). The real-grok tests are a separate, mandatory pre-release gate — always run before a tag/release (no need to ask), never on every commit.
The native-Windows build (irm https://x.ai/cli/install.ps1 | iex) is grok 0.2.3 on the stable channel; the Linux probes in the docs were against 0.2.33 (a different release line — note the gap when reconciling wire shapes). grok does not auto-update. Updating is the explicit grok update command: grok update --check [--json] checks without installing, grok update installs the latest on the current channel, --stable (default, weekly) / --alpha switch channels, --version <X.Y.Z> pins a specific build. Re-run npm run test:live after any CLI update — the wire format is the thing that drifts.
- macOS / Linux / WSL Ubuntu:
./scripts/install.sh - Windows (native):
pwsh scripts\install.ps1— runs the native WindowsgrokCLI directly - WSL Ubuntu (alternative): Remote-WSL → install in the WSL-side VS Code server
See README.md § Install for the full per-platform matrix.
initialize→session/new/session/load→session/set_model→session/promptlifecycle- Model switching is agent-aware. Models belong to agent types —
grok-build/grok-build-planvs thecursoragent that owns the Composer models. The CLI binds the agent at spawn and locks it after the first turn (incl. our primer), so a livesession/set_modelonly works within the same agent; a cross-agent switch errorsMODEL_SWITCH_INCOMPATIBLE_AGENT("Start a new session").switchModelinsidebar.tstries the live switch, and on that error (detected by the pureisIncompatibleAgentErrorinacp-dispatch.ts) persists the pick togrok.defaultModeland restarts —newSessionre-applies the model before the primer runs, while the agent is still rebindable. No history → transparent restart; with history → the same Summarize/Just-Restart prompt as an effort change (sharedpickRestartMode/restartSessionhelpers). Empty-session cleanup: a restart on a primer-only session (no real conversation — the common case when you flip models/effort a few times right after opening) gets the no-prompt transparent path anddiscardRestartedEmptySessiondeletes the abandoned grok session dir afterward, so repeated switches don't pile up identical primer-only sessions in history; the purecarrySessionName(insessions.ts) moves any user rename (customName) from the discarded id onto the fresh one so the chosen name survives. The same cleanup runs on the effort-change empty-session branch (guarded so a dead client on a session with history keeps its history). The toolbar model label resolves IDs to user-facing names via the puremodelDisplayNamehelper. - Streaming
agent_message_chunk+agent_thought_chunk - Sessions: list/resume via
session/load(grok stores them at~/.grok/sessions/<urlencoded-cwd>/<id>/); rename/delete metadata incontext.globalState["grok.sessionMeta"]. We never edit grok's own session files. Per-session delete is the trash button on each non-active history row (deleteSession→deleteSessionDir); the active session has no delete (the live CLI re-persists it, so it wouldn't stick). Clear all (#19) is a footer button in the history popover (shown only when ≥1 non-active session exists) →clearAllSessionsinsidebar.ts: a modal-confirmed sweep of every session dir for the current workspace except the focused one, via the pureclearSessions({fs, grokHome, cwd, exceptId})insessions.ts(best-effort — a locked dir is skipped, not thrown), then purges the removed ids'grok.sessionMetaoverrides and tears down any backgrounded live pool members it deleted. Scale (history pagination): the popover loads one page at a time (newest-first by last activity) so it stays fast with thousands of sessions — see § History pagination. Clear-all is the relief valve for an overgrown store; the pagination is the steady-state fix. - Handlers (mandatory or the agent crashes):
fs/read_text_file,fs/write_text_file,terminal/{create,output,wait_for_exit,kill,release} session/request_permission→ chat card withallow-always/allow-once/reject-once, diff editor preview forkind:"edit"x.ai/ask_user_question→ inline question card (the tool was fully broken before — #12). The catch-all ACK'd unknown server requests with{}, which grok's deserializer rejects with "missing fieldoutcome". Now handled: the card renders each question's options (single question + single-select resolves on one click; otherwise pick-then-Submit; Skip → cancel) and replies{ outcome: "accepted", answers, annotations }(orcancelled).answersis keyed by question text → chosen label. On answer the card collapses to the question + a green✓ <choice>(so it's clear grok received it). On session resume the question replays as atool_call(questions inrawInput) + completedtool_call_update(answer text); chat.js suppresses the generic tool chip forask_user_questionand rebuilds a read-only "You answered" card from that replay — no separate persistence. The full binary-derived wire format is inresearch/ask-user-question.md. Response builders are pure (makeQuestionResponseinacp-dispatch.ts); the answer map is built by the purebuildQuestionAnswersinwebview-helpers.js.session/set_modewired; the picker exposes Agent, Plan, and YOLO. The CLI's non-plan mode id is"default"(not"agent"), captured asACT_MODE_IDinsidebar.ts.- Plan mode is enforced client-side (mirror of YOLO). The CLI's
x.ai/exit_plan_modestill treats any client response — result or error — as approval (re-verified broken in 0.2.3), so we don't rely on it. Insteadsrc/plan-gate.tsgates the two mandatory server→client choke points:fs/write_text_file(block writes resolving inside the workspace cwd) andterminal/create(block anything not on the read-only allowlist). grok's own~/.grok/sessions/<…>/plan.mdwrite lands outside the workspace and is allowed (and snooped to recover the plan text —exit_plan_modearrives withplanContent: null). Approve → drop the gate + send an "implement it now" follow-up prompt; Keep planning → gate stays up. Entering plan mode any way (incl. agent-initiatedcurrent_mode_update: plan) raises the gate; it's lowered only by explicit user action, never auto-lowered by CLI mode flapping. The verdict protocol is taught by a hidden primer (src/grok-primer.ts) — it tells grok to ignore the bogusexit_plan_moderesult and read[Plan approved]/[Plan rejected]/[Plan cancelled](optionally + a comment) from the follow-up message. The primer fires eagerly + non-blocking as its own hidden turn the moment a session goes live — on new AND restored sessions, and after/compact— not in front of the user's first prompt and not at composer-disabling cost (ensurePrimedinsidebar.ts, kicked off fromstartSession's success path). It returns a reusedsession.primingPromise(src/session.ts) so a racing first send awaits the same in-flight primer (grok runs one turn at a time) and is released the instant the silent primer acks; the user can type and send straight away — their bubble shows immediately (a "Grokking…" indicator stands in until grok produces content) while the real prompt is held behind the primer in code. On restore the primer is re-sent on first go-live rather than trusted from replayed history (a/compactcan drop it). The primer's silent turn is hidden by a session-levelsuppressContentflag checked inpost/emitagainstSUPPRESS_TYPES(which deliberately excludesuserMessage/agentStartso a user send that races the background primer still paints its own bubble + Grokking indicator). Primer v4 (PRIMER_VERSION/PRIMER_MARKER) is deliberately minimal: grok-build is agentic, and the old v3 primer's "## Grok Build VS Code extension … open source repo, issues: " paragraph + "Acknowledge briefly" line were tempting grok into a 15–40s pre-turn exploration of the workspace (one capture: 17.7s reading files + searching 114 files) before the user's message even ran — the plan-protocol text itself never did. v4 keeps only the protocol, drops the product paragraph + URL, and adds an explicit do not use any tools / do not read files / do not search the workspace / do not take any action constraint ending in "Reply with exactly: ok". The pureisPrimerText()(version-agnosticPRIMER_PATTERN) detects the primer when grok replays it so the bubble is hidden + not counted toward plan positions. For the full pedagogical course with diagrams and hands-on guidance, seeresearch/understanding-plan-mode.md. grok.defaultEffort→ forwarded as--reasoning-effort <value>before thestdiosubcommand (it's an agent-level flag; afterstdiothe CLI errors "unexpected argument"). Offered values mirror grok's accepted set (none|minimal|low|medium|high|xhigh); the bogusmaxwe used to expose made grok exit code 2 (#3/#4). Args are built by the purebuildGrokAgentArgs(); changing effort restarts the session (setEffortinsidebar.ts).grok.chatFontScale(#14, webview-only — not ACP) → chat-panel zoom as a percent (default 100, clamped 60–300). The host bakes the initial value into<body style="zoom:…">ingetHtml(no flash) andpostFontScalere-posts afontScalemessage ononDidChangeConfigurationfor a live, reload-free update (chat.js setsdocument.body.style.zoom).zoomscales the whole chat — text, icons, spacing — but not the rest of VS Code (the point vs. globalCtrl/Cmd+Shift+=). Supports User/Workspace scope for free.chatFontScale()insidebar.tsreads + clamps.available_commands_update→ slash autocompletecurrent_mode_update→ bottom-toolbar mode button (the top bar was removed in 0.9.0)_meta.totalTokens→ context donut- Generated media (v1.4.x).
/imagine(image_gen, orimage_editfor reference-photo edits) and/imagine-video(video_gen; older/Linux buildsimage_to_video) are subscription-only and do not return ACP image blocks — grok writes the file into its session dir (images/*.jpg,videos/*.mp4) and reports the path as a JSON string in the completed tool result'stextcontent. The pureisMediaGenToolCall/extractGeneratedMediaPaths(inacp-dispatch.ts) detect the tool and parse the path (image-vs-video by extension);acp.tstracks the tool-call id (the completed update has a null title) and emitsmediaContent;sidebar.tspostGeneratedMediaserves the file viawebview.asWebviewUri(streamed from disk under the grok-homelocalResourceRoot— what made multi-MB/imagine-videoclips render; a base64data:URI is only the fallback for files outside the served roots). CSP grantsimg-src/media-src ${webview.cspSource} data:. Inline media is capped at 320px with Copy-path / Open-in-VS-Code hover actions pinned to the image. On resume grok replays it as a single collapsedtool_callcarrying title + path together, so the same path fires. Wire format + probes inresearch/image-generation.md(research/imagine-probe.cjs,research/video-probe.cjs). - Subagent card (deferred / research-only). A pure classifier (
isSubagentToolCall/subagentLabelinwebview-helpers.js) would give a delegation a distinct Subagent: <type> card, but grok 0.2.x does not expose subagents as aspawn_subagentACP tool — it backgrounds a process and polls it viaget_command_or_subagent_output(which the classifier explicitly excludes), so the card rarely fires. Not advertised as a shipped feature (dropped from the README in 1.4.3); seeresearch/subagents.md. - Logout (v1.4.0, #13).
grok.logoutcommand + gear-menu Sign out →sidebar.logout()runsgrok logout, disposes the session, shows the auth-required onboarding.
- Subagent delegation cards are deferred — grok 0.2.x doesn't emit
spawn_subagentover ACP (it backgrounds a process + polls), so the classifier rarely fires; even when it does, child tool calls aren't nested (no inspector) - Generated media is served via
asWebviewUri(streamed from disk) when it lives under the grok-homelocalResourceRoot; files outside that fall back to a base64data:URI - No worktree UI
- Diff editor is preview-only; the write happens via
fs/write_text_fileafter approval - View defaults to left activity bar; user must drag to secondary side bar manually if desired
terminal-manager.tsusesspawn(cmd, { shell: true })so Node pickscmd.exeon Windows,/bin/shelsewhere. Don't hardcode shell paths.cli-locator.tsreadsHOME/USERPROFILEenv vars first (testability), falls back toos.homedir(). Useswhereon Windows,command -velsewhere. Checks.cmd/.exe/.batextensions on Windows.- Tests use
node -e "..."everywhere, so commands are deterministic across platforms — don't addpwd,awk,sleep,true, etc.
@vscode/test-electronintegration suite (scoped inTESTS.md § v0.2)- Status-bar indicator (current model + effort + token usage)
- Subagent support — grok 0.2.x doesn't expose
spawn_subagentover ACP, so the existing classifier is dormant; revisit if/when the CLI surfaces delegations as a tool call (then add the nested inspector that groups child calls under the card) - Worktree UI (
Grok: New Worktree Session) - Optional: auto-move view to secondary side bar on first activation (
workbench.action.moveView)
Release procedure — ALWAYS tag + create a GitHub Release (with the .vsix attached) on a release push to main (standing convention; mirrors the v1.0.0… tag history + GitHub Releases):
The whole procedure below (steps 2–5) is scripted — after bumping the version + writing the changelog section (step 1, user-initiated), just run:
pwsh scripts\release.ps1 # Windows (native) — what we use here
./scripts/release.sh # macOS / Linux / WSLIt reads the version from package.json, runs the gate, builds the vsix, commits the working tree (-MessageFile/-Message override the default Release vX.Y.Z), pushes main, creates the annotated tag, and runs gh release create with the vsix attached — extracting the matching ## X.Y.Z changelog section as the release notes. -DryRun/--dry-run previews; -NoTest/--no-test skips the gate. It refuses to run off main or when the tag already exists (i.e. the version wasn't bumped). The script itself doesn't call vsce/ovsx — but the GitHub Release it creates triggers the publish workflow (next paragraph), so it does kick off Marketplace + Open VSX publish.
What the script encodes, step by step:
- Bump
versioninpackage.json(user-initiated) and add the dated section toCHANGELOG.md. npm test(503-test floor, all green) +tsc -p . --noEmitclean, andnpm run test:liveagainst real grok — mandatory, run without asking (therelease.*scripts don't run it, so run it by hand before invoking them).- Commit + push to
main(direct-to-main, no feature branches). - Annotated git tag
vX.Y.Zat the release commit →git tag -a vX.Y.Z -m "Release vX.Y.Z"→git push origin vX.Y.Z. - GitHub Release for that tag →
gh release create vX.Y.Z --title "Release vX.Y.Z" --notes-file <notes> <vsix>(notes = the new changelog section(s); include any earlier version that was bumped but never released). Always attach the builtgrok-vscode-phuryn-X.Y.Z.vsixas a release asset so the exact installable build is downloadable from the release. - Marketplace publish is separate and explicit — only
npm run publish(vsce) when the user asks. ThePawelHurynpublisher is registered + authenticated locally; publishing ≠ tagging.
Don't skip the tag/release (or the vsix asset) on a release push. (A pure mid-dev version bump that isn't a release — e.g. the unreleased v1.3.0 voice iteration — is the only exception.)
- Direct-to-
main, no feature branches - Commits explain the why, not the what
- Don't introduce abstractions speculatively
- Don't add comments that explain what well-named code already says
- 503 tests is the floor — every PR should keep that green. All tests are grok-free (no binary spawn); grok-dependent probes live in
research/*.cjsand are run manually, never bynpm testor CI - Rebuilding clears older
.vsixfirst —npm run package(and the install/release scripts) delete stalegrok-vscode-phuryn-*.vsixbefore building, so only the current version is on disk. After any doc or code change, rebuild + reinstall so the installed extension's bundled docs are current. Package last: the vsix bundlesCLAUDE.md/README.md/docs/as files, so finish all doc + code edits before the finalnpm run package+ reinstall — otherwise the installed build ships a stale-docs snapshot. If you rebuild mid-task and then touch docs again, rebuild again so the order is always edit-everything → package → reinstall. - Version bumps are user-initiated. Iterate at the current version (rebuild the same vsix and reinstall locally) until the user says to bump and publish. Don't bump
package.jsonon your own. - Sign GitHub comments. Every GitHub issue/PR comment posted on the user's behalf ends with a signature on its own final line (italic). Pick by whether Paweł actually reviewed the text before it was posted: if he reviewed it, use
_Written with an agent, reviewed by Paweł_; if the agent posted it without his review, use_Written by Pawel's agent_. Only claim review when it actually happened.