How the Grok Build VS Code extension is put together, and the one place it deliberately stops being "thin." For day-to-day usage see the README; for the test layers see TESTS.md.
The extension is a UI shell over grok agent stdio. It speaks JSON-RPC over the
Agent Client Protocol (ACP) on the CLI's
stdin/stdout and renders the results. Almost all real state lives in the CLI
process, not the extension.
| Lives in the CLI | Lives in the extension |
|---|---|
Conversation history, memory, ~/.grok/ |
Chips list (active editor + drag-added files) |
| MCP servers, subagents, plugins | YOLO flag (auto-approval) |
| Tool execution, model state | Plan-mode gate + per-plan verdict log |
Plan text on disk (~/.grok/sessions/<…>/plan.md) |
Webview UI state, popovers, slash filter, pending diff per toolCallId |
Kill the extension and the grok child dies with it; kill grok and the
extension surfaces an error and offers a fresh session. Restarting the session
(the + button) kills the CLI child and spawns a new one — memory the CLI
persisted under ~/.grok/ survives.
VS Code webview ──postMessage──► extension host ──JSON-RPC over stdin/stdout──► grok agent stdio
◄── session/update (message chunks, thought chunks, tool calls, mode changes)
◄── fs/read_text_file, fs/write_text_file
◄── terminal/create, terminal/output, terminal/wait_for_exit, terminal/kill, terminal/release
◄── session/request_permission
◄── x.ai/exit_plan_mode, x.ai/ask_user_question
The extension implements every mandatory server→client handler
(fs/read_text_file, fs/write_text_file, terminal/{create,output,wait_for_exit,kill,release})
— miss one and the agent crashes mid-session.
When the panel opens (or you click + for a new session):
- Locate the
grokbinary:grok.cliPathsetting →~/.grok/bin/grok→PATH. - Spawn
grok agent stdioas a background child — visible inps/ Task Manager, never opening a terminal window. - If
grok.defaultEffortis set, pass--reasoning-effort <value>before thestdiosubcommand (it's an agent-level flag). initialize→session/new(orsession/loadto resume) →session/set_model.- Stream
session/updatenotifications (messages, thoughts, tool calls, permission requests, mode changes) back into the chat.
The composer unlocks as soon as the session is live. The extension's hidden
"primer" message (below) fires eagerly and silently the moment the session
goes live — in the background, without blocking the composer — so it's almost
always finished before you send. Your first real prompt simply awaits the same
in-flight primer turn (grok runs one turn at a time) and is released the instant
it acks. While any user turn is waiting on grok — including that brief held-behind-
primer gap — the chat shows an animated Grokking… placeholder, replaced in
place by the first thought / message / tool card.
The sidebar shows one conversation at a time, but it keeps a pool of live
sessions behind it — one spawned grok agent stdio process each, with exactly
one focused (the one you see). All the per-session state lives in a
Session object; the sidebar holds focused plus a Set of
every live Session (pool). The point is lossless re-focus: a backgrounded
session keeps streaming into its own view buffer (every webview post that built
its chat, in order), so re-focusing it is a clearMessages + replay of that
buffer — no grok reload, no process kill, even mid-turn or mid-approval.
Switching focus (focusSession) never touches grok: it swaps this.focused,
replays the target's buffer to the webview, and re-pushes the mode/sessions UI.
Clicking a session that isn't live (cold — it was reaped, or predates this
window) loads it from grok's on-disk history into a fresh pool member instead
(openSession).
Two details make the pool safe:
- Per-session generation guard. Each
Sessionowns agencounter, bumped only when its client is torn down. Handlers capture their session'sgenwhen wired, so a backgrounded session's in-flight events are never judged "stale" just because focus moved elsewhere (the old global counter would have done exactly that). - Session-scoped emit.
emit(session, …)buffers to that session and only forwards to the webview when it's the focused one;post(…)is for UI-wide messages (status dots, the sessions list) that aren't tied to one chat.
Status dots. Every row in the history dropdown shows a dot. It's gray at rest — and "at rest" is deliberately one bucket: idle, already-read, cold, or loaded-from-disk all look the same, because the warm-process-vs-cold distinction is an implementation detail no user should have to reason about. It lights up only when there's something to know: blue working, yellow needs-you (a pending permission / question / plan review), green finished with output you haven't opened yet, red finished with an error you haven't opened.
The green/red dot is an unread badge, not a live state. When a session's turn
ends while you're looking at a different session, a persisted unread flag is set
(in the same globalState session-meta that holds rename overrides); opening that
session clears it. Because the flag lives in metadata rather than the live process,
the badge survives both the idle reaping below and a full VS Code restart — so
you can fire off several agents, walk away, and come back to find the green dots are
exactly the sessions with results waiting. There's no timer: a session you never
open stays green, because it genuinely is still unread. The actual color is a pure
function (computeDot) of (live status, unread, unreadError), so the policy is unit-tested without a process pool. The host pushes
one changed dot at a time (cheap, no disk read) and the full map on each list
refresh.
Reaping (src/session-pool.ts). A live process per
session isn't free, so the pool is bounded — silently. The pure selectReapable
picks victims under two rules: an idle TTL (a session untouched for an hour is
torn down, swept every 5 min) and an LRU cap (at most ~8 live; the
least-recently-used eligible sessions are evicted past it). It never reaps the
focused session or a working/needs-you one — so the cap can be exceeded when
everything spare is busy, by design. Reaping just kills the process and recomputes
the dot — a reaped session that's still unread stays green, a read one goes
gray — and re-clicking the row reloads the session from disk.
One safety valve sits next to this: the explicit Update Grok Build CLI action
tears down every live session to swap the binary, so it now confirms first if any
session is working or needs-you (the silent startup auto-update runs before
anything is in flight, so it doesn't ask).
Everything else mirrors the CLI. Plan Mode is enforced client-side, because
the CLI's x.ai/exit_plan_mode is unreliable: it reports "approved" to any
client reply — result or error — regardless of what the user actually chose. So
the extension can't trust the wire verdict. Two mechanisms cover the gap:
-
The gate (src/plan-gate.ts). While Plan Mode is active, the two mandatory server→client choke points are policed: a
fs/write_text_fileresolving inside the workspace is blocked, and aterminal/createthat isn't on a read-only allowlist is blocked. grok's own~/.grok/sessions/<…>/plan.mdwrite lands outside the workspace, so it's allowed (and snooped to recover the plan text, sinceexit_plan_modearrives withplanContent: null). Entering plan mode any way — including the agent self-initiating it — raises the gate; only an explicit user action lowers it. -
The primer (src/grok-primer.ts). A hidden system message tells grok in plain English to ignore the bogus tool verdict and read the real decision from the next user message instead, as a bracketed marker:
[Plan approved]/[Plan rejected]/[Plan cancelled](optionally followed by a free-form comment). Approve → drop the gate + send "implement it now"; Keep planning → the gate stays up. The primer fires eagerly and non-blocking (ensurePrimed) — its own hidden turn, kicked off the moment a session goes live (new and restored, and after/compact) rather than in front of the user's first prompt. It returns a reusedsession.primingPromise, so a first send that races the background primer awaits the same turn and is released when it acks. It is re-sent on go-live after a restore: a primer buried in replayed history isn't reliably honored (a/compactcan drop it from effective context), so the extension re-asserts it rather than trusting the replay. The silent turn is hidden by a session-levelsuppressContentflag, which deliberately letsuserMessage/agentStartthrough so a racing user send still paints its own bubble + Grokking indicator. Primer v4 is kept minimal on purpose: grok-build is agentic, and the old v3 primer's product-blurb paragraph + repo URL + "acknowledge briefly" line were tempting grok into a 15–40s pre-turn exploration of the workspace before the user's message even ran — so v4 keeps only the plan protocol and adds an explicit do not use tools / read files / search the workspace / take any action; reply with justokconstraint. When grok replays an earlier primer as a user message on restore, the pureisPrimerText()helper detects it so the bubble is hidden and not counted toward plan positions — but that detection does not mark the session primed.
The full pedagogical write-up lives in research/understanding-plan-mode.md.
| File | Role |
|---|---|
| src/extension.ts | Entry point — registers commands, keybindings, output channel |
| src/sidebar.ts | Webview provider, message routing, fs handlers, diff preview, logout, generated-media serving (postGeneratedMedia → asWebviewUri, base64 fallback) |
| src/acp.ts | ACP client — spawns CLI, manages session lifecycle, emits events |
| src/session.ts | Per-session state bag — one Session per live grok agent stdio process (the sidebar holds a pool of these + one focused) |
| src/session-pool.ts | Pure reaping policy (selectReapable) — idle-TTL + LRU cap over the live-session pool |
| src/acp-dispatch.ts | Pure protocol helpers — line parsing, update routing, response + generated-media extraction (isMediaGenToolCall/extractGeneratedMediaPaths) |
| src/cli-locator.ts | Locate the grok binary; cross-platform |
| src/terminal-manager.ts | Headless shells for the agent's terminal/* calls |
| src/plan-gate.ts | Plan-mode policy (pure) — workspace-write containment + read-only command allowlist |
| src/plan-restore.ts | Plan persist + restore decision (pure) |
| src/grok-primer.ts | The hidden primer text + replay-detection helper (pure) |
| src/chips.ts | File-chip CRUD (pure) |
| src/prompt-builder.ts | Chip → prompt-string with @path refs and fenced blocks (pure) |
| src/slash-filter.ts | Slash-command autocomplete filter (pure) |
| src/sessions.ts | Disk-driven session listing/delete + name overrides (pure) — indexSessions (stat-only ordering), readSessionEntries (windowed read), listSessions (whole-list), clearSessions |
| src/file-ref.ts | Open-file ref parsing + large-file inline-read guard (pure) |
| src/plan-review.ts | Plan-snapshot Markdown filename generation (pure) |
| src/voice.ts | Voice-input pure helpers — STT request/response, ffmpeg args, device parsing, key resolution |
| src/voice-recorder.ts | Batch capture (ffmpeg → WAV) + STT REST upload |
| src/voice-streamer.ts | Live capture (ffmpeg PCM → WebSocket STT) |
| media/chat.{js,css} | Webview UI |
| media/webview-helpers.js | Pure webview helpers (file-ref detection, relative-time, mic-button state machine, trailing send-phrase highlight, math extraction splitMath/stripUnsupportedTex, and the deferred subagent classifier isSubagentToolCall/subagentLabel) — shared between webview and tests |
The history dropdown lists every session the CLI saved for this workspace, and that
store can grow into the thousands. The old path read and JSON.parsed every
summary.json on every open, then rendered every row — linear cost that stalled the
popover at scale. It now loads one page at a time (SESSION_PAGE_SIZE = 100,
newest-first), built from two pure primitives in
src/sessions.ts:
indexSessionsdoes onestatper session dir, no reads — it orders every id newest-first bysummary.jsonmtime. mtime is the cheap last-activity proxy: grok rewrites that file (it holdsupdated_at) on every turn. We sort by mtime because the id is a UUIDv7 whose timestamp is creation, not last activity — an id-sort would order by when the session was first opened, which is wrong.readSessionEntriesreads + parsessummary.jsonfor exactly the visible page's ids and applies name overrides.
The host (postSessionsList in src/sidebar.ts) orders everything
cheaply with indexSessions, then drives an mtime-keyed read cache so a re-open /
load-more / search only re-reads entries whose summary.json actually changed —
steady-state opens cost ~zero reads. Search is server-side and complete: a query
warms the whole catalog once (cache-backed) and filters by display name across all
sessions, not just the loaded page. One wrinkle the disk scan can't cover on its own:
a brand-new session has no summary.json yet, so opening history the instant a
session goes live would drop the active row until grok flushes the file. The host fixes
that by synthesizing a top-pinned row from in-memory state for any live session not yet
on disk (first, unfiltered page only — those ids can't appear on a later page). The
webview appends pages on scroll-near-bottom (de-duped by id, one request per boundary)
and debounces the search box. An opt-in
perf simulation (test/sessions.perf.ts via
npm run test:perf, kept out of npm test/CI) asserts the op counts at N=5000: first
open drops reads 5000→100 (~98%), steady-state re-open is 0 reads, search warms once
then 0. Clear all remains the relief valve for an overgrown store; pagination is
the steady-state fix.
- Pure modules split for testability. Everything tagged "(pure)" above has no
vscodeimport, no process spawn, no network — it runs under Vitest in a plain Node process. That's why the bulk of protocol behavior can be regression- tested without launching VS Code or thegrokbinary. See TESTS.md. - YOLO is client-side only. A single
autoApproveflag — toggling Agent ↔ YOLO doesn't restart the CLI or even send a message. When the CLI raises a permission request, the extension just answers "allow always" automatically. - Cross-platform without per-OS branches.
terminal-manager.tsusesspawn(cmd, { shell: true })so Node pickscmd.exeor/bin/sh;cli-locator.tsprefersHOME/USERPROFILEenv overos.homedir()so tests can override paths. - Streaming is rAF-coalesced. Message and thought chunks buffer into a raw string and re-render at most once per animation frame — long responses stay smooth under fast chunk rates.
available_commands_updatedrives slash autocomplete. No hardcoded command list; the CLI tells the extension what's available, so plugin/skill installs surface immediately.- Model switching is agent-aware. Models belong to agent types
(
grok-build/grok-build-planvs. thecursoragent that owns the Composer models). The CLI binds the agent when the process spawns and locks it after the first turn (including our primer), so a livesession/set_modelonly works within the same agent — a cross-agent switch errorsMODEL_SWITCH_INCOMPATIBLE_AGENT. SoswitchModeltries the live switch and, on that specific error (isIncompatibleAgentErrorin src/acp-dispatch.ts), persists the pick togrok.defaultModeland restarts —newSessionre-applies the model before the primer runs, while the agent is still rebindable. No history → transparent restart; with history → the same Summarize / Just-Restart choice as an effort change. A restart on a primer-only session (no real conversation — common when you flip models/effort right after opening) takes the no-prompt path and discards the abandoned grok session dir afterward, so repeated switches don't pile up identical empty sessions in history; the purecarrySessionNamemoves any user rename onto the fresh session so the chosen name survives. The same cleanup runs on the effort-change empty-session branch, guarded so a dead client on a session with history keeps its history. - Generated media is path-based, not an ACP image block.
/imagineand/imagine-videowrite a file into the session dir and report its path as JSON-in-text on the completed tool result. The host parses the path, classifies image-vs-video by extension, and serves it to the webview viaasWebviewUri(streamed from disk) so even a multi-MB video renders. See research/image-generation.md. - Math renders via vendored MathJax (SVG), extracted before HTML-escaping. Grok
answers with TeX (inline
\(…\), display\[…\],\begin{pmatrix}matrices). The puresplitMathpulls math spans out before the markdown pass escapes HTML — so backslashes and braces survive into placeholders, mirroring the code-block/table extraction — andrenderMathinchat.jsrenders each span with MathJax (media/mathjax/tex-svg-full.js, a self-contained ~2.3 MB IIFE, no network) viaMathJax.tex2svg(synchronous once startup resolves; raw-TeX fallback + anupgradeMathInDompass until then).enableAssistiveMml:falsestops a hidden MathML copy from rendering as a visible duplicate, and we supplymjx-container[display="true"]{display:block}ourselves since manualtex2svgskips MathJax's injected stylesheet. Single$…$is deliberately not a delimiter — it false-matches prose currency. (v1.4.7 replaced KaTeX with MathJax, mainly so every equation is an exportable self-contained SVG.) - Display math + Mermaid diagrams export to PNG/SVG. Both end up as a
self-contained
<svg>in an export host (.math-export/.mermaid-block) carrying the source. A hover overlay (delegated.expr-btnhandler, mirroring the generated-imagebuildMediaActions) offers Copy (the source), Download, and Open. Download quick-picks a PNG (canvas-rasterized with the VS Code theme background — WYSIWYG) or a transparent SVG for a dark/light background (math recolorscurrentColor; mermaid re-renders per theme via a%%{init}%%directive). The host (sidebar.ts exportExpr) runs the quick-pick + save dialog; Open writes the PNG toglobalStorageUri/exports/and previews it. - Mermaid renders async, as a post-pass over the inserted DOM. Grok answers
with
```mermaidfences (flowcharts, sequence/state diagrams, git graphs, …). Unlike the synchronous math render,mermaid.renderis async and needs the live DOM (it measures text to lay out nodes), sorenderMarkdownonly turns the fence into a.mermaid-blockplaceholder (carrying the source as a readable fallback code block) andrenderMermaidIninchat.jsswaps in the SVG afterward via vendored Mermaid (media/mermaid/, a self-contained ~3.3 MB IIFE, no network). The streaming agent bubble rebuilds its DOM every animation frame, so two source-keyed module caches make that flicker-free:mermaidSvgCachere-applies a rendered SVG synchronously on a cache hit, andmermaidInFlightstops a diagram being laid out repeatedly before its first render resolves. Themed to VS Code dark/light;securityLevel:"strict"; malformed/half-streamed diagrams keep the readable source. No CSP change (the lib has noeval/new Function; its inline styles are covered bystyle-src).