Session history that stays fast with thousands of sessions.
- Session history loads in pages and stays fast at scale. The history dropdown used to read and parse every saved session on each open, which got slow once a project had hundreds or thousands of them. It now loads the most recent 100 (newest first by last activity) and pulls in older ones as you scroll to the bottom. The search box filters by name across your entire history — not just the loaded page — so you can still find an old session instantly. Behind the scenes it orders sessions with one cheap directory
stateach (no file reads), reads only the page you're looking at, and caches by file modification time so re-opening the dropdown costs effectively no disk reads. (src/sessions.ts, src/sidebar.ts, media/chat.js, media/chat.css) - Switching model or reasoning effort on a fresh session no longer clutters history. Some model and effort changes need the session to restart. If you flip them a few times right after opening a session — before you've actually said anything — each restart used to leave behind an empty, identical session in your history. Now an empty session (one where only the hidden setup has run) restarts cleanly with no "Summarize & Restart vs. Just Restart" prompt, and the throwaway session is removed instead of piling up. If you had renamed that session, the name carries over to the restarted one. (src/sidebar.ts, src/sessions.ts)
- History dropdown no longer opens clipped off the right edge. Opening the session-history popover quickly (before its rows had finished loading) could position it too far right, so it spilled past the panel edge and only looked right after closing and reopening. The popover is now right-aligned to the panel (respecting the edge padding) and grows leftward, so it stays fully on-screen no matter how its contents resize as sessions load in. In a narrow panel it also caps its width to fit, so a long session name truncates with an ellipsis instead of pushing the popover off the left edge. Resizing the panel while the dropdown is open now re-fits it live (no need to close and reopen), and switching to another panel tab or extension closes it so it can't reappear mis-sized when you come back. (media/chat.js)
- Opt-in performance simulation for the history popover. A new
npm run test:perfsuite (kept out ofnpm testand CI) builds a 5000-session in-memory store and asserts the access-count improvement: first open drops file reads from 5000 to 100 (~98%), a repeat open does zero reads (modification-time cache), and search warms the catalog once then stays read-free — with a modeled-latency projection and a real in-memory parse-cost wall-clock. (test/sessions.perf.ts, vitest.perf.config.ts, package.json)
- Documented the pagination design in docs/architecture.md (§ History at scale) and CLAUDE.md (§ History pagination), and updated the Session history feature note in the README.
Make the chat bigger — just the chat.
- Adjustable chat font size (#14). A new
grok.chatFontScalesetting zooms the Grok chat panel only — text, icons, and spacing together — as a percent (e.g.150,200, or smaller like70). Unlike VS Code's globalCtrl/Cmd+Shift+=, it leaves the rest of the editor at its normal size, so you can enlarge (or shrink) just the chat for readability. It applies live with no reload, the composer stays pinned to the bottom of the panel at any scale, and it works at both User (global) and Workspace (local) scope. (package.json, src/sidebar.ts, media/chat.css, media/chat.js)
- README polish. Added screenshots for Voice input and the Agent Dashboard, and moved a few wire-level implementation details out of the feature blurbs into docs/architecture.md so the feature list reads less like internals. (README.md, docs/architecture.md)
Run several Grok sessions at once — switch between them instantly, and see at a glance which one needs you.
- Multi-session Agent Dashboard. The sidebar now keeps several sessions alive at once instead of one at a time. Switching between them from the history dropdown is instant and lossless — the conversation you switch away from keeps running in the background (mid-turn, mid-approval, anything), and switching back replays its exact state with no reload. Picking a session that isn't live anymore loads it from history as before. (src/sidebar.ts, src/session.ts)
- Status dots in the history dropdown. Every session shows a dot so you can see what each one is doing without opening it. It's gray at rest, and only lights up when there's something to know: blue = working, yellow = needs you (a permission, question, or plan to review), green = finished with output you haven't opened yet, red = finished with an error you haven't opened. The green/red marker is an unread badge — it clears the moment you open the session, and it's persisted, so it survives the idle cleanup below and even a VS Code restart. Walk away, come back, and the green sessions are exactly the ones with results waiting. (media/chat.js, media/chat.css, src/session-pool.ts)
- Idle sessions are cleaned up automatically. To keep a pile of background sessions from each holding a live process, a session left untouched for an hour — or beyond a cap of ~8 live — is quietly shut down (never one that's working or waiting on you). It reappears in history and reloads on click, so nothing is lost. (src/session-pool.ts)
- Updating the Grok Build CLI warns about sessions in progress. With multiple sessions now able to run at once, the Update Grok Build CLI action confirms before it restarts when any session is mid-turn or waiting on you — so an update doesn't silently interrupt work in a background session. (src/sidebar.ts)
- No more long pause before Grok starts. Sending your first message used to sit silent for 15–40 seconds before anything appeared. Behind the scenes the extension primes each session with a hidden plan-mode instruction, and that primer was running in front of your first message and — because Grok Build is an agentic CLI — was wandering off to read files and search the workspace before your real prompt even ran. The primer now fires the moment a session goes live, silently in the background, so it's almost always finished before you hit send; if you're quick, your message shows immediately and is released the instant the primer settles. The primer text itself was also trimmed to just the protocol it needs to teach (the product blurb and repo link that were tempting Grok to go exploring are gone), so it completes in a beat instead of dozens of seconds. (src/sidebar.ts, src/grok-primer.ts, src/session.ts)
- A "Grokking…" indicator while you wait. Every turn now shows an animated Grokking… placeholder the instant you send, so there's immediate feedback that Grok received your message — it's replaced in place the moment the first thought, reply, or tool action arrives. (media/chat.js, media/chat.css)
Sharper math, and one-click export for equations and diagrams.
- Math now renders with MathJax (replacing KaTeX). MathJax produces self-contained SVG that's closer to "real LaTeX," renders
\label/\ref-style environments without painting red errors, and — crucially — gives every equation an exportable vector. Inline\(…\)sits on the text baseline in your editor's text color; display\[…\]gets its own centered, horizontally-scrollable block. The swap also fixed a double-rendering bug where Chromium drew MathJax's hidden accessibility MathML as a second, visible copy of each equation (enableAssistiveMml: false). (media/chat.js, src/sidebar.ts, media/mathjax/) - Copy / Download / Open actions on display math + Mermaid diagrams. Hover any display equation or rendered diagram for a top-right overlay (mirrors the generated-image actions): Copy the LaTeX/Mermaid source, Download as an image, or Open it in VS Code's image preview. Download offers a quick-pick — PNG (rasterized with your VS Code theme background, i.e. what you see), or a transparent SVG tuned for a dark or for a light background. Math recolors its ink for each; Mermaid is re-rendered in its matching light/dark theme so a "for light background" diagram actually uses the light palette. (media/chat.js, src/sidebar.ts)
video-genis excluded from the default live-test gate (opt-in via--only=video-gen). In the headless test harness grok 0.2.x spins on/imagine-videoinstead of producing a clip, so it never completes — the feature works interactively, so a default-on test only produced noise. (scripts/live-tests.cjs)
Grok's Mermaid diagrams now render as diagrams.
- Mermaid diagram rendering. Grok answers with
```mermaidfenced blocks — flowcharts, sequence/state diagrams, git graphs, class diagrams, ER, pie, and more — which the chat previously showed as raw diagram source. These now render as real diagrams via the vendored Mermaid library (bundled into the extension, no network — works offline and in the packaged build). The diagram is themed to match VS Code (dark/light) and gets horizontal scroll so a wide flowchart doesn't blow out the narrow sidebar. Rendering is asynchronous and DOM-based (Mermaid measures text to lay out nodes), so unlike the LaTeX path it runs as a post-render pass over the inserted message; an SVG cache keyed by the diagram source keeps the streaming bubble flicker-free (the agent message re-renders every animation frame) and stops the same diagram being laid out dozens of times before the first render resolves. A half-streamed block stays as plain text until its closing```arrives, and if Mermaid can't load or the diagram is malformed the readable source is shown instead of an error. (media/chat.js, src/sidebar.ts, media/mermaid/)
Grok's math now renders as math.
- LaTeX / math rendering. Grok increasingly answers with TeX — inline
\(…\)and display\[…\](including\begin{pmatrix}matrices, fractions, sums, Greek) — which the chat previously showed as raw backslash-soup. Math is now rendered with KaTeX, vendored into the extension (no network, works offline and in the packaged build). The renderer pulls LaTeX out before HTML-escaping so the backslashes and braces survive intact; inline math flows with the text, display math gets its own block with horizontal scroll so a wide matrix doesn't blow out the narrow sidebar. A malformed expression renders as an inline red error (KaTeXthrowOnError:false) instead of blanking the message; if KaTeX somehow can't load, the raw TeX is shown rather than swallowed.\label{…}(which Grok emits insidealign/equationblocks for cross-referencing) is stripped before rendering — KaTeX has no\ref/\eqrefsystem so it would otherwise paint the label as a red error, and\labelproduces no visible output in real LaTeX anyway. Single$…$is deliberately not a delimiter — too many false positives with prose currency ("$5 and $10"). (media/chat.js, media/webview-helpers.js, src/sidebar.ts, media/katex/)
You can read history again while Grok is thinking.
- Scrolling up no longer gets yanked back down while Grok is thinking (#16). The chat snapped to the bottom on every streaming update, so any attempt to scroll up and re-read earlier messages (or Grok's own earlier reasoning) was undone on the very next thought chunk. The view now follows streaming output only while you're already pinned to the bottom; the moment you scroll up to read history, auto-scroll pauses and leaves you there. Genuinely interactive activity you need to see — permission cards, ask-user-question cards, and your own sent message — still pulls the view back down and re-pins. This also restores the ability to keep an eye on reasoning while permission cards stack up (#15). (media/chat.js, media/webview-helpers.js)
Docs catch-up and a faster, leaner session start.
- README rewrite. Restructured around three audiences: users get a clean Requirements → Install → Quick start path, then a Features & capabilities section where each feature is its own collapsible — ordered by what actually sells the extension (diff-preview approval, modes,
/imagineimages+videos, voice…) rather than by implementation. Configuration, Commands & keybindings, and Development each collapse into a single<details>so the page scans in seconds while staying self-contained for the Marketplace listing. The deep dive — diagram, message flow, module map, design notes, and the Plan-Mode "the one part that isn't thin" explainer — moved to a new docs/architecture.md, linked from a short How it works teaser. - Removed stale claims. Dropped the Subagents feature section (still research-only — it rarely fires in practice, so it shouldn't read as shipped) and the "generated media is inlined as base64" known-limit (1.4.2 switched media to
asWebviewUristreaming). Trimmed the opening screenshots to the sidebar + an inline/imagineresult, with a More screenshots link to the folder; removed a decorative image that carried no information. - Canonical
README.md/CHANGELOG.mdcasing. The working-tree files were lowercase on disk (a Windows case-insensitivity slip) while git already tracked them uppercase; the disk now matches. (vscestill normalizes the packaged copies to lowercase inside the.vsix— that's its own convention, which the Marketplace renders fine.)scripts/release.*now referenceCHANGELOG.mdso the release-notes extraction works on case-sensitive filesystems too.
- The hidden plan-mode primer no longer costs a startup round-trip. The extension sends Grok a hidden "primer" that teaches it the Plan-Mode verdict protocol. It used to fire at every session start — new and every restore — locking the composer until Grok acknowledged and burning a turn even on a session you only opened to glance at. It's now sent lazily, as its own hidden turn before your first real prompt — on a new or restored session — so it rides along with work you already triggered. The composer is ready the instant the session connects, and opening/abandoning a session (or restoring just to read history) costs nothing. Re-asserting the primer on the first post-restore send (rather than trusting a copy buried in replayed history, which a
/compactcan drop) keeps Plan Mode reliable across resumes. Best-effort and unchanged in protocol — the plan-gate remains the real enforcement. (src/grok-primer.ts, src/sidebar.ts)
Generated video renders now, and inline media is a tighter thumbnail.
- Generated videos (
/imagine-video) finally render. Detection, path extraction, MIME, and CSP were all already correct — the failure was the delivery: a multi-MB clip base64-inlined into a singlepostMessagedata:URI was silently dropped, so the<video>got an empty source. Generated files are now served viawebview.asWebviewUri(the grok home is alocalResourceRootsentry), so the webview streams the file straight from disk instead of carrying it as a giant string — videos play, and large images load lazily. Files written outside the served roots still fall back to a base64data:URI, so nothing regresses. (src/sidebar.ts, media/chat.js)
- The Copy path / Open in VS Code hover icons now sit on the image. They were anchored to the chat column's right edge, so on a thumbnail they floated in empty space well to the right of the picture. The media block is now sized to the rendered image, so the icons pin to the image's own top-right corner — for videos too. (media/chat.css)
- Inline media is capped at 320px wide (was 640px), so a generation reads as a compact thumbnail in the narrow sidebar instead of dominating the chat. The file is untouched — click an image (or Open in VS Code) for full resolution. (media/chat.css)
A two-part fix for generated images that stopped rendering in 1.4.0.
- Generated images are visible again. 1.4.0 capped inline media at 640px by wrapping it in a
width: fit-contentcontainer. That made the<img>'smax-width: 100%resolve against an indefinite width, which collapses a replaced element to zero in Chromium — so every generation (including plain/imagine) rendered as an invisible, zero-width image. The container is now a normal block (definite width), so the percentage resolves correctly while the 640px cap stays. (media/chat.css) - Reference-edited images (
image_edit) now render too. Editing a real photo with/imagineruns Grok'simage_edittool (titleimagine-edit: …, variantImageEdit) — a surface 1.4.0's detector didn't know about, so the saved file was never inlined. Confirmed live against grok 0.2.x: the completed result reports the path as the same machine-readable JSON{path}the other media tools use (an extended-length\\?\C:\…Windows path, stripped to canonical form).isMediaGenToolCallnow recognizes it. (src/acp-dispatch.ts)
Two new CLI surfaces — generated image/video rendering and a Sign-Out action. The media wire format was confirmed live against grok 0.2.33 (see research/image-generation.md). Available on the VS Code Marketplace.
- Every message you send no longer renders twice (grok 0.2.33 regression). grok ≥0.2.33 echoes the live prompt back as a
user_message_chunkmid-turn — 0.2.3 did not (the code's own comment read "the agent never echoes them back"). The webview already renders the bubble optimistically fromsend(), so the echo produced a second, duplicate bubble (and double-counteduserMessageCount, skewing plan positioning). The host now forwardsuser_message_chunkonly during a session/load replay (a newreplayingflag), and the webview'sappendUserChunkguards the same — so a live echo can never double the bubble. (src/sidebar.ts, media/chat.js)
- Generated images and videos render inline. When Grok generates an image (the subscription-only
/imagine) or a video (/imagine-video), it now shows up as an actual image or a playable<video>in the chat instead of a dead tool chip. The real wire format (confirmed live, research/image-generation.md) is not an ACP image block — Grok'simage_gen/image_to_videotools write the file into the session directory (images/*.jpg,videos/*.mp4) and report the path as a JSON string inside the completed tool result's text. The host recognizes the media-gen call, parses the path out and classifies image-vs-video by extension (isMediaGenToolCall/extractGeneratedMediaPaths), reads the file and inlines it as adata:URI (webviews can't load arbitrary disk paths under the CSP —media-src data:was added for video), and the webview renders it. Hovering an image or video reveals two top-right icons (styled like the code-block copy button): Copy path and Open in VS Code — the latter is the only way to open a video's file, since its click drives playback controls (clicking an image still opens its source too). Inline media is capped at 640px on the longer edge so full-resolution generations stay legible in the chat (the file is untouched). ACP-standard image/resource_linkblocks are also handled as a forward-compatible fallback. Both render identically on session resume (Grok replays the generation as a single collapsedtool_call). (src/acp-dispatch.ts, src/acp.ts, src/sidebar.ts, media/chat.js, media/chat.css)
- Sign out from the extension (#13). New
Grok: Log Outcommand (palette) and a Sign out item in the gear menu rungrok logoutto clear the CLI's cached credentials, tear down the live session, and drop back to the auth-required onboarding screen — no more switching to a terminal to change xAI accounts. (src/sidebar.ts, src/extension.ts, package.json, media/chat.js)
- The Grok Build CLI is updated silently when the extension upgrades. Grok doesn't auto-update, so a user who installs a new extension version could be left on an older CLI whose wire format the new extension no longer matches. Now, the first time a session starts after the extension's own version changes, the host runs
grok updateonce before spawning the CLI — so the next handshake reports the freshly-updated version. It fires only on an actual upgrade, never on a fresh install (the "not-first-run" rule — a clean install just records its baseline version), at most once per activation, viaexecFilewhile no grok process is alive (sidesteps the Windows binary lock), and is best-effort (a failed update logs and continues on the current binary). The gate is the pure, unit-testedextensionWasUpgraded. (src/cli-locator.ts, src/sidebar.ts, media/chat.js) - The welcome status line tracks real readiness. It now follows the true session-start lifecycle —
Updating Grok Build CLI…(during a silent update) →Starting…(through the hidden primer turn, while the composer spinner is up) →Connected · v<version>. Previously it flipped to "connected" at the ACP handshake, before the primer had been sent and processed, so it claimed readiness while grok was still being primed; it now stays "Starting…" until the spinner actually clears. (media/chat.js)
- The gear menu gets an "Other" group with About, Config & debug, and Log out. The flat Config / Account / Debug sections collapse into two sub-views (mirroring the Model picker): About shows the This extension + Grok Build CLI versions, checks for a newer CLI (
grok update --check), and offers an Update Grok Build CLI action; Config & debug holds the config links + extension logs. The on-demand update tears the session down, runsgrok update, then resumes the same session on the fresh binary (preserving the conversation), showing theUpdating… → Starting… → Connected · v<new>lifecycle. (media/chat.js, media/chat.css, src/sidebar.ts) - About shows the real CLI version, even on builds the handshake doesn't tag. The native-Windows build doesn't report a version in the ACP
initializeresponse, so About used to read a bare "—" right next to a confident "CLI is up to date". It now adopts the version the update check returns (grok update --check'scurrentVersion), and the action collapses to a grayed "CLI is up to date" (no button) when there's nothing to do. (media/chat.js) - The Config & debug → MCP servers link works on Windows. It used to type a quoted
"C:\…\grok.exe" mcp listinto the terminal, which PowerShell (the default Windows shell) parses as a string literal and rejects with "Unexpected token". It now launches grok directly as the terminal's own process (shellPath/shellArgs→grok mcp list), sidestepping shell quoting entirely. (src/sidebar.ts) - Transient status text animates and is capitalized. "Starting", "Updating Grok Build CLI", "Thinking", and "Summarizing" now show an animated trailing ellipsis (a CSS
::afterso the layout doesn't shift), and the welcome line reads "Starting…" / "Connected · v…" (capitalized). (media/chat.css, media/chat.js)
- New grok-free tests for v1.4.0: the
image_gen/image_to_videopath-in-JSON result extraction (isMediaGenToolCall/extractGeneratedMediaPaths, classifying image vs video and covering the collapsed-resume shape) and ACP-standard image fallbacks (extractImageContent/collectToolImagesacross inline base64, resource blob, file/remoteresource_link) plus image-vs-text chunk routing, and happy-dom DOM tests driving the realmedia/chat.jsrender paths —addGeneratedMedia(clickable inline<img>,<video controls>, remote-link fallback, and the hover Copy path / Open in VS Code actions for both image and video). Plus the silent-update gate (extensionWasUpgraded— fresh-install vs upgrade vs unchanged vs downgrade) and a happy-dom suite pinning the welcome version-line lifecycle (Updating Grok Build CLI…→Starting…at the handshake →Connected · v<version>only when the priming spinner clears, and no reversion on later busy toggles). And the 0.2.33 regression fixes: a fake-CLI scenario that echoes a liveuser_message_chunk+ a DOM test asserting a single bubble (no duplicate), and a gear-menu suite (the Other group, the About panel's versions +grokUpdateStatus-driven update button incl. the version-from-update-check fallback, the Config & debug links). 401 grok-free tests total.
- Refreshed the Marketplace screenshot. Updated the "alongside VS Code" README image to
v1.3.1_vscode.pngso the listing reflects the current UI. No code changes. (README.md)
- Grok's
ask_user_questiontool works now instead of failing every time (#12). When Grok tried to ask an inline multiple-choice question, the tool errored withClient returned an invalid response to user question: missing field 'outcome'and Grok fell back to dumping the question as plain text. The client had no handler for the question request (x.ai/ask_user_question), so it fell through to the catch-all that acknowledges unknown server requests with a bare{}— which Grok's deserializer rejects because the response is an internally-tagged enum that requires anoutcomefield. There's now a proper inline question card: each question renders its options (a single single-select question resolves on one click, like a permission card; multiple/multi-select questions let you pick then Submit; Skip dismisses), and the client replies with{ outcome: "accepted", answers, annotations }(orcancelled). The full wire format was recovered directly from thegrok.exebinary (method name, theAskUserQuestionExtResponseenum, itsaccepted/cancelled/skip_interview/chat_about_thistags, and theanswers/annotationsfields) and is documented in research/ask-user-question.md. (src/acp.ts, src/acp-dispatch.ts, src/sidebar.ts, media/chat.js, media/chat.css, media/webview-helpers.js) - The question card now clearly confirms your answer. The question text under the "Grok is asking" label is prominent, and once you answer the card collapses to the question plus a bright green ✓ <your choice> (for both single- and multi-select) instead of just greying out — so it's obvious Grok received it even when its reasoning continues above. Skip collapses to a "Skipped" state. (media/chat.js, media/chat.css)
- Answered questions survive a session resume. Resuming a session from history rebuilds each answered question as a read-only "You answered" card — previously the question simply vanished on reload. On replay Grok relabels the tool call's title to the display form "Ask: <question>" and emits one
tool_callper question, so the card is detected by itsrawInput.questions(robust to the relabelled title) and rendered immediately; the chosen answer is filled in when it arrives in the replay stream. Handles both agent schemas — grok-build (ask_user_question,question, quoted answer text) and the cursor/composer agent (AskQuestion,prompt, option-id answer text mapped back to labels). (If Grok's replay omits a particular answer, the question still renders without the green ✓ line.) (media/chat.js) - Resuming a session whose model belongs to a different agent no longer crashes. A session created with a Composer model (cursor agent) resumed while the default model is a grok-build one — or vice-versa — failed the whole resume with
Cannot switch to model '…': it requires agent '…' but the active agent is '…'→Grok exited (code null). A resumed session's agent is fixed by its history, so the cross-agent model can't be applied live; the resume now keeps the session's own model instead of crashing. (src/sidebar.ts)
- 18 new grok-free tests (337 total): the pure response builders (
makeQuestionResponse/makeQuestionCancelledResponse) and answer-map helper (buildQuestionAnswers), a happy-dom suite driving the real question card through single-click / multi-select / multi-question / Skip / the collapsed answered state / resume-restore (including the replayed "Ask: <question>"-titled and cursor/composer shapes), and aSCENARIO_ASK_QUESTIONround-trip in the fake-CLI ACP integration suite asserting Grok receives a well-formedoutcome:"accepted"reply.
- New: dictate prompts with a microphone button. A mic button now sits in the top-right corner of the composer. Click it to record (it turns blue with animated "listening" waves), click again to stop, and the transcription is appended into the input box ready to edit and send. Transcription is powered by xAI's Speech-to-Text API. (media/chat.js, media/chat.css, src/sidebar.ts)
- Live streaming transcription (default). Words now appear in the composer in real time as you speak, over xAI's STT WebSocket (
wss://api.x.ai/v1/stt) — instead of only after you stop.ffmpegstreams raw PCM16 to the socket; the host folds thetranscript.partialevents (keyed bystart— the trailingtranscript.doneis often empty because smart-turn finalizes mid-stream, a quirk confirmed viaresearch/voice-stream-probe.cjs) into the live transcript and relays it to the webview. Falls back to one-shot batch mode viagrok.voiceStreaming: false. Adds the extension's first runtime dependency,ws(tiny, zero sub-deps), bundled into the.vsix. (src/voice-streamer.ts, src/voice.ts, src/sidebar.ts, media/chat.js) - Fully hands-free, continuous listening. Saying "grok send" submits and keeps the mic listening — each command transparently restarts a fresh stream (so every message is one clean utterance), and you can keep dictating the next message while Grok is responding (mid-response messages are queued and sent the moment Grok's turn ends). After the first mic click, no mouse or keyboard is needed until you're done; the mic stops on a manual click or after ~2 minutes of silence (the ffmpeg cap). (src/sidebar.ts, src/voice-streamer.ts, media/chat.js)
- The "grok send" command is highlighted in the composer. As you speak (or type) the phrase, the trailing occurrence is wrapped in a subtle accent pill — visible feedback that it's recognized as a command before it's consumed on send. Implemented as a backdrop overlay behind the transparent textarea (textareas can't style their own text); detection is the pure, unit-tested
trailingSendPhrase(). Uses the configuredgrok.voiceSendPhrase. (media/webview-helpers.js, media/chat.js, media/chat.css, src/sidebar.ts) - Hands-free send via "grok send". Ending a dictation with the phrase "grok send" strips the phrase and auto-submits the message. The two-word default is deliberate — it won't trip on a message that merely ends in "send" (verified against real STT output) — and it's passed to the STT model as a
keytermbias so it's recognized reliably (fixing the "grok send" → "gronsent" mishearing). Configurable/disablable viagrok.voiceSendPhrase; detection is a pure, unit-testedparseVoiceCommand(). (src/voice.ts, src/sidebar.ts, media/chat.js) - Why it's built the way it is. Two hard constraints shaped the design, both verified against the real stack (see research/voice-input.md + research/voice-probe.cjs): (1) the Grok CLI advertises
promptCapabilities.audio: falseand rejects audio content blocks over ACP — it's a text/code agent, so audio can't ride the CLI; and (2) VS Code webviews can't access the microphone (getUserMediais blocked with no override). So capture runs in the extension host via anffmpegchild process — the same place the CLI and terminals are spawned — and the recorded clip is POSTed straight to xAI's separate Speech-to-Text product (api.x.ai/v1/stt), bypassing ACP entirely. The full pipeline (DirectShow device auto-detection → mono/16 kHz capture → graceful stop → upload → transcript) was confirmed end-to-end on native Windows withgrok0.2.3 and ffmpeg 8.0.1. (src/voice.ts, src/voice-recorder.ts) - Setup. Voice input needs
ffmpegonPATH(orgrok.ffmpegPath) and an xAI API key. The STT API is a separate console.x.ai developer key billed pay-as-you-go (~$0.10/hr) — distinct from the Grok CLI login, which can't authenticate against it, and unaffected by a SuperGrok subscription. Provide it viagrok.voiceApiKey, orGROK_VOICE_API_KEY/XAI_API_KEYin the workspace.env. New settings:grok.voiceApiKey,grok.ffmpegPath,grok.voiceInputDevice. (package.json) - Discoverable setup. When no API key is configured, the mic button shows a small "needs setup" dot and a hint tooltip (rather than only failing on click), and clicking it offers an actionable Open Settings / Get a Key prompt. A missing-
ffmpegerror offers a jump togrok.ffmpegPath. The hint updates live when the relevant settings change. (media/chat.js, media/chat.css, src/sidebar.ts) - Cost, measured. STT is billed by audio duration, not word count: $0.10/hr batch, $0.20/hr streaming. We measured a 510-word passage from this project's design chat → 3.06 min of audio → $0.0051 (~½¢) batch / $0.0102 streaming, i.e. ~1¢ per 1,000 words batch. Method (synth →
POST api.x.ai/v1/stt→ cost from the returnedduration) and a reusable probe are in research/voice-cost-probe.cjs; see README § Voice input. (README.md) - Startup feedback (loading state). The mic shows a "connecting…" spinner while the stream spins up (~½–1s); the blue listening waves appear only once it's actually capturing — your "talk now" signal, so the first words aren't clipped. Click during "connecting" to cancel. (media/chat.js, media/chat.css, media/webview-helpers.js)
- Punctuation is preserved and de-duplicated. The command is stripped but the sentence's own punctuation stays: "…what's the weather today grok send?" → "…what's the weather today?". When the message already ended in punctuation, the command's trailing mark is dropped rather than doubled — so "…mate. grok send." → "…mate." (not "…mate.."), and "…not sure. grok send?" → "…not sure." (not "…not sure.?"). At most one trailing mark, the message's own. (src/voice.ts)
- Blocked sends are queued, not dropped. "grok send" spoken while a send is blocked (Grok mid-response, or the hidden session-start primer) is queued and flushed the moment the turn ends or the session is ready. (media/chat.js)
- Voice listens only for the active session. Starting a new session, resuming one from history, or a model/effort restart now hard-stops any in-progress capture and resets the mic to idle (dropping a half-spoken message or a queued "grok send"), so listening never bleeds across a session switch. (src/sidebar.ts, media/chat.js)
- Tests. 85 new grok-free tests (319 total): the pure STT/ffmpeg helpers (incl. the streaming URL builder, the
start-keyed segment accumulator, streaming ffmpeg args,trailingSendPhrase, send/sent tolerance, and punctuation preservation) plus happy-dom coverage of the live-streaming composer, continuous-listening queue, connecting state, and command highlight (request/response/error mapping, per-platform capture args, DirectShow device parsing, key resolution), the mic-button state machine, and a happy-dom DOM suite driving the real mic button through the record → transcribe → insert → error-reset lifecycle. The live STT round-trip stays a manual probe (research/voice-stt-probe.cjs, research/voice-e2e-verify.cjs) per the grok-free CI convention.
- Switching to a model bound to a different agent now works instead of erroring. Picking a model whose agent type differs from the running session — e.g. the Composer models, which belong to the CLI's
cursoragent rather thangrok-build— failed withCannot switch to model '…': it requires agent 'cursor' but the active agent is 'grok-build-plan'. Start a new session to use this model.The CLI binds the agent at spawn and locks it after the first turn (including our hidden primer), so a livesession/set_modelcan only stay within the same agent. The fix mirrors the reasoning-effort flow: the chosen model is persisted togrok.defaultModeland the session restarts, wherenewSessionre-applies it before the primer runs — while the agent is still rebindable (verified against grok 0.2.3 inresearch/*.cjs). With no user history yet the restart is transparent; with history you get the same Summarize & Restart / Just Restart prompt as an effort change. Same-agent switches still happen live with history intact. (src/sidebar.ts, src/acp-dispatch.ts) - Model/effort changes are locked while the session is starting. A model switch fired during the hidden-primer window raced that turn: a probe showed
session/set_modelsometimes lands before the agent locks (applied live) and sometimes after (rejected → restart), so switching on a fresh-looking empty session would intermittently appear to "do nothing". The model button and effort dots are now disabled while a turn is in flight or the session is priming — the samebusysignal that disables send/submit — and the host ignores model/effort messages that slip through the start window. The control re-enables the moment the session is ready. (media/chat.js, src/sidebar.ts)
- The model button shows the user-facing name everywhere. The gear popover's model button showed the raw model ID (
grok-build) while the dropdown showed the friendly name (Grok Build). Both now resolve through a puremodelDisplayName()helper, falling back to the ID only when a model has no name. (media/webview-helpers.js, media/chat.js)
- Grok's own
plan.mdwrite no longer blocked when the home directory is the workspace. The plan-mode write gate exempts grok's CLI-owned~/.grok/sessions/.../plan.mdso it can be written and snooped during planning, but the exemption previously relied on that file living outside the workspace — true for project workspaces, false when the user opens their home directory as the workspace root. There the plan file resolved inside the containment root and the workspace block won, so planning stalled (repeatedfs/read_text_file/fs/write_text_fileerrors, thensession/prompttimeout).shouldBlockWritenow exempts a plan-file write only when it also resolves under the resolved grok home (~/.grok), so home-as-workspace plan writes are allowed while real workspace writes — and an arbitrary project-local.grok/sessions/.../plan.mdthat isn't grok's own — stay blocked. (#10, #11, thanks @shugav)
- Plan-mode gate hardening. Relative workspace write paths are now resolved against the workspace root before containment checks, and common mutating forms of otherwise read-only-looking commands (shell separators/newlines, command-executing heads like
env/awk/sed, write/exec flags onfind/fd/sort/tree, mutating Git forms, andnpm audit --fix) are blocked before plan approval. Grok's own.grok/sessions/.../plan.mdwrite stays allowed and snooped. (#5, #6, thanks @shugav)
grok.defaultEffortno longer crashes startup — and effort forwarding still works. TheGrok exited (code 2)crash was a value mismatch, not a protocol limitation: the picker offeredmax, which the grok CLI doesn't have (it acceptsnone, minimal, low, medium, high, xhigh). Fixed by aligning the offered values to grok's real set — dropped the bogusmax, addednone/minimal.--reasoning-effortis still forwarded (before thestdiosubcommand, where the agent-level flag belongs) and changing effort still restarts the session. A purebuildGrokAgentArgs()helper + a fake-CLI startup test pin the arg shape. (#3, #4, thanks @shugav for the report)
- Open a plan as a Markdown editor tab. Live and restored plan cards now show a link that opens the plan text in a normal VS Code editor (an extension-owned snapshot — deliberately not grok's CLI-owned
.grok/sessions/.../plan.md). Opening it doesn't send a verdict, disable the approval controls, or clear typed feedback. Better for reviewing long plans. (#7, #8, thanks @shugav)
Robustness fixes from a static audit (cross-checked with Codex). The high-impact ones are in the child-process supervision layer; a few low-impact correctness/perf cleanups ride along. Findings judged overstated or cosmetic (e.g. the non-file:// URI drop) were left as-is.
- Responding to the CLI after it exits no longer crashes the extension host.
respondPermission/respondExitPlan/cancel/ the internal request + response writers all did a barethis.proc?.stdin.write(...). Theexithandler never clearedthis.proc, so after the CLI died the optional-chaining check still passed and the write hit a destroyed pipe — throwingERR_STREAM_DESTROYEDsynchronously, or emitting an async'error'with no listener, either of which became an uncaught exception in the host. Real trigger: clicking Approve/Reject/Cancel (or a lateterminal/outputack) after the CLI has crashed. All writes now route through a singlewriteLine()helper that checksstdin.writableand try/catches;start()registers astdin'error'listener; theexithandler dropsthis.procso later writes are skipped;dispose()'skill()is wrapped (it can throwEPERMon Windows if the process already exited). (src/acp.ts) - Killed terminal commands are no longer reported as a clean exit. A process terminated by a signal reports
code === null; the oldcode ?? 0masked that as exit code0, so the agent assumed an interrupted command had succeeded. Signal kills now map to the shell convention128 + signum(SIGTERM → 143) via a new pureresolveExitCode()helper. The sameexithandler also no longer clobbers an exit code already set by thespawn'error'handler. (src/terminal-manager.ts) - Windows: killing a terminal now kills the whole process tree. With
shell: true,spawnwraps the command incmd.exe;proc.kill("SIGTERM")only terminated that wrapper, orphaning long-running descendants (npm,node, …) that held file locks and blocked subsequent grok runs.kill()now usestaskkill /pid <pid> /T /F(viaexecFile, no shell) on Windows through a new purebuildKillPlan()helper; POSIX keeps the direct signal. (src/terminal-manager.ts) - Terminal output no longer corrupts multi-byte UTF-8 at a buffer boundary. Output was decoded with
Buffer.toString()per chunk, so a character split across the truncation point (or across two stream chunks) became a replacement char (�) — visible for any non-ASCII output (emoji, i18n text, localized Windows paths). Each terminal now decodes through aStringDecoderthat buffers incomplete sequences across boundaries. (src/terminal-manager.ts) - Per-request ACP timers are cleared on response. Each
request()armed asetTimeout(30 min for prompts) that was never cleared on success — the resolved request left a live timer and its closure pending until it fired and no-op'd. Timers are now tracked on the pending entry and cleared on response and on process exit. (src/acp.ts) #in file paths (C#/F# folders) now parses correctly. The "open file" ref parser used[^#]+, so a path with a#followed by a real#L<n>line suffix failed the match and fell through to opening the literal (wrong) path. Parsing moved to a pureparseFileRef()that anchors the#L…fragment to the end of the string. (src/file-ref.ts)- Dropping a huge file with Shift no longer freezes the window. Shift-drop read the entire file synchronously just to count lines; a multi-MB log stalled the host (and a 500 MB file could OOM it). Files over 10 MB now skip the line count and fall back to a no-selection chip. (src/file-ref.ts, src/sidebar.ts)
Surfaced while smoke-testing the rebuilt extension:
- "No session" error when sending before the session finished loading. The composer was interactive during the whole
start()+session/newwindow; sending then hitprompt()'ssessionIdguard and surfaced a "no session" bubble. The composer is now locked (spinner, disabled) for the entire session-start window — not just the priming step — and cleared on every start outcome (ready, missing-CLI, error). (src/sidebar.ts) - Plan-verdict protocol markers no longer leak into restored conversations. The host prepends
[Plan approved|rejected|cancelled]to the wire-level prompt for grok's benefit; live hid it, but on resume grok replayed the raw text and the marker showed in the user bubble. Replayed verdict messages now strip the marker; a marker-only verdict (no comment) renders no user bubble at all (matching live), while grok's reply to it still shows. (media/chat.js) - Restored plan cards land in the right place. A marker-only verdict was counted as a user message on replay but never counted live, desyncing the saved
afterUserMessagepositions so cards drifted to the bottom. Marker-only verdicts are no longer counted on replay, re-aligning positions with what the host persisted. (media/chat.js) - Live plan card now matches the (nicer) restored look. After picking a verdict, the live card drops its buttons + comment box and shows a single colored verdict label (
Approved/Rejected/Cancelled), instead of leaving greyed-out buttons and an uncolored label. (media/chat.js) - Can't delete the active session from history. Deleting the live session didn't stick (the CLI re-persists it); the delete button is now hidden for the active row (rename still available). (media/chat.js)
- New regression tests cover each fix, written to fail before the fix landed (TDD). Process layer:
writeLineswallows a throwing/destroyed stdin and skips a non-writable pipe (test/acp-integration.test.ts);resolveExitCodemaps signals to128 + signumand passes real codes (incl. 0) through; a killed process surfaces a non-zero exit;buildKillPlanissuestaskkill /T /Fon Windows andSIGTERMon POSIX; truncating mid-character emits no�(test/terminal-manager.test.ts); a resolvedrequest()leaves no armed timer (test/acp.test.ts). Pure path helpers:parseFileRef/shouldReadFileInline(test/file-ref.test.ts). Webview: marker stripping + marker-only suppression + position alignment on replay, the collapsed live verdict card, and delete hidden for the active session (test/plan-history-restore.dom.test.ts, test/plan-card.dom.test.ts, test/webview-ui.dom.test.ts). - Flaky CI fix.
test/acp-integration.test.tsshared onestderrarray binding across tests, so a prior test's late stderr could bleed into the next (reliably failedgate blocks fs/writeon Linux). Each test now captures into its own array, listeners are removed inafterEach, and the stderr assertion waits for its line (stderr lags the stdout response across pipes). Reproduced on Ubuntu via Docker before fixing.
The headline of this release reverses 1.1.0's "Plan mode stays disabled." The x.ai/exit_plan_mode ACP path is still broken in grok 0.2.3 — it treats any client response (result or error) as approval, so a plan can't be rejected at the protocol layer. Rather than wait on the CLI, this build enforces plan mode client-side, mirroring how YOLO mode is implemented.
- Client-side plan gate (src/plan-gate.ts, pure + unit-tested). While a plan is active, the extension blocks the two mandatory server→client choke points the agent cannot avoid:
fs/write_text_file— refused when the path resolves inside the workspace (grok's own~/.grok/sessions/<cwd>/<id>/plan.mdlands outside the workspace, so it's allowed — and snooped to recover the plan text, sinceexit_plan_modearrives withplanContent: null).terminal/create— refused unless the command is on a conservative read-only allowlist. The classifier is pipe-aware: a pipeline passes only if every|-separated stage is independently read-only, and shell metacharacters that chain, redirect, or smuggle code (>,;,&&,`,$(,{ }) block it outright. The allowlist covers read-only PowerShell pipelines (Get-ChildItem | Select-Object …,Get-Content,Test-Path, etc.) for native Windows, while excluding anything that writes or executes (Out-File,Set-Content,Invoke-Expression/iex,ForEach-Object,Where-Object).
- Asymmetric mode sync. Entering plan mode any way (including an agent-initiated
current_mode_update: plan) raises the gate; it's lowered only by explicit user action, never by the CLI's mode flapping (the false-approve emitscurrent_mode_update: default, which is deliberately ignored). - Mode picker copy updated. Plan mode is no longer marked disabled; its description now reads "Grok explores and proposes a plan; file writes and commands are blocked until you approve it." Matched in the README modes table and command list.
The plan-review card now offers three distinct outcomes, each mapped to a different ACP verdict and different downstream behavior. (Earlier in the iteration this was a two-button Approve / Keep planning UI; user testing surfaced that "I want to stop planning but not implement" had no clean exit, so we split it.)
- Approve & implement → verdict
approved. Drops the gate, returns the CLI to act mode, sends "Implement it now" as the next prompt. - Reject (with optional comment) → verdict
rejected. Keeps the gate up — you're still in Plan mode. If you wrote a comment, it's sent to Grok as a plain user message (not "revise the plan"); Grok decides whether to re-plan or answer. The chosen button highlights, a Rejected label appears. - Cancel → verdict
abandoned. Drops the gate, switches the CLI back to act mode, sends nothing. Use this to back out of planning entirely. Cancelled label appears.
Because grok 0.2.3 treats any exit_plan_mode response as approval, rejecting a plan would otherwise let the agent keep streaming "OK, the plan is approved, here's what I'll do…" before our follow-up prompt landed. On Reject / Cancel we now:
- Send the verdict to the CLI (it still mis-interprets it, but that's fine — the gate is authoritative).
- Immediately send
session/cancelto interrupt the in-flight prompt. - Set a content-only suppression flag (
suppressPlanReject) that dropsmessageChunk/thoughtChunk/toolCallevents for the rest of the turn — but notpromptComplete/agentEnd, so the webview'sbusystate still clears and the send button re-enables when the cancelled turn ends. - Post
agentResetto the webview, which removes the in-flight agent bubble from the DOM so the false-approval text never reaches the screen.
A finally in handleSend clears the suppression as a safety net so it can't get stuck.
Plan bodies render through the same Markdown pipeline as agent messages now (headings, lists, code fences) instead of monospace <pre> blocks. Applies to both the live review card and the restored history cards. A bug along the way: the .code-block position: relative rule was scoped under .msg.agent .body, so when plan cards contained fenced code their absolutely-positioned copy buttons escaped to the viewport and overlapped the Session-history / New-session header buttons. The scoping was loosened so any .code-block is its own positioning context.
grok overwrites ~/.grok/sessions/<…>/plan.md every time the agent proposes a new plan, so older plans in a session are physically gone from disk. We now persist each resolved plan to VS Code's globalState keyed by session id (SessionMetaOverride.plans), capturing text + verdict + afterUserMessage (the count of user messages already sent at the moment of resolution).
On session resume:
- The host posts a
planHistoryQueueto the webview beforesession/loadstarts. - The webview drains the queue inline as the replay streams: each plan card lands at its saved user-message boundary (right where the plan actually happened), not in a clump at the bottom. Legacy entries without a saved position fall back to the end of replay.
- The plan-gate state is restored from the last verdict via a pure helper (src/plan-restore.ts):
rejected→ re-raise the gate (you were mid-planning);approved/abandoned/ no log → leave the gate down (Cancel-then-restore no longer comes back stuck in Plan mode). Without this, the CLI's replayedcurrent_mode_updateevents would raise the gate even when the user had cancelled. - A separate
pendingPlanTextfield holds the displayed plan from render → verdict-click, sincelastPlanTextis cleared the moment the card renders. (Regression: without this, restored plans showed"(empty plan)"despite content being persisted.)
The history-popover, whole-row-click, and reasoning-trace-expand fixes from 1.1.0 are now covered by DOM tests so they can't silently regress again (see Testing).
- Two clearly separated tiers.
npm test(and CI) runs only grok-free tests — pure-logic unit tests plus DOM tests that drive the realmedia/chat.jsin a headlesshappy-domwindow. The grok-dependent probes live inresearch/*.cjs, require thegrokbinary, are run manually, and are never collected by Vitest or CI. - New pure module + tests for the persist / restore decision: src/plan-restore.ts extracts
appendPlanEntryanddecideRestoreState. 15 tests cover chronological append, immutability, text preservation (the wiped-lastPlanTextregression), and the verdict-driven restore decision for every verdict including the "rejects then cancels → Agent mode" case that previously came back in Plan mode. - New DOM tests in
test/plan-history-restore.dom.test.ts(12 tests) lock in the restore-flow rendering: positioned plans interleave at the right boundary, legacy plans flush at end of replay, multiple plans at the same position drain together, live user message drains queued plans,clearMessagesresets queue + counter, all three verdict buttons produce matching status labels,agentResetremoves the in-flight agent bubble and a subsequentmessageChunkcreates a fresh one. - New ACP integration tests in
test/acp-integration.test.ts(6 tests) drive the realAcpClientover JSON-RPC stdio against a ~150-line fakegrok agent stdiofixture (test/fixtures/fake-grok-acp.cjs). Covers the wire layer + plan-mode gate end-to-end: plan-snoop, workspace-write gate (on and off), terminal-create gate for mutating vs read-only commands. Encodes only what ACP requires, not grok's version-specific quirks, so it stays stable across CLI bumps. - 178 tests, ~1.4s, no network, no spawned
grok. The whole suite runs on a clean Ubuntu CI runner via.github/workflows/ci.yml.
- Effort-picker dots are now visually balanced. The "filled / empty" dots used the
●/○Unicode glyphs, which render at different sizes in most fonts (the empty one is visibly larger). Replaced with CSS-shaped spans so active and inactive states are the same diameter. - Spawning
.cmd/.batCLI paths now works on Windows. Node 18+ refuses to spawn those withoutshell: true(CVE-2024-27980).AcpClient.start()now detects them and setsshell: trueautomatically, so installs that resolve grok to a.cmdshim (or the test fake-CLI) start correctly.
- Native Windows is now first-class. xAI shipped a native Windows build of the
grokCLI (irm https://x.ai/cli/install.ps1 | iex), so the extension no longer needs WSL. This reverses the 1.0.3 "Windows isn't supported" onboarding panel.- Onboarding now detects Windows and shows the PowerShell install command (
irm https://x.ai/cli/install.ps1 | iex) with copy-to-clipboard and "Open terminal & run" — the same flow macOS/Linux already had, just with the right command per platform. - "Open terminal & run" sends the PowerShell installer on Windows and the
curl | bashinstaller elsewhere. The CLI locator (grok.cmd/grok.exe) and headless terminal manager (shell:true) already worked cross-platform.
- Onboarding now detects Windows and shows the PowerShell install command (
- README + CLAUDE.md updated: platforms now read "macOS, Linux, and Windows"; install steps show both the bash and PowerShell one-liners; build-from-source and uninstall lines note the
scripts\*.ps1equivalents.
Surfaced by the first native-Windows smoke test (against grok 0.2.3):
- Session-history popover now hides.
.history-popoversetdisplay:flex, which beat the UA[hidden]{display:none}rule (author styles win), so the dropdown rendered as an empty box on startup andhidden = truecould never dismiss it. A.toolbar-popover[hidden] { display:none }rule restores correct hide behavior — the popover now closes on select, click-outside, and new-session. - Whole history row is clickable. Resume was wired only to the name label even though the row showed a pointer cursor; the handler moved to the row, so clicking anywhere on it (name, meta line, or padding) resumes. Rename/delete buttons keep their own
stopPropagation. - Reasoning traces are expandable again. The "Thinking…/Thought for Ns" line is once more a collapsible header — click it to reveal the full trace (collapsed by default, rAF-coalesced while streaming). This reverses the 1.0.2 change that discarded the trace at the render layer.
- Decluttered welcome screen. Removed the static tips list (Enter to send / slash commands / file chips) from the empty-session screen.
- Restored user prompts when loading a session.
session/loadreplays history as session updates, butuser_message_chunkhad no route, so replayed user prompts fell through to the ignored generic-update branch and vanished — loaded sessions showed only the agent's half of the conversation. The chunk is now routed and rendered into a user bubble, with the in-flight agent turn committed at each user boundary. Replayed reasoning headers read "Thought" (no elapsed time, since the original timing isn't in the replay stream); live turns keep "Thought for Ns". - Inline diffs render as diffs. Fenced
```diffblocks now color added lines green and removed lines red using VS Code's owndiffEditorline backgrounds (so they match the editor's diff view), dim hunk/metadata lines, and wrap long lines instead of forcing horizontal scroll. Copy still yields plain diff text (the handler readsinnerText, since each row is now a block-level span). - Copy-code button no longer fights the text. It fades to 0.95 opacity on code-block hover and full opacity on button hover, so its background stays solid instead of blending into the first line of code.
- Agent-mode description corrected. As of
grok0.2.3, Agent mode acts directly and only prompts for changes it judges sensitive; the picker no longer claims it "asks for approval before making each change." Matched in the README modes table. - Plan-mode note de-emphasized. The "Reject / Abandon not yet supported" note under disabled Plan mode is now muted gray (
descriptionForeground) instead of warning yellow — it's an explanation, not an alert.
- Plan mode stays disabled. Re-tested the
x.ai/exit_plan_moderejection path live againstgrok0.2.3 over ACP: rejecting a plan with a JSON-RPC error still let the agent exit plan mode and execute the whole plan (it created the target file anyway). The CLI bug from the 0.1.x baseline is unchanged, so the Plan UI remains off.
- In-progress group header now shows only the current action in present-progressive form with three animated dots — Reading CLAUDE.md, Listing root folder, Running command, Searching web, Editing chat.js. Previous behavior accumulated
"X, Y +N"as new calls streamed in. - Completed multi-call summaries are now categorical instead of listing the first two calls: Explored N items, searched web, ran N commands. Reads and directory listings roll into "explored"; web search/fetch into "searched web" (no number); everything else into "ran N commands".
- Chevron moved to the right of the label and only appears on hover; rotates 90° when expanded.
- Friendlier detail labels —
web_search→ Web search,List .→ List root folder.
- Tighter heading and list spacing. Headings and lists no longer get a phantom
<br><br>stacked on top of their own CSS margin when preceded by a blank line. Block elements rely on their margins; only paragraph-to-paragraph transitions emit a<br><br>.
- User bubble min-width 40%. Short prompts no longer collapse to a text-width sliver against the right edge.
- Show more / Show less hover flips to a full-contrast inverted pair (
foregroundoneditor-background) instead of the semi-transparent secondary-button hover. Reads as a solid pill.
- Streaming rAF-coalesced.
agent_message_chunkandagent_thought_chunkno longer trigger a full markdown re-render per chunk. Updates batch into one paint per animation frame, with a synchronous flush onpromptCompleteso the final chunks always land. Long responses no longer jank.
- Removed dead
grok.defaultPermissionModesetting that was declared inpackage.jsonbut never read by any code. activationEventsdropped — modern VS Code auto-generates activation from the view contribution, so the explicit entry is redundant (linter-flagged).
- README restructured for a dev-reading audience. New top-level sections: Why an extension, not the CLI?, Key concepts (where state lives, modes, chips, permission cards), Architecture (diagram + session lifecycle + module map + design choices), Development. Slash-command tables moved to
docs/SLASH-COMMANDS.md. Marketplace install promoted; stale 1.0.1 VSIX path removed. - package.json
descriptionrewritten to lead with the "thin ACP client" framing instead of a feature laundry list. Added keywords:agent-client-protocol,acp-client,xai-grok.
- Shell-set
XAI_API_KEYnow works. Previously the alias toGROK_CODE_XAI_API_KEY(which the CLI actually reads) only fired for keys loaded from a workspace.env. Keys in the user's shell environment are now mapped too, matching what the README documents. - Broader auth-error detection. The auth-required onboarding panel now triggers on 401/403/
forbidden/api_key/credentialerrors as well as anything containingauth. Reduces the chance of users seeing a generic "Failed to start Grok" toast when the real cause is missing or invalid credentials.
- Windows shows an honest "not supported" panel. Native Windows users no longer get the macOS/Linux
curl | bashinstall command (which can't run in cmd/PowerShell). They get a clear note pointing to the README's WSL workaround. - "SuperGrok Heavy" labeling. The auth panel now names the Heavy tier explicitly (which is what carries the Grok Build entitlement) instead of the ambiguous "SuperGrok subscription".
- Removed precompiled
.vsixfiles fromreleases/. They were drifting frommainand the README's quick-install line pointed at the stale 1.0.1 build (which lacked the new onboarding UI). The marketplace listing is the canonical install path; build-from-source remains supported for development.
- Header hierarchy — H1 / H2 / H3 now scale visibly above body text (1.4em / 1.25em / 1.1em). Previously every heading rendered at body size, just bold.
- Body rhythm — agent message bodies use
line-height: 1.55for easier scanning; first-child headings drop their top margin to avoid awkward leading gaps. - Nested bullet markers — disc → circle → square at three depths (was disc → circle only).
- GFM tables — pipe tables with
|---|---|separator rows now render as bordered tables with bold tinted header rows and per-column alignment (:---,:---:,---:). Wrapped in anoverflow-x: autocontainer so wide tables get a horizontal scrollbar instead of breaking layout.
- Copy code button — fenced code blocks now show a hover-revealed "Copy code" button in the top-right corner. Click writes the code (raw text, no formatting) to the clipboard and flashes a checkmark for 1.5 s.
- User messages as bubbles — right-aligned, capped at 80 % width, no border, lighter
editorWidget-backgroundtint. InlineYOU/GROKrole labels removed; position alone signals sender. - Per-message actions — every user and agent message shows a hover-revealed action row at the bottom: timestamp (
6:47 AM) and a copy-message button that copies the raw markdown. - Show more / Show less — restyled to match the secondary-button family (proper padding, button background). Hover-reveal behavior unchanged.
- Reasoning hidden by design — the "Thinking..." indicator stays as a single line at standard text size; on completion it flips to "Thought for Ns". The actual trace text is discarded at the webview rendering layer (never enters the DOM) instead of being collapsed behind a chevron — there's no expansion affordance.
- In-sidebar onboarding — the missing-CLI and authentication-required errors no longer pop modal VS Code dialogs. The welcome panel itself swaps to an onboarding state:
- Missing CLI — shows the install command (
curl -fsSL https://x.ai/cli/install.sh | bash) with copy-to-clipboard and an "Open terminal & run" button, plus a "Re-check connection" button. - Auth required — explains the two paths: SuperGrok subscription (run
grok /loginin a terminal) or API key from console.x.ai withXAI_API_KEYin a workspace.env. Same "Re-check connection" hand-off. - All onboarding is deterministic — no AI calls happen before the CLI is reachable.
- Missing CLI — shows the install command (
- Welcome on every new session — clicking the new-session button now restores the welcome panel (logo, byline, version, tips) instead of leaving an empty pane. Previously the welcome only appeared on first activation.
- README now points to console.x.ai as the place to obtain an API key, alongside the existing
grok /loginflow.
- Mode labels — mode button now shows "Agent mode" / "Plan mode" (YOLO unchanged) in both the button and the picker. The button collapses to icon-only when the sidebar is narrow.
- Context donut — label changed from a percentage to
usedK/maxKformat (e.g.20K/200K) so the scale adapts to the model's context window. Tooltip shows exact token counts. - Settings gear — Model and Effort — added "Model and Effort" section header above the model+effort row; removed the sparkle icon from the model name button; model name font now matches the rest of the popover (13 px); fixed double-border between the model row and the Session section.
- Effort dots — increased dot size (10 px → 14 px); each dot now shows a descriptive tooltip ("Low — fast, lightweight reasoning", etc.).
- Summarize & Restart — when changing reasoning effort with an active conversation, a VS Code dialog offers Summarize & Restart or Just Restart. The summarize path sends a silent summary request to the current session, starts a fresh session with the new effort level, injects the summary as context (suppressed from the chat UI), and shows a "Context from previous session applied" banner. The original Grok summary response is hidden — only the banner appears.
- Resolved race condition where changing effort (or clicking New Session) showed "Grok exited (code 143)" errors from the previous session's process being disposed. Each session now carries a generation counter;
exitevents and errors from replaced sessions are suppressed. --reasoning-effortflag was never actually passed to the spawned process. Fixed — the flag is now read fromgrok.defaultEffortand forwarded on every session start.
- Bottom toolbar — removed the top bar entirely; model, mode, gear, and new-session controls now live in a responsive row at the bottom of the composer, next to the send button. The row shrinks gracefully to icon-only when the sidebar is narrow (labels disappear, icons stay).
- Mode selector redesign — each mode now has a distinct icon and a one-line description (Claude Code-style popover). Agent uses a shield icon, Plan uses a list-tree icon, YOLO uses a lightning bolt.
- Collapsible user messages — messages taller than ~3 lines collapse automatically with a gradient fade. "Show more" appears on hover; "Show less" collapses back.
- Tool call display — single tool calls render as a flat row with a human-readable label ("Read sidebar.ts", "Edit package.json", "Run npm test"). Multiple calls from one agent step collapse into a grouped header ("Read, Edit +2") that expands on click.
- Welcome screen — xAI Grok mark logo (white), "Grok Build" title, "by Pawel Huryn (The Product Compass)" byline.
- Reasoning effort — configurable from the gear popover (CLI default | Low | Medium | High | XHigh | Max). Changing effort restarts the session so the new flag takes effect. Also exposed as
grok.defaultEffortVS Code setting. - YOLO mode — auto-approves all permission requests in the extension without any CLI restart. Session and memory are fully preserved; switching back to Agent or Plan mode re-enables approval cards immediately.
- Gear / settings popover — single gear icon opens a panel with three sections:
- Session: Reasoning Effort picker, Compact conversation shortcut
- Config: Open global config (
~/.grok/config.toml), Open project config (.grok/config.toml), List MCP servers in a terminal - Debug: Show extension logs
- MCP server support — the extension passes
mcpServers: []insession/new(the CLI rejects the call without this field), and the CLI loads its own MCP configuration from~/.grok/config.toml/.grok/config.tomlalongside that empty list. Configure servers viagrok mcp addor by editing the config files directly.
- Removed
--reasoning-effort highdefault that was causing 403 errors on free/SuperGrok accounts (the flag is unsupported in stdio mode on some subscription tiers). - Removed stale
hintelement references that caused silent JS errors in the webview. - Popovers now position themselves above their trigger button (correct for a bottom toolbar) and clamp to stay within the panel width.
Initial preview. ACP client for grok agent stdio.
- Sidebar chat webview driven by
grok agent stdioover ACP - Streaming agent messages + separate thinking trace (collapsible, shows elapsed time)
- Permission-request cards with diff-editor preview (allow always / allow once / reject)
- Plan-mode toggle (
session/set_mode) + plan-approval cards (x.ai/exit_plan_mode) - Model picker (live
session/set_model) - Slash-command autocomplete sourced from
available_commands_update - Context-usage donut from prompt result
_meta.totalTokens - File chips with hide-toggle, Explorer drag-and-drop (Shift = embed inline)
- Right-click "Grok: Send File / Selection" in Explorer + editor
Ctrl+;opens sidebar;Alt+Ginserts @-mention for active file- Required server→client handlers:
fs/read_text_file,fs/write_text_file,terminal/{create,output,wait_for_exit,kill,release}