A thin VS Code sidebar client for xAI's Grok Build CLI. It spawns grok agent stdio as a headless child and drives it over the Agent Client Protocol (ACP) — session state, MCP servers, memory, and tool execution all stay inside that CLI process. Not a terminal launcher and not a re-implementation. Install the grok CLI first; the extension is a UI shell over it.
Works with a SuperGrok subscription or an xAI API key. Not affiliated with xAI.
Install free from the VS Code Marketplace →
You get the things a terminal can't give you: VS Code's native diff editor on a proposed edit before you approve it, permission cards with Allow always / once / Reject instead of [y/N] prompts, your active editor and selection as first-class @file context, session history you can resume/rename/delete, inline images and video from /imagine, voice dictation, and side-by-side placement next to other AI tools. It's a UI shell — the trade-off is that it's useless without the grok CLI installed.
A short tour of how the extension is wired (and the one place it's deliberately not thin — Plan Mode) lives in docs/architecture.md.
- VS Code 1.90+ (or a compatible editor — Cursor, Windsurf, VSCodium).
- The Grok Build CLI (
grok) on macOS, Linux, or Windows. The CLI ships a native Windows build, so the extension runs natively on all three — no WSL required (WSL2 + Remote-WSL still works if you prefer it). - A login: either a SuperGrok subscription (
grok /login) or an xAI API key. With a subscription you get Grok Build; with an API key you also get the grok-4.x models and grok-imagine. - For voice input only (optional):
ffmpegonPATH, and a separate xAI API key for Speech-to-Text (pay-as-you-go, ~$0.10/hr — your CLI login does not cover it). See Voice input under Features & capabilities.
1. Install the CLI and sign in.
macOS / Linux / WSL:
curl -fsSL https://x.ai/cli/install.sh | bash
grok /loginWindows (PowerShell):
irm https://x.ai/cli/install.ps1 | iex
grok /logingrok /login opens a browser and completes OAuth in one step. Prefer an API key? Get one at console.x.ai and set XAI_API_KEY in your shell or a workspace .env (the extension auto-loads it).
2. Install the extension.
From the Marketplace — search Grok Build by PawelHuryn, or:
code --install-extension PawelHuryn.grok-vscode-phurynOr build from source:
git clone https://github.com/phuryn/grok-build-vscode.git
cd grok-build-vscode
npm install
./scripts/install.sh # Windows: pwsh scripts\install.ps1Reload VS Code (Ctrl+Shift+P → Developer: Reload Window) and click the Grok icon in the activity bar.
Tip: Right-click the Grok icon → Move To → Secondary Side Bar to park Grok on the right, next to other AI tools.
Uninstall: ./scripts/uninstall.sh (Windows: pwsh scripts\uninstall.ps1) or code --uninstall-extension PawelHuryn.grok-vscode-phuryn.
- Open the Grok sidebar (activity bar icon, or
Ctrl/Cmd+;). - Type a prompt and press Enter. Grok streams its answer; a Thinking… line resolves to Thought for Ns — click it to expand the reasoning.
- Approve actions. When Grok wants to write a file or run a command it may raise a permission card — preview an edit in the native diff editor, then Allow once / always / Reject.
- Pick your mode (Agent / Plan / YOLO), model, and reasoning effort from the bottom toolbar and gear menu.
- Resume anytime — the clock icon lists past sessions for this project.
Click any feature to expand.
Permission cards with diff preview — see every edit in VS Code's native diff before you approve
When Grok proposes an edit, the card shows a path — N → M lines summary and an open diff → button that opens VS Code's native diff editor against the proposed content. Approve with Allow once / always, or Reject. The file is written only after you approve — no surprise changes to your files.
Modes — Agent, Plan & YOLO
| Mode | Behaviour |
|---|---|
| Agent (default) | Grok acts directly and may ask permission for a write or shell action it judges sensitive — a card appears in chat. |
| Plan | Grok drafts a plan first and cannot write to the workspace or run anything outside a read-only allowlist until you approve. Approve / Reject / Cancel from the card, each with an optional comment. Plan Mode is enforced by the extension — see How it works. |
| YOLO | The extension auto-approves every permission request. The CLI session is untouched — no restart, just a flag flip. |
Image & video generation — /imagine renders right in the chat
Type /imagine <prompt> (or /imagine-video <prompt>) and the result renders inline — images as a compact thumbnail (capped at 320px; click to open the source file), videos with native playback controls. Hover either for Copy path / Open in VS Code icons. Both are subscription-only Grok features, both survive a session resume, and even a multi-MB video plays. Editing a reference photo with /imagine works too. Wire-format details, for the curious: research/image-generation.md.
Voice input — hands-free dictation with live transcription
The microphone button in the composer dictates speech, transcribed by xAI's Speech-to-Text API. Click it, wait for the blue listening waves, and speak — words appear live as you talk. Say "grok send" to submit hands-free and keep listening for the next message (dictate while Grok responds; those messages queue and flush when it finishes). Click the mic to stop and keep any in-progress text.
The two-word send phrase is deliberate (it won't fire on a message that merely ends in "send") and is configurable via grok.voiceSendPhrase. Streaming is the default; set grok.voiceStreaming: false for one-shot batch mode.
Cost: Speech-to-Text is a separate, pay-as-you-go xAI product — $0.10/hr batch, $0.20/hr streaming, billed by audio duration. In practice ~500 words ≈ ½–1¢; a heavy 10,000-word day ≈ 10¢. It needs its own console.x.ai key (
grok.voiceApiKey/GROK_VOICE_API_KEY/XAI_API_KEY) — a SuperGrok subscription grants no API credit. Why it bypasses the CLI, and how the cost was measured end-to-end: research/voice-input.md.
File chips — your editor and selection as @file context
The active editor is added as an implicit chip automatically (toggle with grok.includeActiveFileByDefault). Drag from the Explorer, right-click → Grok: Send File, press Alt+G, or use the + toolbar button to add explicit chips. Chips are sent as @/path/to/file references — the CLI resolves them, so content stays current and doesn't bloat chat history. Hold Shift while dragging to embed the file's contents inline as a fenced code block instead.
Agent Dashboard — run several sessions at once, switch instantly, see which need you
Keep more than one session alive at the same time. Start a new session with + while another is mid-turn, and switch between them from the history dropdown — the one you leave keeps running in the background (mid-turn, mid-approval, anything), and switching back replays its exact state with no reload. Picking a session that isn't live anymore loads it from history as before.
Each row in the dropdown shows a status dot so you can see what every session is doing without opening it. It's gray at rest and only lights up when there's something to know:
| Dot | Meaning |
|---|---|
| 🔵 Blue | Working — a turn is in flight |
| 🟡 Yellow | Needs you — a permission, question, or plan is waiting |
| 🟢 Green | Finished, with output you haven't opened yet |
| 🔴 Red | Finished with an error you haven't opened |
| ⚪ Gray | At rest — idle, already read, or not loaded |
The green/red dot is an unread badge: it appears when a session finishes while you're looking at another one, and clears the moment you open it. It's persisted, so it survives idle cleanup and a VS Code restart — fire off a few agents, walk away, and the green dots are exactly the sessions with results waiting.
To keep a pile of background sessions from each pinning a live process, a session left untouched for an hour (or beyond ~8 live) is quietly shut down — never one that's working or waiting on you — and reloads from history on click, losing nothing.
Instant feedback — a Grokking… indicator the moment you send, with no startup pause
Every message you send shows an animated Grokking… placeholder immediately, so there's always feedback that Grok received it — it's replaced in place the instant the first thought, reply, or tool action streams in.
There's also no longer a long silent pause before that first response. Plan Mode needs a little hidden setup per session; it now happens quietly in the background the moment a session opens — instead of in front of your first message — so it's almost always done before you hit send. If you are quick, your message still appears right away. (What that setup is and why it's needed: How it works.)
Session history — resume, rename, delete, or clear past sessions
The clock icon lists the sessions the CLI saved for this project, most recent first. Click a row to resume — Grok replays the conversation, with inline images, plans, and reasoning intact. Hover to rename (pencil) or delete (trash); names default to the first message. The list loads the most recent 100 and pulls in older ones as you scroll, and the search box filters by name across your whole history — so it stays fast even with thousands of sessions. Clear all history at the bottom of the dropdown removes every session for this project in one step (after a confirm), keeping the one you're currently in. Renames are stored by the extension and never touch Grok's own files.
Tool calls — every read, edit & command, inline
Every action Grok takes appears in chat — a single flat row ("Read sidebar.ts lines 1–120", "Edit package.json", "Run npm test"), or a collapsed group ("Read, Edit +2") that expands on click.
Math & LaTeX rendering — equations render as math, not raw TeX
When Grok answers with LaTeX — inline \(…\), display \[…\], and environments like \begin{pmatrix} matrices, cases, integrals, sums, and Greek — the chat renders it as real typeset math via MathJax, bundled into the extension so it works offline with no network. Inline math sits on the text baseline in your editor's text color; display equations get their own centered block with horizontal scroll so a wide matrix doesn't overflow the narrow sidebar. A malformed expression shows a small inline error instead of blanking the message. Hover a display equation for actions: copy its LaTeX source, or export it as a PNG (your theme's background) or a transparent SVG tuned for a light or dark background. Bare $…$ is intentionally not a delimiter — it would mangle prose like "it costs $5 and then $10".
Mermaid diagrams — flowcharts and sequence diagrams render as diagrams
When Grok answers with a ```mermaid block — flowcharts, sequence and state diagrams, git graphs, class and ER diagrams, and more — the chat renders it as a real diagram via Mermaid, bundled into the extension so it works offline with no network. Diagrams are themed to match your VS Code light/dark mode and scroll horizontally so a wide flowchart doesn't overflow the narrow sidebar. Hover a diagram to copy its source, or export it as a PNG (your theme's background) or a transparent SVG re-themed for a light or dark background. If a diagram is still streaming or turns out to be malformed, the readable diagram source is shown instead — you never lose the content.
Model picker — switch models live, no restart
Click the model name in the gear popover. The model list comes from your CLI; switching is live with no restart in most cases. (A few models belong to a different agent and need a quick session restart — the extension detects that and handles it for you, carrying your context forward.)
Reasoning effort — trade tokens for depth
Gear icon → effort dots pick a level (none → xhigh), forwarded to the CLI as --reasoning-effort. Changing it restarts the session, with an optional Summarize & Restart to carry context forward. (Some subscription tiers may reject effort at the backend.)
Cost control — token donut, /compact & effort
Stay on top of spend without leaving the sidebar: the bottom-toolbar context donut shows usedK/maxK tokens after each prompt; /compact (gear → Compact) compresses the conversation when it fills, or + starts fresh. Reasoning effort trades tokens for depth, and voice STT cost is called out above.
MCP servers — whatever the CLI loads
MCP servers are configured in the CLI (~/.grok/config.toml global, .grok/config.toml project) — the extension picks up whatever the CLI loads:
grok mcp add playwright --command npx --args @playwright/mcp@latestOr edit the config via gear → Open global / project config, then click + to reload.
All grok.* settings (VS Code Settings → search "grok")
| Setting | Default | Notes |
|---|---|---|
grok.cliPath |
"" |
Path to the grok binary. Empty = auto-discover (~/.grok/bin/grok → PATH). |
grok.defaultModel |
"" |
Model ID for new sessions. Empty = CLI default. |
grok.defaultEffort |
"" |
Reasoning effort forwarded as --reasoning-effort (none / minimal / low / medium / high / xhigh). Empty = CLI default. Changing it restarts the session. |
grok.includeActiveFileByDefault |
true |
Auto-add the active editor as a context chip. |
grok.useCtrlEnterToSend |
false |
When true, Enter inserts a newline and Ctrl/Cmd+Enter sends. |
grok.chatFontScale |
100 |
Zoom for the chat panel only, as a percent (150, 200, …). Scales the whole chat UI without rescaling the rest of VS Code (unlike Ctrl/Cmd+Shift+=). Applies live; supports User (global) and Workspace (local) scope. |
grok.voiceApiKey |
"" |
xAI API key for voice Speech-to-Text — a separate console.x.ai developer key, not the CLI login. Empty = fall back to GROK_VOICE_API_KEY / XAI_API_KEY in the workspace .env. |
grok.ffmpegPath |
"" |
Path to ffmpeg for microphone recording. Empty = use ffmpeg from PATH. |
grok.voiceInputDevice |
"" |
Microphone device override. Empty = system default (Windows auto-detects the first DirectShow audio device). |
grok.voiceSendPhrase |
"grok send" |
Spoken phrase that auto-submits when it ends a transcription. Empty = disable hands-free sending. |
grok.voiceStreaming |
true |
Stream transcription live as you speak. false = one-shot batch mode. Streaming costs $0.20/hr vs $0.10/hr batch. |
VS Code commands & keys (Ctrl/Cmd+Shift+P → "Grok")
VS Code commands (not Grok slash commands):
| Command | What it does |
|---|---|
Grok: Open |
Open the Grok sidebar |
Grok: New Session |
Start a fresh session |
Grok: Pick Model |
Open the model picker |
Grok: Toggle Plan / Agent Mode |
Open the mode picker (Agent / Plan / YOLO) |
Grok: Send File |
Add the selected file to context |
Grok: Send Selection |
Send the current text selection to Grok |
Grok: Insert @-Mention |
Insert an @-mention for the active file into the composer |
Grok: Show Logs |
Open the Grok output channel (ACP messages, errors) |
Grok: Log Out |
Sign out of the Grok CLI (grok logout) and return to the sign-in screen |
| Key | Action |
|---|---|
Ctrl+; / Cmd+; |
Open Grok sidebar |
Alt+G |
Insert @-mention for the active file (when the editor is focused) |
Grok's own slash commands (/imagine, /compact, …) autocomplete in the composer when you type /, sourced live from your installed CLI version. Reference snapshot: docs/SLASH-COMMANDS.md.
The extension is intentionally thin: it speaks JSON-RPC over grok agent stdio and renders the results. Grok owns sessions, memory, MCP, models, and tool execution; the extension mediates file reads/writes, terminal requests, diff previews, the webview UI — and Plan Mode.
Plan Mode is the one place the extension is not thin. The CLI's exit_plan_mode is unreliable (it reports "approved" to any reply), so the extension enforces planning itself: a gate blocks workspace writes and non-read-only commands until you approve, and a hidden primer message teaches Grok to read your real verdict ([Plan approved] / [Plan rejected] / [Plan cancelled]) from your next message. The primer is fired eagerly and silently the instant a session goes live (not in front of your first prompt), and is kept lean so it doesn't add a startup pause — your first real message simply waits, in code, for the silent primer turn to finish (Grok runs one turn at a time) and is released the moment it does.
Full diagram, message flow, module map, and design notes: docs/architecture.md.
Build, test & repo conventions
npm install
npm test # grok-free unit/DOM/integration suite — exactly what CI runs
npm run package # → grok-vscode-phuryn-<version>.vsixnpm test is grok-free, so local ≡ CI — it never spawns the real binary. A separate, on-demand npm run test:live drives the actual grok end-to-end (handshake, restore, plan-mode, image/video gen) and is run before a release, not on every commit. Full test taxonomy and what's deferred to a future @vscode/test-electron suite: TESTS.md. Architecture and module map: docs/architecture.md.
Repo conventions: direct-to-main, no feature branches; commits explain the why; no speculative abstractions; the grok-free suite is the floor — every change keeps it green.
- Diff preview semantics. The diff editor compares the proposed old vs. new text against each other, not against the file on disk at preview time. The write happens via
fs/write_text_fileafter approval. This is an ACP constraint —tool_call_updatecarries the diff before the file is touched. - No worktree UI.
Grok: New Worktree Sessionis planned but not yet implemented. - View placement. The view defaults to the left activity bar; drag it to the secondary side bar manually if you want it on the right.
MIT






