Skip to content

Releases: smithersai/smithers

v0.24.2

17 Jun 11:36

Choose a tag to compare


title: "0.24.2"
description: "One-command onboarding via init, a Codex tool-access fix, the new fix-all-issues workflow, an mcp add fallback, and refreshed agent docs."

Smithers 0.24.2 makes onboarding a single command, fixes a Codex regression that stripped tool access from agentic tasks, adds the fix-all-issues workflow, and prints a manual fallback when mcp add fails. The rest is documentation: the home page and human guide were rewritten, ClaudeCodeAgent authentication is documented, and stale model IDs in the examples were refreshed. There are no breaking changes and no public API symbols were removed.


One-command onboarding

bunx smithers-orchestrator init now does the whole setup in one step.

  • init auto-installs the curated smithers skill into the coding agents it detects on your machine, so there is no more manual mkdir or curl.
  • It seeds an editable hello workflow (.smithers/workflows/hello.tsx with its prompt in .smithers/prompts/hello.mdx) as the smallest end-to-end example to run and tweak.
  • The home page and the For Humans guide were rewritten around this one-command flow.

Workflows

  • New fix-all-issues workflow: it decomposes every open issue, drives a Codex test-first implementation, runs dual review, and opens a PR. An excludeNumbers filter lets you skip specific issues.

Agents

  • Fixed a Codex regression where native structured output was always on, which removed tool access from agentic tasks. Native structured output is now opt-in, so agentic Codex tasks keep their tools.
  • Documented ClaudeCodeAgent authentication: subscription billing by default (the agent clears ANTHROPIC_API_KEY so the claude CLI bills your Pro/Max subscription), per-directory subscription login via configDir (CLAUDE_CONFIG_DIR), and API billing via apiKey.
  • Refreshed stale model IDs in the AI SDK examples and integration docs (claude-sonnet-4-5-20250929 to claude-sonnet-4-6, gpt-5.2-codex to gpt-5.5).

CLI

  • bunx smithers-orchestrator mcp add now prints a copy-paste manual fallback command when the underlying agent's mcp add fails, so registration is recoverable without leaving the terminal.

Quality

  • Added an agent-fluency eval suite: a weak-model, one-shot regression suite that exercises every Smithers feature.

Upgrade

bunx smithers-orchestrator@0.24.2

No configuration changes are required.

v0.24.0

16 Jun 16:38

Choose a tag to compare


title: "0.24.0"
description: "Minor and the first release since 0.23.0, folding in the unpublished 0.23.1 fixes: the gateway-client and gateway-react sync SDK moves to TanStack DB, new headless gateway and monitor commands, workflow input schemas in inspect and generated skills, jsx-runtime exports, plus engine deps and Loop correctness, detached-run event delivery, durable CLI-agent answers, project-anchored smithers.db, refreshed default model ids, and a broad sweep of Gateway, package-facade, MCP, and Pi-plugin fixes."

0.24.0 is the first release since 0.23.0, so it also carries every fix from the unpublished 0.23.1. The headline change is the Gateway SDK's move to TanStack DB collections. Alongside it are new CLI surfaces, engine correctness work, and a wide set of Gateway and package fixes from dogfooding.


Breaking changes

Gateway sync SDK rebuilt on TanStack DB

@smithers-orchestrator/gateway-client and @smithers-orchestrator/gateway-react now build their declarative sync SDK on TanStack DB collections, replacing the previous SyncClient, SyncCache, and SyncSubscriptionHub classes.

  • gateway-client exports createGatewayCollection, gatewayCollectionDefs, and the run-node helpers (flattenGatewayRunNode, snapshotToGatewayRunNode, reconcileSnapshotNodes), all backed by the gateway RPC and WebSocket transport. The framework-free core stays in gateway-client.
  • gateway-react adds the SyncProvider context, the GatewayCollections registry, and the hooks useSyncQuery, useSyncMutation, and useSyncSubscription, plus the typed shortcuts useGatewayQuery, useGatewayMutation, useGatewayRunStream, useGatewayRunTree, and useGatewayConnectionStatus, backed by @tanstack/react-db.
  • The removed exports have no drop-in replacement. Rebuild custom-UI data access on the collection hooks. apps/smithers now reads connection status from the registry observer rather than a workflows probe.

This is the Phase 1 foundation for the Postgres-of-record and TanStack DB sync work. @tanstack/react-db is a new dependency of gateway-react.

New features

bunx smithers-orchestrator gateway command

A new bunx smithers-orchestrator gateway command starts the multi-run Gateway RPC and WebSocket control plane headlessly, backed by the workspace database. It exposes listRuns, streamRunEvents, and streamDevTools, and prints the workspace and DB paths it serves on startup.

This is distinct from bunx smithers-orchestrator up --serve, which runs a single workflow and adds a lightweight serve layer. Use gateway when you need the full /v1/rpc/* control plane without launching a workflow run.

Fixes #255. PR #268.

Workflow input schemas in inspect and generated skills

bunx smithers-orchestrator inspect now returns the machine-readable JSON schema for each workflow's input alongside the run summary. The schema is also surfaced in generated skill docs: real field names, types, defaults, enums, and descriptions appear in the skill file rather than a generic placeholder.

Fixes #258. PR #272.

bunx smithers-orchestrator monitor command

A new bunx smithers-orchestrator monitor [RUN_ID] command watches a running workflow, diagnoses failures with tool access, optionally applies a gated self-fix (--autofix), and writes an HTML report. It drives the seeded monitor workflow and supports --detach and --input. The CLI also auto-mounts workspace custom UIs.

smithers-orchestrator jsx-runtime and run-control exports

The smithers-orchestrator package now ships ./jsx-runtime and ./jsx-dev-runtime subpath exports with types, so /** @jsxImportSource smithers-orchestrator */ resolves directly from the published package. The public barrel also exposes the run-control and output API: runWorkflow, getRun, listRuns, approveNode, denyNode, and renderFrame.

Driver and graph surfaces

  • The driver exposes resolved worktree paths to workflow code, and jj worktrees resolve as real git worktrees for child tools.
  • A typed run-annotations option is exposed on the driver.
  • Sandbox egress props thread through the graph.

Bug fixes

Engine: dependency and Loop correctness

  • deps resolve across loop boundaries, and unsatisfiable deps fail loudly. A deps key pointing at an upstream id that needed remapping used to silently no-op. It now resolves across loop boundaries and raises DEPENDENCY_DEADLOCK when a dependency can never be satisfied.
  • Parallel <Loop> iterations no longer starve until total run quiescence. All three scheduling paths shared the bug: the legacy engine ladder returned await-trigger or schedule-retry before reaching ready-loop handling, the workflow session gated loop progress behind a global-idle check, and the WorkflowDriver's Promise.all batch barrier held all completions until the batch finished. Ready loops now advance whenever a loop node completes, the driver processes completions incrementally, continue-as-new handoffs stay quiescence-only, and unhandled task failures keep precedence over further iterations. Fixes #267. PR #271.

Detached runs and approvals

  • Gateway streams persisted events from detached runs. streamRunEvents returned heartbeats only for runs started with bunx smithers-orchestrator up -d, because the Gateway host had no visibility into events the detached process wrote. A built-in out-of-process event bridge now tails _smithers_events for runs the Gateway host did not execute. It is on by default and configurable via the outOfProcessEventBridge (boolean) and outOfProcessEventBridgePollMs (number) Gateway options. Fixes #254. PR #266.
  • Supervisor auto-resumes detached runs after an approval is recorded. Approving a gate on a detached run recorded the decision but left the run stuck in waiting-event. The supervisor now queries _smithers_approvals joined to pending nodes on every poll and spawns a detached resume for any decided-but-unconsumed approval. bunx smithers-orchestrator approve prepends a resume command to its CTA when the run is still waiting-event, and bunx smithers-orchestrator why emits an approval-decided-resume-required blocker. Fixes #242. PR #250.
  • approveNode no longer strands runs on post-commit bridge failure. Approving through the MCP resolve_approval tool could return an error while consuming the approval row, leaving the run permanently stuck. approveNode commits the decision first, then treats a post-commit bridge failure as non-fatal: it warns and returns success, because the committed approval is the durable source of truth and the engine processes it on resume. Fixes #241. PR #246.

CLI agents

  • CLI-agent answers survive captured-stdout truncation. Long runs overflow the 200 KB captured-stdout cap; the capture kept the head and dropped the terminal result event, so the engine's context-free JSON repair persisted schema-valid but amnesiac rows that claimed the task "was not present in the available context." The capture now keeps the stream tail and reports truncation, the agent prefers the live interpreter's completed answer parsed before the cap, token usage falls back to the completed event, and a warning fires on truncation. Fixes #277.
  • Completed-event answer is preserved when result.text is empty, and structured-output extraction trusts the CLI output file over stdout fallbacks.
  • CodexAgent honors the <Worktree> cwd and the brittle --cd worktree fallback is removed.

CLI and database

  • smithers.db resolves from the project anchor, not raw CWD. Running commands from inside .smithers/ (or any subdirectory) created a stray smithers.db and silently fragmented run history. Both apps/cli and packages/smithers now walk upward for a .smithers/ anchor (excluding HOME and the filesystem root), stop at an anchor whose DB does not exist yet rather than crossing the project boundary, warn when multiple databases are found, and include the consulted DB path in RUN_NOT_FOUND. Fixes #240. PR #248.
  • bunx smithers-orchestrator output resolves camelCase output tables. A node whose output table schema key is camelCase printed null, because state stores the schema key verbatim while the physical table is snake_case. The lookup now tries the stored name first and falls back to the snake_case translation only when no physical table with the stored name exists. Fixes #276.
  • Large command stdout is flushed synchronously so piped output is no longer truncated. Fixes #235.
  • Default agent scaffolding no longer ships broken providers. Generated agents.ts now leads the smart and smartTool pools with a working Claude subscription provider, drops the opencode-over-Anthropic-API entry from the defaults, and fails with NO_USABLE_AGENTS rather than writing a configuration that cannot run. Fixes #236. PR #270.
  • **bunx smithers-orchestrator ui GATEWAY_UNREACHA...
Read more

v0.23.0

08 Jun 22:50

Choose a tag to compare


title: "0.23.0"
description: "Upcoming release: a PostgreSQL and PGlite persistence dialect, a deterministic local Plue e2e harness, and a validation slideshow capture pipeline."

**Unreleased / upcoming.** This entry documents work landing in the next release. The current published version is `0.22.0`, and `0.23.0` has not shipped to npm yet, so treat the features below as in progress until the release notes go live.

This is the largest release since Smithers went public. The durable engine moves to a production-grade database, the snapshot store learns to capture an agent's worktree as it works, two new products ship, and the agent, gateway, docs, and benchmark surfaces all grow.

The persistence layer now runs unchanged on PostgreSQL or an embedded PGlite through a SQL dialect seam, so the crash-and-resume guarantees that worked on SQLite hold on Postgres. Durable workspace snapshots (behind a flag) checkpoint a jj worktree as the agent edits it and restore it on resume. Smithers is a streaming Cerebras chat PWA deployable to Cloudflare in one command, now with run inspector, review, and live jjhub surfaces over a real gateway. UltraGrill is an open-ended real-time collaboration workflow with a live custom UI. Custom Workflow UIs get a full guide and two end-to-end examples, the gateway gains a typed extensions and sync backplane, and a create-workflow metaworkflow turns a plain-English ask into a runnable workflow. Agents can now escalate to a human mid-task and block on the answer (bunx smithers-orchestrator ask-human and the ask_human MCP tool), connect to any stdio MCP server (createMcpToolset), and run against a bundled jj binary with no system install. Alongside it: a bunx smithers-orchestrator usage quota command, four benchmark harnesses plus a defending-code example, a Vibe (Mistral) agent and a corrected Antigravity CLI, an interactive bunx smithers-orchestrator init ceremony, a launch article, and a correctness sweep across JSON extraction, DevTools stream recovery, agent schemas, and the test gate.


PostgreSQL & PGlite Persistence

The headline of 0.23.0 is a PostgreSQL and PGlite dialect for the persistence layer (packages/db/src/dialect.js). Smithers' storage layer is hand-written SQL, and a new dialect seam lets that exact SQL run on SQLite or Postgres without rewriting a query. The dialect handles placeholder translation (? to $1, literal and comment aware), DDL type and autoincrement mapping (INTEGER to BIGINT, REAL to DOUBLE PRECISION, BLOB to BYTEA, INTEGER PRIMARY KEY AUTOINCREMENT to BIGSERIAL PRIMARY KEY), information_schema introspection, BEGIN vs BEGIN IMMEDIATE transaction semantics, and json_extract to ->> rewriting.

A Smithers run crashing and resuming from its last durable checkpoint

  • createSmithersPostgres boots node-postgres or an embedded PGlite. A new async API stands up the engine against a real Postgres server or an in-process PGlite over a local socket, so you can develop against an embedded database and deploy against managed Postgres with the same code. opts is a discriminated union ({ provider: 'postgres', connectionString?, connection? } or { provider: 'pglite', dataDir? }), the result extends the createSmithers API with a close() teardown, and BIGINT values are parsed back to JS numbers so timestamps and counters match SQLite behavior. pg, @electric-sql/pglite, and @electric-sql/pglite-socket are optional dependencies, and the synchronous bun:sqlite path is untouched, so existing SQLite users see no change.

  • The dialect is exercised end to end on real PGlite. New test suites boot an in-process PGlite and run the db dialect, the engine builder, and the time-travel fork/snapshot paths against it (packages/db/tests/db-postgres-dialect.test.js, packages/engine/tests/create-smithers-postgres.test.jsx, effect-builder-postgres.test.js, time-travel-postgres.test.js), so the snapshot, replay, and branch-listing Effects are verified on the new dialect rather than only on SQLite. The time-travel fork and snapshot effects route their upserts through the dialect-aware internalStorage.upsert on Postgres instead of Drizzle's SQLite-only onConflictDoUpdate.

Durable Workspace Snapshots

A new engine substrate snapshots an agent's worktree as it works, so a crashed or resumed run can restore the files on disk to match the agent's transcript. The whole feature is gated behind SMITHERS_DURABILITY_SNAPSHOTS=1, off by default, and stays inert when a task has no jj worktree.

  • A serial snapshot service per worktree. createSnapshotService (packages/engine/src/snapshotService.js) runs a per-cwd serial queue so jj never races itself. Tier 2 snapshots (source watch) dedup an unchanged commit id, Tier 1 boundaries (hook/wrap) always record a checkpoint, and a monotonic seq is kept per (run, node, iteration, attempt). A failed capture or db write becomes a recorded gap through onGap and never throws into the agent path.

  • A zero-dependency workspace watcher. createWorkspaceWatcher (packages/engine/src/workspaceWatcher.js) is a recursive fs.watch with a trailing-idle debounce (150ms default) that fires onSettle when the tree quiets. It ignores .jj/ and .git/ unconditionally and degrades to a safe no-op when the path cannot be watched. The watch backend is a dependency-injection seam.

  • One handle wires it around an agent attempt. startDurability (packages/engine/src/startDurability.js) composes real jj capture (captureWorkspaceSnapshot) and the db adapter into a SnapshotService and a Tier 2 watcher. It returns a no-op handle when the feature is disabled, there is no cwd, or the worktree is not a jj repo, and on stop() it closes the watcher and does a final flush. The engine wires it around each agent attempt and calls durability.stop() in finally.

  • Resume restores the worktree first. restoreWorkspaceToLatestCheckpoint (packages/engine/src/restoreWorkspace.js) lists a task's checkpoints, picks the chronologically latest (ties broken by attempt then seq), and reverts via jj restore --from <commit_id>. It never throws and returns a structured { restored, reason?, commitId?, seq?, error? }. The engine calls it before startDurability, only when the flag is set and a session is being resumed, so it stays inert for existing runs.

  • Two tables and a granular jj handle back the store. _smithers_workspace_states records deduped jj working-copy handles (the jj_operation_id is the durable restore handle, since the operation log survives gc), and _smithers_workspace_checkpoints records one never-deduped row per snapshot boundary (migrations 0015 and 0016). In the vcs package, captureWorkspaceSnapshot(cwd?) returns { commitId, changeId, operationId } under a 1500ms timeout and returns null on any failure rather than throwing. An end-to-end test drives a file-writing agent through a real jj worktree with the flag on and asserts both tables fill; with the flag off, nothing is recorded.

Smithers: a Cerebras Chat PWA

This release introduces Smithers (apps/smithers), a streaming chat PWA powered by Cerebras gpt-oss-120b through TanStack AI and deployable to Cloudflare via Alchemy infrastructure-as-code. The browser streams replies from a Cloudflare Worker over Server-Sent Events and never holds the API key; the Worker runs Cerebras server-side with the key bound as a Worker secret. bun dev at the repo root now boots this app (the old Studio 2 dev script moved to bun dev:studio).

The Smithers Cerebras chat PWA in dark mode showing the welcome hero and composer

  • Streaming chat with a command menu, voice dictation, and a grill-workflow graph. The frontend renders streaming Markdown replies, a Cmd-K command menu, toast notifications, voice dictation, and an "ask me" grill workflow graph drawn with ReactFlow. The chat backend is a single Cloudflare Worker route (POST /api/chat) that runs Cerebras through TanStack AI's chat() and returns a Server-Sent-Events response.

  • Flash-free dark mode tokenized end to end. Every color in the app is a semantic CSS custom property defined in :root, and dark mode ships two ways: the OS preference via @media (prefers-color-scheme: dark) and an explicit [data-theme] override. An inline script resolves the theme before first paint so there is no flash, color-scheme keeps native controls and scrollbars matched, and the workflow graph follows via ReactFlow's colorMode.

  • Renamed and rebranded. The app moved from apps/search-pwa to apps/smithers (package @smithers-orchestrator/smithers), and every remaining "Huey" string was rebranded to "Smithers" across UI text, the manifest, the Ask Me system prompt, the Alchemy app id, and the localStorage key prefix. The Alchemy deploy ships the Worker plus the built PWA as static assets with SPA routing.

Smithers PWA: Surfaces, Auth, and Live Backends

The chat shell grew into a full operator surface. The app was re-architected onto TanStack Router with a Zustand-only state model (no useState/useEffect for app state, URL as the source of truth), and a Cloudflare Worker now fronts real backends.

  • **A reverse-proxy Worker for auth, th...
Read more

v0.22.0

01 Jun 01:14

Choose a tag to compare

0.22.0

Major release: Smithers ships Smithers Studio 2, a ground-up dark agent operations console with three-tier progressive-disclosure navigation, a Cmd-K/Cmd-P command palette, a color-equals-run-state design system, and a full suite of live data surfaces — Runs, Workflows, terminal, chat, JJHub, and DevTools — all wired to real Gateway and workspace-API backends. Studio 2 now defaults to a chat-first shell — one conversation that drives issues, runs, PRs, and workflows, with the tabbed shell one toggle away. Alongside Studio 2, this release de-mocks every e2e test onto real seeded backends, adds a live Gateway Run Chronicle plus init-pack-generated per-workflow UIs and a new smithers ui command, grows the CLI with a starter gallery and GEPA-style prompt optimization, and ships a drop-in agent-facing smithers skill so a coding agent can drive Smithers without reading the whole docs site. A focused security pass closes XSS, local-RCE, path-traversal, DoS, and auth-bypass vectors, and a broad correctness sweep hardens the engine, time-travel, DB, graph, and observability stacks.


Smithers Studio 2

The headline of 0.22.0 is Smithers Studio 2 (apps/smithers-studio-2), a ground-up rebuild of the agent operations console. The original "spaceship" studio put all ~25 views on one flat sidebar with equal weight, and color everywhere meant nothing stood out. Studio 2 inverts that: a dark, near-monochrome console where saturated color only ever signals run state, and where the surfaces you reach for daily are the only ones permanently visible. It is built on Vite, React 19, and Zustand, and ships as a web app today with a desktop shell in the wings. The design and information-architecture contracts are written down first, in apps/smithers-studio-2/docs/DESIGN.md and docs/UX.md.

Smithers Studio 2 home screen with recent workspaces and a live operations strip

  • Three-tier progressive-disclosure navigation. The shell exposes exactly four primary surfaces that are always visible — Home, Runs, Workspace, Workflows — backed by a single nav registry in src/shell/navRegistry.tsx. Six secondary surfaces (Issues, Landings, Workspaces, Memory, Scores, Search) live in a collapsed-by-default "More" group, and three developer surfaces (DevTools, SQL Browser, Logs) hide entirely until opted in. Every surface registers exactly one entry in buildNavRegistry, so a new view appears in the right sidebar section and the palette automatically without touching AppShell or Sidebar.

  • Registry-level developer-mode gating. Developer surfaces are gated by conditional construction, not CSS. When the persisted developerMode flag (localStorage key studio.developerMode) is off, the developer items are simply not added to the registry, making them unreachable by sidebar, command palette, and deep-link — the sidebar is byte-for-byte identical to a non-developer session. Toggling the flag off while sitting on a developer surface falls the user back to Home rather than stranding them on an unregistered route.

  • WELCOME → FOCUS → DETAIL altitude model. The app has three navigational altitudes you can always locate yourself in. Home (src/home/Home.tsx) is the WELCOME screen — a calm centered column with two verbs (Open Folder, launch a workflow), a recent-workspaces list, and a live Operations strip; opening a folder routes to Workspace (FOCUS) and launching a workflow routes to Runs. DETAIL always lives in an inspector pane inside a surface, never as a new top-level view, which is how the old runs/snapshots/approvals/scores/logs nodes collapse into a single "what are my agents doing" surface.

  • Cmd-K / Cmd-P command palette. A universal accelerator (src/shell/CommandPalette) reaches every registered surface across all three tiers plus contextual commands, so the rail can stay tiny without trapping power users. Prefix pills switch modes — > commands, / run workflow, @ open file, ? ask AI — parsed by parseQuery.ts, and results are grouped by section ("Go to", "More", "Developer", "Commands"). Developer surfaces only surface in the palette when developer mode is on, since the palette is built from the same gated registry.

Opening the command palette and navigating to a surface

  • Dark design system where color equals state. All visual tokens live as CSS custom properties in src/theme.css and are mirrored as a single TypeScript export in src/theme/themeTokens.ts, so no component hardcodes a hex value. Three stacked surfaces (--bg, --surface-1, --surface-2) carry depth, text is white at three opacities, and the only saturated colors are run-state signals: --accent blue for live/running, --success green for completed/approved, --warning amber for waiting/pending-approval gates, and --danger red for failed/denied. Motion is limited to 120–150ms ease-out feedback on real state changes and collapses to zero under prefers-reduced-motion.

  • Zustand store as the single shell state. src/useStudioStore.ts holds the whole top-level shell state — active view, developer mode, command-palette open/query/selection, and terminal tabs — with detail state colocated in each surface's folder rather than the global union. The store keeps a backwards-compatible terminalworkspace view alias so existing hotkeys and tests that target the old terminal id keep resolving after the terminal moved inside the Workspace surface.

  • Global hotkeys and a runs badge. The shell owns global hotkeys via useHotkey (src/useHotkey.ts): Cmd-P and Cmd-K open the palette, Cmd-T opens a new terminal. The Runs nav row carries an unread approvals badge (runsBadgeStore) so time-sensitive gates surface without digging, consistent with the IA rule that anything that is a state of a run is disclosed inside the Runs surface rather than as a sibling nav node.

  • Electrobun desktop packaging direction. An Electrobun shell (electrobun.config.ts, electrobun/) wraps the already-built Vite output rather than rebuilding it: dist/ is copied verbatim as the mainview view and a Bun main process opens a single 1280×820 window pointed at views://mainview/index.html, with a SMITHERS_STUDIO_DEV_URL override that points the webview at the Vite dev server for HMR. It installs a minimal native macOS menu (app/quit, Edit roles so copy-paste works in the webview, Window controls). The Vite web app stays the source of truth — the desktop shell adds no UI of its own.

Chat-First Shell

The default Studio 2 surface is now a chat-first shell (src/chat): one long conversation with the agent is the whole app, replacing tab navigation. Rather than clicking between surfaces, you tell the agent what you want and it manages issues, runs, PRs, workflows, and sandboxes for you, showing rich data inline through a sandboxed-HTML tool or in an overlay that can split beside the chat or sit full-screen over it. Slash commands map to Smithers CLI features and open a default UI as an overlay — surface overlays reuse the existing real Studio surfaces verbatim, and the terminal overlay reuses the real Ghostty PTY.

The switch is non-destructive: the classic tabbed shell (shell/AppShell) is untouched and one toggle away via the /studio command or the project-bar gear, gated by a new shellMode flag in the studio store that App switches on. Concepts the backend does not model yet — projects, per-message tags, the chat feed, the agent HTML tool, and overlays — are fed from typed seams with mock implementations behind real-ready interfaces (grep SEAM:), while everything that already has a backend (gateway, workspace API, the reused surfaces, the PTY) is wired to it directly. Pure helpers (parseSlash, resolveSlashAction, tagColor) are split out and unit-tested, and the shell has real-backend e2e coverage.

Studio 2 Data Surfaces and Workspace

Behind the shell, every surface is a real, live view over the Gateway and workspace-API backends. Here is what each one does.

  • Live Runs inspection surface. The Runs view pairs a run-history rail (with an approvals filter and a live nav badge) against a responsive tree-plus-inspector layout that splits with a draggable divider above 800px and falls back to a modal inspector sheet on narrow widths. It streams run.event frames over the Gateway's real WebSocket protocol (a connect + streamRunEvents subscription routed through /v1/rpc), debouncing event bursts into a single getRun + getDevToolsSnapshot refresh of only the selected run so a chatty run never hammers the list RPCs. The RunToolbar exposes the lifecycle actions valid for the current state — cancel a live run, resume a terminal one — plus a frame scrubber for time-travel rewind.

Studio 2 Runs surface streaming live run events into a tree and inspector

  • Inline approval gates. When a selected node has a pending approval, an ApprovalGate renders directly inside the inspector with an optional note and approve/deny buttons that post submitApproval. Approvals are surfaced three ways from one source of truth: the history-list filter, the inline gate, and the s...
Read more

v0.21.0

24 May 17:54

Choose a tag to compare

0.21.0

Minor release: Smithers now has tracked database migrations, a hosted
control-plane package, a default Gateway console, stricter CLI input
handling, a process-backed sandbox runner, hardened hot reload, and a
recordable keyboard-driven demo deck.


Database

  • Tracked SQLite schema migrations. Internal schema setup now records
    applied migrations in _smithers_schema_migrations and runs the DB
    migration path idempotently on startup. The legacy startup
    pseudo-migration code was moved into the DB package, including the
    rebuild that restores missing run foreign keys on split schema tables.

  • SqlMessageStorage was split into a lowercase module. The old
    SqlMessageStorage.js entry is now a compatibility shim over
    sql-message-storage.js, keeping imports stable while making the
    implementation easier to maintain.

  • Node diff cache upserts are safer. Cache writes now use a stricter
    upsert path and have regression coverage for repeated writes and schema
    setup.

Runtime And Engine

  • Durability and operator flows were hardened. Runtime recovery,
    workflow metadata generation, and operator-console paths received
    regression fixes before landing this release.

  • Workflow metadata and skill generation are now first-class. The CLI
    can discover richer workflow metadata and generated workflow skill files
    from the seeded workflow pack.

  • Cache policy logic was extracted and tested. Engine cache scoping,
    TTL behavior, and schema validation now live behind a focused cache policy
    module with unit and integration coverage.

  • Hot watch is more robust. The hot reload watcher now handles rapid
    file changes and rebuild boundaries more defensively.

  • jumpToFrame is hardened. Time-travel frame jumps now preserve the
    expected audit and state invariants more reliably.

CLI

  • Antigravity CLI support landed for Google agent workflows.
    Smithers now includes AntigravityAgent, CLI detection, init templates,
    hijack support, account-provider environment wiring, trace normalization,
    and docs for the agy CLI. GeminiAgent and GeminiAgentOptions remain
    available for legacy and enterprise Gemini CLI setups, but are now marked
    deprecated in favor of Antigravity for new Google CLI integrations.

  • JSON arguments are preflighted before workflow modules load.
    Malformed --input and --annotations values now fail with Smithers
    errors instead of surfacing raw runtime stack traces.

  • --input - and --annotations - read JSON from stdin. Stdin JSON is
    capped at 1 MiB, parsed before detached child processes are spawned, and
    documented in the CLI reference.

  • Raw JSON stdout is preserved. JSON-format command output avoids
    accidental human formatting so automation can parse it reliably.

  • Argument parsing helpers were split out. Shared argv and JSON parsing
    utilities reduce duplicated command handling and make flag behavior easier
    to test.

  • Architecture budgets are enforced. scripts/check-architecture-budget.mjs
    now guards major CLI, engine, and Gateway files from growing past agreed
    line-count budgets.

Gateway And Control Plane

  • New @smithers-orchestrator/control-plane package. Hosted
    deployments now have a tested SQLite store for organizations, teams,
    projects, billing records, identity providers, usage events and limits,
    secret manager references, and audit export.

  • Facade export for hosted control-plane APIs. Consumers can import
    ControlPlaneStore through smithers-orchestrator/control-plane or the
    scoped package.

  • Default Gateway console. Gateway can now mount a built-in operator UI
    for workflow inventory, active runs, approvals, and common run actions.
    The UI was extracted into focused auth, bundle, and default-console
    modules so custom Gateway apps have a cleaner integration point.

  • Production hardening docs. Deployment docs now cover durable storage,
    Gateway tokens, sandbox boundaries, cache policy, audit trail retention,
    and release checks.

Sandbox

  • <Sandbox> now supports injectable providers. Workflow authors can
    pass a provider object or registered provider id instead of hardcoding a
    runtime such as Docker. Provider-backed sandboxes run remotely, return a
    validated result bundle, and record the same sandbox lifecycle events as
    built-in transports.

  • Sandbox result bundles can carry diffBundles. Providers may return a
    structured result with output, remote ids, artifacts, logs, and a
    diffBundle; Smithers materializes the bundle, review-gates changes, and
    applies accepted diffs through the engine diff-bundle path.

  • Runtime selection now fails closed. Unknown runtimes are rejected, and
    Docker no longer silently falls back to bubblewrap when Docker is
    unavailable. The legacy local transport path still defaults to bubblewrap
    only when no provider and no runtime are supplied.

  • Nested sandboxes are explicit. Sandbox execution tracks parent sandbox
    context and rejects nested sandboxes unless the nested component opts in
    with allowNested, making diff-base, cleanup, quota, and secret-boundary
    risks visible at the API boundary.

  • Freestyle is documented as a third-party sandbox provider. The new
    examples/freestyle/ adapter shows how a provider can create a Freestyle VM,
    ship request files with additionalFiles, run vm.exec(), read a result
    JSON file, and return a Smithers sandbox bundle. The sandbox docs now use
    this as the provider-extension example.

  • Process runner transport. Sandbox execution can now use a
    process-backed runner with request/result bundle boundaries and persisted
    sandbox metadata.

  • Bundle safety was tightened. Bundle manifests, produced diffs,
    artifact paths, and cleanup behavior now have stronger path containment,
    size-boundary, and review-decision coverage.

Eval Suite And DevTools

  • Workflow eval suites landed in the CLI. smithers eval can run
    workflow cases, write reports, detect duplicate run IDs, dry-run plans,
    and evaluate exact or partial output assertions.

  • Eval assertions are more flexible. outputContains now matches array
    entries outside prefix position, and docs cover the continued status.

  • DevTools tree utilities gained nested ordering coverage. Tests now
    assert depth-first task collection through nested containers.

Demo, Docs, And CI

  • Keyboard-driven demo deck. .smithers/scripts/run-demo.sh now launches
    a 35-slide terminal deck with keyboard navigation, replay, mute, auto mode,
    and a live durability/time-travel sequence.

  • Dynamic demo workflow. A lightweight dynamic workflow was added for
    smoke-testing task graph behavior without running the full deck.

  • Demo output is cleaner for recording. The live deck no longer inherits
    NO_COLOR into forced-color child commands, avoiding Bun color warnings
    during the durability slide.

  • Workflow catalog and docs were refreshed. The seeded workflow catalog,
    MCP/server docs, caching docs, eval quickstart, and quickstart copy were
    updated alongside the new behavior.

  • CI now runs the test gate on pull requests. The GitHub workflow includes
    the repository test job, and agent timeout tests were hardened so idle
    timeout coverage is less timing-sensitive.

v0.20.4

21 May 11:06

Choose a tag to compare

0.20.4

Patch release: the smithers bin now resolves the workspace-local
install through package.json instead of the npm/pnpm .bin/ shell
shim, fixing the bun: Expected ")" but found "$(echo " crash that
hit any project with a local .smithers/ install. Engine prompts for
schema-driven outputs now demand raw JSON without code fences.


CLI

  • Bin delegation now resolves the workspace-local install through
    package.json.
    When you run smithers inside a directory whose
    .smithers/node_modules/ ships its own copy of
    smithers-orchestrator, the global bin re-execs into that local
    copy so the workflow runtime — engine, react-reconciler,
    components, React — all load from a single module tree. Previously
    the global bin handed the local pnpm/npm .bin/smithers shell
    shim
    to process.execPath (bun), which tried to parse the
    #!/bin/sh shim as JavaScript and crashed with
    Expected ")" but found "$(echo ". Resolution now reads
    node_modules/smithers-orchestrator/package.json and re-execs the
    actual JS entry, bypassing the shim. Workflow-path commands
    (up, graph, fork, replay, revert, timetravel) also
    walk upward from the workflow file to find the nearest local
    install, so smithers up examples/foo/workflow.tsx picks up the
    workflow's own node_modules even when invoked from elsewhere.

    Unit tests for the resolution paths live in
    apps/cli/tests/bin-delegation-unit.test.js.

Engine

  • Structured outputs now require raw JSON, not a code-fenced
    block.
    When the engine asks an agent to emit a typed output and
    the agent doesn't support native structured outputs, the appended
    prompt now instructs the agent to return ONLY the raw JSON object
    — no markdown, no headings, no ```json fence — and to start
    the response with { and end with }. The validation-retry and
    schema-summary prompts received the same tightening. This matches
    what the downstream parser expects and removes the failure mode
    where an agent emitted a perfectly valid JSON block but the
    workflow rejected it because of the surrounding markdown.

Dependencies

  • Dropped unused workspace dependencies across apps/observability,
    packages/db, packages/engine, packages/graph,
    packages/memory, packages/openapi, packages/protocol,
    packages/sandbox, packages/scorers, packages/smithers, and
    packages/time-travel. Net change: −67 lines from
    pnpm-lock.yaml, no public API impact.

Docs

  • Docs example smoke tests. New
    apps/cli/tests/docs-examples-smoke.test.js exercises the
    workflow snippets shipped in the docs so broken examples surface
    in CI instead of in users' first runs. Several docs files
    (tour.mdx, examples/loop.mdx, examples/multi-agent-review.mdx,
    integrations/cli-agents.mdx, integrations/tools.mdx) received
    small corrections discovered by that pass.

  • Public feature coverage audit. A docs-side sweep verified that
    every public-facing engine and CLI feature has at least one
    example or how-to reference, and patched the gaps it found.

Internal

  • Test coverage additions for the CLI (node-detail,
    semantic-tools, why-diagnosis) and components (Runbook,
    markdown, array control-flow, and composite suites).
  • CLI refactors: inlined single-statement branches, dropped a few
    dead try/catch wrappers, and replaced a handful of duplicate
    implementations with re-export shims to keep the workspace
    graph tidy.

v0.20.3

18 May 22:13

Choose a tag to compare

0.20.3

Patch release: two new top-level CLI commands stream the published
docs to stdout, a real-world Bun-internals port workflow lands as an
example, and <Loop> / <Sequence> accept a typed key prop.


CLI

  • smithers docs. Fetches https://smithers.sh/llms.txt (the
    concise docs index for LLMs) and writes it to stdout. With --json,
    returns { url, content } instead.

  • smithers docs-full. Fetches https://smithers.sh/llms-full.txt
    (the full ~8k-line docs bundle for LLMs) and writes it to stdout.
    Same --json behavior as docs.

    Both commands fetch at runtime against the published docs site, so
    output always reflects the latest shipped docs and the CLI bundle
    stays small. Fetch failures surface as DOCS_FETCH_FAILED /
    DOCS_FULL_FETCH_FAILED with a non-zero exit.

Components

  • <Loop> and <Sequence> accept key. LoopProps and
    SequenceProps in @smithers-orchestrator/components now declare
    key?: string explicitly, so TypeScript users can pass a stable
    key when iterating without a TS error. Matches the React contract
    the renderer already honored at runtime.

Examples

  • examples/bun-port-smithers. A new production-grade example
    workflow that drives porting Rust crates to Bun internals using
    Smithers subflows: lifetime classification, Phase-A port, crate
    compile bringup, ungate-and-proper-port, panic-probe swarm, test
    swarm, and audit sweeps. Demonstrates real subflow execution,
    agent wiring, prompt colocation, and zod schemas at scale —
    ~2,000 lines of workflow code in examples/bun-port-smithers/.

Docs

  • Landing + intro copy refresh. docs/index.mdx and
    docs/introduction.mdx lead with "Smithers powers TypeScript
    teams to run long AI coding workflows that survive crashes and
    handle approvals, retries, and restarts."

  • Background-agents post URL fix. Corrected blog link in
    docs/why/background-agents.mdx (#139, thanks @dneljo).

v0.20.1

14 May 12:11

Choose a tag to compare

0.20.1

Patch release: agent-trace observability lands, the smithers init agent
flow gets sharper, and pnpm release now opens a GitHub release.


Observability

  • Canonical agent traces. Every node attempt that invokes an agent now
    emits canonical AgentTraceEvent / AgentTraceSummary /
    AgentSessionEvent records, OTEL log records with stable Loki query
    attributes (run.id, workflow.path, node.id, node.attempt,
    agent.family, agent.capture_mode, trace.completeness,
    event.kind), and — when logDir is set — a durable NDJSON artifact
    at .smithers/executions/<runId>/logs/agent-trace/<node>-<iter>-<attempt>.ndjson.
    See packages/engine/src/AgentTraceCollector.js and the new
    apps/observability/src/agentTrace.ts surface.

  • Local OTEL stack adds Loki 3.3.2. observability/docker-compose.otel.yml
    pins otel-collector to 0.148.0, adds a Loki service with a
    persistent /loki volume, wires the otel-collector logs pipeline to
    Loki via otlp_http, and adds healthchecks + service_healthy
    depends_on across the stack. Grafana ships with a Loki datasource
    and Tempo tracesToLogs config out of the box. New helper scripts:
    scripts/obs-reset.sh, scripts/obs-wait-healthy.sh,
    scripts/verify-observability.sh.

  • Run annotations. RunOptions.annotations is a flat
    Record<string, string | number | boolean> plumbed through every run
    for stable Loki/OTEL query attributes.

  • Demo workflow. workflows/agent-trace-otel-demo.tsx shows the
    end-to-end trace surface against the live OTEL stack.

Agents

  • Token-usage normalization. New
    packages/agents/src/BaseCliAgent/normalizeTokenUsage.js collapses
    provider-specific token-usage fields into a single
    NormalizedTokenUsage shape regardless of source naming.

  • Better JSON text extraction. BaseCliAgent.extractTextFromJsonValue
    now handles array roots, OpenAI-style output_text, and recursive
    item fields — relevant for Codex/Responses-API outputs.

CLI

  • smithers init agent detection. Reworked
    apps/cli/src/agent-detection.js so the init flow detects installed
    agents correctly across more environments; covered by new tests in
    apps/cli/tests/cli-agent-detection.test.js and
    apps/cli/tests/init-agents.e2e.test.js.

  • Better .gitignore templates. smithers init now writes
    *.db-shm, *.db-wal, *.log, and logs/ into
    .smithers/.gitignore, and adds a new
    .smithers/workflows/.gitignore for *.log / run-*.log. Keeps
    ephemeral SQLite WAL files and run logs out of git by default. (#137)

Gateway

  • Hardened HTTP boundaries. packages/server/src/gateway.js
    tightens request/response handling at the gateway boundary; covered
    by the rewritten gateway-http-boundaries.test.js. (#134)

Internal

  • GitHub releases on pnpm release. scripts/publish.mjs now
    pushes the version tag to origin if missing and runs
    gh release create v<version> --notes-file docs/changelogs/<version>.mdx --latest
    after a successful npm publish. Pass --skip-gh-release to opt out.
    Fills the gap that left releases on GitHub stuck at v0.12.0 while
    npm advanced to v0.20.0; all intermediate tags were backfilled in
    the same pass.

  • Engine index.d.ts cleanup. tsup --dts-only regen of the
    engine type bundle drops orphaned
    ComponentDefinition / ComponentDefinitionBuilder /
    WorkflowDefinitionBuilder aliases left over from the Effect API
    rebuild. No public-surface change.

v0.20.0

14 May 11:55

Choose a tag to compare

0.20.0 is a minor release with one breaking change. Since 0.19.0, the
Effect authoring surface has been rebuilt around first-class graph values:
Smithers.workflow and Smithers.fragment replace the previous
Smithers.createWorkflow / Smithers.createComponent callback-builder API.
The runtime, persistence model, scheduler, and JSX surface are unchanged.


Effect API

The 0.19.0 callback-builder API is gone. The new shape is a typed workflow
handle whose constructors return ordinary graph values:

const G = Smithers.workflow({ name: "review", input: inputSchema });

const analyze = G.step("analyze", { output: analysisSchema, run: ... });
const report  = G.step("report",  { needs: { analyze }, ... });

export const reviewWorkflow = G.from(G.sequence(analyze, report));

Highlights:

  • First-class graph values. G.step, G.approval, G.sequence,
    G.parallel, G.match, G.branch, G.loop, G.worktree, and
    G.scope all return values you can export, return from a function,
    or compose with other graph values. No callback scope, no builder token.

  • One reuse mechanism. G.scope(instanceId, fragment) applies a
    durable ID prefix to every step inside fragment, so the same fragment
    can be mounted under multiple scopes without collision. The previous
    createComponent / buildWithPrefix machinery is removed; reuse is
    now plain functions returning graph values plus G.scope for
    multi-mount.

  • Cross-workflow fragments via Smithers.fragment(inputSchema).
    Returns the same constructors as a workflow handle minus .from, so
    fragments can be authored once and mounted into any matching workflow.

  • G.worktree and G.branch are now reachable from the Effect
    surface.
    They were already in the underlying graph union but weren't
    exposed to Effect-API authors; both now have ordinary constructors.

  • .pipe() on every graph value. Left-to-right function application
    matching Effect.pipe. No pipeable combinators ship in this release;
    the method is forward-compat for future data-last forms.

  • Compilation is internal but inspectable. G.from(graph) returns
    { execute, node }, where node is the compiled BuilderNode tree.
    Useful for tooling and tests; the runtime path is unchanged.

  • Memoization is part of the contract. Compilation is keyed on
    (activeScopePrefix, graphValue). The same step value referenced as
    both a child and a needs source compiles to one handle. The same
    fragment mounted under two scopes produces two distinct handle sets.

Breaking change

Smithers.createWorkflow, Smithers.createComponent, the .build(($) => …)
callback, and the WorkflowDefinitionBuilder / ComponentDefinition /
ComponentDefinitionBuilder types are removed.

Migration is mechanical:

  1. Lift step and approval declarations out of the .build callback.
  2. Replace $.x with G.x, where G = Smithers.workflow(opts).
  3. Replace the trailing return $.sequence(...) with
    export default G.from(G.sequence(...)).
  4. Replace createComponent(name).build(($, params) => ...) with a
    function returning a graph fragment, mounted with
    G.scope(instanceId, fragment(params)).

A counter-proposal documenting the design decisions and rejected
alternatives lives in research/effect-combinator-api.md.

Docs

  • New "Background agents are here" article in the Learn group. A
    direct quote-and-respond to the recent debate over background agents,
    framing Smithers as a substrate for production-ready agentic workflows
    rather than a wrapper around any single agent runtime. Iterated through
    several rewrites and is linked from the introduction under the "But…
    Why?" section.

  • Effect API page rewritten end-to-end to use the new graph-value
    surface. Covers steps, dependencies, control flow (sequence,
    parallel, match, branch, loop, worktree), G.scope-based
    reuse, Smithers.fragment for cross-workflow fragments, the .pipe()
    hook, and operational notes (idempotency, signals, retry, cache).

  • Why React? page reorganized into the "But… Why?" section of the
    navigation alongside the new background-agents article.

LLM bundles

  • llms-effect.txt is now generated. scripts/generate-llms.ts
    picks up the rewritten Effect overview as its own opt-in fragment and
    bundles it into llms-full.txt. The index in llms.txt points at it.

  • Regenerated llms-*.txt. All bundles refreshed for the new Effect
    API surface and the updated articles.

v0.19.0

14 May 11:55

Choose a tag to compare

0.19.0 should be a minor release. Since 0.18.0, Smithers has exposed
public Effect-based workflow builders, added Unix socket transport for the
Gateway and its client, hardened the Gateway client's response validation,
and scaffolded a real Kanban UI into the default workflow pack. There are
no intentional breaking changes, but the new public API surface — most
notably Smithers.createWorkflow / Smithers.createComponent and the
path listen option — is larger than a patch release.


Effect API

  • Smithers.createWorkflow and Smithers.createComponent are now public.
    The previously-internal _createWorkflow and _createComponent Effect
    builders are exported from the engine and re-exported through
    smithers-orchestrator. Authors can compose workflows with Effect,
    Layer, and Schema directly, without going through JSX.

  • New TypeScript surface for the Effect builder. The engine type bundle
    now exports BuilderApi, BuiltSmithersWorkflow, ComponentDefinition,
    ComponentDefinitionBuilder, StepOptions, and
    WorkflowDefinitionBuilder, so external callers can type Effect-based
    step bodies and component factories.

  • New Effect API docs section. A new docs page walks through the
    minimal Effect workflow, schemas, step bodies that return Effect
    values, sequence/parallel/loop/match nodes, and the runtime guarantees
    shared with the JSX surface (SQLite-backed steps, no re-runs on resume,
    schema-validated outputs).

Gateway

  • Unix socket transport. Gateway#listen now accepts a path option
    and listens on a Unix domain socket when provided. The gateway client
    understands matching ws+unix: base URLs, so browser-style and
    socket-based deployments can share the same RPC and WebSocket surface.

  • Hardened client response validation. SmithersGatewayClient now
    validates Gateway response frames against an explicit shape check before
    surfacing them, raising INVALID_GATEWAY_RESPONSE on malformed payloads,
    HTTP_ERROR on non-RPC HTTP failures, and a clear error when fetch is
    not available in the current environment. The WebSocketImpl field is
    now optional and populated lazily.

  • Connection-level robustness. SmithersGatewayConnection was tightened
    alongside the client (frame validation, error normalization), and a new
    test suite covers HTTP RPC paths, WebSocket request/response framing,
    and the gateway-react RPC and actions hooks.

CLI

  • smithers init scaffolds a Kanban gateway UI. New projects get a
    .smithers/gateway.ts entry, a .smithers/ui/kanban.tsx browser entry,
    and a gateway script so bun ./gateway.ts starts a local Gateway
    with the Kanban workflow mounted at /workflows/kanban.

  • Kanban UI dark theme. The scaffolded Kanban UI ships with a
    Linear-style dark theme: token-driven colors, refreshed metric tiles,
    run/board layout, and approval controls. The light/system color scheme
    is still honored, but the default look is dark.

  • Init E2E coverage. The CLI init test verifies the seeded gateway
    entry, Kanban UI file, dependency wiring, and that the generated
    .smithers/ typechecks.

Examples

  • New parallel-tickets example. examples/parallel-tickets.jsx plus
    five MDX prompts (triage, implement, research, review, merge)
    demonstrate a Triage → wave-by-wave parallel implement/review →
    MergeQueue pipeline. Tickets within a wave run concurrently in their
    own worktrees; waves merge back into main one at a time before the
    next wave starts. Implementer/researcher run on Claude (Sonnet/Haiku);
    the reviewer runs through the Pi/Codex CLI.

Docs

  • New Why React? page. A Learn-group page explains the agent-experience
    rationale for Smithers being a JSX runtime: agents are disproportionately
    good at writing React, and the runtime's properties (durability, time
    travel, hot reload, composability) fall out of mapping orchestration onto
    that authoring model. Linked from the introduction.

  • Self-contained TOON API reference. research/toon-api.md is a
    briefing for an agent designing a Lisp front-end for Smithers — it
    inlines the spec, schemas, node kinds, compiler source, and fixtures of
    the retired .toon workflow format so the briefing has no external
    dependencies.

  • Regenerated llms-*.txt bundles. llms-full.txt and
    llms-observability.txt were rebuilt to pick up the Custom React UI
    section and the new listWorkflows / listApprovals RPC documentation
    introduced in 0.18.0.

Fixes

  • GUI .dmg link points at the correct host. The installation page
    and regenerated llms-*.txt bundles now use download.smithers.sh for
    the GUI download, replacing the get.smithers.sh URL that 404s.