feat: onnx-community upgrades — Kokoro WebGPU, Moonshine STT, SmolVLM, transformers.js v4 by ijbo · Pull Request #6 · Textagent/textagent.github.io

ijbo · 2026-06-22T10:24:57Z

What

Four upgrades sourced from the onnx-community HuggingFace org, each filtered for browser-runnability (small, permissive, transformers.js-compatible).

Base branch: this is stacked on top of fix/stt-audio-improvements (PR #5) — it extends the STT tier system added there. Merge #5 first, or set this PR's base to fix/stt-audio-improvements.

Changes

🔊 1. Kokoro TTS on WebGPU (tts-worker.js)

Kokoro now loads on device:'webgpu' (fp32) when an adapter is available, falling back to WASM (q8). Fixes the audit's #1 audio bottleneck — CPU-only synthesis at 5–15 s/chunk; on GPU, Kokoro v1.0 does ~10 s of speech in ~1 s. You already ship this model; it's a runtime change.

🌙 2. Moonshine fast-English STT (speech-worker.js, ai-docgen.js)

onnx-community/moonshine-base-ONNX (MIT) added as a moonshine worker tier and a 4th {{@STT:}} engine option. English-only, non-Whisper — so it skips the Whisper streamer/language path and the org fallback. Routed via a new setWhisperTier() API.

🖼️ 3. SmolVLM lightweight vision (ai-models.js, ai-worker-smolvlm.js)

SmolVLM-256M/500M-Instruct-ONNX registered with a new worker (image-text-to-text, WebGPU/WASM, streaming) — a ~270–500 MB alternative to Gemma 4 Vision (~2–4 GB) / Florence-2 for low-end devices.

⚙️ 4. transformers.js v3.8.1 → v4.2.0 (package.json)

The in-browser ML runtime for every local model worker. v4 brings a C++ WebGPU runtime, ~200 architectures, 53% smaller bundles, 10× faster builds. The APIs the workers use are unchanged across the major.

Testing

Vite build: clean on v4.
Playwright: 434 passed (full smoke + feature suite) on v4; TTS/STT/speech suites green. Updated STT engine-selector test (3 → 4 options).
Live runtime verification (the important one for v4): drove a real Kokoro synthesis in the browser against v4 — the model downloaded, loaded its 54-voice list, and speakAsync(...) resolved with hasAudio: true. This confirms kokoro-js works on v4 despite declaring a v3 peer dep.
ESLint: no new errors.

Notes

kokoro-js@1.2.1 (latest) declares transformers@^3.5.1 as a peer dep — it doesn't officially support v4, but the APIs it calls are unchanged and a live synthesis confirms it works. Flagging for awareness.

🤖 Generated with Claude Code

@stt

…, transformers.js v4 - Kokoro TTS runs on WebGPU when available (WASM fallback) — fixes the CPU-only 5-15s/chunk bottleneck (tts-worker.js) - Moonshine added as a fast English STT engine (onnx-community/moonshine-base, MIT): new worker tier + {{@stt:}} card option + setWhisperTier() API - SmolVLM 256M/500M registered with a new public/ai-worker-smolvlm.js worker — a lightweight image-text-to-text alternative to Gemma 4 Vision / Florence-2 - transformers.js bumped 3.8.1 -> 4.2.0 (every local model worker). APIs used are unchanged; verified Kokoro TTS synthesizes live on v4 despite kokoro-js's v3 peer-dep declaration. Verified: vite build clean; 434 Playwright tests pass on v4; live Kokoro synthesis resolved with audio. STT engine-selector test updated 3 -> 4 options. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e map Live testing caught wrong repo IDs (onnx-community/SmolVLM-*-ONNX 404'd with "Unauthorized access"). The ONNX exports live under HuggingFaceTB/SmolVLM-256M-Instruct and -500M-Instruct. Also switched to a per-component dtype map (embed_tokens fp16, vision_encoder/decoder_model_merged q4) since SmolVLM ships as three ONNX files. Verified end-to-end: loaded 256M on WebGPU and captioned a test image correctly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ijbo and others added 2 commits June 22, 2026 19:24

ijbo changed the base branch from fix/stt-audio-improvements to main June 22, 2026 13:08

ijbo merged commit b0d86df into main Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: onnx-community upgrades — Kokoro WebGPU, Moonshine STT, SmolVLM, transformers.js v4#6

feat: onnx-community upgrades — Kokoro WebGPU, Moonshine STT, SmolVLM, transformers.js v4#6
ijbo merged 2 commits into
mainfrom
feat/onnx-community-upgrades-v2

ijbo commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ijbo commented Jun 22, 2026

What

Changes

🔊 1. Kokoro TTS on WebGPU (tts-worker.js)

🌙 2. Moonshine fast-English STT (speech-worker.js, ai-docgen.js)

🖼️ 3. SmolVLM lightweight vision (ai-models.js, ai-worker-smolvlm.js)

⚙️ 4. transformers.js v3.8.1 → v4.2.0 (package.json)

Testing

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant