feat: onnx-community upgrades — Kokoro WebGPU, Moonshine STT, SmolVLM, transformers.js v4#6
Merged
Merged
Conversation
…, transformers.js v4
- Kokoro TTS runs on WebGPU when available (WASM fallback) — fixes the CPU-only
5-15s/chunk bottleneck (tts-worker.js)
- Moonshine added as a fast English STT engine (onnx-community/moonshine-base,
MIT): new worker tier + {{@stt:}} card option + setWhisperTier() API
- SmolVLM 256M/500M registered with a new public/ai-worker-smolvlm.js worker —
a lightweight image-text-to-text alternative to Gemma 4 Vision / Florence-2
- transformers.js bumped 3.8.1 -> 4.2.0 (every local model worker). APIs used
are unchanged; verified Kokoro TTS synthesizes live on v4 despite kokoro-js's
v3 peer-dep declaration.
Verified: vite build clean; 434 Playwright tests pass on v4; live Kokoro
synthesis resolved with audio. STT engine-selector test updated 3 -> 4 options.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e map Live testing caught wrong repo IDs (onnx-community/SmolVLM-*-ONNX 404'd with "Unauthorized access"). The ONNX exports live under HuggingFaceTB/SmolVLM-256M-Instruct and -500M-Instruct. Also switched to a per-component dtype map (embed_tokens fp16, vision_encoder/decoder_model_merged q4) since SmolVLM ships as three ONNX files. Verified end-to-end: loaded 256M on WebGPU and captioned a test image correctly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Four upgrades sourced from the
onnx-communityHuggingFace org, each filtered for browser-runnability (small, permissive, transformers.js-compatible).Changes
🔊 1. Kokoro TTS on WebGPU (tts-worker.js)
Kokoro now loads on
device:'webgpu'(fp32) when an adapter is available, falling back to WASM (q8). Fixes the audit's #1 audio bottleneck — CPU-only synthesis at 5–15 s/chunk; on GPU, Kokoro v1.0 does ~10 s of speech in ~1 s. You already ship this model; it's a runtime change.🌙 2. Moonshine fast-English STT (speech-worker.js, ai-docgen.js)
onnx-community/moonshine-base-ONNX(MIT) added as amoonshineworker tier and a 4th{{@STT:}}engine option. English-only, non-Whisper — so it skips the Whisper streamer/languagepath and the org fallback. Routed via a newsetWhisperTier()API.🖼️ 3. SmolVLM lightweight vision (ai-models.js, ai-worker-smolvlm.js)
SmolVLM-256M/500M-Instruct-ONNXregistered with a new worker (image-text-to-text, WebGPU/WASM, streaming) — a ~270–500 MB alternative to Gemma 4 Vision (~2–4 GB) / Florence-2 for low-end devices.⚙️ 4. transformers.js v3.8.1 → v4.2.0 (package.json)
The in-browser ML runtime for every local model worker. v4 brings a C++ WebGPU runtime, ~200 architectures, 53% smaller bundles, 10× faster builds. The APIs the workers use are unchanged across the major.
Testing
speakAsync(...)resolved withhasAudio: true. This confirms kokoro-js works on v4 despite declaring a v3 peer dep.Notes
transformers@^3.5.1as a peer dep — it doesn't officially support v4, but the APIs it calls are unchanged and a live synthesis confirms it works. Flagging for awareness.🤖 Generated with Claude Code