Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
-
Updated
Jun 19, 2026 - Python
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
Terminal voice-to-text TUI — Qwen3-ASR-1.7B on the Apple GPU via MLX (mlx-speech). Fully local, no PyTorch, transcribes in ~1s. macOS Apple Silicon.
Talk. Ink. — Push-to-talk dictation for macOS, 100% on-device. Pick your model: Qwen3-ASR, NVIDIA Nemotron or Voxtral, all via Apple MLX.
Local voice-to-text for macOS and iOS. Multilingual (EN/ZH/JP) with Traditional Chinese output. Runs Qwen3-ASR on Apple Silicon via MLX. No cloud, no subscription.
Production pipeline around OpenAI gpt-4o-transcribe-diarize for long-form 2-speaker interviews. Cross-chunk speaker consistency · diarization hallucination fix · async GPT-5.5 domain-term correction. WER 6.05% / DER 4.28% on 2h26m benchmark. Beats raw OpenAI API by +11.5 Q.
Live speech-to-text streaming on Apple Silicon — Qwen3-ASR + Silero VAD + MLX
🎬 AI subtitle generator: convert video to SRT subtitles locally with NVIDIA NeMo Parakeet-TDT speech-to-text. GPU-accelerated, word-level timestamps, VAD, LLM correction — a fast offline Whisper alternative.
Free, private, on-device dictation for macOS (Apple Silicon). Push-to-talk speech-to-text with on-device LLM cleanup — an offline, local alternative to cloud & Whisper dictation. Parakeet TDT v3 ASR + Qwen 2.5 1.5B cleanup + macOS Accessibility injection. Pure Rust, ~300–400 ms per utterance, nothing leaves your Mac.
OpenAI-compatible speech-to-text server for nvidia/nemotron-3.5-asr-streaming-0.6b (NeMo). Runs on the DGX Spark / GB10.
Private, local-first meeting recorder + transcription, diarization, AI notes, voice dictation & read-aloud for Windows — runs on your own GPU.
Skill de Claude Code que transcribe audios y videos a Markdown estructurado con timestamps y diarización, usando Google Gemini. Reemplazo gratuito de ElevenLabs Scribe / Whisper para quien ya paga Gemini.
Local FastAPI transcription studio: AssemblyAI Universal-2 (99 lang), FFmpeg, yt-dlp, Word/PDF/ZIP export
Reconnaissance vocale 100% locale sur Mac Apple Silicon (NVIDIA Nemotron 3.5 ASR + parakeet.cpp) : transcription micro live ou fichier, sortie .txt/.srt.
Transcribe your voice directly to text using local MLX models on macOS. Press a key to dictate and paste text instantly while keeping your data private.
Офлайн-расшифровка русского видео/аудио в текст и субтитры (GigaAM v3 + sherpa-onnx), нативный Windows x64/ARM64
Voice-to-text desktop app that captures speech, refines transcripts with AI, and auto-pastes at your cursor
Add a description, image, and links to the whisper-alternative topic page so that developers can more easily learn about it.
To associate your repository with the whisper-alternative topic, visit your repo's landing page and select "manage topics."