Set of 📝 with 🔗 to help those building Voice AI agents 🎙️🤖
-
Updated
Jun 17, 2026
Set of 📝 with 🔗 to help those building Voice AI agents 🎙️🤖
🎤💬 Full example of implementing ChatGPT's realtime voice from scratch with VAD + STT + LLM + TTS technology stack within almost one file!
Real-time voice agents with parallel async background sub-agents — conversations continue naturally while tasks run • Join the builders → https://discord.gg/mqxKaN3UKC
Open-source realtime voice agent server in Go with WebRTC (WHIP), barge-in, streaming STT/LLM/TTS pipelines, plugin system, multi-language SDKs, SIP telephony, ESP32 support & fully local mode.
An AI-powered object detection system using YOLOv8 to identify and locate graffiti across various contexts including walls, buildings, over-bridges, vehicles, and other surfaces.
LiveKit voice app validation skill. Use when building, debugging, or declaring working any LiveKit voice agent, Agents UI app, or React/Next.js LiveKit project. Enforces evidence-based validation before reporting a session, token endpoint, worker, transcript, or end-to-end voice interaction as complete.
Gemini Live API voice tutor for K-12 NCERT math — Hindi/English, hand-drawn whiteboard, open source
Production-grade speech data infrastructure platform for multilingual ASR/TTS training, featuring distributed processing, speaker diarization, emotion/style annotation, active learning, dataset optimization, observability and AWS-native deployment.
Voice agent prototype for structured clinical interviewing, with VAD-based interruption handling, modular ASR/LLM/TTS backends, and dialogue workflow control.
LiveKit Agents UI demo showing a voice AI assistant that schedules roof inspections using real-time voice interaction, visualizers, and booking workflow.
Realtime Voice AI platform featuring streaming speech-to-text, multi-agent conversational orchestration, Redis Streams event processing, WebSockets, OpenAI Whisper/TTS, Prometheus-Grafana observability, and session-aware memory management, and cloud deployment.
Developer-facing interface for discovering and calling the Livepeer network.
Real-time hand sign recognition using LSTM-based models for sequence detection from video frames.
Real-time voice interface for OpenClaw. Stream speech-to-text, LLM reasoning, and text-to-speech into a low-latency conversational agent you can talk to—locally or in the cloud.
A real-time (<500ms) voice AI concierge built with Next.js, FastAPI, and Gemini 2.5 Flash Lite. Features local RAG (ChromaDB) for policy retrieval, Tool Calling for live booking, and event-driven CRM logging to Google Sheets.
howeverpipecat: engineering-focused Pipecat distribution
Realtime multimodal AI agent with voice streaming, RAG memory, and autonomous workflows
Traffyx-AI — Traffic Forecasting & Urban Mobility Intelligence System Applied machine learning system for traffic prediction, congestion analysis, and real-world spatiotemporal data modeling.
Browser voice assistant with WebRTC realtime mode, STT/LLM/TTS pipeline, VAD, web search, and cost tracking.
Example apps showcase what can be build with the Livepeer BYOC workflow.
Add a description, image, and links to the realtime-ai topic page so that developers can more easily learn about it.
To associate your repository with the realtime-ai topic, visit your repo's landing page and select "manage topics."