#python #agentic_ai #agents #ai #ai_agents #realtime #stt #tts #video_agents #video_ai #vision_ai #voice_ai
Vision Agents is an open-source Python framework by Stream to build real-time AI agents that watch video, listen to audio, and respond instantly with low latency under 30ms. It integrates YOLO, Roboflow, OpenAI, Gemini, and 25+ tools for apps like golf coaching, security cameras detecting theft, or phone assistants. Install easily with `uv add vision-agents`, use free Stream credits, and deploy on any video network. You benefit by quickly creating smart video AI for gaming, safety, or coaching without vendor lock-in, saving time and costs on custom builds.
https://github.com/GetStream/Vision-Agents
Vision Agents is an open-source Python framework by Stream to build real-time AI agents that watch video, listen to audio, and respond instantly with low latency under 30ms. It integrates YOLO, Roboflow, OpenAI, Gemini, and 25+ tools for apps like golf coaching, security cameras detecting theft, or phone assistants. Install easily with `uv add vision-agents`, use free Stream credits, and deploy on any video network. You benefit by quickly creating smart video AI for gaming, safety, or coaching without vendor lock-in, saving time and costs on custom builds.
https://github.com/GetStream/Vision-Agents
GitHub
GitHub - GetStream/Vision-Agents: Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider.…
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency. - GetStream/Vision-Agents
#typescript #ai #cuda #mlx #qwen3_tts #qwen3_tts_ui #voice_ai #voice_clone #whisper
Voicebox is a free, open-source voice synthesis studio that lets you clone voices, generate speech in 23 languages, and apply audio effects—all running privately on your computer. You can create realistic voice clones from just seconds of audio, use five different text-to-speech engines for different needs, add effects like reverb and pitch shift, and build multi-voice projects with a timeline editor. The key benefit is complete privacy: your voice data and AI models never leave your machine, unlike cloud-based alternatives. It also includes an API for building voice-powered applications and works across Mac, Windows, and Linux with GPU acceleration support.
https://github.com/jamiepine/voicebox
Voicebox is a free, open-source voice synthesis studio that lets you clone voices, generate speech in 23 languages, and apply audio effects—all running privately on your computer. You can create realistic voice clones from just seconds of audio, use five different text-to-speech engines for different needs, add effects like reverb and pitch shift, and build multi-voice projects with a timeline editor. The key benefit is complete privacy: your voice data and AI models never leave your machine, unlike cloud-based alternatives. It also includes an API for building voice-powered applications and works across Mac, Windows, and Linux with GPU acceleration support.
https://github.com/jamiepine/voicebox
GitHub
GitHub - jamiepine/voicebox: The open-source AI voice studio. Clone, dictate, create.
The open-source AI voice studio. Clone, dictate, create. - jamiepine/voicebox
#python #ai #ai_agents #conversational_ai #fastapi #llm #nextjs #open_source #outbound_calls #pipecat #python #self_hosted #speech_to_text #telephony #text_to_speech #voice #voice_agents #voice_ai #voice_assistant #voip #webrtc
Dograh AI is an open-source, self-hostable tool for building voice agents with a drag-and-drop workflow. You can start fast, run it on your own server, use your own LLM, TTS, and STT services, and avoid vendor lock-in. The benefit to you is more control, more privacy, and a working voice bot in minutes without needing API keys.
https://github.com/dograh-hq/dograh
Dograh AI is an open-source, self-hostable tool for building voice agents with a drag-and-drop workflow. You can start fast, run it on your own server, use your own LLM, TTS, and STT services, and avoid vendor lock-in. The benefit to you is more control, more privacy, and a working voice bot in minutes without needing API keys.
https://github.com/dograh-hq/dograh
GitHub
GitHub - dograh-hq/dograh: Open Source Voice Agent Platform
Open Source Voice Agent Platform. Contribute to dograh-hq/dograh development by creating an account on GitHub.