huggingface/distil-whisper
#audio #speech_recognition #whisper
Stars: 261 Issues: 2 Forks: 9
https://github.com/huggingface/distil-whisper
#audio #speech_recognition #whisper
Stars: 261 Issues: 2 Forks: 9
https://github.com/huggingface/distil-whisper
GitHub
GitHub - huggingface/distil-whisper: Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word…
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate. - huggingface/distil-whisper
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Language: Python
#ai #deep_learning #emotion #emotivoice #multi_speaker #prompt #python #pytorch #speech #speech_synthesis #style #text_to_speech #tts
Stars: 432 Issues: 3 Forks: 38
https://github.com/netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Language: Python
#ai #deep_learning #emotion #emotivoice #multi_speaker #prompt #python #pytorch #speech #speech_synthesis #style #text_to_speech #tts
Stars: 432 Issues: 3 Forks: 38
https://github.com/netease-youdao/EmotiVoice
GitHub
GitHub - netease-youdao/EmotiVoice: EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine - netease-youdao/EmotiVoice
👍1
alesaccoia/VoiceStreamAI
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Language: Python
#ai #speech_recognition #speech_to_text #websocket
Stars: 139 Issues: 2 Forks: 13
https://github.com/alesaccoia/VoiceStreamAI
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Language: Python
#ai #speech_recognition #speech_to_text #websocket
Stars: 139 Issues: 2 Forks: 13
https://github.com/alesaccoia/VoiceStreamAI
GitHub
GitHub - alesaccoia/VoiceStreamAI: Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS - alesaccoia/VoiceStreamAI
❤3👍2
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language: Python
#acoustic #audio_representation #codec #dac #encodec #gpt4o #music_representation_learning #semantic #soundstream #speech_language_model #speech_representation #text_to_speech
Stars: 332 Issues: 6 Forks: 20
https://github.com/jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language: Python
#acoustic #audio_representation #codec #dac #encodec #gpt4o #music_representation_learning #semantic #soundstream #speech_language_model #speech_representation #text_to_speech
Stars: 332 Issues: 6 Forks: 20
https://github.com/jishengpeng/WavTokenizer
GitHub
GitHub - jishengpeng/WavTokenizer: [ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language…
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling - GitHub - jishengpeng/WavTokenizer: [ICLR 2025] SOTA discrete acoustic codec models with 4...
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language: Python
#large_language_models #multimodal_large_language_models #speech_interaction #speech_language_model #speech_to_speech #speech_to_text
Stars: 274 Issues: 1 Forks: 16
https://github.com/ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language: Python
#large_language_models #multimodal_large_language_models #speech_interaction #speech_language_model #speech_to_speech #speech_to_text
Stars: 274 Issues: 1 Forks: 16
https://github.com/ictnlp/LLaMA-Omni
GitHub
GitHub - ictnlp/LLaMA-Omni: LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1…
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level. - ictnlp/LLaMA-Omni
amanvirparhar/chaplin
A real-time silent speech recognition tool.
Language: Python
#auto_avsr #avsr #llm #ollama #speech_recognition #speech_to_text #vsr
Stars: 279 Issues: 2 Forks: 22
https://github.com/amanvirparhar/chaplin
A real-time silent speech recognition tool.
Language: Python
#auto_avsr #avsr #llm #ollama #speech_recognition #speech_to_text #vsr
Stars: 279 Issues: 2 Forks: 22
https://github.com/amanvirparhar/chaplin
GitHub
GitHub - amanvirparhar/chaplin: A real-time silent speech recognition tool.
A real-time silent speech recognition tool. Contribute to amanvirparhar/chaplin development by creating an account on GitHub.