AI & ML Papers
32.8K subscribers
7.07K photos
523 videos
24 files
7.72K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
AI & ML Papers
Photo
🔥 Fish Audio S2 Technical Report

💡 The paper introduces Fish Audio S2, an open source text to speech system that features multi speaker capabilities, multi turn generation, and instruction following control through natural language descriptions. The system utilizes a multi stage training approach, which includes a staged data pipeline covering video captioning, speech captioning, voice quality assessment, and reward modeling. This approach allows for scalable training and improves the overall performance of the system. The authors also release their model weights, fine tuning code, and an inference engine, making it production ready for streaming. The inference engine achieves a real time factor of 0.195 and a time to first audio of below 100 milliseconds, indicating its efficiency and speed. The code and weights are made available on GitHub and Hugging Face, and users are encouraged to try custom voices on the website. Overall, the paper contributes to the advancement of open source text to speech systems, providing a robust and efficient solution for generating high quality speech.


📅 Published on Mar 9

🔗 Links:
• arXiv: https://arxiv.org/abs/2603.08823
• PDF: https://arxiv.org/pdf/2603.08823
• Project Page: https://fish.audio/
• GitHub: https://github.com/fishaudio/fish-speech 30.2k

🤖 Models citing this paper:
https://huggingface.co/fishaudio/s2-pro
https://huggingface.co/drbaph/s2-pro-fp8
https://huggingface.co/mlx-community/fish-audio-s2-pro-bf16

📊 Datasets citing this paper:
https://huggingface.co/datasets/Izzyzlin/CFSDD

🚀 Spaces citing this paper:
https://huggingface.co/spaces/artificialguybr/fish-s2-pro-zero
https://huggingface.co/spaces/fguilleme/fish-s2-pro-zero
https://huggingface.co/spaces/MAYA-AI/fish-s2-pro-zero

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#TextToSpeechSystems #MultispeakerSynthesis #NaturalLanguageProcessing #SpeechGenerationModels #RealTimeAudioProcessing
4👍2