Data Science by ODS.ai 🦜
44.8K subscribers
783 photos
85 videos
7 files
1.86K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
Download Telegram
⚡️ A new model has been released in Llama3-Speech, that can natively understand audio and text input.

This multimodal checkpoint with improved speech understanding, listens to human speech and responds in text

Llama3s v0.2 consistently performs across multiple Speech Understanding benchmarks.

They adapted llama3.1 using early-fusion with semantic tokens.

It uses whispervq to get semantic tokens. encoder is frozen during training, only llama3 base is trained.

So the devs used a synthetically generated speech dataset. This speech data is then semantically encoded with WhisperVQ from WhisperSpeech.

This dataset was then interleaved to have 70% speech instruction prompts and 30% speech transcription prompts.

You can try the demo and ask questions in English and keep them under 10 seconds long. This is due to our model's limitation in being trained on audio prompts with fewer than 500 tokens, which the developers plan to address in a future update.

https://huggingface.co/homebrewltd/llama3.1-s-instruct-v0.2

homebrew.ltd/blog/llama3-just-got-ears

@opendatascience

#llama
🔥12👍61