Notes from captain obvious:
Сomparing two GPUs with Tensor Cores, one of the single best indicators for each GPU’s performance is their memory bandwidth;
Most computation time on GPUs is memory access;
A100 compared to the V100 is 1.70x faster for NLP and 1.45x faster for computer vision;
Tesla A100 compared to the V100 is 1.70x faster for NLP and 1.45x faster for computer vision;
3-Slot design of the RTX 3090 makes 4x GPU builds problematic. Possible solutions are 2-slot variants or the use of PCIe extenders;
4x RTX 3090 will need more power than any standard power supply unit on the market can provide right now (this is BS, but power connectors may be an issue - I have 2000W PSU);
With BF16 precision, training might be more stable than with FP16 precision while providing the same speedups;
The new fan design for the RTX 30sV series features both a blower fan and a push/pull fan;
350W TDP;
Compared to an RTX 2080 Ti, the RTX 3090 yields a speedup of 1.57x for convolutional networks and 1.5x for transformers while having a 15% higher release price. Thus the Ampere RTX 30s delivers a pretty substantial improvement over the Turing RTX 20s series;
PCIe 4.0 and PCIe lanes do not matter in 2x GPU setups. For 4x GPU setups, they still do not matter much;
NVLink is not useful. Only useful for GPU clusters;
No info about power connector. But I believe the first gaming gpus use 2*6 pin plus maybe some adapter;
Despite heroic software engineering efforts, AMD GPUs + ROCm will probably not be able to compete with NVIDIA due to lacking community and Tensor Core equivalent for at least 1-2 years;
You will need +50Gbits/s network cards to gain speedups if you want to parallelize across machines;
So if you expect to run deep learning models after 300 days, it is better to buy a desktop instead of using AWS spot instances (also fuck off AWS and Nvidia with sla about data centers);
Сomparing two GPUs with Tensor Cores, one of the single best indicators for each GPU’s performance is their memory bandwidth;
Most computation time on GPUs is memory access;
A100 compared to the V100 is 1.70x faster for NLP and 1.45x faster for computer vision;
Tesla A100 compared to the V100 is 1.70x faster for NLP and 1.45x faster for computer vision;
3-Slot design of the RTX 3090 makes 4x GPU builds problematic. Possible solutions are 2-slot variants or the use of PCIe extenders;
4x RTX 3090 will need more power than any standard power supply unit on the market can provide right now (this is BS, but power connectors may be an issue - I have 2000W PSU);
With BF16 precision, training might be more stable than with FP16 precision while providing the same speedups;
The new fan design for the RTX 30sV series features both a blower fan and a push/pull fan;
350W TDP;
Compared to an RTX 2080 Ti, the RTX 3090 yields a speedup of 1.57x for convolutional networks and 1.5x for transformers while having a 15% higher release price. Thus the Ampere RTX 30s delivers a pretty substantial improvement over the Turing RTX 20s series;
PCIe 4.0 and PCIe lanes do not matter in 2x GPU setups. For 4x GPU setups, they still do not matter much;
NVLink is not useful. Only useful for GPU clusters;
No info about power connector. But I believe the first gaming gpus use 2*6 pin plus maybe some adapter;
Despite heroic software engineering efforts, AMD GPUs + ROCm will probably not be able to compete with NVIDIA due to lacking community and Tensor Core equivalent for at least 1-2 years;
You will need +50Gbits/s network cards to gain speedups if you want to parallelize across machines;
So if you expect to run deep learning models after 300 days, it is better to buy a desktop instead of using AWS spot instances (also fuck off AWS and Nvidia with sla about data centers);
Data Science by ODS.ai 🦜
Nvidia announced new card RTX 3090 RTX 3090 is roughly 2 times more powerful than 2080. There is probably no point in getting 3080 because RAM volume is only 10G. But what really matters, is how it was presented. Purely technological product for mostly…
YouTube
Греет ли RTX 3080 память и кулер процессора? Моделирование воздушных потоков референсной RTX 3080.
RTX 3000 серии - https://www.e-katalog.ru/u/SCQd7w/a
Комплектующие - https://www.e-katalog.ru/u/znDHaL/a
В видео смотрим на то как RTX 3080 со сквозным вентилятором ведёт в себя в обычных и в необычных корпусах. замеряем температуры внутри корпуса на разных…
Комплектующие - https://www.e-katalog.ru/u/znDHaL/a
В видео смотрим на то как RTX 3080 со сквозным вентилятором ведёт в себя в обычных и в необычных корпусах. замеряем температуры внутри корпуса на разных…
Silero Speech-To-Text Models V1 Released
We are proud to announce that we have released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:
- English
- German
- Spanish
Why this is a big deal:
- STT Research is typically focused on huge compute budgets
- Pre-trained models and recipes did not generalize well, were difficult to use even as-is, relied on obsolete tech
- Until now STT community lacked easy to use high quality production grade STT models
How we solve it:
- We publish a set of pre-trained high-quality models for popular languages
- Our models are embarrassingly easy to use
- Our models are fast and can be run on commodity hardware
Even if you do not work with STT, please give us a star / share!
Links
- https://github.com/snakers4/silero-models
We are proud to announce that we have released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:
- English
- German
- Spanish
Why this is a big deal:
- STT Research is typically focused on huge compute budgets
- Pre-trained models and recipes did not generalize well, were difficult to use even as-is, relied on obsolete tech
- Until now STT community lacked easy to use high quality production grade STT models
How we solve it:
- We publish a set of pre-trained high-quality models for popular languages
- Our models are embarrassingly easy to use
- Our models are fast and can be run on commodity hardware
Even if you do not work with STT, please give us a star / share!
Links
- https://github.com/snakers4/silero-models
GitHub
GitHub - snakers4/silero-models: Silero Models: pre-trained text-to-speech models made embarrassingly simple
Silero Models: pre-trained text-to-speech models made embarrassingly simple - snakers4/silero-models
Reposts on Habr
https://habr.com/ru/post/519564/
https://habr.com/ru/post/519562/
If you have an account, please give is a like
https://habr.com/ru/post/519564/
https://habr.com/ru/post/519562/
If you have an account, please give is a like
Хабр
Мы опубликовали современные STT модели сравнимые по качеству с Google
Мы наконец опубликовали наш набор высококачественных пре-тренированных моделей для распознавания речи (т.е. сравнимых по качеству с премиум-моделями Google ) для следующих языков: Английский;...
Spark in me
Reposts on Habr https://habr.com/ru/post/519564/ https://habr.com/ru/post/519562/ If you have an account, please give is a like
Repost on Medium
https://medium.com/@aveysov/modern-google-level-stt-models-released-c6491019e30c?sk=0d51c5301da830c31dcd9d2de7171c17
https://medium.com/@aveysov/modern-google-level-stt-models-released-c6491019e30c?sk=0d51c5301da830c31dcd9d2de7171c17
Medium
Modern Google-level STT Models Released
Our models are on par with premium Google models and also really simple to use
Silero Models on Torch Hub
TLDR - https://pytorch.org/hub/snakers4_silero-models_stt/
Also Soumith Chintala himself commented on this release.
PS
Upvote on HackerNews
https://news.ycombinator.com/item?id=24565831
TLDR - https://pytorch.org/hub/snakers4_silero-models_stt/
Also Soumith Chintala himself commented on this release.
PS
Upvote on HackerNews
https://news.ycombinator.com/item?id=24565831
2020 DS / ML Digest 10
Highlights:
- Silero STT models release on Torch hub
- Oculus Quest 2, now $100 cheaper at $300
- Nvidia 30?0 close to release (see above a detailed commentary on Tim's post)
- Fast.ai book
- Benchmarking Deep Learning Optimizers
- Language-Agnostic BERT Sentence Embedding + new Transformer LASER
- Are we done with ImageNet?
Please like / share / repost!
https://spark-in.me/post/2020_ds_ml_digest_11
#digest
Highlights:
- Silero STT models release on Torch hub
- Oculus Quest 2, now $100 cheaper at $300
- Nvidia 30?0 close to release (see above a detailed commentary on Tim's post)
- Fast.ai book
- Benchmarking Deep Learning Optimizers
- Language-Agnostic BERT Sentence Embedding + new Transformer LASER
- Are we done with ImageNet?
Please like / share / repost!
https://spark-in.me/post/2020_ds_ml_digest_11
#digest
Telegram
Spark in me - Internet, data science, math, deep learning, philosophy
Notes from captain obvious:
Сomparing two GPUs with Tensor Cores, one of the single best indicators for each GPU’s performance is their memory bandwidth;
Most computation time on GPUs is memory access;
A100 compared to the V100 is 1.70x faster for NLP…
Сomparing two GPUs with Tensor Cores, one of the single best indicators for each GPU’s performance is their memory bandwidth;
Most computation time on GPUs is memory access;
A100 compared to the V100 is 1.70x faster for NLP…
Microsoft ... Stepping up its Game in ML?
Wow, wow, do not close yet! I am a big MS-hater myself.
Well, an outsider / legacy player in cutting edge tech / ML, looks like with a series of well-placed decisions it may earn its place under the ML sun?
Yes, you heard me right. I am big Microsoft hater, but Just check this out:
- https://github.com/microsoft/onnxjs
- https://github.com/microsoft/onnxruntime#binaries
- https://github.com/microsoft/DeepSpeed
- ... OpenAI deal is just for hype I guess, no-one takes OpenAI seriously, right? ( ͡° ͜ʖ ͡°)
- Also I recently used Azure datasets ... it was clunky compared to S3, but beggars cannot be choosers. Download speeds were slow, but their VScode-like desktop app was ok ... but some features just did not work
It used to be a standard narrative "TF = production". But I guess a more correct one would be "Google has invested billions in marketing and it has huge captive audience".
Lately I spent some time reading TF tutorials ... and they are so needlessly difficult - they fucking invent a protocol for everything! For what PyTorch hub achieves in 4 steps, TF hub requires you to read 10 markdown docs ... written in corporate language.
So, why is this important? Because proper competition makes everything shine brighter.
Why TF 2.0? Because PyTorch 1.0 ( ͡° ͜ʖ ͡°). Now it looks like Google and Nvidia have a new real competitors in ML inference market together with Intel (which afaik is losing in general, but that is another story).
Nice!
#deep_learning
#machine_learning
#rant
Wow, wow, do not close yet! I am a big MS-hater myself.
Well, an outsider / legacy player in cutting edge tech / ML, looks like with a series of well-placed decisions it may earn its place under the ML sun?
Yes, you heard me right. I am big Microsoft hater, but Just check this out:
- https://github.com/microsoft/onnxjs
- https://github.com/microsoft/onnxruntime#binaries
- https://github.com/microsoft/DeepSpeed
- ... OpenAI deal is just for hype I guess, no-one takes OpenAI seriously, right? ( ͡° ͜ʖ ͡°)
- Also I recently used Azure datasets ... it was clunky compared to S3, but beggars cannot be choosers. Download speeds were slow, but their VScode-like desktop app was ok ... but some features just did not work
It used to be a standard narrative "TF = production". But I guess a more correct one would be "Google has invested billions in marketing and it has huge captive audience".
Lately I spent some time reading TF tutorials ... and they are so needlessly difficult - they fucking invent a protocol for everything! For what PyTorch hub achieves in 4 steps, TF hub requires you to read 10 markdown docs ... written in corporate language.
So, why is this important? Because proper competition makes everything shine brighter.
Why TF 2.0? Because PyTorch 1.0 ( ͡° ͜ʖ ͡°). Now it looks like Google and Nvidia have a new real competitors in ML inference market together with Intel (which afaik is losing in general, but that is another story).
Nice!
#deep_learning
#machine_learning
#rant
GitHub
GitHub - microsoft/onnxjs: ONNX.js: run ONNX models using JavaScript
ONNX.js: run ONNX models using JavaScript. Contribute to microsoft/onnxjs development by creating an account on GitHub.
Also about competition.
.... why Microsoft of all people wants to train a effing TRILLION parameter transformer?
... ( ͡° ͜ʖ ͡°) because they license a 100bn one from another company.
PS
I may be wrong in the exact figures.
.... why Microsoft of all people wants to train a effing TRILLION parameter transformer?
... ( ͡° ͜ʖ ͡°) because they license a 100bn one from another company.
PS
I may be wrong in the exact figures.
Which ML hub have you used?
Anonymous Poll
35%
TF hub
45%
Torch hub
15%
ONNX models
10%
cadene / rwightman
24%
Other
Core DL Framework?
Anonymous Poll
75%
PyTorch
44%
TensorFlow / Keras
1%
MXNet
0%
CNTK
0%
Chainer
1%
Caffee (2)
4%
Matlab
2%
Theano
4%
Other
7%
I use wrappers built on top
Our Model Featured on TF Hub
https://github.com/snakers4/silero-models
So far I added only the English as a start:
- https://tfhub.dev/silero
- https://tfhub.dev/silero/silero-stt/en/1
- https://tfhub.dev/silero/collections/silero-stt/1
https://github.com/snakers4/silero-models
So far I added only the English as a start:
- https://tfhub.dev/silero
- https://tfhub.dev/silero/silero-stt/en/1
- https://tfhub.dev/silero/collections/silero-stt/1
GitHub
GitHub - snakers4/silero-models: Silero Models: pre-trained text-to-speech models made embarrassingly simple
Silero Models: pre-trained text-to-speech models made embarrassingly simple - snakers4/silero-models
Forwarded from NVIDIA Inception
ДОКЛАД NVIDIA "Fast training with AMP/TF32 using TensorCores on NVIDIA GPU" на Data Fest + СЕССИЯ Q&A
Денис Тимонин, AI Solutions Architect в NVIDIA, расскажет об одном из самых эффективных методов ускорения обучения и инференса нейросетей - применении смешанной точности. В своем докладе Денис разберет статью “Mixed Precision Training” от NVIDIA и Baidu Research и расскажет о деталях работы с точностью формата TensorFloat32. Также мы обсудим алгоритмы, которые применяются при обучении с помощью смешанной точности и поговорим об аппаратных решениях, которые обеспечивают высокую скорость работы для форматов данных в нейросетях.
В первой части доклада мы разберем числа с плавающей точкой, мотивацию за обучением в смешанной точности, тензорные ядра, а также обучим сложную нейросеть StarGAN V2 (CVPR 2020) в режиме Automatic Mixed precision (AMP).
Во второй части погрузимся в оптимизацию работы с тензорными ядрами: разберем трюки для быстрого обучения в высокоуровневых фреймворках, C++ API, а так же научимся подбирать правильные размеры данных и слоев в нейросети для наибыстрейшего обучения.
Доклад записан на английском языке.
Доклад уже доступен на Youtube канале ODS: https://bit.ly/3kPAvPA
Сессия Q&A состоится в субботу, 26 сентября с 12 до 14 тут: https://spatial.chat/s/ods Пароль для входа можно получить тут: https://bit.ly/2GbDB1j
Денис Тимонин, AI Solutions Architect в NVIDIA, расскажет об одном из самых эффективных методов ускорения обучения и инференса нейросетей - применении смешанной точности. В своем докладе Денис разберет статью “Mixed Precision Training” от NVIDIA и Baidu Research и расскажет о деталях работы с точностью формата TensorFloat32. Также мы обсудим алгоритмы, которые применяются при обучении с помощью смешанной точности и поговорим об аппаратных решениях, которые обеспечивают высокую скорость работы для форматов данных в нейросетях.
В первой части доклада мы разберем числа с плавающей точкой, мотивацию за обучением в смешанной точности, тензорные ядра, а также обучим сложную нейросеть StarGAN V2 (CVPR 2020) в режиме Automatic Mixed precision (AMP).
Во второй части погрузимся в оптимизацию работы с тензорными ядрами: разберем трюки для быстрого обучения в высокоуровневых фреймворках, C++ API, а так же научимся подбирать правильные размеры данных и слоев в нейросети для наибыстрейшего обучения.
Доклад записан на английском языке.
Доклад уже доступен на Youtube канале ODS: https://bit.ly/3kPAvPA
Сессия Q&A состоится в субботу, 26 сентября с 12 до 14 тут: https://spatial.chat/s/ods Пароль для входа можно получить тут: https://bit.ly/2GbDB1j
YouTube
Optimization Track. Denis Timonin: Fast training with AMP/TF32 using TensorCores on NVIDIA GPU
Increasing the size of a neural network typically improves accuracy but also increases the memory and compute requirements for training the model. At the same time amount of data is constantly growing (exponentially in the last years). So we will talk about…
Forwarded from Profunctor Jobs
Frontend Junior
Стэк: Tensorflow, JavaScript, Docker
Денег: $700-1200
Remote
silero.ai - занимаемся ML продуктами в сфере речи. Ищем человека чтобы делать минималистичные веб-интерфейсы, работающие с нашими алгоритмами
Стэк: Tensorflow, JavaScript, Docker
Денег: $700-1200
Remote
silero.ai - занимаемся ML продуктами в сфере речи. Ищем человека чтобы делать минималистичные веб-интерфейсы, работающие с нашими алгоритмами
The State of JS Inference in Browsers
Last time I looked at ONNX was 2+ years ago. It was basically nowhere. Now I decided to export our models from silero-models to ONNX and TensorFlow.
After fixing numerous issues, tldr - vanilla ONNX (via
As for running in browsers ... my bet was on onnx.js (PyTorch => ONNX), but it looks like it is still early. Some issues I created - 1, 2. Interestingly, similar issues were addressed in
As for TF, I do not believe it is a go-to option just because you need to transform your model 3 times (PyTorch => ONNX => TF => TF js).
Such things looks like Sisyphus' tasks, but in the long run ecosystem evolves, which is cool.
#deep_learning
Last time I looked at ONNX was 2+ years ago. It was basically nowhere. Now I decided to export our models from silero-models to ONNX and TensorFlow.
After fixing numerous issues, tldr - vanilla ONNX (via
onnx-runtime) and TF models just work without looking under the hood. Ofc, I have not tested all of the teased back-ends for onnx-runtime, but it works at least on x86 w/o any issues out-of-the-box.As for running in browsers ... my bet was on onnx.js (PyTorch => ONNX), but it looks like it is still early. Some issues I created - 1, 2. Interestingly, similar issues were addressed in
onnx-tensorrt.As for TF, I do not believe it is a go-to option just because you need to transform your model 3 times (PyTorch => ONNX => TF => TF js).
Such things looks like Sisyphus' tasks, but in the long run ecosystem evolves, which is cool.
#deep_learning
GitHub
GitHub - snakers4/silero-models: Silero Models: pre-trained text-to-speech models made embarrassingly simple
Silero Models: pre-trained text-to-speech models made embarrassingly simple - snakers4/silero-models
Comparing Popular Model Hubs
The contenders are:
- PyTorch Hub
- TF Hub (latest PR not merged yet)
- ONNX models
- OpenVino models
So, I read the docs for all of them, created PRs for 3 of them, in the end our models were accepted to PyTorch Hub and TF Hub.
PyTorch Hub
- Instructions are literally 4 steps long
- The way the models are packaged and distributed is beautiful, clean, minimalistic and easy to use and maintain. Some great mind really designed it
- Essentially you just host your model, add a special file to your repo, torch hub just pull your repo behind the scenes and loads the model. It is embarrassingly easy
- I was greeted by Soumith Chintala himself
- After publishing, our models got a decent surge of traffic - cool!
- No focus on licensing or any similar sort of bs - you just share your models
The contenders are:
- PyTorch Hub
- TF Hub (latest PR not merged yet)
- ONNX models
- OpenVino models
So, I read the docs for all of them, created PRs for 3 of them, in the end our models were accepted to PyTorch Hub and TF Hub.
PyTorch Hub
- Instructions are literally 4 steps long
- The way the models are packaged and distributed is beautiful, clean, minimalistic and easy to use and maintain. Some great mind really designed it
- Essentially you just host your model, add a special file to your repo, torch hub just pull your repo behind the scenes and loads the model. It is embarrassingly easy
- I was greeted by Soumith Chintala himself
- After publishing, our models got a decent surge of traffic - cool!
- No focus on licensing or any similar sort of bs - you just share your models
TF Hub
- A lot of confusion and legacy between old TF / Keras / Hub models / SavedModels
- Instructions are like 10 separate documents, but after looking at examples it is kind of easy ... but the docs are very ... non-intuitive
- ~1000 models published, most of them by Google, most of them just very niche research artefacts
- Overall structure consists of models and collections. Nice tagging, nice web UI to look through the models, I see why there is so much hassle with format documents
- Zero traffic from there though =(
- Core models and community models are separated ... so I believe this is the reason. Also PyTorch hosts its hub on their main domain, whereas TF created a separate one ... you see what I mean =)
- I could fit PyTorch logic there as well, so it is nice after all!
- No focus on licensing - you can just add any license you want
- A lot of confusion and legacy between old TF / Keras / Hub models / SavedModels
- Instructions are like 10 separate documents, but after looking at examples it is kind of easy ... but the docs are very ... non-intuitive
- ~1000 models published, most of them by Google, most of them just very niche research artefacts
- Overall structure consists of models and collections. Nice tagging, nice web UI to look through the models, I see why there is so much hassle with format documents
- Zero traffic from there though =(
- Core models and community models are separated ... so I believe this is the reason. Also PyTorch hosts its hub on their main domain, whereas TF created a separate one ... you see what I mean =)
- I could fit PyTorch logic there as well, so it is nice after all!
- No focus on licensing - you can just add any license you want