Silero Models on Torch Hub
TLDR - https://pytorch.org/hub/snakers4_silero-models_stt/
Also Soumith Chintala himself commented on this release.
PS
Upvote on HackerNews
https://news.ycombinator.com/item?id=24565831
TLDR - https://pytorch.org/hub/snakers4_silero-models_stt/
Also Soumith Chintala himself commented on this release.
PS
Upvote on HackerNews
https://news.ycombinator.com/item?id=24565831
2020 DS / ML Digest 10
Highlights:
- Silero STT models release on Torch hub
- Oculus Quest 2, now $100 cheaper at $300
- Nvidia 30?0 close to release (see above a detailed commentary on Tim's post)
- Fast.ai book
- Benchmarking Deep Learning Optimizers
- Language-Agnostic BERT Sentence Embedding + new Transformer LASER
- Are we done with ImageNet?
Please like / share / repost!
https://spark-in.me/post/2020_ds_ml_digest_11
#digest
Highlights:
- Silero STT models release on Torch hub
- Oculus Quest 2, now $100 cheaper at $300
- Nvidia 30?0 close to release (see above a detailed commentary on Tim's post)
- Fast.ai book
- Benchmarking Deep Learning Optimizers
- Language-Agnostic BERT Sentence Embedding + new Transformer LASER
- Are we done with ImageNet?
Please like / share / repost!
https://spark-in.me/post/2020_ds_ml_digest_11
#digest
Telegram
Spark in me - Internet, data science, math, deep learning, philosophy
Notes from captain obvious:
Сomparing two GPUs with Tensor Cores, one of the single best indicators for each GPU’s performance is their memory bandwidth;
Most computation time on GPUs is memory access;
A100 compared to the V100 is 1.70x faster for NLP…
Сomparing two GPUs with Tensor Cores, one of the single best indicators for each GPU’s performance is their memory bandwidth;
Most computation time on GPUs is memory access;
A100 compared to the V100 is 1.70x faster for NLP…
Microsoft ... Stepping up its Game in ML?
Wow, wow, do not close yet! I am a big MS-hater myself.
Well, an outsider / legacy player in cutting edge tech / ML, looks like with a series of well-placed decisions it may earn its place under the ML sun?
Yes, you heard me right. I am big Microsoft hater, but Just check this out:
- https://github.com/microsoft/onnxjs
- https://github.com/microsoft/onnxruntime#binaries
- https://github.com/microsoft/DeepSpeed
- ... OpenAI deal is just for hype I guess, no-one takes OpenAI seriously, right? ( ͡° ͜ʖ ͡°)
- Also I recently used Azure datasets ... it was clunky compared to S3, but beggars cannot be choosers. Download speeds were slow, but their VScode-like desktop app was ok ... but some features just did not work
It used to be a standard narrative "TF = production". But I guess a more correct one would be "Google has invested billions in marketing and it has huge captive audience".
Lately I spent some time reading TF tutorials ... and they are so needlessly difficult - they fucking invent a protocol for everything! For what PyTorch hub achieves in 4 steps, TF hub requires you to read 10 markdown docs ... written in corporate language.
So, why is this important? Because proper competition makes everything shine brighter.
Why TF 2.0? Because PyTorch 1.0 ( ͡° ͜ʖ ͡°). Now it looks like Google and Nvidia have a new real competitors in ML inference market together with Intel (which afaik is losing in general, but that is another story).
Nice!
#deep_learning
#machine_learning
#rant
Wow, wow, do not close yet! I am a big MS-hater myself.
Well, an outsider / legacy player in cutting edge tech / ML, looks like with a series of well-placed decisions it may earn its place under the ML sun?
Yes, you heard me right. I am big Microsoft hater, but Just check this out:
- https://github.com/microsoft/onnxjs
- https://github.com/microsoft/onnxruntime#binaries
- https://github.com/microsoft/DeepSpeed
- ... OpenAI deal is just for hype I guess, no-one takes OpenAI seriously, right? ( ͡° ͜ʖ ͡°)
- Also I recently used Azure datasets ... it was clunky compared to S3, but beggars cannot be choosers. Download speeds were slow, but their VScode-like desktop app was ok ... but some features just did not work
It used to be a standard narrative "TF = production". But I guess a more correct one would be "Google has invested billions in marketing and it has huge captive audience".
Lately I spent some time reading TF tutorials ... and they are so needlessly difficult - they fucking invent a protocol for everything! For what PyTorch hub achieves in 4 steps, TF hub requires you to read 10 markdown docs ... written in corporate language.
So, why is this important? Because proper competition makes everything shine brighter.
Why TF 2.0? Because PyTorch 1.0 ( ͡° ͜ʖ ͡°). Now it looks like Google and Nvidia have a new real competitors in ML inference market together with Intel (which afaik is losing in general, but that is another story).
Nice!
#deep_learning
#machine_learning
#rant
GitHub
GitHub - microsoft/onnxjs: ONNX.js: run ONNX models using JavaScript
ONNX.js: run ONNX models using JavaScript. Contribute to microsoft/onnxjs development by creating an account on GitHub.
Also about competition.
.... why Microsoft of all people wants to train a effing TRILLION parameter transformer?
... ( ͡° ͜ʖ ͡°) because they license a 100bn one from another company.
PS
I may be wrong in the exact figures.
.... why Microsoft of all people wants to train a effing TRILLION parameter transformer?
... ( ͡° ͜ʖ ͡°) because they license a 100bn one from another company.
PS
I may be wrong in the exact figures.
Which ML hub have you used?
Anonymous Poll
35%
TF hub
45%
Torch hub
15%
ONNX models
10%
cadene / rwightman
24%
Other
Core DL Framework?
Anonymous Poll
75%
PyTorch
44%
TensorFlow / Keras
1%
MXNet
0%
CNTK
0%
Chainer
1%
Caffee (2)
4%
Matlab
2%
Theano
4%
Other
7%
I use wrappers built on top
Our Model Featured on TF Hub
https://github.com/snakers4/silero-models
So far I added only the English as a start:
- https://tfhub.dev/silero
- https://tfhub.dev/silero/silero-stt/en/1
- https://tfhub.dev/silero/collections/silero-stt/1
https://github.com/snakers4/silero-models
So far I added only the English as a start:
- https://tfhub.dev/silero
- https://tfhub.dev/silero/silero-stt/en/1
- https://tfhub.dev/silero/collections/silero-stt/1
GitHub
GitHub - snakers4/silero-models: Silero Models: pre-trained text-to-speech models made embarrassingly simple
Silero Models: pre-trained text-to-speech models made embarrassingly simple - snakers4/silero-models
Forwarded from NVIDIA Inception
ДОКЛАД NVIDIA "Fast training with AMP/TF32 using TensorCores on NVIDIA GPU" на Data Fest + СЕССИЯ Q&A
Денис Тимонин, AI Solutions Architect в NVIDIA, расскажет об одном из самых эффективных методов ускорения обучения и инференса нейросетей - применении смешанной точности. В своем докладе Денис разберет статью “Mixed Precision Training” от NVIDIA и Baidu Research и расскажет о деталях работы с точностью формата TensorFloat32. Также мы обсудим алгоритмы, которые применяются при обучении с помощью смешанной точности и поговорим об аппаратных решениях, которые обеспечивают высокую скорость работы для форматов данных в нейросетях.
В первой части доклада мы разберем числа с плавающей точкой, мотивацию за обучением в смешанной точности, тензорные ядра, а также обучим сложную нейросеть StarGAN V2 (CVPR 2020) в режиме Automatic Mixed precision (AMP).
Во второй части погрузимся в оптимизацию работы с тензорными ядрами: разберем трюки для быстрого обучения в высокоуровневых фреймворках, C++ API, а так же научимся подбирать правильные размеры данных и слоев в нейросети для наибыстрейшего обучения.
Доклад записан на английском языке.
Доклад уже доступен на Youtube канале ODS: https://bit.ly/3kPAvPA
Сессия Q&A состоится в субботу, 26 сентября с 12 до 14 тут: https://spatial.chat/s/ods Пароль для входа можно получить тут: https://bit.ly/2GbDB1j
Денис Тимонин, AI Solutions Architect в NVIDIA, расскажет об одном из самых эффективных методов ускорения обучения и инференса нейросетей - применении смешанной точности. В своем докладе Денис разберет статью “Mixed Precision Training” от NVIDIA и Baidu Research и расскажет о деталях работы с точностью формата TensorFloat32. Также мы обсудим алгоритмы, которые применяются при обучении с помощью смешанной точности и поговорим об аппаратных решениях, которые обеспечивают высокую скорость работы для форматов данных в нейросетях.
В первой части доклада мы разберем числа с плавающей точкой, мотивацию за обучением в смешанной точности, тензорные ядра, а также обучим сложную нейросеть StarGAN V2 (CVPR 2020) в режиме Automatic Mixed precision (AMP).
Во второй части погрузимся в оптимизацию работы с тензорными ядрами: разберем трюки для быстрого обучения в высокоуровневых фреймворках, C++ API, а так же научимся подбирать правильные размеры данных и слоев в нейросети для наибыстрейшего обучения.
Доклад записан на английском языке.
Доклад уже доступен на Youtube канале ODS: https://bit.ly/3kPAvPA
Сессия Q&A состоится в субботу, 26 сентября с 12 до 14 тут: https://spatial.chat/s/ods Пароль для входа можно получить тут: https://bit.ly/2GbDB1j
YouTube
Optimization Track. Denis Timonin: Fast training with AMP/TF32 using TensorCores on NVIDIA GPU
Increasing the size of a neural network typically improves accuracy but also increases the memory and compute requirements for training the model. At the same time amount of data is constantly growing (exponentially in the last years). So we will talk about…
Forwarded from Profunctor Jobs
Frontend Junior
Стэк: Tensorflow, JavaScript, Docker
Денег: $700-1200
Remote
silero.ai - занимаемся ML продуктами в сфере речи. Ищем человека чтобы делать минималистичные веб-интерфейсы, работающие с нашими алгоритмами
Стэк: Tensorflow, JavaScript, Docker
Денег: $700-1200
Remote
silero.ai - занимаемся ML продуктами в сфере речи. Ищем человека чтобы делать минималистичные веб-интерфейсы, работающие с нашими алгоритмами
The State of JS Inference in Browsers
Last time I looked at ONNX was 2+ years ago. It was basically nowhere. Now I decided to export our models from silero-models to ONNX and TensorFlow.
After fixing numerous issues, tldr - vanilla ONNX (via
As for running in browsers ... my bet was on onnx.js (PyTorch => ONNX), but it looks like it is still early. Some issues I created - 1, 2. Interestingly, similar issues were addressed in
As for TF, I do not believe it is a go-to option just because you need to transform your model 3 times (PyTorch => ONNX => TF => TF js).
Such things looks like Sisyphus' tasks, but in the long run ecosystem evolves, which is cool.
#deep_learning
Last time I looked at ONNX was 2+ years ago. It was basically nowhere. Now I decided to export our models from silero-models to ONNX and TensorFlow.
After fixing numerous issues, tldr - vanilla ONNX (via
onnx-runtime) and TF models just work without looking under the hood. Ofc, I have not tested all of the teased back-ends for onnx-runtime, but it works at least on x86 w/o any issues out-of-the-box.As for running in browsers ... my bet was on onnx.js (PyTorch => ONNX), but it looks like it is still early. Some issues I created - 1, 2. Interestingly, similar issues were addressed in
onnx-tensorrt.As for TF, I do not believe it is a go-to option just because you need to transform your model 3 times (PyTorch => ONNX => TF => TF js).
Such things looks like Sisyphus' tasks, but in the long run ecosystem evolves, which is cool.
#deep_learning
GitHub
GitHub - snakers4/silero-models: Silero Models: pre-trained text-to-speech models made embarrassingly simple
Silero Models: pre-trained text-to-speech models made embarrassingly simple - snakers4/silero-models
Comparing Popular Model Hubs
The contenders are:
- PyTorch Hub
- TF Hub (latest PR not merged yet)
- ONNX models
- OpenVino models
So, I read the docs for all of them, created PRs for 3 of them, in the end our models were accepted to PyTorch Hub and TF Hub.
PyTorch Hub
- Instructions are literally 4 steps long
- The way the models are packaged and distributed is beautiful, clean, minimalistic and easy to use and maintain. Some great mind really designed it
- Essentially you just host your model, add a special file to your repo, torch hub just pull your repo behind the scenes and loads the model. It is embarrassingly easy
- I was greeted by Soumith Chintala himself
- After publishing, our models got a decent surge of traffic - cool!
- No focus on licensing or any similar sort of bs - you just share your models
The contenders are:
- PyTorch Hub
- TF Hub (latest PR not merged yet)
- ONNX models
- OpenVino models
So, I read the docs for all of them, created PRs for 3 of them, in the end our models were accepted to PyTorch Hub and TF Hub.
PyTorch Hub
- Instructions are literally 4 steps long
- The way the models are packaged and distributed is beautiful, clean, minimalistic and easy to use and maintain. Some great mind really designed it
- Essentially you just host your model, add a special file to your repo, torch hub just pull your repo behind the scenes and loads the model. It is embarrassingly easy
- I was greeted by Soumith Chintala himself
- After publishing, our models got a decent surge of traffic - cool!
- No focus on licensing or any similar sort of bs - you just share your models
TF Hub
- A lot of confusion and legacy between old TF / Keras / Hub models / SavedModels
- Instructions are like 10 separate documents, but after looking at examples it is kind of easy ... but the docs are very ... non-intuitive
- ~1000 models published, most of them by Google, most of them just very niche research artefacts
- Overall structure consists of models and collections. Nice tagging, nice web UI to look through the models, I see why there is so much hassle with format documents
- Zero traffic from there though =(
- Core models and community models are separated ... so I believe this is the reason. Also PyTorch hosts its hub on their main domain, whereas TF created a separate one ... you see what I mean =)
- I could fit PyTorch logic there as well, so it is nice after all!
- No focus on licensing - you can just add any license you want
- A lot of confusion and legacy between old TF / Keras / Hub models / SavedModels
- Instructions are like 10 separate documents, but after looking at examples it is kind of easy ... but the docs are very ... non-intuitive
- ~1000 models published, most of them by Google, most of them just very niche research artefacts
- Overall structure consists of models and collections. Nice tagging, nice web UI to look through the models, I see why there is so much hassle with format documents
- Zero traffic from there though =(
- Core models and community models are separated ... so I believe this is the reason. Also PyTorch hosts its hub on their main domain, whereas TF created a separate one ... you see what I mean =)
- I could fit PyTorch logic there as well, so it is nice after all!
- No focus on licensing - you can just add any license you want
ONNX Models
- Looks like it is mostly community-driven
- Their submission process is mostly biased towards community uploading third-party research artefacts, though
- Though description format is plain
- You MUST upload all of your models / artefacts / datasets to their git-lfs storage
- All in all of this mostly implies between the lines that your models will not be updated / will be hard to maintain
- What is most sinful - if you follow their format entirely, there is no connection between you and your model's users - they will just see ONNX hub repo folders and will never care about you
- No legal problems per se, but I cannot see any motivation for independent companies to share models there - you essentially upload whole packages there and they are gone for good (I believe that links inside of github are not properly indexed, but git-lfs is NOT indexed by web-crawlers for sure)
- In the end, even if you do not care about this sort of thing, and you are 100% altruistic, this git-lfs basically implies that you have to re-upload your updated models on each update, create a PR, etc etc - I am just lazy to do this for each model
Open Vino models
- Well, their docs format are decent
- A bit too much focus, again, on hosting third-party research artefacts and too much belief that all datasets are free and public (and that leaderboards matter)
- The big problem is that they essentially also want you to package something and relinquish all rights to this
- It is also supported by the fact that GPL licenses are NOT allowed
- So, what is my motivation to contribute as an independent company? Well, there is very little
- Looks like it is mostly community-driven
- Their submission process is mostly biased towards community uploading third-party research artefacts, though
- Though description format is plain
md, it is mostly geared towards people packaging third-party research artefacts- You MUST upload all of your models / artefacts / datasets to their git-lfs storage
- All in all of this mostly implies between the lines that your models will not be updated / will be hard to maintain
- What is most sinful - if you follow their format entirely, there is no connection between you and your model's users - they will just see ONNX hub repo folders and will never care about you
- No legal problems per se, but I cannot see any motivation for independent companies to share models there - you essentially upload whole packages there and they are gone for good (I believe that links inside of github are not properly indexed, but git-lfs is NOT indexed by web-crawlers for sure)
- In the end, even if you do not care about this sort of thing, and you are 100% altruistic, this git-lfs basically implies that you have to re-upload your updated models on each update, create a PR, etc etc - I am just lazy to do this for each model
Open Vino models
- Well, their docs format are decent
- A bit too much focus, again, on hosting third-party research artefacts and too much belief that all datasets are free and public (and that leaderboards matter)
- The big problem is that they essentially also want you to package something and relinquish all rights to this
- It is also supported by the fact that GPL licenses are NOT allowed
- So, what is my motivation to contribute as an independent company? Well, there is very little
Comparing Popular Model Hubs - Conclusion
ONNX and OpenVino hubs are designed to make models built by other people load / work using "1 line" of code with a semblance of everything being done properly in legal terms, but there is very little incentive for model creators to share models there.
TF and Torch hubs may actually help with discovery and they are low-maintenance.
All of this is kind obvious, if you compare my conclusions with strategies of companies maintaining these, especially the no-GPL mumo-jumbo by Intel.
ONNX and OpenVino hubs are designed to make models built by other people load / work using "1 line" of code with a semblance of everything being done properly in legal terms, but there is very little incentive for model creators to share models there.
TF and Torch hubs may actually help with discovery and they are low-maintenance.
All of this is kind obvious, if you compare my conclusions with strategies of companies maintaining these, especially the no-GPL mumo-jumbo by Intel.
Spark in me
The State of JS Inference in Browsers Last time I looked at ONNX was 2+ years ago. It was basically nowhere. Now I decided to export our models from silero-models to ONNX and TensorFlow. After fixing numerous issues, tldr - vanilla ONNX (via onnx-runtime)…
Wonderful things I learned about onnx.js
- It does not support 1D or 3D convolutions
- Operation support is in fact really patchy for more involved models
- What is really funny - some of these combinations really lock you in - for example absence of Shape op for some backends
- It does not support 1D or 3D convolutions
- Operation support is in fact really patchy for more involved models
- What is really funny - some of these combinations really lock you in - for example absence of Shape op for some backends
GitHub
Conv1d not supported · Issue #161 · microsoft/onnxjs
It seems like 1-dimensional convolutions doesn't work in onnx js. Given the following network class Net(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv1d(1,16,5,paddi...
2020 DS / ML Digest 12
Highlights:
- Neural network visualization tool
- Russian large GPT by Sber
- Some tests of 3090
- Large radiology dataset
- New wave of space-tech
- Containerization landscape
Please like / share / repost!
https://spark-in.me/post/2020_ds_ml_digest_12
#digest
Highlights:
- Neural network visualization tool
- Russian large GPT by Sber
- Some tests of 3090
- Large radiology dataset
- New wave of space-tech
- Containerization landscape
Please like / share / repost!
https://spark-in.me/post/2020_ds_ml_digest_12
#digest
Spark in me
2020 DS / ML Digest 2 Highlights - New STT benchmarks from FAIR - Analysis of GPT-2 by thegradient - Google’s Meena, a 2.6 billion parameter end-to-end trained neural conversational model (not AGI ofc) - OpenAI now uses PyTorch - LaserTag - cool idea on…
Trying PyTorch DDP
DDP = DistributedDataParallel
DP = DataParallel
I am a bit late to the party (PyTorch now even has its own "redis" key-value DB analog, its own RPC framework and numerous bells and whistles ... likely targeted at enterprise with over 9000 GPUs) but let me write my first impression here.
I usually always was just able to optimize my models and code not to require 4+ GPUs (DDP becomes essential after 4-5 GPUs, for 2-3 it does not really matter and DP just works, for 4 it is arguable):
- Docs are detailed, simple and clean
- Examples in the docs ... are just too plain, but there are guides now, which are also a bit simplistic
- The best way to start is to find some high quality boilerplate. There is lots of shitty boilerplate written in 2018 - PyTorch has evolved and polished its interfaces, so just look out for fresh boilerplate (see last update and cross-reference API invocations)
- Looks like DDP is not the most popular feature, but I did not really face the issues everyone (hangs and freezes, failure to kill the processes gracefully) claimed to face
Turning Your DP Script into a DDP
- Your code has to be properly structured and refactored - then migrating to DDP becomes a weekend project tops
- You need to understand the concepts of rank, world size, communication backend, gradient synchronization
- They finally included it in the docs - use NCCL backend for distributed GPU, Gloo backend for distributed CPU training
- You need to pass
- Do not forget to use
- You need to rewrite your main function to accept rank and args
- You need to spawn several processes using the provided utils and setup the process communication utils, i.e. something like:
In my case technically yes (but it has nothing to do with reasons why people use DDP usually). But in general case, it just solves bottleneck issues that arise out of having 6-8+ GPUs.
So you should optimize, refactor and profile your code first and only then, if you see some unsolvable issues or you need over9000 GPUs - then you should switch to DDP.
Is It Worth it?
100% for 6-8 GPUs.
It depends for 2-5 GPUs.
If your code is properly written, then there is little difference for 2-4 GPUs.
Major Design Drawbacks
DDP implies 1 GPU (at least) per process.
You can have 1+ GPUs per process.
You cannot share 1 GPU between 2 processes.
To do so, you would need an Ampere GPU with multi-instance GPU, but it is still not clear whether 3090 or Quadro GPUs will have it.
(I hope team Red will catch up here as well soon!)
Going Deeper
For now I opted for just splicing my train datasets into N parts as easy as
Also trying their RPC framework would be nice, but too much work for me.
#deep_learning
#pytorch
DDP = DistributedDataParallel
DP = DataParallel
I am a bit late to the party (PyTorch now even has its own "redis" key-value DB analog, its own RPC framework and numerous bells and whistles ... likely targeted at enterprise with over 9000 GPUs) but let me write my first impression here.
I usually always was just able to optimize my models and code not to require 4+ GPUs (DDP becomes essential after 4-5 GPUs, for 2-3 it does not really matter and DP just works, for 4 it is arguable):
- Docs are detailed, simple and clean
- Examples in the docs ... are just too plain, but there are guides now, which are also a bit simplistic
- The best way to start is to find some high quality boilerplate. There is lots of shitty boilerplate written in 2018 - PyTorch has evolved and polished its interfaces, so just look out for fresh boilerplate (see last update and cross-reference API invocations)
- Looks like DDP is not the most popular feature, but I did not really face the issues everyone (hangs and freezes, failure to kill the processes gracefully) claimed to face
Turning Your DP Script into a DDP
- Your code has to be properly structured and refactored - then migrating to DDP becomes a weekend project tops
- You need to understand the concepts of rank, world size, communication backend, gradient synchronization
- They finally included it in the docs - use NCCL backend for distributed GPU, Gloo backend for distributed CPU training
- You need to pass
is_leader param to your logging functions to suppress some logging and checkpoints for non-master nodes (rank > 0). Each process has an almost exactly the same model copy anyway- Do not forget to use
barrier() to avoid hangs and for more transparent syncing- You need to rewrite your main function to accept rank and args
- You need to spawn several processes using the provided utils and setup the process communication utils, i.e. something like:
import torch- I am still not exactly sure why, but best boilerplate does
import torch.distributed as dist
def setup_distributed(rank, args):
dist.init_process_group(backend=args.ddp.dist_backend,
rank=rank,
init_method=args.ddp.dist_url,
world_size=args.ddp.world_size)
def spawn_main(main, args):
if args.ddp.enabled:
torch.multiprocessing.spawn(
main, args=(args,), nprocs=args.ddp.world_size, join=True
)
else:
main(0, args)
.to(device, non_blocking=True) instead of to(device)
Is it faster?In my case technically yes (but it has nothing to do with reasons why people use DDP usually). But in general case, it just solves bottleneck issues that arise out of having 6-8+ GPUs.
So you should optimize, refactor and profile your code first and only then, if you see some unsolvable issues or you need over9000 GPUs - then you should switch to DDP.
Is It Worth it?
100% for 6-8 GPUs.
It depends for 2-5 GPUs.
If your code is properly written, then there is little difference for 2-4 GPUs.
Major Design Drawbacks
DDP implies 1 GPU (at least) per process.
You can have 1+ GPUs per process.
You cannot share 1 GPU between 2 processes.
To do so, you would need an Ampere GPU with multi-instance GPU, but it is still not clear whether 3090 or Quadro GPUs will have it.
(I hope team Red will catch up here as well soon!)
Going Deeper
For now I opted for just splicing my train datasets into N parts as easy as
dataset[rank :: world_size], but you can use the provided `key-value`stores for some advanced syncing, but in this case you would really have to care about there seed for random number generators (also double the memory footprint).Also trying their RPC framework would be nice, but too much work for me.
#deep_learning
#pytorch
Telegram
Spark in me - Internet, data science, math, deep learning, philosophy
PyTorch NLP best practices
Very simple ideas, actually.
(1) Multi GPU parallelization and FP16 training
Do not bother reinventing the wheel.
Just use nvidia's apex, DistributedDataParallel, DataParallel.
Best examples [here](https://github.com/huggingface/pytorch…
Very simple ideas, actually.
(1) Multi GPU parallelization and FP16 training
Do not bother reinventing the wheel.
Just use nvidia's apex, DistributedDataParallel, DataParallel.
Best examples [here](https://github.com/huggingface/pytorch…
Torch Dataloader With Workers Leaking RAM
Everyone has faced this issue for HUGE datasets. Is is just because of python itself. If you faced it - you know what I am talking about.
I do not claim this to be a definitive solution, but it worked for me.
If you just need the whole dict, it has some methods to access the whole dict in one big object, which is fast.
#pytorch
#deep_learning
Everyone has faced this issue for HUGE datasets. Is is just because of python itself. If you faced it - you know what I am talking about.
I do not claim this to be a definitive solution, but it worked for me.
import timeBe careful with manager dict though. Though it behaves like a dict, if you just try to iterate over its keys, it will be slow, because it has some overhead for inter-process communication.
import torch
import random
import string
from multiprocessing import Manager
from torch.utils.data import Dataset, DataLoader
def id_gen(size=6,
chars=string.ascii_uppercase):
return ''.join(random.choice(chars)
for _ in range(size))
class DataIter(Dataset):
def __init__(self):
m = Manager()
self.data = m.dict({i: {'key': random.random(),
'path': id_gen(size=10)}
for i in range(1000000)})
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
data = self.data[idx]
return torch.tensor(data['cer']), data['path']
train_data = DataIter()
train_loader = DataLoader(train_data,
batch_size=60,
shuffle=False,
drop_last=False,
pin_memory=False,
num_workers=10)
tic = time.time()
for i, item in enumerate(train_loader):
if (i + 1) % 1000 == 0:
toc = time.time()
print(f"Time for 1000 batches in {toc - tic} s")
tic = time.time()
If you just need the whole dict, it has some methods to access the whole dict in one big object, which is fast.
#pytorch
#deep_learning
👍3