A Recipe for Training Neural Networks by Andrej Karpathy
New article written by Andrej Karpathy distilling a bunch of useful heuristics for training neural nets. The post is full of real-world knowledge and how-to details that are not taught in books and often take endless hours to learn the hard way.
Link: https://karpathy.github.io/2019/04/25/recipe/
#tipsandtricks #karpathy #tutorial #nn #ml #dl
New article written by Andrej Karpathy distilling a bunch of useful heuristics for training neural nets. The post is full of real-world knowledge and how-to details that are not taught in books and often take endless hours to learn the hard way.
Link: https://karpathy.github.io/2019/04/25/recipe/
#tipsandtricks #karpathy #tutorial #nn #ml #dl
karpathy.github.io
A Recipe for Training Neural Networks
Musings of a Computer Scientist.
A tiny autograd engine
Andrej Karpathy recently released a library called micrograd which provides the ability to build & train a NN using a simple and intuitive interface.
In fact, he wrote the whole library in roughly 150 lines of code which he claims is the tiniest autograd engine there is. Ideally, such types of libraries can be used for educational purposes.
github: https://github.com/karpathy/micrograd
#karpathy #autograd
Andrej Karpathy recently released a library called micrograd which provides the ability to build & train a NN using a simple and intuitive interface.
In fact, he wrote the whole library in roughly 150 lines of code which he claims is the tiniest autograd engine there is. Ideally, such types of libraries can be used for educational purposes.
github: https://github.com/karpathy/micrograd
#karpathy #autograd
mingpt – a minimal pytorch re-implementation of the openai generative pretrained transformer training
by karpathy
small, clean, interpretable and educational, as most of the currently available ones are a bit sprawling. this implementation is appropriately about 300 lines of code, including boilerplate and a totally unnecessary custom causal self-attention module. all that's going on is that a sequence of indices goes into a sequence of transformer blocks, and a probability distribution of the next index comes out.
with a bpe encoder, distributed training and maybe fp16 this implementation may be able to reproduce gpt-1/gpt-2 results, though they haven't tried $$$. gpt-3 is likely out of reach as his understanding is that it does not fit into gpu memory and requires a more careful model-parallel treatment.
https://twitter.com/karpathy/status/1295410274095095810?s=20
#nlp #karpathy #gpt #torch
by karpathy
small, clean, interpretable and educational, as most of the currently available ones are a bit sprawling. this implementation is appropriately about 300 lines of code, including boilerplate and a totally unnecessary custom causal self-attention module. all that's going on is that a sequence of indices goes into a sequence of transformer blocks, and a probability distribution of the next index comes out.
with a bpe encoder, distributed training and maybe fp16 this implementation may be able to reproduce gpt-1/gpt-2 results, though they haven't tried $$$. gpt-3 is likely out of reach as his understanding is that it does not fit into gpu memory and requires a more careful model-parallel treatment.
https://twitter.com/karpathy/status/1295410274095095810?s=20
#nlp #karpathy #gpt #torch
Twitter
Andrej Karpathy
I wrote a minimal/educational GPT training library in PyTorch, am calling it minGPT as it is only around ~300 lines of code: https://t.co/79S9lShJRN +demos for addition and character-level language model. (quick weekend project, may contain sharp edges)
Deep Neural Nets: 33 years ago and 33 years from now
Great post by Andrej Karpathy on the progress #CV made in 33 years.
Author's ideas on what would a time traveler from 2055 think about the performance of current networks:
* 2055 neural nets are basically the same as 2022 neural nets on the macro level, except bigger.
* Our datasets and models today look like a joke. Both are somewhere around 10,000,000X larger.
* One can train 2022 state of the art models in ~1 minute by training naively on their personal computing device as a weekend fun project.
* Today’s models are not optimally formulated, and just changing some of the details of the model, loss function, augmentation or the optimizer we can about halve the error.
* Our datasets are too small, and modest gains would come from scaling up the dataset alone.
* Further gains are actually not possible without expanding the computing infrastructure and investing into some R&D on effectively training models on that scale.
Website: https://karpathy.github.io/2022/03/14/lecun1989/
OG Paper link: http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf
#karpathy #archeology #cv #nn
Great post by Andrej Karpathy on the progress #CV made in 33 years.
Author's ideas on what would a time traveler from 2055 think about the performance of current networks:
* 2055 neural nets are basically the same as 2022 neural nets on the macro level, except bigger.
* Our datasets and models today look like a joke. Both are somewhere around 10,000,000X larger.
* One can train 2022 state of the art models in ~1 minute by training naively on their personal computing device as a weekend fun project.
* Today’s models are not optimally formulated, and just changing some of the details of the model, loss function, augmentation or the optimizer we can about halve the error.
* Our datasets are too small, and modest gains would come from scaling up the dataset alone.
* Further gains are actually not possible without expanding the computing infrastructure and investing into some R&D on effectively training models on that scale.
Website: https://karpathy.github.io/2022/03/14/lecun1989/
OG Paper link: http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf
#karpathy #archeology #cv #nn
karpathy.github.io
Deep Neural Nets: 33 years ago and 33 years from now
Musings of a Computer Scientist.
👍48😢4🤮3🥰2😁2❤1