Data Science by ODS.ai 🦜

A Recipe for Training Neural Networks by Andrej Karpathy

New article written by Andrej Karpathy distilling a bunch of useful heuristics for training neural nets. The post is full of real-world knowledge and how-to details that are not taught in books and often take endless hours to learn the hard way.

Link: https://karpathy.github.io/2019/04/25/recipe/

#tipsandtricks #karpathy #tutorial #nn #ml #dl

karpathy.github.io

A Recipe for Training Neural Networks

Musings of a Computer Scientist.

8.38K viewsedited 04:59

🔥 49 😑 2

Data Science by ODS.ai 🦜

A tiny autograd engine

Andrej Karpathy recently released a library called micrograd which provides the ability to build & train a NN using a simple and intuitive interface.

In fact, he wrote the whole library in roughly 150 lines of code which he claims is the tiniest autograd engine there is. Ideally, such types of libraries can be used for educational purposes.

github: https://github.com/karpathy/micrograd

#karpathy #autograd

9.82K views13:38

👎🏿 5 👍🏿 47

Data Science by ODS.ai 🦜

mingpt – a minimal pytorch re-implementation of the openai generative pretrained transformer training
by karpathy

small, clean, interpretable and educational, as most of the currently available ones are a bit sprawling. this implementation is appropriately about 300 lines of code, including boilerplate and a totally unnecessary custom causal self-attention module. all that's going on is that a sequence of indices goes into a sequence of transformer blocks, and a probability distribution of the next index comes out.

with a bpe encoder, distributed training and maybe fp16 this implementation may be able to reproduce gpt-1/gpt-2 results, though they haven't tried $$$. gpt-3 is likely out of reach as his understanding is that it does not fit into gpu memory and requires a more careful model-parallel treatment.

https://twitter.com/karpathy/status/1295410274095095810?s=20

#nlp #karpathy #gpt #torch

Twitter

Andrej Karpathy

I wrote a minimal/educational GPT training library in PyTorch, am calling it minGPT as it is only around ~300 lines of code: https://t.co/79S9lShJRN +demos for addition and character-level language model. (quick weekend project, may contain sharp edges)

15.1K viewsedited 18:36

Comment

Data Science by ODS.ai 🦜

Deep Neural Nets: 33 years ago and 33 years from now

Great post by Andrej Karpathy on the progress #CV made in 33 years.

Author's ideas on what would a time traveler from 2055 think about the performance of current networks:

* 2055 neural nets are basically the same as 2022 neural nets on the macro level, except bigger.
* Our datasets and models today look like a joke. Both are somewhere around 10,000,000X larger.
* One can train 2022 state of the art models in ~1 minute by training naively on their personal computing device as a weekend fun project.
* Today’s models are not optimally formulated, and just changing some of the details of the model, loss function, augmentation or the optimizer we can about halve the error.
* Our datasets are too small, and modest gains would come from scaling up the dataset alone.
* Further gains are actually not possible without expanding the computing infrastructure and investing into some R&D on effectively training models on that scale.

Website: https://karpathy.github.io/2022/03/14/lecun1989/
OG Paper link: http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf

#karpathy #archeology #cv #nn

karpathy.github.io

Deep Neural Nets: 33 years ago and 33 years from now

Musings of a Computer Scientist.

👍48😢4🤮3🥰2😁2❤1

25.1K viewsedited 07:09

About

Blog

Apps

Platform