Just links

Pre-trained BERT in PyTorch

https://github.com/huggingface/pytorch-pretrained-BERT

(1)
Model code here is just awesome.
Integrated DataParallel / DDP wrappers / FP16 wrappers also are awesome.

FP16 precision training from APEX just works (no idea about convergence though yet).

(2)
As for model weights - I cannot really tell, there is no dedicated Russian model.
The only problem I am facing now - using large embeddings bags batch size is literally 1-4 even for smaller models.

And training models with sentence piece is kind of feasible for rich languages, but you will always worry about generalization.

(3)
Did not try the generative pre-training (and sentence prediction pre-training), I hope that properly initializing embeddings will also work for a closed domain with a smaller model (they pre-train 4 days on 4+ TPUs, lol).

(5)
Why even tackle such models?
Chat / dialogue / machine comprehension models are complex / require one-off feature engineering.
Being able to tune something like BERT on publicly available benchmarks and then on your domain can provide a good way to embed complex situations (like questions in dialogues).

#nlp
#deep_learning

GitHub

GitHub - huggingface/transformers: 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models…

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. - GitHub - huggingface/t...

12 views08:13

Just links

https://www.reddit.com/r/Coq/comments/aj87a5/so_i_translated_part_of_the_1st_chapter_of_adam/

From the Coq community on Reddit: So I translated part of the 1st chapter of Adam Chlipala's book code from Coq into C++ template…

Explore this post and more from the Coq community

486 views08:44

Just links

https://twitter.com/aureliengeron/status/1088358749561999360

Twitter

Aurélien Geron

Just added code examples and exercises on #autodiff in @TensorFlow 2.0. Covers autodiff basics, computing 2nd order derivatives, how to write a custom training loop to train a Keras model, and how to use running metrics. See notebook #2 in https://t.co/j77gSiJDVt…

427 views17:39

Just links

Forwarded from Hacker News

AlphaStar: Mastering the Real-Time Strategy Game StarCraft II (Score: 112+ in 1 hour)

Link: https://readhacker.news/s/3Wamk
Comments: https://readhacker.news/c/3Wamk

8 views07:04

Just links

https://www.technologyreview.com/s/612768/we-analyzed-16625-papers-to-figure-out-where-ai-is-headed-next/

MIT Technology Review

We analyzed 16,625 papers to figure out where AI is headed next

Our study of 25 years of artificial-intelligence research suggests the era of deep learning may come to an end.

429 views14:54

Just links

https://www.reddit.com/r/MachineLearning/comments/ajgzoc/we_are_oriol_vinyals_and_david_silver_from/

We are Oriol Vinyals and David Silver from DeepMind’s AlphaStar...

Hi there! We are Oriol Vinyals (/u/OriolVinyals) and David Silver (/u/David_Silver), lead researchers on DeepMind’s AlphaStar team, joined by...

424 views15:22