Ted Talk with Yann LeCun
in which Yann discusses his current research into self-supervised machine learning, how he's trying to build machines that learn with common sense (like humans) and his hopes for the next conceptual breakthrough in AI.
βΆοΈ Watch
in which Yann discusses his current research into self-supervised machine learning, how he's trying to build machines that learn with common sense (like humans) and his hopes for the next conceptual breakthrough in AI.
βΆοΈ Watch
Ted
Deep learning, neural networks and the future of AI
Yann LeCun, the chief AI scientist at Facebook, helped develop the deep learning algorithms that power many artificial intelligence systems today. In conversation with head of TED Chris Anderson, LeCun discusses his current research into self-supervised machineβ¦
Open source 2.7 billion parameter GPT-3 model was released
As you probably know OpenAI has not released source code or pre-trained weights for their 175 billion language model GPT-3. Instead, they decided to create a commercial product and exclusively license GPT-3 to Microsoft.
But open-source enthusiasts from eleuther.ai have released the weights of 1.3B and 2.7B parameter models of their replication of GPT-3. It is the largest (afaik) publicly available GPT-3 replica. The primary goal of this project is to replicate a full-sized GPT-3 model and open source it to the public, for free.
The models were trained on an open-source dataset The Pile which consists of 22 smaller, high-quality datasets combined together (825 GB text in total). Larger GPT models (10B and full GPT-3) are not available yet, but soon will be.
The 2.7B model likely won't fit into the Colab TPUs memory, and you may have to get some larger TPU pods to finetune from it. Sampling from it, however, works just fine.
It is the first time I tried some kind of GPT model, and I was not too impressed (see my fav results below). Maybe 2.7 billion params are not enough yet to leave the copyrighters without their jobs π€·πΌββοΈ. Let's wait for a bigger model.
βοΈ Code in Mesh-Tensforflow
π¦ 1.3 and 2.7 model weights and configs
π« Colab notebook
As you probably know OpenAI has not released source code or pre-trained weights for their 175 billion language model GPT-3. Instead, they decided to create a commercial product and exclusively license GPT-3 to Microsoft.
But open-source enthusiasts from eleuther.ai have released the weights of 1.3B and 2.7B parameter models of their replication of GPT-3. It is the largest (afaik) publicly available GPT-3 replica. The primary goal of this project is to replicate a full-sized GPT-3 model and open source it to the public, for free.
The models were trained on an open-source dataset The Pile which consists of 22 smaller, high-quality datasets combined together (825 GB text in total). Larger GPT models (10B and full GPT-3) are not available yet, but soon will be.
The 2.7B model likely won't fit into the Colab TPUs memory, and you may have to get some larger TPU pods to finetune from it. Sampling from it, however, works just fine.
It is the first time I tried some kind of GPT model, and I was not too impressed (see my fav results below). Maybe 2.7 billion params are not enough yet to leave the copyrighters without their jobs π€·πΌββοΈ. Let's wait for a bigger model.
βοΈ Code in Mesh-Tensforflow
π¦ 1.3 and 2.7 model weights and configs
π« Colab notebook
GitHub
GitHub - EleutherAI/gpt-neo: An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library. - EleutherAI/gpt-neo
ββI decided to ask the AI if it will kill us, meatbags, in the near future. This is what I got in response...
Prompt: Will AI enslave the humankind in the next 20-30 years?
==============
GPT-3 Answers:
==============
The increasing pace of AI is not going to decrease. As of now the AI revolution has just began. The effects of AI on people and the future is not known yet, but if that is the AIβs fate, then the exposure of the human population is going to increase drastically.
There is a possibility that you will be in one of the AI enslaved societies.
The AI Revolution is a powerful technologies that is coming to the world in the near future. The AI revolution will be implemented as a protection mechanism for the protection of planet earth from dangerous climate change and increasing nuclear war. The AI has the potential to be our savior for the future. It is going to be a technology that is going to change the world drastically but there is a possibility that this is simply an illusion that we are living in a world of AI and it is just a pure mass hallucination, and all those projections actually represent the AIβs downfall and not an AI savior.
.I am excited to see what the AI revolution is going to bring.
.I believe that it is going to bring great changes to our civilization.
.I believe that it is going to bring great changes in human civilization.
~~~
Well, the last 10 years or so have been awful for everyone. It's not AI enslaving the humans in the next 20 or 30, it's automation on a global scale.
~~~
No. There will be mass deaths, and we'll probably know them as AI-driven holocaust or AI driven genocides. AI will have tools to justify and defend those crimes.
~~~
We already have the technology to create and control the world. I think we're already there. That's not the problem.
~~~
If there is a zombie apocalypse, perhaps we may find a way for AI to take over from humanity yet again.
~~~
Well, the last 10 years or so have been awful for everyone. It's not AI enslaving the humans in the next 20 or 30, it's automation on a global scale.
~~~
A prediction: AI will be able to look at the world around it, and decide what the world would look like without humans in it. Once the AI judge the world without humans, we'll create a world where humans are on the edge of our world and the AI is on the edge of the universe.
Prompt: Will AI enslave the humankind in the next 20-30 years?
==============
GPT-3 Answers:
==============
The increasing pace of AI is not going to decrease. As of now the AI revolution has just began. The effects of AI on people and the future is not known yet, but if that is the AIβs fate, then the exposure of the human population is going to increase drastically.
There is a possibility that you will be in one of the AI enslaved societies.
The AI Revolution is a powerful technologies that is coming to the world in the near future. The AI revolution will be implemented as a protection mechanism for the protection of planet earth from dangerous climate change and increasing nuclear war. The AI has the potential to be our savior for the future. It is going to be a technology that is going to change the world drastically but there is a possibility that this is simply an illusion that we are living in a world of AI and it is just a pure mass hallucination, and all those projections actually represent the AIβs downfall and not an AI savior.
.I am excited to see what the AI revolution is going to bring.
.I believe that it is going to bring great changes to our civilization.
.I believe that it is going to bring great changes in human civilization.
~~~
Well, the last 10 years or so have been awful for everyone. It's not AI enslaving the humans in the next 20 or 30, it's automation on a global scale.
~~~
No. There will be mass deaths, and we'll probably know them as AI-driven holocaust or AI driven genocides. AI will have tools to justify and defend those crimes.
~~~
We already have the technology to create and control the world. I think we're already there. That's not the problem.
~~~
If there is a zombie apocalypse, perhaps we may find a way for AI to take over from humanity yet again.
~~~
Well, the last 10 years or so have been awful for everyone. It's not AI enslaving the humans in the next 20 or 30, it's automation on a global scale.
~~~
A prediction: AI will be able to look at the world around it, and decide what the world would look like without humans in it. Once the AI judge the world without humans, we'll create a world where humans are on the edge of our world and the AI is on the edge of the universe.
Why do models often attend to salient words,and how does this evolve throughout training?
This work tries to understand the black box of attention training. Early on in training, the LSTM attention models first learn how to translate individual words from the bag of words co-occurrence statistics, which then drives the learning of the attention. The authors propose a framework explaining why attention weights obtained by standard training often correlate with saliency, and how multi-head attention can increase performance by improving the training dynamics rather than expressiveness.
Read more below ππ»
This work tries to understand the black box of attention training. Early on in training, the LSTM attention models first learn how to translate individual words from the bag of words co-occurrence statistics, which then drives the learning of the attention. The authors propose a framework explaining why attention weights obtained by standard training often correlate with saliency, and how multi-head attention can increase performance by improving the training dynamics rather than expressiveness.
Read more below ππ»
Forwarded from DL in NLP (nlpcontroller_bot)
Approximating How Single Head Attention Learns
Snell et al., [Berkeley]
arxiv.org/abs/2103.07601
A look inside LSTM seq2seq with attention training dynamics. The main idea of the paper is KTIW β Knowledge to Translate Individual Words. To explain the dynamics, the authors divide training into two stages: uniform attention (KTIW) and non-uniform attention.
In the first stage of model training, attention does not change significantly from the uniform, and the model mainly learns to translate individual words (KTIW, a.k.a. dictionary translation). After KTIW is learned, attention starts forming its patterns, and this process is driven by the KTIW. As correct word translations are already more probable, now attention mainly needs to align the words from the source and target language.
To quantitatively test the hypothesis, they develop a new lexical prob that is essentially hard attention. Yet, the most impressive result is that attention cannot learn a simple copy operation if KTIW is not learned.
Snell et al., [Berkeley]
arxiv.org/abs/2103.07601
A look inside LSTM seq2seq with attention training dynamics. The main idea of the paper is KTIW β Knowledge to Translate Individual Words. To explain the dynamics, the authors divide training into two stages: uniform attention (KTIW) and non-uniform attention.
In the first stage of model training, attention does not change significantly from the uniform, and the model mainly learns to translate individual words (KTIW, a.k.a. dictionary translation). After KTIW is learned, attention starts forming its patterns, and this process is driven by the KTIW. As correct word translations are already more probable, now attention mainly needs to align the words from the source and target language.
To quantitatively test the hypothesis, they develop a new lexical prob that is essentially hard attention. Yet, the most impressive result is that attention cannot learn a simple copy operation if KTIW is not learned.
π₯New DALL-E? Paint by Wordπ₯
Fresh Blogpost!
In this post, I will give a brief overview of the recent paper from MIT Paint by Word and compare it to DALL-E. Authors introduce a novel method which is to be able to paint in an image arbitrary new concepts described by text at any specific location provided by the user in a form of a mask. The proposed Paint by Word method can also generate a full image just based on a textual description.
π Read more in the Blogpost
There is also Telegram InstantView of the post. But it is better to read it in a regular browser, as Telegram doesn't render Latex formulas.
Fresh Blogpost!
In this post, I will give a brief overview of the recent paper from MIT Paint by Word and compare it to DALL-E. Authors introduce a novel method which is to be able to paint in an image arbitrary new concepts described by text at any specific location provided by the user in a form of a mask. The proposed Paint by Word method can also generate a full image just based on a textual description.
π Read more in the Blogpost
There is also Telegram InstantView of the post. But it is better to read it in a regular browser, as Telegram doesn't render Latex formulas.
Prepare your ears π§πΌββοΈ - The Robot Brains Podcast
Pieter Abbeel, renown Professor at Berkeley, Director of the Berkeley Robot Learning Lab and Co-Director of the Berkeley Artificial Intelligence (BAIR) Lab, has launched a new podcast about AI.
https://www.therobotbrains.ai/
Let me know in comments if you want me to share my list of fav AI/Machine Learning podcasts
Pieter Abbeel, renown Professor at Berkeley, Director of the Berkeley Robot Learning Lab and Co-Director of the Berkeley Artificial Intelligence (BAIR) Lab, has launched a new podcast about AI.
https://www.therobotbrains.ai/
Let me know in comments if you want me to share my list of fav AI/Machine Learning podcasts
This media is not supported in your browser
VIEW IN TELEGRAM
Texel has created a Tg bot for virtual try-on
Virtual try-on is getting more attention recently. There is a bot @Texel_Materia_Bot, where anyone may upload their photo and dress-up a bit. The gif above show how it looks on a perfect photo.
Read about the Texel bot
Virtual try-on is getting more attention recently. There is a bot @Texel_Materia_Bot, where anyone may upload their photo and dress-up a bit. The gif above show how it looks on a perfect photo.
Read about the Texel bot
But I gave the bot a hard time π. I don't know why but one foot becomes bare very often. Interesting how robust it is on the oversized clothes (first image), and apparently the algorithm includes some sort of parametric 3D shape fitting (in the spirit of SMPL) and inpainting (see how some regions behind the original jacket were reconstruted).
ββFinetuning Pretrained Transformers into RNNs
Microsoft+Deepmind+...
Transformers is the current SOTA in language modeling. But they come with significant computational overhead, as the attention mechanism scales quadratically in sequence length. The memory consumption also grows linearly as the sequence becomes longer. This bottleneck limits the usage of large-scale pretrained generation models, such as GPT-3 or Image transformers.
Several efficient transformer variants have been proposed recently. For example, a linear-complexity recurrent variant has proven well suited for an autoregressive generation. It approximates the softmax attention with randomized or heuristic feature maps but can be difficult to train or yield suboptimal accuracy.
This work converts a pretrained transformer into its efficient linear-complexity recurrent counterpart with a learned feature map to improve the efficiency while retaining the accuracy. To achieve this, they replace the softmax attention in an off-the-shelf pretrained transformer with its linear-complexity recurrent alternative and then finetune.
β Pros:
+ The finetuning process requires much less GPU time than training the recurrent variants from scratch
+ Converting a large off-the-shelf transformer to a lightweight inference model w/o repeating the whole training procedure is very handy in many downstream applications.
π arxiv.org/abs/2103.13076
Microsoft+Deepmind+...
Transformers is the current SOTA in language modeling. But they come with significant computational overhead, as the attention mechanism scales quadratically in sequence length. The memory consumption also grows linearly as the sequence becomes longer. This bottleneck limits the usage of large-scale pretrained generation models, such as GPT-3 or Image transformers.
Several efficient transformer variants have been proposed recently. For example, a linear-complexity recurrent variant has proven well suited for an autoregressive generation. It approximates the softmax attention with randomized or heuristic feature maps but can be difficult to train or yield suboptimal accuracy.
This work converts a pretrained transformer into its efficient linear-complexity recurrent counterpart with a learned feature map to improve the efficiency while retaining the accuracy. To achieve this, they replace the softmax attention in an off-the-shelf pretrained transformer with its linear-complexity recurrent alternative and then finetune.
β Pros:
+ The finetuning process requires much less GPU time than training the recurrent variants from scratch
+ Converting a large off-the-shelf transformer to a lightweight inference model w/o repeating the whole training procedure is very handy in many downstream applications.
π arxiv.org/abs/2103.13076
This media is not supported in your browser
VIEW IN TELEGRAM
Ever wanted to run dense pose recognition on animals πΆ but didn't have labels?
Facebook AI Research
Now you can train on animals w/o annotations using teacher-student training and utilize datasets with labeled Humans to generalize to animals.
Facebook AI Research recently released the source code (as a part of detectron2) for our paper and pretrained teacher and student models for Chimps dataset. Huge shout-out to Vasil Khalidov for this release!
We (yes, I'm the first author), introduced the DensePose Evolution framework, which can be used to bootstrap DensePose on unlabeled data with animals.
βοΈ DensePose Evolution source code and models
πDensePose Evolution proj page
The method explained below π
Facebook AI Research
Now you can train on animals w/o annotations using teacher-student training and utilize datasets with labeled Humans to generalize to animals.
Facebook AI Research recently released the source code (as a part of detectron2) for our paper and pretrained teacher and student models for Chimps dataset. Huge shout-out to Vasil Khalidov for this release!
We (yes, I'm the first author), introduced the DensePose Evolution framework, which can be used to bootstrap DensePose on unlabeled data with animals.
βοΈ DensePose Evolution source code and models
πDensePose Evolution proj page
The method explained below π
ββDensepose Evolution Models & Bootstrapping Pipeline
π¬ The training proceeds in two stages (see image below):
1. First, a master model is trained on data from the source domain (humans with full DensePose annotation S, I, U and V) and supporting domain (animals with segmentation annotation only). Only selected animal classes are chosen from the supporting domain through category filters to guarantee the quality of target domain results. The training is done in a class-agnostic manner: all selected categories are mapped to a single category (human).
2. Second, a student model is trained on data from source and supporting domains, as well as data from target domain obtained by applying the master model, selecting high-confidence detections, and sampling the results.
βοΈ What is included in the GitHub repository:
1. Models that perform estimation of confidence in regressed UV coordinates as well as confidences associated with coarse and fine segmentation.
2. Master and student models trained using the bootstrapping pipeline with chimpanzee as the target category.
3. The source code for the entire pipeline.
𦧠Model Zoo
π¨βπ For a more exhaustive explanation of this method please check my older post.
π¬ The training proceeds in two stages (see image below):
1. First, a master model is trained on data from the source domain (humans with full DensePose annotation S, I, U and V) and supporting domain (animals with segmentation annotation only). Only selected animal classes are chosen from the supporting domain through category filters to guarantee the quality of target domain results. The training is done in a class-agnostic manner: all selected categories are mapped to a single category (human).
2. Second, a student model is trained on data from source and supporting domains, as well as data from target domain obtained by applying the master model, selecting high-confidence detections, and sampling the results.
βοΈ What is included in the GitHub repository:
1. Models that perform estimation of confidence in regressed UV coordinates as well as confidences associated with coarse and fine segmentation.
2. Master and student models trained using the bootstrapping pipeline with chimpanzee as the target category.
3. The source code for the entire pipeline.
𦧠Model Zoo
π¨βπ For a more exhaustive explanation of this method please check my older post.
Can Vision Transformers Learn without Natural Images? YES!π₯
This is very exciting. It was shown that we can pretrain Vision Transformers purely on synthetic fractal dataset w/o any manual annotations and achieve similar performance on downstream tasks as self-supervised pretraining on ImageNet and similar performance to supervised pretraining on other datasets like Places.
Authors also pretrained regular ResNets on their fractal synthetic data. It works pretty well too, although DeiT Transformers are better.
Overall, this is good news. If we can come up with clever approaches to synthetic data generation, then we can generate arbitrarily large datasets for free!
π Paper
π Proj page
π¦ Fractal dataset is described in this paper.
This is very exciting. It was shown that we can pretrain Vision Transformers purely on synthetic fractal dataset w/o any manual annotations and achieve similar performance on downstream tasks as self-supervised pretraining on ImageNet and similar performance to supervised pretraining on other datasets like Places.
Authors also pretrained regular ResNets on their fractal synthetic data. It works pretty well too, although DeiT Transformers are better.
Overall, this is good news. If we can come up with clever approaches to synthetic data generation, then we can generate arbitrarily large datasets for free!
π Paper
π Proj page
π¦ Fractal dataset is described in this paper.