Forwarded from Self Supervised Boy
Yet again simple approach leading to unsupervised segmentation. Mostly useful as pre-training though.
Proposed pipeline first mines saliency object areas (with any available framework, possibly supervised) and then makes contrast learning for pixel embeddings inside those regions. During second step individual pixel embedding is attracted to the mean embedding of its object and pushed away from mean embeddings of other objects. This additional detail differs it from some previously proposed pipelines and allows wider training, because of slower growing rate of the loss pairs.
Less briefly and with some external links here.
Source here.
Proposed pipeline first mines saliency object areas (with any available framework, possibly supervised) and then makes contrast learning for pixel embeddings inside those regions. During second step individual pixel embedding is attracted to the mean embedding of its object and pushed away from mean embeddings of other objects. This additional detail differs it from some previously proposed pipelines and allows wider training, because of slower growing rate of the loss pairs.
Less briefly and with some external links here.
Source here.
Ярослав's Notion on Notion
Unsupervised Semantic Segmentation by Contrasting Objects Mask Proposals
Paper proposes versatile two-step approach of pixel-level embeddings training which could be used both for unsupervised segmentation, or as pre-training for semi-supervised segmentation. Authors argue, that the mid-range prior for training embeddings is better…
Forwarded from Self Supervised Boy
Spotlight on ICLR 2021 by Schmidhuber. Proposes the method of unsupervised keypoints location algorithm with RL application on Atari.
Very clear and simple idea.:
1. Compressing image with VAE and using features from some intermediate layer of encoder later on.
2. Trying to predict feature vector by its surrounding vectors. If the prediction error is high, we found some important object.
3. Compressing error map for image as the mixture of gaussians with fixed covariance, each center representing one keypoint.
SoTA on Atari games, more robust to input noise.
Probably, could be also used outside of simple Atari framework if you have enough data to train, and take later layers of encoder.
With colorfull images here: https://www.notion.so/Unsupervised-Object-Keypoint-Learning-Using-Local-Spatial-Predictability-ddcf36a856ff4e389050b3089cd710bc
Source here: https://openreview.net/pdf?id=GJwMHetHc73
Very clear and simple idea.:
1. Compressing image with VAE and using features from some intermediate layer of encoder later on.
2. Trying to predict feature vector by its surrounding vectors. If the prediction error is high, we found some important object.
3. Compressing error map for image as the mixture of gaussians with fixed covariance, each center representing one keypoint.
SoTA on Atari games, more robust to input noise.
Probably, could be also used outside of simple Atari framework if you have enough data to train, and take later layers of encoder.
With colorfull images here: https://www.notion.so/Unsupervised-Object-Keypoint-Learning-Using-Local-Spatial-Predictability-ddcf36a856ff4e389050b3089cd710bc
Source here: https://openreview.net/pdf?id=GJwMHetHc73
Ярослав's Notion on Notion
Unsupervised Object Keypoint Learning Using Local Spatial Predictability
In this paper authors proposed the new approach to the unsupervised keypoint learning. Previous SoTA approach, Transporter, was guided by the movement between slices to learn keypoints. In current paper authors shown possible flaws of such training procedure…
Involution: Inverting the Inherence of Convolution for Visual Recognition
ByteDance AI Lab
Convolution has been the core ingredient of modern neural networks. Now authors propose a novel atomic operation or deep neural networks by inverting the design principles of convolution.
Proposed Involution-based models improve over the conv-based baselines using ResNet-50:
- by up to 1.6% top-1 accuracy on Imagent classification,
- by 2.5% detection AP on COCO and
- by 2.4% on COCO segmentation
- by 4.7% mean IoU on Cityscapes segmentation
Moreover, the computational cost is reduced by ~60%.
To understand the Involution, it's better to read the paper though.
I don't know but maybe it will be something that universal like GroupNorm and will improve performance in almost any task?
📝 Paper
🛠 Code
ByteDance AI Lab
Convolution has been the core ingredient of modern neural networks. Now authors propose a novel atomic operation or deep neural networks by inverting the design principles of convolution.
Proposed Involution-based models improve over the conv-based baselines using ResNet-50:
- by up to 1.6% top-1 accuracy on Imagent classification,
- by 2.5% detection AP on COCO and
- by 2.4% on COCO segmentation
- by 4.7% mean IoU on Cityscapes segmentation
Moreover, the computational cost is reduced by ~60%.
To understand the Involution, it's better to read the paper though.
I don't know but maybe it will be something that universal like GroupNorm and will improve performance in almost any task?
📝 Paper
🛠 Code
It has been less than a week since Mark Zuckerberg promised face tracking in Oculus devices and HTC rapidly announced VIVE Facial Tracker which seamlessly tracks 38 facial movements across the lips, jaw, teeth, tongue, chin, and cheeks.
Amazing how this seamingly simple technology significantly improves virtual experience.
With VR becoming more profitable, companies like Valve and Facebook continue to invest in the technology. And now rumors are swirling that Apple is working on a mixed-reality headset as well.
This is my approximate interpretation of the Russian post from @ai_newz
Amazing how this seamingly simple technology significantly improves virtual experience.
With VR becoming more profitable, companies like Valve and Facebook continue to invest in the technology. And now rumors are swirling that Apple is working on a mixed-reality headset as well.
This is my approximate interpretation of the Russian post from @ai_newz
TechRadar
HTC Vive has a new VR trick – full facial tracking
It'll require a new accessory though, with improved body tracking incoming too via a new add-on.
This media is not supported in your browser
VIEW IN TELEGRAM
Example of HTC VIVE Face tracking in action.
Some psychedelic neural art. The first one is pretty awesome and indeed worth printing on a t-shirt. Thanks @krasniy_doshik.
This media is not supported in your browser
VIEW IN TELEGRAM
MIT 6.S192: Deep Learning for Art, Aesthetics, and Creativity
Privet guys!
As you could notice I'm fond of neural art and artistic style transfer and have even published some papers on this topic (ECCV18, CVPR19, ICCV19). That's why today I'm very happy to share an awesome mini-course from MIT on Neural Art and Creativity👩🏼🎨. This course has a lineup of great invited speakers like Phillip Isola (MIT), Alyosha Efros (UC Berkeley), Jeff Clune (OpenAI), etc. The video lectures are free and available online.
🌀 http://deepcreativity.csail.mit.edu
Privet guys!
As you could notice I'm fond of neural art and artistic style transfer and have even published some papers on this topic (ECCV18, CVPR19, ICCV19). That's why today I'm very happy to share an awesome mini-course from MIT on Neural Art and Creativity👩🏼🎨. This course has a lineup of great invited speakers like Phillip Isola (MIT), Alyosha Efros (UC Berkeley), Jeff Clune (OpenAI), etc. The video lectures are free and available online.
🌀 http://deepcreativity.csail.mit.edu
Transformers Comprise the Fourth Pillar of Deep Learning
ARK Invest - one of the biggest asset-management companies and it is focused on disruptive technologies. They are convinced that Transformers is the next big thing and as recent language models with billions of parameters are very computationally demanding ARK Invest bets a lot on the growth of the AI chip market 🦾.
According to their research, Deep Learning had added a mindblowing $1 trillion in equity market capitalization to companies like Alphabet, Amazon, Nvidia, and TSMC as of year-end 2019 and perhaps another $250-500 billion in 2020. They predict that AI would contribute roughly $30 trillion to global equity market cap creation over the next 20 years.
🗣 Source post
ARK Invest - one of the biggest asset-management companies and it is focused on disruptive technologies. They are convinced that Transformers is the next big thing and as recent language models with billions of parameters are very computationally demanding ARK Invest bets a lot on the growth of the AI chip market 🦾.
According to their research, Deep Learning had added a mindblowing $1 trillion in equity market capitalization to companies like Alphabet, Amazon, Nvidia, and TSMC as of year-end 2019 and perhaps another $250-500 billion in 2020. They predict that AI would contribute roughly $30 trillion to global equity market cap creation over the next 20 years.
🗣 Source post
Google and Facebook Datacenter AI Workloads as of year 2018 (before the raise of Transformers 😀). Multi-layer perceptrons (MLPs) here are responsible for ranking and recommendations for search and content feeds like Instagram, Netflix, and YouTube.
—
Have you seen anywhere any recent stats on this matter? Would be very interesting to see and compare.
—
Have you seen anywhere any recent stats on this matter? Would be very interesting to see and compare.
Self-training Improves Pre-training for Natural Language Understanding
Facebook AI & Stanford
Most semi-supervised NLP approaches require specifically in-domain unlabeled data. It means that for the best results, the unlabeled portion of the data which we want to use for semi-supervised training must be from the same domain as the annotated dataset.
This paper proposes SenAugment - a method that constructs task-specific in-domain unannotated datasets on the fly from the large external bank of sentences. So for any new NLP task where we have only a small dataset, we don't need to bother anymore to collect a very similar unannotated dataset if we want to use semi-supervised training.
Now we can sort of cheat to improve the performance of an NLP model on almost any downstream task using Self-training (which is also called Teacher-Student training):
1. We retrieve the most relevant sentences (few millions of them) for the current downstream task from the external bank. For retrieval we use the embedding space of a sentence encoder - Transformer, pre-trained with masked language modeling and finetuned to maximize cosine similarity between similar sentences.
2. We train the teacher model - a RoBERTa-Large model finetuned on the downstream task.
3. Then we use a teacher model to annotate the retrieved unlabeled in-domain sentences. We perform additional filtering by keeping the ones that have the high-confident predictions.
4. As our student model, we then finetune a new RoBERTa-Large using KL-divergence on the synthetic data by considering the post-softmax class probabilities as labels (i.e., not only the most confident class but the entire class distribution is used as a label for every sentence).
Such a self-training procedure significantly boosts the performance compared to the baseline. And the positive effect is higher when fewer GT annotated sentences are available.
As a large-scale external bank of unannotated sentences, authors use CommonCrowl. In particular, they use a corpus with 5 billion sentences (100B words). Because of its scale and diversity, the sentence bank contains data from various domains and with different styles, allowing to retrieve relevant data for many downstream tasks. To retrieve the most relevant sentences for a specific downstream task, we need to obtain an embedding for the task. Several options exist: (1) average embeddings of all sentences in the training set; (2) average embeddings for every class; (3) keep original sentences embeddings.
📝 Paper
🛠 Code
#paper_explained #nlp
Facebook AI & Stanford
Most semi-supervised NLP approaches require specifically in-domain unlabeled data. It means that for the best results, the unlabeled portion of the data which we want to use for semi-supervised training must be from the same domain as the annotated dataset.
This paper proposes SenAugment - a method that constructs task-specific in-domain unannotated datasets on the fly from the large external bank of sentences. So for any new NLP task where we have only a small dataset, we don't need to bother anymore to collect a very similar unannotated dataset if we want to use semi-supervised training.
Now we can sort of cheat to improve the performance of an NLP model on almost any downstream task using Self-training (which is also called Teacher-Student training):
1. We retrieve the most relevant sentences (few millions of them) for the current downstream task from the external bank. For retrieval we use the embedding space of a sentence encoder - Transformer, pre-trained with masked language modeling and finetuned to maximize cosine similarity between similar sentences.
2. We train the teacher model - a RoBERTa-Large model finetuned on the downstream task.
3. Then we use a teacher model to annotate the retrieved unlabeled in-domain sentences. We perform additional filtering by keeping the ones that have the high-confident predictions.
4. As our student model, we then finetune a new RoBERTa-Large using KL-divergence on the synthetic data by considering the post-softmax class probabilities as labels (i.e., not only the most confident class but the entire class distribution is used as a label for every sentence).
Such a self-training procedure significantly boosts the performance compared to the baseline. And the positive effect is higher when fewer GT annotated sentences are available.
As a large-scale external bank of unannotated sentences, authors use CommonCrowl. In particular, they use a corpus with 5 billion sentences (100B words). Because of its scale and diversity, the sentence bank contains data from various domains and with different styles, allowing to retrieve relevant data for many downstream tasks. To retrieve the most relevant sentences for a specific downstream task, we need to obtain an embedding for the task. Several options exist: (1) average embeddings of all sentences in the training set; (2) average embeddings for every class; (3) keep original sentences embeddings.
📝 Paper
🛠 Code
#paper_explained #nlp
What happens if you augment your training dataset with a load of stylized images as well?
Someone trained a StyleGAN2-ada on the images augmented with style transfer and synced the output with audio 🎶.
Someone trained a StyleGAN2-ada on the images augmented with style transfer and synced the output with audio 🎶.
YouTube
StyleGAN2-ada-pytorch audio reactive weirdness
So what happens if you augment your dataset with a load of style-transfer images as well? Well, I guess it sort of seems to work. Now I think I need to up my dataset size from 3000 to over 9000! I should probably test with 256x256 images first, right? Think…