How Tesla Truck design affects design of all autonomous vehicles
#design #selfdriving #autonomousvehicle #rl #scania
#design #selfdriving #autonomousvehicle #rl #scania
Data Science by ODS.ai π¦
ββThorough analysis of recent Tesla Model 3 accident and warning to autopilot users Olga Uskova shared insights of her #CognitivePilot team members on #Tesla accident. Highlights: - Please donβt use autopilot on highways. They are still buggy and in developmentβ¦
If you wanna explore #selfdriving problems more and havenβt seen it yet, you are welcome to checkout this post.
ββVirTex: Learning Visual Representations from Textual Annotations
The authors offer an alternative approach to pre-training backbones for CV tasks β using semantically dense captions to learn visual representations.
Recent methods have explored unsupervised pretraining to scale to vast quantities of unlabeled images. In contrast, the authors aim to learn high-quality visual representations from fewer images. They revisit supervised pretraining and seek data-efficient alternatives to classification-based pretraining.
VirTex (CNN + Transformer) is pre-trained on COCO captions. On downstream tasks it can reach performance similar to pre-training on ImageNet, but with 10x less images!
Paper: https://arxiv.org/abs/2006.06666
Code: https://github.com/kdexd/virtex
Site: https://kdexd.github.io/virtex/
#imagecaptioning #cv #visual #annotation #transformer #pretraining #transferlearning #deeplearning #paper
The authors offer an alternative approach to pre-training backbones for CV tasks β using semantically dense captions to learn visual representations.
Recent methods have explored unsupervised pretraining to scale to vast quantities of unlabeled images. In contrast, the authors aim to learn high-quality visual representations from fewer images. They revisit supervised pretraining and seek data-efficient alternatives to classification-based pretraining.
VirTex (CNN + Transformer) is pre-trained on COCO captions. On downstream tasks it can reach performance similar to pre-training on ImageNet, but with 10x less images!
Paper: https://arxiv.org/abs/2006.06666
Code: https://github.com/kdexd/virtex
Site: https://kdexd.github.io/virtex/
#imagecaptioning #cv #visual #annotation #transformer #pretraining #transferlearning #deeplearning #paper
Forwarded from Graph Machine Learning
ICML 2020. Comprehensive analysis of authors, organizations, and countries.
Finally here is my post on the analysis of ICML 2020. There are several things I learned from that. For example that USA participates in 3/4 of the papers π± Or that DeepMind makes approximately half of all papers for UK. Or that Google does not collaborate with other companies. Or that, except the USA, there is only China that can brag about several companies that publish regularly. Or that a Japanese professor published 12 papers. And much more.
The code and data is on the github, but the cool part is that you can make your own interactive plots in colab notebook (with no installation required) including a collaboration graph between universities and companies.
Finally here is my post on the analysis of ICML 2020. There are several things I learned from that. For example that USA participates in 3/4 of the papers π± Or that DeepMind makes approximately half of all papers for UK. Or that Google does not collaborate with other companies. Or that, except the USA, there is only China that can brag about several companies that publish regularly. Or that a Japanese professor published 12 papers. And much more.
The code and data is on the github, but the cool part is that you can make your own interactive plots in colab notebook (with no installation required) including a collaboration graph between universities and companies.
Medium
ICML 2020. Comprehensive analysis of authors, organizations, and countries.
Who published the most?
ββPIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
Authors suggest an approach to the single image 3D shape reconstruction of a human body. The approach leverages representation from (1), that authors argue to be capable of processing high-resolution input images up to 1024 by 1024px.
The main idea of PIFu (1) is to represent a 3D shape as a function that defines the surface and the texture (both being parametrized by the MLP). This allows not to store the whole 3D volume as in voxel-based methods, but one can easily convert this representation to a mesh via a marching cube algorithm. Despite being more memory efficient, (1) still was not able to operate on the resolutions higher than 512 by 512px
Authors suggest an idea that pushes (1) even further, allowing to process images up to 1024 by 1024. They design a two-level pipeline with two PIFu modules, one for coarse shape estimation that operates on 512 by 512 px image and another one for fine-grained prediction, which takes 1024 by 1024px as an input as well as the features from the coarse level.
The model needs ground truth 3D poses thus authors use RenderPeople (2) dataset of 500 3D human models.
Paper: https://arxiv.org/pdf/2004.00452.pdf
Code: https://github.com/facebookresearch/pifuhd
Project: https://shunsukesaito.github.io/PIFuHD/
Colab: https://colab.research.google.com/drive/11z58bl3meSzo6kFqkahMa35G5jmh2Wgt
(1) Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., & Li, H. (2019). Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2304-2314).
(2) https://renderpeople.com/3d-people/
#3d #reconstruction #humandigitalization #singleimage
Authors suggest an approach to the single image 3D shape reconstruction of a human body. The approach leverages representation from (1), that authors argue to be capable of processing high-resolution input images up to 1024 by 1024px.
The main idea of PIFu (1) is to represent a 3D shape as a function that defines the surface and the texture (both being parametrized by the MLP). This allows not to store the whole 3D volume as in voxel-based methods, but one can easily convert this representation to a mesh via a marching cube algorithm. Despite being more memory efficient, (1) still was not able to operate on the resolutions higher than 512 by 512px
Authors suggest an idea that pushes (1) even further, allowing to process images up to 1024 by 1024. They design a two-level pipeline with two PIFu modules, one for coarse shape estimation that operates on 512 by 512 px image and another one for fine-grained prediction, which takes 1024 by 1024px as an input as well as the features from the coarse level.
The model needs ground truth 3D poses thus authors use RenderPeople (2) dataset of 500 3D human models.
Paper: https://arxiv.org/pdf/2004.00452.pdf
Code: https://github.com/facebookresearch/pifuhd
Project: https://shunsukesaito.github.io/PIFuHD/
Colab: https://colab.research.google.com/drive/11z58bl3meSzo6kFqkahMa35G5jmh2Wgt
(1) Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., & Li, H. (2019). Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2304-2314).
(2) https://renderpeople.com/3d-people/
#3d #reconstruction #humandigitalization #singleimage
π2
ββImage GPT
by openai
The authors have shown that by trading off 2-D knowledge for scale and by choosing predictive features from the middle of the network, a sequence transformer can be competitive with top convolutional nets for unsupervised image classification.
Notably, they achieved their results by directly applying the GPT-2 language model to image generation. Their results suggest that due to its simplicity and generality, a sequence transformer given sufficient compute might ultimately be an effective way to learn excellent features in many domains.
There are two methods they use to assess model performance:
[0] linear probe, uses the trained model to extract features from the images in the downstream dataset and then fits a logistic regression to the labels
[1] fine-tunes the entire model on the downstream dataset :youknow:
blog: https://openai.com/blog/image-gpt/
papers:
icml 2020 (v1)
(v2)
github (code is provided as-is, no updates expected): https://github.com/openai/image-gpt
#openai #gpt2 #language #image #icml2020
by openai
The authors have shown that by trading off 2-D knowledge for scale and by choosing predictive features from the middle of the network, a sequence transformer can be competitive with top convolutional nets for unsupervised image classification.
Notably, they achieved their results by directly applying the GPT-2 language model to image generation. Their results suggest that due to its simplicity and generality, a sequence transformer given sufficient compute might ultimately be an effective way to learn excellent features in many domains.
There are two methods they use to assess model performance:
[0] linear probe, uses the trained model to extract features from the images in the downstream dataset and then fits a logistic regression to the labels
[1] fine-tunes the entire model on the downstream dataset :youknow:
blog: https://openai.com/blog/image-gpt/
papers:
icml 2020 (v1)
(v2)
github (code is provided as-is, no updates expected): https://github.com/openai/image-gpt
#openai #gpt2 #language #image #icml2020
We got reached by a fellow DS channel editor, who's work definately deserves your attention: @machinelearning24x7. This is channel covering area similar to ours, lead by 'Machine Learning India' community.
Hope you find theirs work interesting!
Hope you find theirs work interesting!
Forwarded from Age Of Geeks
Analytics India Magazine: AI-Based System Can Now Turn Brainwaves Into Text.
https://analyticsindiamag.com/ai-based-system-can-now-turn-brainwaves-into-text/
https://analyticsindiamag.com/ai-based-system-can-now-turn-brainwaves-into-text/
Analytics India Magazine
AI-Based System Can Now Turn Brainwaves Into Text
Scientists have developed an AI system that converts brain activity into text result in transforming communication for those canβt speak.
Live U-Net implementation online session today
Famous Abhishek Thakur (First 4x GM on Kaggle) is going to show you how to implement the original U-Net with #PyTorch.
Session starts in 4 hours from now (at 6PM CET / 9.30PM IST), make sure you turned the notifications on if you are interested.
YouTube Link: https://www.youtube.com/watch?v=u1loyDCoGbE
#Livecoding #Unet
Famous Abhishek Thakur (First 4x GM on Kaggle) is going to show you how to implement the original U-Net with #PyTorch.
Session starts in 4 hours from now (at 6PM CET / 9.30PM IST), make sure you turned the notifications on if you are interested.
YouTube Link: https://www.youtube.com/watch?v=u1loyDCoGbE
#Livecoding #Unet
YouTube
Implementing original U-Net from scratch using PyTorch
In this video, I show you how to implement original UNet paper using PyTorch. UNet paper can be found here: https://arxiv.org/abs/1505.04597
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
To buy myβ¦
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
To buy myβ¦
ββMemory Transformer
Burtsev & Sapunov
The authors proposed and studied two memory augmented architectures MemTransformer and MemBottleneck Transformer. Qualitative analysis of attention patterns produced by the transformer heads trained to solve machine translation task suggests that both models successfully discovered basic operations for memory control. Attention maps show evidence for the presence of memory read/write as well as some in-memory processing operations such as copying and summation.
A comparison of machine translation quality shows that adding general-purpose memory in MemTransformer improves performance over the baseline. Moreover, the speed of training and final quality positively correlates with the memory size. On the other hand, MemBottleneck Transformer, with all self-attention restricted to the memory only, has significantly lower scores after training.
Memory lesion tests demonstrate that the performance of the pre-trained MemTransformer model critically depends on the presence of memory. Still, the memory controller learned by the model degrades only gradually when memory size is changed during inference. This indicates that the controller has some robustness and ability for generalization.
More interesting figures u can check out in the attachment.
paper: https://arxiv.org/abs/2006.11527.pdf
#nlp #transformer #attention #machine #translation
Burtsev & Sapunov
The authors proposed and studied two memory augmented architectures MemTransformer and MemBottleneck Transformer. Qualitative analysis of attention patterns produced by the transformer heads trained to solve machine translation task suggests that both models successfully discovered basic operations for memory control. Attention maps show evidence for the presence of memory read/write as well as some in-memory processing operations such as copying and summation.
A comparison of machine translation quality shows that adding general-purpose memory in MemTransformer improves performance over the baseline. Moreover, the speed of training and final quality positively correlates with the memory size. On the other hand, MemBottleneck Transformer, with all self-attention restricted to the memory only, has significantly lower scores after training.
Memory lesion tests demonstrate that the performance of the pre-trained MemTransformer model critically depends on the presence of memory. Still, the memory controller learned by the model degrades only gradually when memory size is changed during inference. This indicates that the controller has some robustness and ability for generalization.
More interesting figures u can check out in the attachment.
paper: https://arxiv.org/abs/2006.11527.pdf
#nlp #transformer #attention #machine #translation
ββπ₯Logo generation autonomous system was revealed to be used in production for almost a year.
Leading Russia-based design studio Artlebedev revealed that they experimented with using neural networks and set of algorithmic systems to design logotypes for real customers. They named system Nikolay Ironov (in russian N.Ironov sounds close to Neuronov). The system realeased 17 commercial projects, which were welcomed by the audience.
Mishief managed! π
Link: https://www.artlebedev.com/ironov/
Project portfolio: https://www.artlebedev.ru/nikolay-ironov/
#GAN #design #logotypes #logo #generation #generative #artlebedev
Leading Russia-based design studio Artlebedev revealed that they experimented with using neural networks and set of algorithmic systems to design logotypes for real customers. They named system Nikolay Ironov (in russian N.Ironov sounds close to Neuronov). The system realeased 17 commercial projects, which were welcomed by the audience.
Mishief managed! π
Link: https://www.artlebedev.com/ironov/
Project portfolio: https://www.artlebedev.ru/nikolay-ironov/
#GAN #design #logotypes #logo #generation #generative #artlebedev
β€1
Data Science by ODS.ai π¦
π€ The NetHack Learning Environment #Facebook launched new Reinforcement Learning environment for training agents based on #NetHack game. Nethack has nothing to do with what is considered common cybersecurity now, but it is an early terminal-based Minecraftβ¦
Update from #Facebook on #Nethack learning Environment.
Link: https://ai.facebook.com/blog/nethack-learning-environment-to-advance-deep-reinforcement-learning
Publication: https://arxiv.org/abs/2006.13760
#RL
Link: https://ai.facebook.com/blog/nethack-learning-environment-to-advance-deep-reinforcement-learning
Publication: https://arxiv.org/abs/2006.13760
#RL
ββPre-training via Paraphrasing
Mike Lewis, Marjan Ghazvininejad & etc. by Facebook AI
The authors introduce MARGE, a pre-trained seq2seq model learned with an unsupervised multi-lingual multi-document paraphrasing objective.
MARGE provides an alternative to the dominant masked language modeling paradigm, where they self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. Showed it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks. For example, with no additional task-specific training they achieve BLEU scores of up to 35.8 for document translation.
Further show that fine-tuning gives a strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date.
Future work should scale MARGE to more domains and languages, and study how to more closely align pre-training objectives with different end tasks.
paper: https://arxiv.org/abs/2006.15020.pdf
#nlp #paraphrasing #unsupervise
Mike Lewis, Marjan Ghazvininejad & etc. by Facebook AI
The authors introduce MARGE, a pre-trained seq2seq model learned with an unsupervised multi-lingual multi-document paraphrasing objective.
MARGE provides an alternative to the dominant masked language modeling paradigm, where they self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. Showed it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks. For example, with no additional task-specific training they achieve BLEU scores of up to 35.8 for document translation.
Further show that fine-tuning gives a strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date.
Future work should scale MARGE to more domains and languages, and study how to more closely align pre-training objectives with different end tasks.
paper: https://arxiv.org/abs/2006.15020.pdf
#nlp #paraphrasing #unsupervise
ββ(Re)Discovering Protein Structure and Function Through Language Modeling
Trained solely on unsupervised language modeling, the Transformer's attention mechanism recovers high-level structural (folding) and functional properties of proteins!
Why this is important: traditional protein modelling requires lots of computational power. This might be a key to more efficient structure modelling. Protein structure => function. Function => faster drug research and understanding of diseases mechanisms.
Blog: https://blog.einstein.ai/provis/
Paper: https://arxiv.org/abs/2006.15222
Code: https://github.com/salesforce/provis
#DL #NLU #proteinmodelling #bio #biolearning #insilico
Trained solely on unsupervised language modeling, the Transformer's attention mechanism recovers high-level structural (folding) and functional properties of proteins!
Why this is important: traditional protein modelling requires lots of computational power. This might be a key to more efficient structure modelling. Protein structure => function. Function => faster drug research and understanding of diseases mechanisms.
Blog: https://blog.einstein.ai/provis/
Paper: https://arxiv.org/abs/2006.15222
Code: https://github.com/salesforce/provis
#DL #NLU #proteinmodelling #bio #biolearning #insilico
ββπMozillaβs Common Voice project
Mozilla launched a project to make digitalization of human voice more open and accessable. Anyone is eligible to download the dataset to use it for building #voicerecognition or #voicegeneration ML systems.
Most importantly, anyone can take a part in the project and make sure that her/his voice with all the accents and personal manner of speech features such as altitude, speed, clarity and timbre are accounted for in the models are to built.
Why is that important: if you have speech defects and you are not happy how machine speech translation works for you, or how well #Alexa or #Siri gets you, you should spend some time recording your voice for the Common Voice, to increase the probability of upcoming voice recognition model working great for you.
Project: https://voice.mozilla.org
Venturebeat article: https://venturebeat.com/2020/07/01/mozilla-common-voice-updates-will-help-train-the-hey-firefox-wakeword-for-voice-based-web-browsing/
#open #SpeechToText #TextToSpeech #DL #mozilla #audiolearning #voicerecognition
Mozilla launched a project to make digitalization of human voice more open and accessable. Anyone is eligible to download the dataset to use it for building #voicerecognition or #voicegeneration ML systems.
Most importantly, anyone can take a part in the project and make sure that her/his voice with all the accents and personal manner of speech features such as altitude, speed, clarity and timbre are accounted for in the models are to built.
Why is that important: if you have speech defects and you are not happy how machine speech translation works for you, or how well #Alexa or #Siri gets you, you should spend some time recording your voice for the Common Voice, to increase the probability of upcoming voice recognition model working great for you.
Project: https://voice.mozilla.org
Venturebeat article: https://venturebeat.com/2020/07/01/mozilla-common-voice-updates-will-help-train-the-hey-firefox-wakeword-for-voice-based-web-browsing/
#open #SpeechToText #TextToSpeech #DL #mozilla #audiolearning #voicerecognition
π1
Data Science by ODS.ai π¦
ββπMozillaβs Common Voice project Mozilla launched a project to make digitalization of human voice more open and accessable. Anyone is eligible to download the dataset to use it for building #voicerecognition or #voicegeneration ML systems. Most importantlyβ¦
Please share this message to your friends, especially to those who speak funny, strange. If you have a friend, whom you canβt understand sometimes when she/he is anxious / excited, you will help them a lot.
And if you ever heard from someone that they canβt get you, you are speaking to fast, slow, or losing sounds, you should definately record some pieces for this project.
And if you ever heard from someone that they canβt get you, you are speaking to fast, slow, or losing sounds, you should definately record some pieces for this project.
ββReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
The authors propose a set of design principles that improves model performance significantly based on the analysis of representation bottlenecks.
Authors think that commonly used architectures have a representation bottleneck and try to fix it by expanding channel size, using more expand layers, and better activation functions. This also improves the performance of models on ImageNet and good results on transfer learning on classification and object detection.
Authors hope that their design ideas could be used by NAS to create even better models.
Paper: https://arxiv.org/abs/2007.00992
Code: https://github.com/clovaai/rexnet
#deeplearning #pretraining #transferlearning #computervision #pytorch
The authors propose a set of design principles that improves model performance significantly based on the analysis of representation bottlenecks.
Authors think that commonly used architectures have a representation bottleneck and try to fix it by expanding channel size, using more expand layers, and better activation functions. This also improves the performance of models on ImageNet and good results on transfer learning on classification and object detection.
Authors hope that their design ideas could be used by NAS to create even better models.
Paper: https://arxiv.org/abs/2007.00992
Code: https://github.com/clovaai/rexnet
#deeplearning #pretraining #transferlearning #computervision #pytorch
π1