Gradient Dude
2.54K subscribers
180 photos
50 videos
2 files
169 links
TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).

Most ML feeds go for fluff, we go for the real meat.

YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude
Download Telegram
Chinese researchers are very fond of doing extensive surveys of a particular sub-field of machine learning, listing the main works and the major breakthrough ideas. There are so many articles published every day, and it is impossible to read everything. Therefore, such reviews are valuable (if they are well written, of course, which is quite rare).

Recently there was a very good paper reviewing various variants of Transformers with a focus on language modeling (NLP). This is a must-read for anyone getting into the world of NLP and interested in Transformers. The paper discusses the basic principles of self-attention and such details of modern variants of Transformers as architecture modifications, pre-training, and various applications.

📝Paper: A Survey of Transformers.
​​Facebook AI has built a system called TextStyleBrush that can replace text both in scenes and handwriting — in one shot — using only a single example word.
The model was made self-supervised because it is utterly hard to collect labeled pairs of text in different conditions, and to annotate the segmentation masks for text (although I think it can be done using synthetic generation).

The model is trained to understand unlimited text styles for not just different typography and calligraphy, but also for different transformations, like rotations, curved text, and deformations that happen between paper and pen when handwriting; background clutter; and image noise. The main idea is to disentangle the content of a text image from all aspects of the appearance of the entire word box. The representation of the overall appearance can then be applied as a one-shot-transfer without retraining on the novel source style samples.

The model consists of a style encoder, content encoder, and stylized text generator (plus a bunch of losses).
The generator architecture is based on the StyleGAN2 model. However, the design of StyleGAN2 has an important limitation: StyleGAN2 is an unconditional model, meaning it generates images by sampling a random latent vector. For generating photo-realistic text images, however, one needs to control the output based on two separate sources: the desired text content and style. This is solved by extracting layer-specific style information and injecting it at each layer of the generator (it is some sort of conditional instance normalization).

The losses are the following: 1) reconstruction and cycle loss; 2) Discriminator real/fake; 3) Recognizer - the network that recognizes text on the stylized image and makes sure that no content is lost; 4) Typeface classifier - a pretrained network that measures how well the generator captures the style of input.

Results are quite striking!
Now imagine how you drive through the busy streets of Hong Kong and see street signs projected on the windshield of your car and translated online. Or one day used we will send personalized messages by generating some creative images with the text embedded in them (instead of stickers).

🌀 Blogpost
📝 Paper
This is the architecture. Content encoder encodes text, Style encoder extracts style and Generator generates stylized text conditioned on a style vector.
Media is too big
VIEW IN TELEGRAM
Just a small announcement 🔥
Our new (with Facebook AI Research) #CVPR21 paper is out!

Discovering Relationships between Object Categories via Universal Canonical Maps

TL;DR: Densepose method for Animals on Steroids which as a byproduct can automatically discover correspondences between 3D shapes of animals using novel cycle losses.

I will present the paper Today (21.06) at 11am EDT / 5PM CET. Feel free to join live Q&A session and ask me a question😉.

🌐 Project page
▶️ Video explanation
📝 Paper
🛠 Source code
(1) High-level scheme of our method and (2) some more results.
​​I'm happy to announce that our team (me, Stepan Konev, Kirill Brodt) was awarded🏅 3rd place within the Waymo Motion Prediction Challenge 2021.

To plan a safe and efficient route, an autonomous vehicle should anticipate future motions of other agents around it. Motion prediction is an extremely challenging task that recently gained significant attention from the research community. We present a simple and yet very strong baseline for multimodal motion prediction based purely on Convolutional Neural Networks.

The task is the following: Given agents' tracks for the past 1 second on a corresponding map, we had to predict the positions of the agents on the road for 8 seconds into the future.

Our model takes a raster image centered around a target agent as input and directly predicts a set of possible trajectories along with their confidences. The raster image is obtained by rasterisation of a scene and the history of all the agents. While being easy-to-implement, the proposed approach achieves competitive performance compared to the state-of-the-art methods on the Waymo Open Dataset Motion Prediction Challenge (2021): Our model ranks 1st using minimum average displacement error and 3rd using mAP score.

We wrote a small paper and release our code!

📜Technical report
Code
Pipeline of our motion prediciton approach (MotionCNN) and the results.
Forwarded from Denis Sexy IT 🇬🇧
Recently I have found an Instagram of artist from Tomsk, Evgeny Schwenk – he redraws characters from Soviet cartoons as if they were real people. I have applied neural.love neural network which made his drawings even more realistic. Just a bit of Photoshop (mainly for hats) and here we go.

I guess Karlsson-on-the-Roof is my best result.
Aloha guys!
I'm verty excited to announce that I have joined Facebook Reality Labs (FRL) as a Research Scientist! Before that, I interned twice at Facebook AI Research, and now I will work in the FRL division, which focuses on virtual and augmented reality. Moving from academy to industry, I hope that I will still have enough freedom in choosing research directions 😉.
Experimented with generating images from text prompts with VQGAN and CLIP. Some cool results:

1."Minecraft Starcraft"
2. "Polygonal fast food"
3. "Holy war against capitalism"
4. "Modern cubist painting"

🤙🏼 Colab notebook