Gradient Dude
2.62K subscribers
180 photos
50 videos
2 files
169 links
TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).

Most ML feeds go for fluff, we go for the real meat.

YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude
Download Telegram
April 24, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
May 4, 2021
May 7, 2021
May 12, 2021
May 12, 2021
June 11, 2021
​​Facebook AI has built a system called TextStyleBrush that can replace text both in scenes and handwriting β€” in one shot β€” using only a single example word.
The model was made self-supervised because it is utterly hard to collect labeled pairs of text in different conditions, and to annotate the segmentation masks for text (although I think it can be done using synthetic generation).

The model is trained to understand unlimited text styles for not just different typography and calligraphy, but also for different transformations, like rotations, curved text, and deformations that happen between paper and pen when handwriting; background clutter; and image noise. The main idea is to disentangle the content of a text image from all aspects of the appearance of the entire word box. The representation of the overall appearance can then be applied as a one-shot-transfer without retraining on the novel source style samples.

The model consists of a style encoder, content encoder, and stylized text generator (plus a bunch of losses).
The generator architecture is based on the StyleGAN2 model. However, the design of StyleGAN2 has an important limitation: StyleGAN2 is an unconditional model, meaning it generates images by sampling a random latent vector. For generating photo-realistic text images, however, one needs to control the output based on two separate sources: the desired text content and style. This is solved by extracting layer-specific style information and injecting it at each layer of the generator (it is some sort of conditional instance normalization).

The losses are the following: 1) reconstruction and cycle loss; 2) Discriminator real/fake; 3) Recognizer - the network that recognizes text on the stylized image and makes sure that no content is lost; 4) Typeface classifier - a pretrained network that measures how well the generator captures the style of input.

Results are quite striking!
Now imagine how you drive through the busy streets of Hong Kong and see street signs projected on the windshield of your car and translated online. Or one day used we will send personalized messages by generating some creative images with the text embedded in them (instead of stickers).

πŸŒ€ Blogpost
πŸ“ Paper
June 15, 2021
June 15, 2021
Media is too big
VIEW IN TELEGRAM
June 21, 2021
June 21, 2021
​​I'm happy to announce that our team (me, Stepan Konev, Kirill Brodt) was awardedπŸ… 3rd place within the Waymo Motion Prediction Challenge 2021.

To plan a safe and efficient route, an autonomous vehicle should anticipate future motions of other agents around it. Motion prediction is an extremely challenging task that recently gained significant attention from the research community. We present a simple and yet very strong baseline for multimodal motion prediction based purely on Convolutional Neural Networks.

The task is the following: Given agents' tracks for the past 1 second on a corresponding map, we had to predict the positions of the agents on the road for 8 seconds into the future.

Our model takes a raster image centered around a target agent as input and directly predicts a set of possible trajectories along with their confidences. The raster image is obtained by rasterisation of a scene and the history of all the agents. While being easy-to-implement, the proposed approach achieves competitive performance compared to the state-of-the-art methods on the Waymo Open Dataset Motion Prediction Challenge (2021): Our model ranks 1st using minimum average displacement error and 3rd using mAP score.

We wrote a small paper and release our code!

πŸ“œTechnical report
βš’Code
June 27, 2021
June 27, 2021
June 28, 2021
Aloha guys!
I'm verty excited to announce that I have joined Facebook Reality Labs (FRL) as a Research Scientist! Before that, I interned twice at Facebook AI Research, and now I will work in the FRL division, which focuses on virtual and augmented reality. Moving from academy to industry, I hope that I will still have enough freedom in choosing research directions πŸ˜‰.
July 2, 2021
July 7, 2021
Media is too big
VIEW IN TELEGRAM
July 13, 2021
OpenAI disbands its robotics research team. This is exactly the same team that, for example, taught a robotic arm to solve a Rubik's cube using Reinforcement Learning. This decision was made because the company considers more promising research in areas where physical equipment is not required (except for servers, of course), and there is already a lot of data available. And also for economic reasons, since Software as a Services is a business with a much higher margin. Yes, the joke is that the non-profit organization OpenAI is considered more and more about profit. This is understandable because it takes a lot of money to create general artificial intelligence (AGI) that can learn all the tasks that a person can do and even more.

It's no secret that research in the field of robotics is also a very costly activity that requires a lot of investment. Therefore, there are not so many companies involved in this. Among the large and successful, only Boston Dynamics comes to mind, which has already changed several owners. Did you know that in 2013 Google acquired Boston Dynamics, then Google also scaled down its robotics research program, and in 2017 sold Boston Dynamic to the Japanese firm SoftBank. The adventures of Boston Dynamics did not end there, and in December 2020 SoftBank resold 80% of the shares (a controlling stake) to the automaker Hyundai. This looks somehow fishy as if every company understands after a few years that it is still difficult to make a profit from Boston Dynamics and sells it to another patsy.

In any case, it is very interesting to observe which focus areas are chosen by the titans of AI research. But I'm a bit sad that robots are still lagging behind.
July 17, 2021
July 25, 2021
German startup aims to become "Europe's OpenAI"

The German startup Aleph Alpha, which is based in Heidelberg, Germany (the city where I did my PhD), recently raised $ 27M in a Series A round. The task, they set themselves ambitious (even too much) - they want to create another breakthrough in AI, something similar to OpenAI GPT-3.

The company was founded in 2019, and, strangely, I discovered it only today. I looked at their ML team. And I have not found a single person with any major scientific achievements (say on the level of Professor). I got disappointed. Their ML team includes 3 recent PhD students and Connor Leahy, who is known for co-founding EleutherAI. EleutherAI is a non-profit organization that was created to reproduce and open-source GPT-3 model. Perhaps they bet on Connor, but, frankly speaking, Connor is not a researcher, he has no scientific publications, and EleutherAI is simply reproducing results of OpenAI. When OpenAI was founded, it was immediately clear that they got a stellar team, which would certainly produce something cool.

My impressions are controversial. Aleph Alpha has partnerships with German government agencies. They promote themselves in the style of "we are Europe's last chance claim a place in the field of AI", "we will be based purely in Europe and will be pushing European values and ethical standards." They also promise to be more open than OpenAI (lol) and commit to open-source. Although, perhaps, they will just create some kind of large platform with AI solutions and sit on the government funding. It will be a kind of AI consulting, they even have a job posted on their website for this purpose - AI Delivery & Consulting. The whole affair smacks of a government cover-up like in the case of Palantir (at least partially).

I'm not a startup expert, but it seems like Europe is very hungry for innovation. They want to keep up with the United States and China. Therefore, they give out the bucks at the first opportunity, especially if the company promises to work closely with the government. What do you think about this, gentlemen?
August 7, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
August 10, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
August 10, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
August 11, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
August 11, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
October 11, 2021
​​On Neural Rendering

What is Neural Rendering? In a nutshell, neural rendering is when we take classic algorithms for image rendering from computer graphics and replace a part of the pipeline with neural networks (stupid, but effective). Neural rendering learns to render and represent a scene from one or more input photos by simulating the physical process of a camera that captures the scene. A key property of 3D neural rendering is the disentanglement of the camera capturing process (i.e., the projection and image formation) and the representation of a 3D scene during training. That is, we learn an explicit (voxels, point clouds, parametric surfaces) or an implicit (signed distance function) representation of a 3D scene. For training, we use observations of the scene from several camera viewpoints. The network is trained on these observations by rendering the estimated 3D scene from the training viewpoints, and minimizing the difference between the rendered and observed images. This learned scene representation can be rendered from any virtual camera in order to synthesize novel views. It is important for learning that the entire rendering pipeline is differentiable.

You may have noticed, that the topic of neural rendering, including all sorts of nerfs-schmerfs, is now a big hype in computer vision. You might say that neural rendering is very slow, and you'd be right. A typical training session on a small scene with ~ 50 input photos takes about 5.5 hours for the fastest method on a single GPU, but neural rendering methods have made significant progress in the last year improving both fidelity and efficiency. To catch up on all the recent developments in this direction, I highly recommend reading this SOTA report "Advances in Neural Rendering".

The gif is from Volume Rendering of Neural Implicit Surfaces paper.
November 12, 2021
May 1, 2022
May 1, 2022