Gradient Dude
2.54K subscribers
180 photos
50 videos
2 files
169 links
TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).

Most ML feeds go for fluff, we go for the real meat.

YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude
Download Telegram
There is a bunch of Contrastive Representation Learning methods exist already, e.g. MoCo, SimCLR, BYOL, etc.
Here is another one - CLIM: Center-wise local image muxture for contrastive representation learning (ICLR 2021).

The main idea is to consider the semantic similarity between different images and incorporate it in the learning procedure, in contrast to the many contrastive learning methods which usually use augmentations of the query image as positives. The main contribution is 2-fold:
a) partition data in 10k clusters, use nearest neighbors from the same cluster which are closer to the centroid than the anchor as positive samples;
b) use more complex augmentations, i.e. CutMix and multi-resolution during training. The proposed method achieves state-of-the-art results for unsupervised learning on Imagenet and transfer learning tasks Pascal VOC, COCO, and LVIS.
It seems like they can achieve much better results with just a single FC layer on top, harder augmentations and smarter positive sampling. The method is simple and does not require any other tricks like multiple FC layers in SimCLRv2 or memory bank in MoCo.
​​Let's talk a bit about object detectors.
But we want to have a single bounding box per object (not hundreds of them), right?
You probably know that in most detection pipelines there is a step called Non-Maximum suppression (NMS) which is responsible for this.
Its purpose is to get a lot of tentative detections proposed by the networks and drop all spurious and highly overlapping ones, retaining only a single the most confident bounding box per object.

The de facto standard approach for NMS is Greedy NMS. At each step, we select the most confident box and drop all others which have IoU greater than some fixed threshold (often 0.5). We repeat this process until no proposals are left. But this approach is very taxing and requires manual tweaking (which I personally hate). For example, if you set a threshold too high you may lose Recall, since some very close objects would be considered as duplicate detections and would be dropped. On the other hand, a lower threshold may leave you with too many spurious detections.

Therefore researchers from Max Planck Institute for Informatics in Saarbrücken proposed a method for end-to-end learnable NMS. Now we train a neural network to do NMS instead of a greedy algorithm. Details are in the paper: 📰 "Learning non-maximum suppression"

I briefly summarize the proposed algorithm, but for more detail refer to the paper.
If the object is already assigned to one detection, all other detections with the high overlap (neighboring detections) should be notified about it and should decrease their scores. To do this the paper proposed to compute pairwise features between overlapping proposals (these features are handcrafted and include IoU, normalized distance in X and Y directions, a difference of width and height, aspect ratio difference, detection scores, etc.). These pairwise features are concatenated with original detection features produced by the CNN backbone and are passed through a series of residual blocks with FC layers (see Figure). Next, to assign only a single detection per object, authors run a Hungarian matching algorithm between GT and detections and enforce all non-matched detections to decrease their scores. After that, the proposed network, called GNet, can produce only a few boxes with very string scores, and all other boxes are assigned very teeny ones (see examples below).
​​This is not a paper, but it is awesome!
App Polycam uses a builtin LIDAR sensor in the latest iPad Pro to scan the surroundings and build a textured 3D mesh. The mesh is “generally accurate down to about one inch”. The process is also near-real-time: processing is done locally on the tablet, with single-room captures taking “only seconds to process”, making it possible to see the mesh building up as you walk around. Looks like it's one of the best 3D scanning app out there (for arbitrary objects, see 3D sofa example here).

However, it relies on LIDAR which we can find only in the latest iPad Pro and upcoming iPhone 12 Pro. It would be much more exciting if they used pure RGB-based techniques, e.g. SLAM which does no require a LIDAR or a depth camera. I will come back to this and will briefly discuss some techniques for building 3D shapes from images in future posts.
​​There is another cool app in3D created by my mates that can build your 3D avatar from a 360 video capturing you from different angles. They achieve compelling results with their avatars capturing fine shape details and automatically rigged (see avatar example). However, the app is currently available only for iPhones as well, but at least does not require a LIDAR sensor 😅.
​​Scientists from the University of Washington broke the longstanding record in solving the notorious NP-hard problem — Travelling Salesman Problem (TSP). This optimization problem, which seeks the shortest (or least expensive) round trip through a collection of cities, has applications ranging from DNA sequencing to ride-sharing logistics.

There were no advancements in this field since 1976 when Nicos Christofides came up with an algorithm that efficiently finds approximate solutions — round trips that are at most 50% longer than the best round trip.

Funny enough that the novel algorithm improves the previous approximate algorithm by a whopping margin of 2.0 x 10^-36 !!! (Yes, it is 0.2 billionth of a trillionth of a trillionth of a percent.) But please don't be too disappointed (although I was). This result breaks a theoretical and psychological barrier that persisted for more than forty years. And hopefully, it will spike the interest of the broader community about this problem and will lead to further advancements in the next years. Moreover, it is likely (although not proven yet) that the proposed algorithm is much more efficient than the predecessor in most of the cases and it improves at least by that tiny margin in the worst case.

As a Deep Learning evangelist, my first impression after reading the caption was that it was another victory of Neural Networks, however, I was wrong, and yet not all the cool stuff is done with the help of NNs. The method is based on the machinery called the geometry of polynomials, a very little known discipline in the theoretical computer science world.
We are living in incredible times! Maybe somebody will finally prove P = NP?
JuergenSchmidhuber.gif
18.9 MB
New paper on neural painting "Stylized Neural Painting".
The main idea is to train a neural network to render individual brushstrokes, parametrized by color, shape, and transparency.
An input image is approximated by a fixed number of brushstrokes which are blended based on their transparency values.
To find optimal parameters for each brushstroke authors propose to run an iterative optimization procedure in the same way as it was done in the pioneering work of Gatys et al.

Another novelty of this paper is Optimal Transport loss, which has more meaningful gradients compared to the photometric loss in case of sparse brushstrokes.

Authors even created a google Colab notebook, where you can play around with the method.

📃 https://arxiv.org/pdf/2011.08114.pdf
🌐 https://jiupinjia.github.io/neuralpainter/
💾 https://github.com/jiupinjia/stylized-neural-painting
vangogh_night.gif
16.3 MB
And one more example 🎨
​​Keynote of Turing Award Winners at AAAI 2020 (Geoff Hinton, Yann LeCunn, Yoshua Bengio). I especially liked the Yann LeCunn's talk on Self-supervised learning

🎥 Video
📃 LeCunn's slides
​​This is very exciting when the technology which we are developing helps us to appreciate some historical moments.
I have stumbled upon an amazing video of the interview with Yuri Gagarin, the first cosmonaut, from July 1961. The interview was done by the BBC during Gargarin’s 4-day visit to Great Britain as part of a Soviet Exhibition at Earl's Court in London.

And now we can appreciate the interview in 4K(!!!) enhanced by neural networks, as a courtesy of @denissexy. Gagarin was never so alive for Millenials!

Enjoy!
Forwarded from Karim Iskakov - канал (Karim Iskakov)
This media is not supported in your browser
VIEW IN TELEGRAM
Turning selfie video into Deformable NeRF for high-fidelity renderings from novel viewpoints.

The work smashes previous methods (Neural Volumes, NeRF) in terms of quality by a wide margin. Just look at these curls at 0:46 (timecode is clickable)!

🌐 nerfies.github.io
📝 arxiv.org/abs/2011.12948
📉 @loss_function_porn
​​That feeling when a dumb robot dances better than you. Boston Dynamics is still surprising with an amazing manual control of those robots.
A Swiss village was completely reproduced in virtual reality (VR) by filming from drone and handheld cameras. Now you can make it to the heart of the old settlement and feel the history while sitting in your chair. Just amazing!
https://twitter.com/i/status/1343112828069113856
My first youtube video is out!
I this video I explain how we earned $6000 by getting in Top3 on a Kaggle autonomous driving competition organized by LYFT.

It is crucial for an autonomous vehicle to anticipate what will happen next on the road to plan its actions accordingly.
The goal of this competition was to predict the future motions of all the cars (or any other agents) around the autonomous vehicle. In the video, I present our CNN + Set Transformer solution which is placed in TOP 3 on the private leaderboard.

Video: https://youtu.be/3Yz8_x38qbc
Solution source code: https://github.com/asanakoy/kaggle-lyft-motion-prediction-av
Solution write-up: https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/205376

Please let me know in the comments what you think about such a format.
Set Transformer original paper: https://arxiv.org/abs/1810.00825. I will write about it specifically later.