Gradient Dude
2.54K subscribers
180 photos
50 videos
2 files
169 links
TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).

Most ML feeds go for fluff, we go for the real meat.

YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude
Download Telegram
JuergenSchmidhuber.gif
18.9 MB
New paper on neural painting "Stylized Neural Painting".
The main idea is to train a neural network to render individual brushstrokes, parametrized by color, shape, and transparency.
An input image is approximated by a fixed number of brushstrokes which are blended based on their transparency values.
To find optimal parameters for each brushstroke authors propose to run an iterative optimization procedure in the same way as it was done in the pioneering work of Gatys et al.

Another novelty of this paper is Optimal Transport loss, which has more meaningful gradients compared to the photometric loss in case of sparse brushstrokes.

Authors even created a google Colab notebook, where you can play around with the method.

📃 https://arxiv.org/pdf/2011.08114.pdf
🌐 https://jiupinjia.github.io/neuralpainter/
💾 https://github.com/jiupinjia/stylized-neural-painting
vangogh_night.gif
16.3 MB
And one more example 🎨
​​Keynote of Turing Award Winners at AAAI 2020 (Geoff Hinton, Yann LeCunn, Yoshua Bengio). I especially liked the Yann LeCunn's talk on Self-supervised learning

🎥 Video
📃 LeCunn's slides
​​This is very exciting when the technology which we are developing helps us to appreciate some historical moments.
I have stumbled upon an amazing video of the interview with Yuri Gagarin, the first cosmonaut, from July 1961. The interview was done by the BBC during Gargarin’s 4-day visit to Great Britain as part of a Soviet Exhibition at Earl's Court in London.

And now we can appreciate the interview in 4K(!!!) enhanced by neural networks, as a courtesy of @denissexy. Gagarin was never so alive for Millenials!

Enjoy!
Forwarded from Karim Iskakov - канал (Karim Iskakov)
This media is not supported in your browser
VIEW IN TELEGRAM
Turning selfie video into Deformable NeRF for high-fidelity renderings from novel viewpoints.

The work smashes previous methods (Neural Volumes, NeRF) in terms of quality by a wide margin. Just look at these curls at 0:46 (timecode is clickable)!

🌐 nerfies.github.io
📝 arxiv.org/abs/2011.12948
📉 @loss_function_porn
​​That feeling when a dumb robot dances better than you. Boston Dynamics is still surprising with an amazing manual control of those robots.
A Swiss village was completely reproduced in virtual reality (VR) by filming from drone and handheld cameras. Now you can make it to the heart of the old settlement and feel the history while sitting in your chair. Just amazing!
https://twitter.com/i/status/1343112828069113856
My first youtube video is out!
I this video I explain how we earned $6000 by getting in Top3 on a Kaggle autonomous driving competition organized by LYFT.

It is crucial for an autonomous vehicle to anticipate what will happen next on the road to plan its actions accordingly.
The goal of this competition was to predict the future motions of all the cars (or any other agents) around the autonomous vehicle. In the video, I present our CNN + Set Transformer solution which is placed in TOP 3 on the private leaderboard.

Video: https://youtu.be/3Yz8_x38qbc
Solution source code: https://github.com/asanakoy/kaggle-lyft-motion-prediction-av
Solution write-up: https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/205376

Please let me know in the comments what you think about such a format.
Set Transformer original paper: https://arxiv.org/abs/1810.00825. I will write about it specifically later.
I use the podcasts of Lex Fridman as an opportunity to talk to very intelligent and clever people while having breakfast. These conversations always give me the motivation to keep up with my research work as well.

I have just finished listening to Lex's conversation with Prof. Sergey Levine. Very insightful!
Sergey is a brilliant researcher in the field of Deep RL and Computer Vision and a very humble and genuine person. I was lucky to meet him in person and to talk to him a little bit at my first big scientific conference NeurIPS 2016.

A piece of advice for students from Sergey Levine:

"It is important to not be afraid to spend time imagining the kind of outcome that you might like to see. If someone who is a student considering a career in AI takes a little while, sits down and thinks like "What do I really want to see a machine do? What do I want to see a robot do? What do I want to see a natural language system do?". Imagine it almost like a commercial for a future product or something that you'd like to see in the world. And then actually sit down and think about the steps that are necessary to get there. And hopefully, that thing is not a better number on ImageNet classification, it's probably like an actual thing that we can't do today. That would be really AWESOME.

Whether it's a robot butler or an awesome healthcare decision-making support system. Whatever it is that you find inspiring. And I think that thinking about that and then backtracking from there and imagining the steps needed to get there will actually do much better research, it will lead to rethinking the assumptions, it will lead to working on the bottlenecks other people aren't working on."
Hi guys! New video on my YouTube channel!

Computer Vision for animals is a fast-growing and very promising sub-field.
I this video I will explain how to reconstruct a 3D model of an animal with a single photo.
The method is based on cycle consistency loss between image pixels and vertices on a 3D mesh.

Reference papers:
1) "Articulation-aware Canonical Surface Mapping", Kulkarni et al., CVPR 2020
2) "Canonical Surface Mapping via Geometric Cycle Consistency", Kulkarni et al., ICCV 2019

Method source code: GitHub repo.
High-level approach overview.
​​Hi guys! A productive Sunday is when you feel like you have learned something new.
To learn more details about our 3rd place solution for the Kaggle competition "Lyft Prediction for Autonomous Vehicles competition" you can check out my Medium blogpost.
Self-Attention models are gaining popularity in Computer Vision.
DETR applied transformers for end-to-end detection, VideoBERT learns a joint visual-linguistic representation for videos, ViT uses self-attention to achieve SOTA classification results on ImageNet, etc.

PapersWithCode created a taxonomy of modern self-attention models for vision and discusses Recent Progress. You can read it here.
I'm planning to delve deeper into this topic and it looks like it is a perfect place to start 🤓!
New full-frame video stabilization method. Looking forward to having it on my Google Pixel phone! There is hope as one of the authors is at Google.

The core idea is a learning-based fusion approach to aggregate warped contents from multiple neighboring frames (see pipeline figure below).
This method is several magnitudes slower than the built-in Adobe Premiere Pro 2020 warp stabilizer. However, this method does not aggressively crop the frame borders and hence better preserves the original content, in contrast to the warp stabilizer in Adobe Premiere Pro.

✏️ Paper
🧾 Project page