Gradient Dude
2.66K subscribers
180 photos
50 videos
2 files
169 links
TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).

Most ML feeds go for fluff, we go for the real meat.

YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude
Download Telegram
πŸ™ˆ Interesting findings
The most challenging for both human annotators and the DeepLabCut baseline is to predict the positions of shoulders and hips. Another point of failure for Neural Networks is self occlusions.

πŸ“ Paper
πŸ› Dataset
βš™οΈ Pretrained models in DeepLabCut Model Zoo
πŸ““ Colab
This media is not supported in your browser
VIEW IN TELEGRAM
I totally need glasses that move with my eyebrows. (c) Yann LeCun

The quality is wicked because of the pesky twitter compression.
CLIP + StyleGAN. Searching in StyleGAN latent space using description embedded with CLIP.

Queries: "A pony that looks like Beyonce", "... like Billie Eilish", ".. like Rihanna"

πŸ“ The basic idea
Generate an image with StyleGAN and pass the image to CLIP for the loss against a CLIP text query representation. You then backprop through both networks and optimize a latent space in StyleGAN.

🀬 Drawbacks 1) it only works on text it knows 2) needs some cherry picking, only about 1/5 are really good.

Source twitt.
This media is not supported in your browser
VIEW IN TELEGRAM
Cute RoboCat 🐈 learned how to track objects. Fun application of Computer Vision. Is anyone among my subscribers working in robotics?

Source: IG @bio.makers
Robotics & Biomechanics lab in our Uni (Heidelberg, Germany).
GANs are getting their way into production

Adobe has rolled out a super-resolution feature for Photoshop. Now one can upscale the image x2 times on each side.

πŸ’Ž For curious, I leave several links to SOTA super-resolution methods:
1. Structure-Preserving Super Resolution with Gradient Guidance (SPSR), CVPR2020
2. Learned Image Downscaling for Upscaling using Content Adaptive Resampler (CAR), ECCV2020
3. Single Image Super-Resolution via a Holistic Attention Network (HAN), ECCV2020
4. ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic, CVPR2021

β€”
Let me know in comments if there is a better super-res paper.
Facebook AI has built TimeSformer, a new architecture for video understanding. It’s the first based exclusively on the self-attention mechanism used in Transformers. It outperforms the state of the art while being more efficient than 3D ConvNets for video.

❓Why it matters
To train video-understanding models, the best 3D CNNs today can only use video segments that are a few seconds long. With TimeSformer, we are able to train on far longer video clips β€” up to several minutes long. This may dramatically advance research to teach machines to understand complex long-form actions in videos, which is an important step for many AI applications geared toward human behavior understanding (e.g., an AI assistant).

Furthermore, the low inference cost of TimeSformer is an important step toward supporting future real-time video processing applications, such as AR/VR, or intelligent assistants that provide services based on video taken from wearable cameras.

🌐 FAIR Blog
πŸ“ Paper
The well-known implementation-freak lucidrains has already released a βš™οΈ Timesformer code.
You don't need EfficientNets. Simple tricks make ResNets better and faster than EfficientNets
Google Brain

Authors introduce a new family of ResNet architectures - ResNet-RS

πŸ”₯ Main Results
- ResNet-RSs are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar or better accuracies on ImageNet.
- In semi-supervised learning scenario (w/ 130M pseudo-labaled images) ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet-NoisyStudent
- SoTA results for transfer learning.

Continued belowπŸ‘‡