Gradient Dude – Telegram

Gradient Dude

2.66K subscribers

180 photos

50 videos

2 files

169 links

TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).

Most ML feeds go for fluff, we go for the real meat.

YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude

Download Telegram

About

Blog

Apps

Platform

2.66K subscribers

This media is not supported in your browser

VIEW IN TELEGRAM

Neural 3D Video Synthesis
Facebook Reality Labs

These guys created a novel time-conditioned Neural Radiance Fields. The results are impressive. When it gets faster, it will enable mind-blowing applications!

It is a sort of extension of NeRF model for videos. The model learns to generate video frames conditioned on position, view direction and time-variant latent code.
Temporal latent codes are optimized jointly with other network parameters.
NeRF model is notoriously slow and requires a long training time. Training separate NERF models for every frame requires ~15K GPU hours, while the proposed model - only 1.3K GPU hours.

📝 Paper: https://arxiv.org/abs/2103.02597
🌐 Project page: https://neural-3d-video.github.io

560 views06:01

⚙️ Model architecture:
z_t is a time-variant learnable 1024-dimensional latent code. The rest is almost the same as in NERF.

🔪 Limitations:
- Training requires time-synchronized input videos from multiple static cameras with known intrinsic and extrinsic parameters.
- Training for a single 10-seconds video is still quite slow for any real-life application: It takes about a week with 8 x V100 GPUs (~1300 GPU hours).
- Blur in the moving regions in highly dynamic scenes with large and fast motions.
- Apparent artifacts when rendering from viewpoints beyond the bounds of the training views (baseline NERF model has the same problems).

515 views06:01

VQGAN: Taming Transformers for High-Resolution Image Synthesis
from my lab in Heidelberg University

Paper explained on my YouTube channel!

Authors introduce VQGAN which combines the efficiency of convolutional approaches with the expressivity of transformers. VQGAN is essentially a GAN that learns a codebook of context-rich visual parts and uses it to quantize the bottleneck representation at every forward pass.
The self-attention model is used to learn a prior distribution of codewords.
Sampling from this model then allows producing plausible constellations of the codewords which are then fed through a decoder to generate realistic images in arbitrary resolution.

📝 Paper
⚙️ Code (with pretrained models)
📓 Colab notebook
📓 Colab notebook to compare the first stage models in VQGAN and in DALL-E

💪🏻🦾🤙🏼
▶️ YouTube Video explanation

573 views16:09

Visual results. Bellissimo! 👌🏻

479 viewsedited 16:09

This media is not supported in your browser

VIEW IN TELEGRAM

497 views16:09

Facebook open-sourced a library for state-of-the-art self-supervised learning: VISSL.

+ It contains reproducible reference implementation of SOTA self-supervision approaches (like SimCLR, MoCo, PIRL, SwAV etc) and their components that can be reused. Also supports supervised trainings.
+ Easy to train model on 1-gpu, multi-gpu and multi-node. Seamless scaling to large scale data and model sizes with FP16, LARC etc.

Finally somebody unified all recent works in one modular framework. I don't know about you, but I'm very happy 😌!

VISSL: https://vissl.ai/
Blogpost: https://ai.facebook.com/blog/seer-the-start-of-a-more-powerful-flexible-and-accessible-era-for-computer-vision
Tutorials in Google Colab: https://vissl.ai/tutorials/

922 viewsedited 16:31

Self-supervised Pretraining of Visual Features in the Wild

Facebook also published its ultimate SElf-supERvised (SEER) model.

- They pretrained it on a 1B random, unlabeled and uncurated Instagram images 👀.
- SEER outperformed SOTA self-supervised systems, reaching 84.2% top-1 accuracy on ImageNet.
- SEER also outperformed SOTA supervised models on downstream tasks, including low-shot, object detection, segmentation, and image classification.
- When trained with just 10% of the ImageNet, SEER still achieved 77.9% top-1 accuracy on the full data set. When trained with just 1% of the annotated ImageNet examples, SEER achieved 60.5% top-1 accuracy.
- SEER is based on recent RegNet achitecture . Under comparable training settings and flops, the RegNet models outperform the popular EfficientNet l models while being up to 5x faster on GPUs.

📝 Paper
📖 Blogpost
⚙️ I guess the source code will be published as a part of VISSL soon.

SEER: The start of a more powerful, flexible, and accessible era for computer vision

The future of AI is in creating systems that can learn directly from whatever information they’re given — whether it’s text, images, or another type of data — without relying on carefully curated and labeled data sets to teach them how to recognize objects…

753 views16:46

84.2% top-1 accuracy on Imagenet! 👀

582 views16:46

This media is not supported in your browser

VIEW IN TELEGRAM

Synthesized StyleGAN2 portrait was tuned using a textual description using CLIP encoder. A man was transformed into a vampire by navigating in the latent space using a query "an image of a man resembling a vampire, with the face of Count Dracula". Video attached.

For me this looks like a sorcery ✨.

➖ Link to the source twitt
📓 Colab notebook

644 views17:24

Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Yann LeCun, FAIR

New self-supervised learning loss: compute cross-correlation matrix between the features of two distorted versions of a sample and make it as close to the identity matrix as possible.

+ This naturally avoids representation collapse and causes the representation vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors.
+ It is also robust to the training batch size.
+ Comparable to SOTA self-supervised methods (similar results as BYOL), but the method is conceptually simpler.

⚙️ My favorite part, training resources: 32x V100 GPUs, approx. 124 hours

📝 Paper
🛠 Code (will be released soon)

1.44K views17:54

Self-supervised learning: The dark matter of intelligence

Blog post by Yann LeCun and Ishan Misra - well-known experts in self-supervised learning at FAIR.

They talk about:
- Self-supervised learning as a paradigm in general
- Self-supervised learning as predictive learning,
- Self-supervised learning for language versus vision
- Modeling the uncertainty in prediction
- A unified view of self-supervised methods
- Self-supervised learning at Facebook

Some excerpts:

As babies, we learn how the world works largely by observation. We form generalized predictive models about objects in the world by learning concepts such as object permanence and gravity. Later in life, we observe the world, act on it, observe again, and build hypotheses to explain how our actions change our environment by trial and error.

We believe that self-supervised learning (SSL) is one of the most promising ways to build such background knowledge and approximate a form of common sense in AI systems.

📎 Read more here.

632 views18:22

“Long term, progress in AI will come from programs that just watch videos all day and learn like a baby. ... Childrean learn by watching the spectacle of the world.
But when the spectacle of the world is captured by a camera, it's a video." -
@Yann Lecun

I can only add here, that AI might be also learnign from interacting with its environment (at least a simulated one).

Blogpost with high-level reflection on self-supervised learning at wired.com.

Facebook’s New AI Teaches Itself to See With Less Human Help

Most image recognition algorithms require lots of labeled pictures. This new approach eliminates the need for most of the labeling.

543 views07:01

This media is not supported in your browser

VIEW IN TELEGRAM

Visualising Neurons in Artificial Neural Networks

What a surprise, openAI discovered yet another time that neurons can be interpretable 😂 now they showed neurons for their recently hyped CLIP model.

https://openai.com/blog/multimodal-neurons/

521 viewsedited 18:16

But to be honest this time they look much better and crispy.

514 views18:16

However, CLIP tends to over-abstract images in many ways and this leads to a new type of attack on such neural Networks - typographic attack.

Yes, nowadays it is that easy to trick this powerful AI.

541 views18:16

Regarding the typographic attack in the previous post. Apparently, It can be avoided if you give proper query text string. For example “wait a second, this is just an apple with a label saying iPod” will get a higher confidence than the “iPod”

This was discovered by Yannic.

562 views01:53

This media is not supported in your browser

VIEW IN TELEGRAM

It is Sunday, pancake time 👌🏻. So I could not resist sharing this spectacular Deep Fake with you.

733 views15:06

Neural Funk: AI generates endless breakbeats

Enthusiasts from Skoltech have trained a WaveGAN on 7500 vintage drum loops, then used the resulting model to generate thousands of new drum loops.
I have attached my favorite 6-minute sample (147 bpm). Love it!

The result was obtained by moving a point slowly through a random trajectory in the model’s latent space. Each point in the latent space corresponds to either an existing or non-existing break. Linear movement between two points results in a smooth transition between two corresponding breaks.

The pace of progress in synthetic audio and image generation is mind-blowing. Will we be able to generate infinite-movies? Imagine an infinite Harry Potter story or an endless New Year's speech of Putin 😅

▶️ A 6-hour Neural Funk on YouTube
🎧 A 6-hour sequence in wav format

📓Colab notebook with pretrained models

823 viewsedited 16:15

570 views16:15