Gradient Dude

Visualising Neurons in Artificial Neural Networks

What a surprise, openAI discovered yet another time that neurons can be interpretable 😂 now they showed neurons for their recently hyped CLIP model.

https://openai.com/blog/multimodal-neurons/

521 viewsedited 18:16

Gradient Dude

But to be honest this time they look much better and crispy.

514 views18:16

Gradient Dude

However, CLIP tends to over-abstract images in many ways and this leads to a new type of attack on such neural Networks - typographic attack.

Yes, nowadays it is that easy to trick this powerful AI.

541 views18:16

Gradient Dude

Regarding the typographic attack in the previous post. Apparently, It can be avoided if you give proper query text string. For example “wait a second, this is just an apple with a label saying iPod” will get a higher confidence than the “iPod”

This was discovered by Yannic.

561 views01:53

Gradient Dude

0:20

This media is not supported in your browser

VIEW IN TELEGRAM

It is Sunday, pancake time 👌🏻. So I could not resist sharing this spectacular Deep Fake with you.

733 views15:06

Gradient Dude

Neural Funk: AI generates endless breakbeats

Enthusiasts from Skoltech have trained a WaveGAN on 7500 vintage drum loops, then used the resulting model to generate thousands of new drum loops.
I have attached my favorite 6-minute sample (147 bpm). Love it!

The result was obtained by moving a point slowly through a random trajectory in the model’s latent space. Each point in the latent space corresponds to either an existing or non-existing break. Linear movement between two points results in a smooth transition between two corresponding breaks.

The pace of progress in synthetic audio and image generation is mind-blowing. Will we be able to generate infinite-movies? Imagine an infinite Harry Potter story or an endless New Year's speech of Putin 😅

▶️ A 6-hour Neural Funk on YouTube
🎧 A 6-hour sequence in wav format

📓Colab notebook with pretrained models

822 viewsedited 16:15

Gradient Dude

Audio

569 views16:15

Gradient Dude

Interview with Natalia Neverova - Research Lead at Facebook AI Research

Natalia Neverova was one of my research advisors during my internship at Facebook AI Research. In this interview, she talks about the research at FAIR, which students do they prefer to hire, 3D reconstruction of people and animals (3D animals 🐒 was exactly my research project at FAIR).

🌐 Link to the interview (unfortunately, only in Russian)

YouTube

Transferring Dense Pose to Proximal Animal Classes (CVRP2020)

Frame-by-frame results produced by our model after self-training.

Project url: https://asanakoy.github.io/densepose-evolution/

732 views16:50

Gradient Dude

China trains a 10billion parameter multimodal network… using NVIDIA’s code:

A hybrid team of researchers from Alibaba and Tsinghua University have built M6, a “Multi-Modality to Multi-Modality Multitask Mega-transformer”. M6 is a multi-modal model trained on a huge corpus of text and image data, including image-text pairs (similar to recent systems like OpenAI’s CLIP). M6 has a broad capability surface and because of how it was trained, you can use M6 to search for an image or vice versa, generate media in different modalities, match images together, write poems, answer questions, and so on.

📦 Data: ~60 million images (with accompanying text pairs) totalling 1.9TB (almost twice the raw size of ImageNet), plus 292GB of text.

551 views18:32

Gradient Dude

📌 Facts and figures: Though the authors say they’ve trained a 10billion and 100billion parameter model, they mostly report performance statistics for the 10billion. The 100b is a mixture-of-experts model, while the 10b is based on NVIDIA’s Megatron training code. The model’s size and sophistication is notable – this feels like a symptom of the maturing capabilities of various Chinese AI organization. I wonder when we’ll get an M6-scale system from people affiliated with India, or regions like Europe or Africa.

🤷🏼‍♂️ Why this matters: M6 is notable for being a non-English model at equivalent scale to some of the largest primarily-English ones. We’re entering an era where there will be multiple, gigantic AI models, with variations stemming from the organizations that trained them. It’s also interesting to consider how these models proliferate, and who will get access to them. Will students and researchers at Tsinghua get access to M6, or just Alibaba’s researchers, or both? And how might access schemes develop in other countries, as well?

🌀 A word about bias: There’s no discussion of bias in the paper (or ethics), which isn’t typical for papers of this type but is typical of papers that come out of Chinese research organizations 😉

📝 ArXiv Paper link

—
Source: https://jack-clark.net/

565 views18:32

Gradient Dude

The results, honestly, are quite good. Especially enjoyed the humble opinion about "The Great Wall" 😄

559 views18:32

Gradient Dude

We are on the eve of the Matrix. Constantly increased dopamine level in the VR world or poverty and fighting with robots in reality.

Scientists from the universities of Helsinki used GANs to create personalized attractive faces. To gradually increase the face attractiveness they recorded the electrical activity of the brain of the tested person while changing the synthetic faces by random walking in the GAN latent space. This way, we get a GAN, in which a living person acts as a discriminator, and therefore the generated faces were more likable for that person.

I have thought about a similar idea a couple of years ago. We can analyze users' preferences in male/female appearance by their likes in social media and then use it to generate personalized ads with the faces of the most attractive people. This seems like a more feasible scenario than using brain encephalograms 🧠.

1.8K views16:38

Gradient Dude

Imagine now, that with the help of such techniques, one can create an ideal virtual partner. To go even further, think about how personalized porn can be created with the face/appearance of the most-attractive person (maybe not even existing).

The terrible new world is almost ready 😅.

📝 Paper
🌐 Blogpost

YouTube

Brain-computer interface for generating personally attractive images

Spapé, M., Davis, K., Kangassalo, L., Ravaja, N., Sovijärvi-Spapé, Z., & Ruotsalo, T. (2021) . Brain-computer interface for generating personally attractive images. IEEE Transactions on Affective Computing, in press.

1.7K views16:38

Gradient Dude

This media is not supported in your browser

VIEW IN TELEGRAM

Learning High Fidelity Depths of Dressed Humans by Watching TikTok Dance Videos

The single-frame depth is refined by self-supervised leveraging local transformations of body parts to enforce geometric consistency across different poses.

634 views17:24

Gradient Dude

First, depth and normal estimation network is pretrained using Synthetic 3D data (RenderPeople). Then this network is refined by using geometric consistency between pairs of different frames. Each body part transformation is modeled independently as a rigid transformation, then estimated 3D coordinates of the points on each body part can be warped onto a different frame and the disparity can be used as a loss function.

📝 Paper
🛠 Code (will be released soon)

548 views17:24

Gradient Dude

This media is not supported in your browser

VIEW IN TELEGRAM

536 views17:24

Gradient Dude

A gentle introduction to RL (18 min) by Sergey Levine, UC Berkeley, one of the leading experts in the field.

▶️ Video

Thanks @ml_for_curious

YouTube

A Gentle Introduction to Offline Reinforcement Learning

This short talk presents a non-technical introduction to offline reinforcement learning: algorithms for learning to make decisions from data.

Slides available here: https://drive.google.com/file/d/1Ip9CaAr8bF-nnvbmU63c9CPnIr2FOyPG/view?usp=sharing

567 views06:30

Gradient Dude

This media is not supported in your browser

VIEW IN TELEGRAM

NeX: Real-time View Synthesis with Neural Basis Expansion

An amazing new approach to novel view synthesis a combination of multiplane image (MPI) and neural basis expansion (NeRF-like networks). It can reproduce spectacular complex view-dependent effects (see video).

Unlike traditional MPI that uses a set of simple RGBαplanes, this technique models view-dependent effects by instead parameterizing each pixel as a linear combination of basis functions learned by a neural network.

It is stunningly fast to render! The first real-time neural rendering. 60FPS! 1000x faster than NeRF.
However, training NeX still takes a long time and may require a higher number of input views to replicate view-dependent effects.

—
By the way it is the first paper that I see from Thailand!

📝 Paper
▶️ Video from authors
🌐 Project page
🛠 Code will come soon

572 views16:45

About

Blog

Apps

Platform