Gradient Dude

1:33

VR Mind Control from NextMind: decode the act of focusing

Sorry Elon, no need to drill skulls anymore!

The NextMind sensor is non-invasive and can read electrical signals from the brains' visual cortex using small electrodes attached to the skin. Then the machine learning is used to decode brain activity and pinpoint the object of focus, allowing you to control game actions with your mind in real-time.

The sensor itself is surprisingly small and light — it fits in the palm of your hand, with two arms that extend slightly beyond that. It easily fits under a baseball cap. You just need to ensure that the nine sets of two-pronged electrode sensors make contact with your skin

Currently, it's just a dev-kit that can be paired with 3rd party VR headsets including Oculus. The kit retails for $399 and can be already preordered. The functional is limited, but it is only the first step, I'm very excited to see the further development of this technology!

Full review is here.
Thanks @ai_newz for the pointer.

1.47K views15:36

Self-supervised Learning for Medical images

Due to fixed imaging procedures, medical images like X-ray or CT scans are usually well aligned geometrically.
This gives an opportunity to utilize such an alignment to automatically mine similar pairs of image patches for self-supervised training.

The basic idea is to fix K random locations in the unlabeled medical images (K locations are the same for every image) and crop image patches across different images (which correspond to scans of different patients).
Now we create a surrogate classification task by assigning a unique pseudo-label to every location 1...K.
Authors combine the surrogate classification task with image restoration using a denoising autoencoder: they randomly perturb the cropped patches (color jittering, random noise, random cut-outs) and train a decoder to restore the original view.

However, sometimes the alignment between medical images is not perfect by default and images may depict different body parts. To make sure that the images are aligned, we train an autoencoder on full images (before cropping) and select only similar images by comparing the distances between them in the learned autoencoder latent space.

Authors show that their method is significantly better than other self-supervised learning approaches on medical data and can even be combined with existing self-supervised methods like RotNet (predicting image rotations). But unfortunately, the comparison is rather limited, and they didn't compare to Jigsaw Puzzle, SwaV, or recent contrastive self-supervised methods like MoCO, BYOL, and SimCLR.

📝 Paper
🛠 Code & Models

#paper_tldr #cv #self_supervised

1.49K views14:37

Results. TransVW (transferable visual words) is the proposed method.

1.45K views14:37

LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions

A framework that learns meaningful directions in GANs' latent space using unsupervised contrastive learning. Instead of discovering fixed directions such as in previous work, this method can discover non-linear directions in pretrained StyleGAN2 and BigGAN models. The discovered directions may be used for image manipulation.

Authors use the differences caused by an edit operation on the feature activations to optimize the identifiability of each direction. The edit operations are modeled by several separate neural nets ∆_i(z) and learning. Given a latent code z and its generated image x = G(z), we seek to find edit operations ∆_i(z) such that the image x' = G(∆_i(z)) has semantically meaningful changes over x while still preserving the identity of x.

📝 Paper
🛠 Code (next week)

#paper_tldr #cv #gan

2.26K views14:59

Spectacular Image Stylization using CLIP and DALL-E

As a Style Transfer Dude, I can say that this is super cool. A statue of David by Michelangelo was used as an input image. Then it was morphed towards different styles of famous artists by steering the latent code towards the embeddings of a textual description in CLIP space.

I especially like Picasso's Cubism where it created a half-bull half-human portrait which is one of the typical sujets of Picasso. Rene Magritte's stylization is my second favorite.

I discussed similar techniques for image editing here and here.

📙Colab which contains the most significant parts to reproduce the results: link.

Original youtube video.
Thanks @NeuralShit for the pointer.

#image_gen #gan #style_transfer

1.7K viewsedited 17:40

Joker Donald Trump Inauguration Speech🃏

Look Ma, DeepFakes are getting amazingly good! No need to spend thousands of dollars anymore to create such realistic effects.

Borrowed from @NeuroLands

1.62K viewsedited 06:01

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement🔥

This paper proposed an improved way to project real images in the StyleGAN latent space (which is required for further image manipulations).

Instead of directly predicting the latent code of a given real image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate. The initial estimate is set to just average latent code across the dataset. Inverting is done using multiple of forward passes by iteratively feeding the encoder with the output of the previous step along with the original input.

Notably, during inference, ReStyle converges its inversion after a small number of steps (e.g., < 5), taking less than 0.5 seconds per image. This is compared to several minutes per image when inverting using optimization techniques.

The results are impressive! The L2 and LPIPS loss valeus are comparable to optimization-based techniques, while two orders of magnitude faster!

📝 Paper
🛠 Code
👫 Colab

1.67K viewsedited 19:19

0:33

0:33

0:30