Designing, Visualizing and Understanding Deep Neural Networks, CS182
Sergey Levine released his new lectures for deep learning class, CS182! This is an introductory deep learning course (advanced undergraduate + graduate) covering a broad range of deep learning topics. Prof. Levine is an Assistant Professor at UC Berkeley and is the head of the Robotic Artificial Intelligence and Learning Lab, I have posted about him a few months ago.
π Course website
βΆοΈ Lectures playlist
Sergey Levine released his new lectures for deep learning class, CS182! This is an introductory deep learning course (advanced undergraduate + graduate) covering a broad range of deep learning topics. Prof. Levine is an Assistant Professor at UC Berkeley and is the head of the Robotic Artificial Intelligence and Learning Lab, I have posted about him a few months ago.
π Course website
βΆοΈ Lectures playlist
This media is not supported in your browser
VIEW IN TELEGRAM
Neural Corgi π€
StyleGAN2-ADA trained on cute Corgi images. Looks amazing!
1. Scrape 350k Corgi images from Instagram.
2. Detect dogs using YOLOv3.
3. Remove small detections and dogs not facing in the camera.
4. Remove duplicates and crop the images. Around 130k 1024x1024 were obtained at this step.
5. Upsample crops to 1024 x 1024.
6. Train StyleGAN2-ADA for 5 million iterations for 18 days on Tesla V100.
7. Profit ?!
π Colab
π Code and dataset
StyleGAN2-ADA trained on cute Corgi images. Looks amazing!
1. Scrape 350k Corgi images from Instagram.
2. Detect dogs using YOLOv3.
3. Remove small detections and dogs not facing in the camera.
4. Remove duplicates and crop the images. Around 130k 1024x1024 were obtained at this step.
5. Upsample crops to 1024 x 1024.
6. Train StyleGAN2-ADA for 5 million iterations for 18 days on Tesla V100.
7. Profit ?!
π Colab
π Code and dataset
MacaquePose: A Novel βIn the Wildβ Macaque Monkey Pose Dataset
Recently, Computer vision for animals is getting more traction. Several works on this topic have already been discussed in this channel: post [1], post [2] , post [3].
β Why?
Pose estimation is fundamental for analyzing the relationship between the animalβs behaviors and its brain functions and malfunctions. And Macaque monkeys are excellent non-human primate models, especially for studying neuroscience.
Another possible application is Instagram / Snapchat masks and effects for your cute quadruple friends.
π Dataset
This dataset provides keypoints for macaques in naturalistic scenes, it consists of 13k images and 16k monkey instances.
- 17 keypoints and instance segmentation for each monkey in COCO format.
- Annotations are of high quality because crowd-sourced annotations were curated and refined by 8 researchers working specifically with macaques.
Recently, Computer vision for animals is getting more traction. Several works on this topic have already been discussed in this channel: post [1], post [2] , post [3].
β Why?
Pose estimation is fundamental for analyzing the relationship between the animalβs behaviors and its brain functions and malfunctions. And Macaque monkeys are excellent non-human primate models, especially for studying neuroscience.
Another possible application is Instagram / Snapchat masks and effects for your cute quadruple friends.
π Dataset
This dataset provides keypoints for macaques in naturalistic scenes, it consists of 13k images and 16k monkey instances.
- 17 keypoints and instance segmentation for each monkey in COCO format.
- Annotations are of high quality because crowd-sourced annotations were curated and refined by 8 researchers working specifically with macaques.
π Interesting findings
The most challenging for both human annotators and the DeepLabCut baseline is to predict the positions of shoulders and hips. Another point of failure for Neural Networks is self occlusions.
π Paper
π Dataset
βοΈ Pretrained models in DeepLabCut Model Zoo
π Colab
The most challenging for both human annotators and the DeepLabCut baseline is to predict the positions of shoulders and hips. Another point of failure for Neural Networks is self occlusions.
π Paper
π Dataset
βοΈ Pretrained models in DeepLabCut Model Zoo
π Colab
This media is not supported in your browser
VIEW IN TELEGRAM
I totally need glasses that move with my eyebrows. (c) Yann LeCun
The quality is wicked because of the pesky twitter compression.
The quality is wicked because of the pesky twitter compression.
CLIP + StyleGAN. Searching in StyleGAN latent space using description embedded with CLIP.
Queries: "A pony that looks like Beyonce", "... like Billie Eilish", ".. like Rihanna"
π The basic idea
Generate an image with StyleGAN and pass the image to CLIP for the loss against a CLIP text query representation. You then backprop through both networks and optimize a latent space in StyleGAN.
π€¬ Drawbacks 1) it only works on text it knows 2) needs some cherry picking, only about 1/5 are really good.
Source twitt.
Queries: "A pony that looks like Beyonce", "... like Billie Eilish", ".. like Rihanna"
π The basic idea
Generate an image with StyleGAN and pass the image to CLIP for the loss against a CLIP text query representation. You then backprop through both networks and optimize a latent space in StyleGAN.
π€¬ Drawbacks 1) it only works on text it knows 2) needs some cherry picking, only about 1/5 are really good.
Source twitt.
This media is not supported in your browser
VIEW IN TELEGRAM
Cute RoboCat π learned how to track objects. Fun application of Computer Vision. Is anyone among my subscribers working in robotics?
Source: IG @bio.makers
Source: IG @bio.makers
GANs are getting their way into production
Adobe has rolled out a super-resolution feature for Photoshop. Now one can upscale the image x2 times on each side.
π For curious, I leave several links to SOTA super-resolution methods:
1. Structure-Preserving Super Resolution with Gradient Guidance (SPSR), CVPR2020
2. Learned Image Downscaling for Upscaling using Content Adaptive Resampler (CAR), ECCV2020
3. Single Image Super-Resolution via a Holistic Attention Network (HAN), ECCV2020
4. ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic, CVPR2021
β
Let me know in comments if there is a better super-res paper.
Adobe has rolled out a super-resolution feature for Photoshop. Now one can upscale the image x2 times on each side.
π For curious, I leave several links to SOTA super-resolution methods:
1. Structure-Preserving Super Resolution with Gradient Guidance (SPSR), CVPR2020
2. Learned Image Downscaling for Upscaling using Content Adaptive Resampler (CAR), ECCV2020
3. Single Image Super-Resolution via a Holistic Attention Network (HAN), ECCV2020
4. ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic, CVPR2021
β
Let me know in comments if there is a better super-res paper.
Summary of Recent Generative Models
Nice blogpost giving a brief overview over several recent generative models, including VAEe, GANs and Diffusion Models.
π Read it here
Nice blogpost giving a brief overview over several recent generative models, including VAEe, GANs and Diffusion Models.
π Read it here
Aran Komatsuzaki
State-of-the-Art Image Generative Models
I have aggregated some of the SotA image generative models released recently, with short summaries, visualizations and comments. The overall development is summarized, and the future trends are speβ¦
Facebook AI has built TimeSformer, a new architecture for video understanding. Itβs the first based exclusively on the self-attention mechanism used in Transformers. It outperforms the state of the art while being more efficient than 3D ConvNets for video.
βWhy it matters
To train video-understanding models, the best 3D CNNs today can only use video segments that are a few seconds long. With TimeSformer, we are able to train on far longer video clips β up to several minutes long. This may dramatically advance research to teach machines to understand complex long-form actions in videos, which is an important step for many AI applications geared toward human behavior understanding (e.g., an AI assistant).
Furthermore, the low inference cost of TimeSformer is an important step toward supporting future real-time video processing applications, such as AR/VR, or intelligent assistants that provide services based on video taken from wearable cameras.
π FAIR Blog
π Paper
βWhy it matters
To train video-understanding models, the best 3D CNNs today can only use video segments that are a few seconds long. With TimeSformer, we are able to train on far longer video clips β up to several minutes long. This may dramatically advance research to teach machines to understand complex long-form actions in videos, which is an important step for many AI applications geared toward human behavior understanding (e.g., an AI assistant).
Furthermore, the low inference cost of TimeSformer is an important step toward supporting future real-time video processing applications, such as AR/VR, or intelligent assistants that provide services based on video taken from wearable cameras.
π FAIR Blog
π Paper
The well-known implementation-freak lucidrains has already released a βοΈ Timesformer code.