Fabulous DeepDream straight outta Mexico. The ambient sounds are well adjusted as well.
Who said that DeepDream is useless? 😂
Who said that DeepDream is useless? 😂
You don't need EfficientNets. Simple tricks make ResNets better and faster than EfficientNets
Google Brain
Authors introduce a new family of ResNet architectures - ResNet-RS
🔥 Main Results
- ResNet-RSs are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar or better accuracies on ImageNet.
- In semi-supervised learning scenario (w/ 130M pseudo-labaled images) ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet-NoisyStudent
- SoTA results for transfer learning.
Continued below👇
Google Brain
Authors introduce a new family of ResNet architectures - ResNet-RS
🔥 Main Results
- ResNet-RSs are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar or better accuracies on ImageNet.
- In semi-supervised learning scenario (w/ 130M pseudo-labaled images) ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet-NoisyStudent
- SoTA results for transfer learning.
Continued below👇
🃏 They take advantage of the following ideas:
1. Convolutions are better optimized for GPUs/TPUs than depthwise convolutions used in EfficientNets.
2. Simple Scaling Strategy (i.e. Increasing the model dimensions like width, depth and resolution) is the key. Scale model depth in regimes where overfitting can occur:
🔸Depth scaling outperforms width scaling for longer epoch regimes.
🔸Width scaling outperforms depth scaling for shorter epoch regimes.
3. Apply weight decay, label smoothing, dropout and stochastic depth for regularization.
4. Use RandAugment instead of AutoAugment.
5. Adding two common and simple architectural changes (Squeeze-and-Excitation and ResNet-D).
6. Decrease weight decay when using more regularization like dropout, augmentations, stochastic depth, etc..
❓How to tune the hyperparameters?
1. Scaling strategies found in small-scale regimes (e.g. on small models or with few training epochs) fail to generalize to larger models or longer training iterations
2. Run a small subset of models across different scales, for the full training epochs, to gain intuition on which dimensions are the most useful across model scales.
3. Increase Image Resolution lower than previously recommended. Larger image resolutions often yield diminishing returns.
⚔️FLOPs vs Latency
While FLOPs provide a hardware-agnostic metric for assessing computational demand, they may not be indicative of actual latency times for training and inference. In custom hardware architectures (e.g. TPUs and GPUs), FLOPs are an especially poor proxy because operations are often bounded by memory access costs and have different levels of optimization on modern matrix multiplication units. The inverted bottlenecks used in EfficientNets employ depthwise convolutions with large activations and have a small compute to memory ratio (operational intensity) compared to the ResNet’s bottleneck blocks which employ dense convolutions on smaller activations. This makes EfficientNets less efficient 😂 on modern accelerators compared to ResNets. A ResNet-RS model with 1.8x more FLOPs than EfficientNet-B6 is 2.7x faster on a TPUv3.
⚔️ Parameters vs Memory
Although ResNet-RS has 3.8x more parameters and FLOPs than EfficeintNet with the same accuracy, the ResNet-RS model requires 2.3x less memory and runs ~3x faster on TPUs and GPUs.
Parameter count does not necessarily dictate memory consumption during training because memory is often dominated by the size of the activations. And EfficientNets has large activations which cause a larger memory footprint because EfficientNets requires large image resolutions to match the performance of the ResNet-RSs. E.g, to get 84% top-1 ImageNet accuracy, EficientNet needs an input image of 528x528, while ResNet-RS - only 256x256.
☑️ Conclusions:
1. You'd better ResNets as baselines for your projects now.
2. Reporting latencies and memory consumption are generally more relevant metrics to compare different architectures, than the number of FLOPs. FLOPs and parameters are not representative of latency or memory consumption.
3. Training methods can be more task-specific than architectures. E.g., data augmentation is useful for small datasets or when training for many epochs, but the specifics of the augmentation method can be task-dependent (e.g. scale jittering instead of RandAugment is better on Kinetics-400 video classification).
4. The best performing scaling strategy depends on the training regime and whether overfitting is an issue. When training for 350 epochs on ImageNet, use depth scaling, whereas scaling the width is preferable when training for few epochs (e.g. only 10)
5.Future successful architectures will probably emerge by co-design with hardware, particularly in resource-tight regimes like mobile phones.
🌐 My blogpose at gdude.de
📝 Paper Revisiting ResNets: Improved Training and Scaling Strategies
🔨 Code (Tensorflow)
📎 Other references:
EfficientNet
ResNet-D (Bag of Tricks)
RandAugment
AutoAugment
Squeeze-and-Excitation
1. Convolutions are better optimized for GPUs/TPUs than depthwise convolutions used in EfficientNets.
2. Simple Scaling Strategy (i.e. Increasing the model dimensions like width, depth and resolution) is the key. Scale model depth in regimes where overfitting can occur:
🔸Depth scaling outperforms width scaling for longer epoch regimes.
🔸Width scaling outperforms depth scaling for shorter epoch regimes.
3. Apply weight decay, label smoothing, dropout and stochastic depth for regularization.
4. Use RandAugment instead of AutoAugment.
5. Adding two common and simple architectural changes (Squeeze-and-Excitation and ResNet-D).
6. Decrease weight decay when using more regularization like dropout, augmentations, stochastic depth, etc..
❓How to tune the hyperparameters?
1. Scaling strategies found in small-scale regimes (e.g. on small models or with few training epochs) fail to generalize to larger models or longer training iterations
2. Run a small subset of models across different scales, for the full training epochs, to gain intuition on which dimensions are the most useful across model scales.
3. Increase Image Resolution lower than previously recommended. Larger image resolutions often yield diminishing returns.
⚔️FLOPs vs Latency
While FLOPs provide a hardware-agnostic metric for assessing computational demand, they may not be indicative of actual latency times for training and inference. In custom hardware architectures (e.g. TPUs and GPUs), FLOPs are an especially poor proxy because operations are often bounded by memory access costs and have different levels of optimization on modern matrix multiplication units. The inverted bottlenecks used in EfficientNets employ depthwise convolutions with large activations and have a small compute to memory ratio (operational intensity) compared to the ResNet’s bottleneck blocks which employ dense convolutions on smaller activations. This makes EfficientNets less efficient 😂 on modern accelerators compared to ResNets. A ResNet-RS model with 1.8x more FLOPs than EfficientNet-B6 is 2.7x faster on a TPUv3.
⚔️ Parameters vs Memory
Although ResNet-RS has 3.8x more parameters and FLOPs than EfficeintNet with the same accuracy, the ResNet-RS model requires 2.3x less memory and runs ~3x faster on TPUs and GPUs.
Parameter count does not necessarily dictate memory consumption during training because memory is often dominated by the size of the activations. And EfficientNets has large activations which cause a larger memory footprint because EfficientNets requires large image resolutions to match the performance of the ResNet-RSs. E.g, to get 84% top-1 ImageNet accuracy, EficientNet needs an input image of 528x528, while ResNet-RS - only 256x256.
☑️ Conclusions:
1. You'd better ResNets as baselines for your projects now.
2. Reporting latencies and memory consumption are generally more relevant metrics to compare different architectures, than the number of FLOPs. FLOPs and parameters are not representative of latency or memory consumption.
3. Training methods can be more task-specific than architectures. E.g., data augmentation is useful for small datasets or when training for many epochs, but the specifics of the augmentation method can be task-dependent (e.g. scale jittering instead of RandAugment is better on Kinetics-400 video classification).
4. The best performing scaling strategy depends on the training regime and whether overfitting is an issue. When training for 350 epochs on ImageNet, use depth scaling, whereas scaling the width is preferable when training for few epochs (e.g. only 10)
5.Future successful architectures will probably emerge by co-design with hardware, particularly in resource-tight regimes like mobile phones.
🌐 My blogpose at gdude.de
📝 Paper Revisiting ResNets: Improved Training and Scaling Strategies
🔨 Code (Tensorflow)
📎 Other references:
EfficientNet
ResNet-D (Bag of Tricks)
RandAugment
AutoAugment
Squeeze-and-Excitation
Googleblog
Improving Deep Learning Performance with AutoAugment
Gradient Dude
You don't need EfficientNets. Simple tricks make ResNets better and faster than EfficientNets Google Brain Authors introduce a new family of ResNet architectures - ResNet-RS 🔥 Main Results - ResNet-RSs are 1.7x - 2.7x faster than EfficientNets on TPUs,…
If you prefer reading blogposts, here its. I've just written it for you. A bit easier to read than in Telegram.
🌐 https://gdude.de/blog/2021-03-15/Revisiting-Resnets
🌐 https://gdude.de/blog/2021-03-15/Revisiting-Resnets
Gradient Dude
You don't need EfficientNets. Simple tricks make ResNets better and faster than EfficientNets
Revisiting ResNets: Improved Training and Scaling Strategies. New family of architectures - ResNet-RS.
Future of human-computer interaction — the 10-year vision by Facebook Reality Labs
Say you decide to walk to your local cafe to get some work done. You’re wearing a pair of AR glasses and a soft wristband. As you head out the door, your Assistant asks if you’d like to listen to the latest episode of your favorite podcast. A small movement of your finger lets you click “play.”
As you enter the cafe, your Assistant asks, “Do you want me to put in an order for a 12-ounce Americano?” Not in the mood for your usual, you again flick your finger to click “no.”
You head to a table, but instead of pulling out a laptop, you pull out a pair of soft, lightweight haptic gloves. When you put them on, a virtual screen and keyboard show up in front of you and you begin to edit a document. Typing is just as intuitive as typing on a physical keyboard and you’re on a roll, but the noise from the cafe makes it hard to concentrate.
Read more about the vision of the future of HCI at Facebok Reality Labs (FRL) blogpost.
Say you decide to walk to your local cafe to get some work done. You’re wearing a pair of AR glasses and a soft wristband. As you head out the door, your Assistant asks if you’d like to listen to the latest episode of your favorite podcast. A small movement of your finger lets you click “play.”
As you enter the cafe, your Assistant asks, “Do you want me to put in an order for a 12-ounce Americano?” Not in the mood for your usual, you again flick your finger to click “no.”
You head to a table, but instead of pulling out a laptop, you pull out a pair of soft, lightweight haptic gloves. When you put them on, a virtual screen and keyboard show up in front of you and you begin to edit a document. Typing is just as intuitive as typing on a physical keyboard and you’re on a roll, but the noise from the cafe makes it hard to concentrate.
Read more about the vision of the future of HCI at Facebok Reality Labs (FRL) blogpost.
Tech at Meta
Inside Facebook Reality Labs: The next era of human-computer interaction - Tech at Meta
Inside Facebook Reality Labs: The next era of human-computer interactionInside Facebook Reality Labs: The next era of human-computer interactionTL;DR: In today’s post — the first in a series exploring the future of human-computer interaction (HCI) — we’ll…
Media is too big
VIEW IN TELEGRAM
Ultra-low-friction AR interface will be built on two technological pillars:
1. Ultra-low-friction input, so when you need to act, the path from thought to action is as short and intuitive as possible. You might gesture with your hand, make voice commands, or select items from a menu by looking at them — actions enabled by hand-tracking cameras, a microphone array, and eye-tracking technology.
But ultimately, you’ll need a more natural way - neural input, e.g. wrist-based electromyography (EMG).
Wrist-based EMG reads the signals on the motor neurons that run from the spinal cord to the hand. The signals through the wrist are so clear that EMG can detect finger motion of just a millimeter. Ultimately it may even be possible to sense just the intent to move a finger.
1. Ultra-low-friction input, so when you need to act, the path from thought to action is as short and intuitive as possible. You might gesture with your hand, make voice commands, or select items from a menu by looking at them — actions enabled by hand-tracking cameras, a microphone array, and eye-tracking technology.
But ultimately, you’ll need a more natural way - neural input, e.g. wrist-based electromyography (EMG).
Wrist-based EMG reads the signals on the motor neurons that run from the spinal cord to the hand. The signals through the wrist are so clear that EMG can detect finger motion of just a millimeter. Ultimately it may even be possible to sense just the intent to move a finger.
2. The second pillar is the use of AI, context, and personalization to scope the effects of your input actions to your needs at any given moment. AI should adapt the input interface to the context/environment and, ideally, anticipate the user's needs.
I strongly recommend watching the Keynote talk by FRL Chief Scientist Michael Abrash. The FRL projects are very ambitious.
I strongly recommend watching the Keynote talk by FRL Chief Scientist Michael Abrash. The FRL projects are very ambitious.
Continuing the discussion about novel Human-Computer Interfaces 🦾
Technologies & Startups that Hack The Brain: Beyond the Healthcare Market
A review of 30 startups, their markets, business models, tech, and where machine learning fits in.
This article has a rather wide view on neurotech, and brain-computer interfaces (BCIs, both invasive and noninvasive) and various technologies, e.g. electroencephalography (EEG), electromyography (EMG), functional near-infrared spectroscopy (fNIRS), and others. It also covers neuromodulation that partially overlaps with the BCIs space.
Technologies & Startups that Hack The Brain: Beyond the Healthcare Market
A review of 30 startups, their markets, business models, tech, and where machine learning fits in.
This article has a rather wide view on neurotech, and brain-computer interfaces (BCIs, both invasive and noninvasive) and various technologies, e.g. electroencephalography (EEG), electromyography (EMG), functional near-infrared spectroscopy (fNIRS), and others. It also covers neuromodulation that partially overlaps with the BCIs space.
This media is not supported in your browser
VIEW IN TELEGRAM
Gucci and Belarusian startup Wanna created virtual sneakers.
You can buy then at Gucci app for $12 or at Wanna Kicks app for $9 🤭
I'm not a big fan of such applications. While I appreciate the efforts of the Wanna team - they went a long way since the last year and the shoes fit the foot much better now, but such sneakers still look a bit toyish in my opinion. To make the material look more realistic one would need to adapt the rendering to the current lighting conditions and shadows.
Would you use this app?
Video from @futuresailors.
You can buy then at Gucci app for $12 or at Wanna Kicks app for $9 🤭
I'm not a big fan of such applications. While I appreciate the efforts of the Wanna team - they went a long way since the last year and the shoes fit the foot much better now, but such sneakers still look a bit toyish in my opinion. To make the material look more realistic one would need to adapt the rendering to the current lighting conditions and shadows.
Would you use this app?
Video from @futuresailors.
Whatsup people 🤙🏼,
Today is ICCV submission deadline. And it is very tricky to write a good Introduction in your paper.
But today Prof. Kate Saenko (our Russian speaking part of the channel should probably know her) shares her experience and shows a template which she gives to new graduate students 🙂.
#phd_tips
🌐 How to Write the Introduction in 3 Easy Steps.
Today is ICCV submission deadline. And it is very tricky to write a good Introduction in your paper.
But today Prof. Kate Saenko (our Russian speaking part of the channel should probably know her) shares her experience and shows a template which she gives to new graduate students 🙂.
#phd_tips
🌐 How to Write the Introduction in 3 Easy Steps.
Not bad HTC! Looks like everyone is trying to create its own VR helmet. Face tracking and hand movements look impressive. However, manipulation part is still not comfortable. I don't want to hold those sticks all the time😐
https://tttttt.me/ai_newz/344
https://tttttt.me/ai_newz/344
Telegram
эйай ньюз
Еще фейс-трекинга от HTC Vive. В начале немножко криповато, но в целом возможности впечатляют. Движутся они точно в верном направлении.
Nice infographics about the amounts of data uploaded and consumed everyday. Although it was created in 2019. Now the numbers has doubled at least IMO.
Full resolution
Full resolution
How to easily edit and compose images like in Photoshop using GANs?
MIT
🎯Task:
Given an incomplete image or a collage of images, generate a realistic image from it.
🔑Method:
This paper presents a simple approach – given a fixed pretrained generator (e.g., StyleGAN), they train a regressor network to predict
the latent code from an input image. To teach the regressor to predict the latent code for images w/ missing pixels they mask random patches during training.
Now, given an input collage, the regressor projects it into a reasonable location of the latent space, which then the generator maps onto the
image manifold. Such an approach enables more localized editing of individual image parts compared to direct editing in the latent space
📚Interesting findings:
- Even though our regressor is never trained on unrealistic and incoherent collages, it projects the given image into a reasonable latent code.
- Authors show that the representation of the generator is already compositional in the latent code. Meaning that altering the part of the input image, will result in a change of the regressed latent code in the corresponding location.
➕Pros:
- As input, we need only a single example of approximately how we want the generated image to look (can be a collage of different images).
- Requires only one forward pass of the regressor and generator -> fast, unlike iterative optimization approaches that can require up to a minute to reconstruct an image. https://arxiv.org/abs/1911.11544
- Does not require any labeled attributes.
💬Applications
- Image inpainting.
- Example-based image editing (incoherent collage -> to realistic image).
#paper_explained #cv
📝 Paper: Using latent space regression to analyze and leverage compositionality in GANs
🌐 Project page
⚒ Code
📓 Colab
MIT
🎯Task:
Given an incomplete image or a collage of images, generate a realistic image from it.
🔑Method:
This paper presents a simple approach – given a fixed pretrained generator (e.g., StyleGAN), they train a regressor network to predict
the latent code from an input image. To teach the regressor to predict the latent code for images w/ missing pixels they mask random patches during training.
Now, given an input collage, the regressor projects it into a reasonable location of the latent space, which then the generator maps onto the
image manifold. Such an approach enables more localized editing of individual image parts compared to direct editing in the latent space
📚Interesting findings:
- Even though our regressor is never trained on unrealistic and incoherent collages, it projects the given image into a reasonable latent code.
- Authors show that the representation of the generator is already compositional in the latent code. Meaning that altering the part of the input image, will result in a change of the regressed latent code in the corresponding location.
➕Pros:
- As input, we need only a single example of approximately how we want the generated image to look (can be a collage of different images).
- Requires only one forward pass of the regressor and generator -> fast, unlike iterative optimization approaches that can require up to a minute to reconstruct an image. https://arxiv.org/abs/1911.11544
- Does not require any labeled attributes.
💬Applications
- Image inpainting.
- Example-based image editing (incoherent collage -> to realistic image).
#paper_explained #cv
📝 Paper: Using latent space regression to analyze and leverage compositionality in GANs
🌐 Project page
⚒ Code
📓 Colab
Learning to resize: Replace a front-end resizer in deep networks by a learnable non-linear resizer
Google Research
Deep computer vision models can benefit greatly from replacing a fixed linear resizer which you use to downsample Imagenet images before training with a well-designed, learned, nonlinear resizer.
Structure of the learned resizer is specific; not just adding more generic convolutional layers to the baseline model. Looks like it strives to encode some extra information in the downsampled image. From there stems the extra perfromance on Imagenet.
This work shows that a generically deeper model can be improved upon w/ a well-designed front-end, task-optimized, processor.
Looking ahead: probably there’s a lot of room for work on task-optimized pre-processing modules for computer vision and other tasks.
📝 Paper
No code yet
#cv #paper_explained
Google Research
Deep computer vision models can benefit greatly from replacing a fixed linear resizer which you use to downsample Imagenet images before training with a well-designed, learned, nonlinear resizer.
Structure of the learned resizer is specific; not just adding more generic convolutional layers to the baseline model. Looks like it strives to encode some extra information in the downsampled image. From there stems the extra perfromance on Imagenet.
This work shows that a generically deeper model can be improved upon w/ a well-designed front-end, task-optimized, processor.
Looking ahead: probably there’s a lot of room for work on task-optimized pre-processing modules for computer vision and other tasks.
📝 Paper
No code yet
#cv #paper_explained