Infinite image generation and resampling π₯
This method can generate infinite images of diverse and complex scenes that transition naturally from one into another. It does so without any conditioning and trains without any supervision from a dataset of unrelated square images.
You can check an interactive demo on the project website.
πPaper
This method can generate infinite images of diverse and complex scenes that transition naturally from one into another. It does so without any conditioning and trains without any supervision from a dataset of unrelated square images.
You can check an interactive demo on the project website.
πPaper
April 24, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
Snap has released a new model for animating the entire human body (not just the face). Looks pretty good.
The principle is similar to their previous method - First order motion model for animation of heads. The difference is that (a) the background motion is explicitly modeled here; and (b) instead of regressing local affine transformations for a set of keypoints, this method learns to find heatmaps of different body parts in unsupervised way and
the transformation matrix of each body part is computed by applying principal component analysis (PCA) to the predicted heatmaps.
More details on the project website. Most importantly, there is code and pretrained weights. So go ahead and animate!
P.S. 2 years ago another method for animating the whole body "Everybody Dance Now" was released, but there you had to retrain the network for each new person.
The principle is similar to their previous method - First order motion model for animation of heads. The difference is that (a) the background motion is explicitly modeled here; and (b) instead of regressing local affine transformations for a set of keypoints, this method learns to find heatmaps of different body parts in unsupervised way and
the transformation matrix of each body part is computed by applying principal component analysis (PCA) to the predicted heatmaps.
More details on the project website. Most importantly, there is code and pretrained weights. So go ahead and animate!
P.S. 2 years ago another method for animating the whole body "Everybody Dance Now" was released, but there you had to retrain the network for each new person.
May 4, 2021
Moore's law is still working. Yesterday IBM has announced that they created the first 2nm chip!
They claim that their 2nm development will improve performance by 45% at the same power, or 75% energy at the same performance, compared to modern 7nm processors (e.g., Intel's).
IBM is one of the worldβs leading research centers on future semiconductor technology, but they have sold its manufacturing to GlobalFoundries in 2014 so currently, IBM only develops IP in collaboration with others (Samsung and recently announced Intel) for their manufacturing facilities.
The latest NVIDIA GPUs based on Ampere microarchitecture (2020) use TSMC 7 nm fabrication process. TSMC's 3nm is already entering into production in 2022. But when is IBM/Intel's 2nm even coming? I'm also curious if Intel can even manage their 5nm chips by 2024/25.
Source article.
They claim that their 2nm development will improve performance by 45% at the same power, or 75% energy at the same performance, compared to modern 7nm processors (e.g., Intel's).
IBM is one of the worldβs leading research centers on future semiconductor technology, but they have sold its manufacturing to GlobalFoundries in 2014 so currently, IBM only develops IP in collaboration with others (Samsung and recently announced Intel) for their manufacturing facilities.
The latest NVIDIA GPUs based on Ampere microarchitecture (2020) use TSMC 7 nm fabrication process. TSMC's 3nm is already entering into production in 2022. But when is IBM/Intel's 2nm even coming? I'm also curious if Intel can even manage their 5nm chips by 2024/25.
Source article.
May 7, 2021
Another cool work from OpenAI: Diffusion Models Beat GANs on Image Synthesis.
New SOTA for image generation on ImageNet
A new type of generative models is proposed - Diffusion Probabilistic Model. The diffusion model is a parameterized Markov chain trained using variational inference to generate samples matching data after finite time. The diffusion process here is a Markov chain that gradually adds noise to the data in the opposite direction of sampling until signal is destroyed. So here we are learning reverse transitions in this chain, which reverse the diffusion process. And of course, we parameterize everything with neural networks.
It produces very high-quality generations, even better than with GANs (it is especially clearly seen on the man with a fish, who is not that spectacular in the BigGAN model). The current disadvantage of diffusion models is slow training and inference.
π Paper
βοΈ Code
New SOTA for image generation on ImageNet
A new type of generative models is proposed - Diffusion Probabilistic Model. The diffusion model is a parameterized Markov chain trained using variational inference to generate samples matching data after finite time. The diffusion process here is a Markov chain that gradually adds noise to the data in the opposite direction of sampling until signal is destroyed. So here we are learning reverse transitions in this chain, which reverse the diffusion process. And of course, we parameterize everything with neural networks.
It produces very high-quality generations, even better than with GANs (it is especially clearly seen on the man with a fish, who is not that spectacular in the BigGAN model). The current disadvantage of diffusion models is slow training and inference.
π Paper
βοΈ Code
May 12, 2021
May 12, 2021
Chinese researchers are very fond of doing extensive surveys of a particular sub-field of machine learning, listing the main works and the major breakthrough ideas. There are so many articles published every day, and it is impossible to read everything. Therefore, such reviews are valuable (if they are well written, of course, which is quite rare).
Recently there was a very good paper reviewing various variants of Transformers with a focus on language modeling (NLP). This is a must-read for anyone getting into the world of NLP and interested in Transformers. The paper discusses the basic principles of self-attention and such details of modern variants of Transformers as architecture modifications, pre-training, and various applications.
πPaper: A Survey of Transformers.
Recently there was a very good paper reviewing various variants of Transformers with a focus on language modeling (NLP). This is a must-read for anyone getting into the world of NLP and interested in Transformers. The paper discusses the basic principles of self-attention and such details of modern variants of Transformers as architecture modifications, pre-training, and various applications.
πPaper: A Survey of Transformers.
June 11, 2021
ββFacebook AI has built a system called TextStyleBrush that can replace text both in scenes and handwriting β in one shot β using only a single example word.
The model was made self-supervised because it is utterly hard to collect labeled pairs of text in different conditions, and to annotate the segmentation masks for text (although I think it can be done using synthetic generation).
The model is trained to understand unlimited text styles for not just different typography and calligraphy, but also for different transformations, like rotations, curved text, and deformations that happen between paper and pen when handwriting; background clutter; and image noise. The main idea is to disentangle the content of a text image from all aspects of the appearance of the entire word box. The representation of the overall appearance can then be applied as a one-shot-transfer without retraining on the novel source style samples.
The model consists of a style encoder, content encoder, and stylized text generator (plus a bunch of losses).
The generator architecture is based on the StyleGAN2 model. However, the design of StyleGAN2 has an important limitation: StyleGAN2 is an unconditional model, meaning it generates images by sampling a random latent vector. For generating photo-realistic text images, however, one needs to control the output based on two separate sources: the desired text content and style. This is solved by extracting layer-specific style information and injecting it at each layer of the generator (it is some sort of conditional instance normalization).
The losses are the following: 1) reconstruction and cycle loss; 2) Discriminator real/fake; 3) Recognizer - the network that recognizes text on the stylized image and makes sure that no content is lost; 4) Typeface classifier - a pretrained network that measures how well the generator captures the style of input.
Results are quite striking!
Now imagine how you drive through the busy streets of Hong Kong and see street signs projected on the windshield of your car and translated online. Or one day used we will send personalized messages by generating some creative images with the text embedded in them (instead of stickers).
π Blogpost
π Paper
The model was made self-supervised because it is utterly hard to collect labeled pairs of text in different conditions, and to annotate the segmentation masks for text (although I think it can be done using synthetic generation).
The model is trained to understand unlimited text styles for not just different typography and calligraphy, but also for different transformations, like rotations, curved text, and deformations that happen between paper and pen when handwriting; background clutter; and image noise. The main idea is to disentangle the content of a text image from all aspects of the appearance of the entire word box. The representation of the overall appearance can then be applied as a one-shot-transfer without retraining on the novel source style samples.
The model consists of a style encoder, content encoder, and stylized text generator (plus a bunch of losses).
The generator architecture is based on the StyleGAN2 model. However, the design of StyleGAN2 has an important limitation: StyleGAN2 is an unconditional model, meaning it generates images by sampling a random latent vector. For generating photo-realistic text images, however, one needs to control the output based on two separate sources: the desired text content and style. This is solved by extracting layer-specific style information and injecting it at each layer of the generator (it is some sort of conditional instance normalization).
The losses are the following: 1) reconstruction and cycle loss; 2) Discriminator real/fake; 3) Recognizer - the network that recognizes text on the stylized image and makes sure that no content is lost; 4) Typeface classifier - a pretrained network that measures how well the generator captures the style of input.
Results are quite striking!
Now imagine how you drive through the busy streets of Hong Kong and see street signs projected on the windshield of your car and translated online. Or one day used we will send personalized messages by generating some creative images with the text embedded in them (instead of stickers).
π Blogpost
π Paper
June 15, 2021
June 15, 2021
Media is too big
VIEW IN TELEGRAM
Just a small announcement π₯
Our new (with Facebook AI Research) #CVPR21 paper is out!
Discovering Relationships between Object Categories via Universal Canonical Maps
TL;DR: Densepose method for Animals on Steroids which as a byproduct can automatically discover correspondences between 3D shapes of animals using novel cycle losses.
I will present the paper Today (21.06) at 11am EDT / 5PM CET. Feel free to join live Q&A session and ask me a questionπ.
π Project page
βΆοΈ Video explanation
π Paper
π Source code
Our new (with Facebook AI Research) #CVPR21 paper is out!
Discovering Relationships between Object Categories via Universal Canonical Maps
TL;DR: Densepose method for Animals on Steroids which as a byproduct can automatically discover correspondences between 3D shapes of animals using novel cycle losses.
I will present the paper Today (21.06) at 11am EDT / 5PM CET. Feel free to join live Q&A session and ask me a questionπ.
π Project page
βΆοΈ Video explanation
π Paper
π Source code
June 21, 2021
June 21, 2021
ββI'm happy to announce that our team (me, Stepan Konev, Kirill Brodt) was awardedπ
3rd place within the Waymo Motion Prediction Challenge 2021.
To plan a safe and efficient route, an autonomous vehicle should anticipate future motions of other agents around it. Motion prediction is an extremely challenging task that recently gained significant attention from the research community. We present a simple and yet very strong baseline for multimodal motion prediction based purely on Convolutional Neural Networks.
The task is the following: Given agents' tracks for the past 1 second on a corresponding map, we had to predict the positions of the agents on the road for 8 seconds into the future.
Our model takes a raster image centered around a target agent as input and directly predicts a set of possible trajectories along with their confidences. The raster image is obtained by rasterisation of a scene and the history of all the agents. While being easy-to-implement, the proposed approach achieves competitive performance compared to the state-of-the-art methods on the Waymo Open Dataset Motion Prediction Challenge (2021): Our model ranks 1st using minimum average displacement error and 3rd using mAP score.
We wrote a small paper and release our code!
πTechnical report
βCode
To plan a safe and efficient route, an autonomous vehicle should anticipate future motions of other agents around it. Motion prediction is an extremely challenging task that recently gained significant attention from the research community. We present a simple and yet very strong baseline for multimodal motion prediction based purely on Convolutional Neural Networks.
The task is the following: Given agents' tracks for the past 1 second on a corresponding map, we had to predict the positions of the agents on the road for 8 seconds into the future.
Our model takes a raster image centered around a target agent as input and directly predicts a set of possible trajectories along with their confidences. The raster image is obtained by rasterisation of a scene and the history of all the agents. While being easy-to-implement, the proposed approach achieves competitive performance compared to the state-of-the-art methods on the Waymo Open Dataset Motion Prediction Challenge (2021): Our model ranks 1st using minimum average displacement error and 3rd using mAP score.
We wrote a small paper and release our code!
πTechnical report
βCode
June 27, 2021
June 27, 2021
Forwarded from Denis Sexy IT π¬π§
Recently I have found an Instagram of artist from Tomsk, Evgeny Schwenk β he redraws characters from Soviet cartoons as if they were real people. I have applied neural.love neural network which made his drawings even more realistic. Just a bit of Photoshop (mainly for hats) and here we go.
I guess Karlsson-on-the-Roof is my best result.
I guess Karlsson-on-the-Roof is my best result.
June 28, 2021
Aloha guys!
I'm verty excited to announce that I have joined Facebook Reality Labs (FRL) as a Research Scientist! Before that, I interned twice at Facebook AI Research, and now I will work in the FRL division, which focuses on virtual and augmented reality. Moving from academy to industry, I hope that I will still have enough freedom in choosing research directions π.
I'm verty excited to announce that I have joined Facebook Reality Labs (FRL) as a Research Scientist! Before that, I interned twice at Facebook AI Research, and now I will work in the FRL division, which focuses on virtual and augmented reality. Moving from academy to industry, I hope that I will still have enough freedom in choosing research directions π.
Tech at Meta
Reality Labs | Tech at Meta
July 2, 2021
Experimented with generating images from text prompts with VQGAN and CLIP. Some cool results:
1."Minecraft Starcraft"
2. "Polygonal fast food"
3. "Holy war against capitalism"
4. "Modern cubist painting"
π€πΌ Colab notebook
1."Minecraft Starcraft"
2. "Polygonal fast food"
3. "Holy war against capitalism"
4. "Modern cubist painting"
π€πΌ Colab notebook
July 7, 2021
Media is too big
VIEW IN TELEGRAM
Here's a very recent article from Googe Brain that uses diffusion models for super-resolution.
The results are shocking! Their model beats the GAN-based SOTA method. The video shows an example of how a 64x64 picture is upscaled to 1024x1024. But no source code yet.
π Project page
π Paper
I also wrote about OpenAI paper on diffusion models earlier.
The results are shocking! Their model beats the GAN-based SOTA method. The video shows an example of how a 64x64 picture is upscaled to 1024x1024. But no source code yet.
π Project page
π Paper
I also wrote about OpenAI paper on diffusion models earlier.
July 13, 2021
OpenAI disbands its robotics research team. This is exactly the same team that, for example, taught a robotic arm to solve a Rubik's cube using Reinforcement Learning. This decision was made because the company considers more promising research in areas where physical equipment is not required (except for servers, of course), and there is already a lot of data available. And also for economic reasons, since Software as a Services is a business with a much higher margin. Yes, the joke is that the non-profit organization OpenAI is considered more and more about profit. This is understandable because it takes a lot of money to create general artificial intelligence (AGI) that can learn all the tasks that a person can do and even more.
It's no secret that research in the field of robotics is also a very costly activity that requires a lot of investment. Therefore, there are not so many companies involved in this. Among the large and successful, only Boston Dynamics comes to mind, which has already changed several owners. Did you know that in 2013 Google acquired Boston Dynamics, then Google also scaled down its robotics research program, and in 2017 sold Boston Dynamic to the Japanese firm SoftBank. The adventures of Boston Dynamics did not end there, and in December 2020 SoftBank resold 80% of the shares (a controlling stake) to the automaker Hyundai. This looks somehow fishy as if every company understands after a few years that it is still difficult to make a profit from Boston Dynamics and sells it to another patsy.
In any case, it is very interesting to observe which focus areas are chosen by the titans of AI research. But I'm a bit sad that robots are still lagging behind.
It's no secret that research in the field of robotics is also a very costly activity that requires a lot of investment. Therefore, there are not so many companies involved in this. Among the large and successful, only Boston Dynamics comes to mind, which has already changed several owners. Did you know that in 2013 Google acquired Boston Dynamics, then Google also scaled down its robotics research program, and in 2017 sold Boston Dynamic to the Japanese firm SoftBank. The adventures of Boston Dynamics did not end there, and in December 2020 SoftBank resold 80% of the shares (a controlling stake) to the automaker Hyundai. This looks somehow fishy as if every company understands after a few years that it is still difficult to make a profit from Boston Dynamics and sells it to another patsy.
In any case, it is very interesting to observe which focus areas are chosen by the titans of AI research. But I'm a bit sad that robots are still lagging behind.
VentureBeat
OpenAI disbands its robotics research team
OpenAI has disbanded its robotics team in what might be a reflection of economic and commercial realities.
July 17, 2021
Researchers from NVIDIA (in particular Tero Karras) have once again "solved" image generation.
This time, the scientists were able to remove aliasing in the generator. In a nutshell, then the reason for the artifacts was careless signal processing in the CNN resulting in incorrect discretization. The signal could not be accurately reconstructed, which led to unnatural "jerks" noticeable in the video. The authors have modified the generator to prevent these negative sampling effects.
The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales.
The code is not available yet, but I'm sure NVIDIA will release it soon.
Read more about Alias-Free GAN here.
This time, the scientists were able to remove aliasing in the generator. In a nutshell, then the reason for the artifacts was careless signal processing in the CNN resulting in incorrect discretization. The signal could not be accurately reconstructed, which led to unnatural "jerks" noticeable in the video. The authors have modified the generator to prevent these negative sampling effects.
The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales.
The code is not available yet, but I'm sure NVIDIA will release it soon.
Read more about Alias-Free GAN here.
July 25, 2021
German startup aims to become "Europe's OpenAI"
The German startup Aleph Alpha, which is based in Heidelberg, Germany (the city where I did my PhD), recently raised $ 27M in a Series A round. The task, they set themselves ambitious (even too much) - they want to create another breakthrough in AI, something similar to OpenAI GPT-3.
The company was founded in 2019, and, strangely, I discovered it only today. I looked at their ML team. And I have not found a single person with any major scientific achievements (say on the level of Professor). I got disappointed. Their ML team includes 3 recent PhD students and Connor Leahy, who is known for co-founding EleutherAI. EleutherAI is a non-profit organization that was created to reproduce and open-source GPT-3 model. Perhaps they bet on Connor, but, frankly speaking, Connor is not a researcher, he has no scientific publications, and EleutherAI is simply reproducing results of OpenAI. When OpenAI was founded, it was immediately clear that they got a stellar team, which would certainly produce something cool.
My impressions are controversial. Aleph Alpha has partnerships with German government agencies. They promote themselves in the style of "we are Europe's last chance claim a place in the field of AI", "we will be based purely in Europe and will be pushing European values and ethical standards." They also promise to be more open than OpenAI (lol) and commit to open-source. Although, perhaps, they will just create some kind of large platform with AI solutions and sit on the government funding. It will be a kind of AI consulting, they even have a job posted on their website for this purpose - AI Delivery & Consulting. The whole affair smacks of a government cover-up like in the case of Palantir (at least partially).
I'm not a startup expert, but it seems like Europe is very hungry for innovation. They want to keep up with the United States and China. Therefore, they give out the bucks at the first opportunity, especially if the company promises to work closely with the government. What do you think about this, gentlemen?
The German startup Aleph Alpha, which is based in Heidelberg, Germany (the city where I did my PhD), recently raised $ 27M in a Series A round. The task, they set themselves ambitious (even too much) - they want to create another breakthrough in AI, something similar to OpenAI GPT-3.
The company was founded in 2019, and, strangely, I discovered it only today. I looked at their ML team. And I have not found a single person with any major scientific achievements (say on the level of Professor). I got disappointed. Their ML team includes 3 recent PhD students and Connor Leahy, who is known for co-founding EleutherAI. EleutherAI is a non-profit organization that was created to reproduce and open-source GPT-3 model. Perhaps they bet on Connor, but, frankly speaking, Connor is not a researcher, he has no scientific publications, and EleutherAI is simply reproducing results of OpenAI. When OpenAI was founded, it was immediately clear that they got a stellar team, which would certainly produce something cool.
My impressions are controversial. Aleph Alpha has partnerships with German government agencies. They promote themselves in the style of "we are Europe's last chance claim a place in the field of AI", "we will be based purely in Europe and will be pushing European values and ethical standards." They also promise to be more open than OpenAI (lol) and commit to open-source. Although, perhaps, they will just create some kind of large platform with AI solutions and sit on the government funding. It will be a kind of AI consulting, they even have a job posted on their website for this purpose - AI Delivery & Consulting. The whole affair smacks of a government cover-up like in the case of Palantir (at least partially).
I'm not a startup expert, but it seems like Europe is very hungry for innovation. They want to keep up with the United States and China. Therefore, they give out the bucks at the first opportunity, especially if the company promises to work closely with the government. What do you think about this, gentlemen?
TechCrunch
German startup Aleph Alpha raises $27M Series A round to build βEuropeβs OpenAIβ
With Microsoft now being an investor in OpenAI the field is more open for new insurgents into the open-source AI arena. Now a German company hopes to take on the next AI mantle and produce something akin to the success of the GPT-3 AI model.
August 7, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
Earlier I wrote about a style transfer that, instead of optimizing pixels, optimizes directly parameterized brush strokes. A new work has been released, it uses Transformer architecture to predict stroke parameters using one (well, almost one) forward-pass of the network. In fact, their transformer is made in the spirit of a recurrent network.
Four forward passes of the original image in different resolutions are made through the network, starting from downsapled 16 times to the original one. At each next forward puss, a rendered canvas with strokes predicted in the previous passes is also added to the input as well as the original image in a higher resolution. Thus, the network gradually adds new strokes to the canvas, starting with larger ones (they are painted on a low resolution canvas), and ending with smaller ones (on a high resolution canvas). The network is trained on synthetic data generated online.
π Paper
π Code
Earlier I wrote about a style transfer that, instead of optimizing pixels, optimizes directly parameterized brush strokes. A new work has been released, it uses Transformer architecture to predict stroke parameters using one (well, almost one) forward-pass of the network. In fact, their transformer is made in the spirit of a recurrent network.
Four forward passes of the original image in different resolutions are made through the network, starting from downsapled 16 times to the original one. At each next forward puss, a rendered canvas with strokes predicted in the previous passes is also added to the input as well as the original image in a higher resolution. Thus, the network gradually adds new strokes to the canvas, starting with larger ones (they are painted on a low resolution canvas), and ending with smaller ones (on a high resolution canvas). The network is trained on synthetic data generated online.
π Paper
π Code
August 10, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
One more example of a synthesised image by the paper mentioned above.
August 10, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes
[Another recent "style transfer with brushstrokes" paper from my colleagues in Heidelberg University ]
In this paper, images are stylized by optimizing parameterized brush strokes instead of pixels as well. In order to backpropagate teh error through the rendered brushstrokes, the authors came up with a simple differentiable rendering of strokes, each of which is parameterized with a Bezier curve.
The results are excellent. You can also add constraints to the shape of your brush strokes by drawing a couple of lines over the photo. The only drawback is that it works for a rather long time (10-20 minutes per 1MP picture), since this is an iterative optimization, and at each iteration a forward-pass through VGG-16 newtwork is required.
π Project website
π Source code
[Another recent "style transfer with brushstrokes" paper from my colleagues in Heidelberg University ]
In this paper, images are stylized by optimizing parameterized brush strokes instead of pixels as well. In order to backpropagate teh error through the rendered brushstrokes, the authors came up with a simple differentiable rendering of strokes, each of which is parameterized with a Bezier curve.
The results are excellent. You can also add constraints to the shape of your brush strokes by drawing a couple of lines over the photo. The only drawback is that it works for a rather long time (10-20 minutes per 1MP picture), since this is an iterative optimization, and at each iteration a forward-pass through VGG-16 newtwork is required.
π Project website
π Source code
August 11, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
Here's another example of how the algorithm mentioned above works. The horse is stylized in Kirchner's style.
The differences with the classic pixel-by-pixel method by Gatys et al. are very explicit. The new method, of course, significantly perturbs the content in contrast to Gatys, but the style really looks more similar to the German expressionist Kirchner and we see prominent brush strokes.
The differences with the classic pixel-by-pixel method by Gatys et al. are very explicit. The new method, of course, significantly perturbs the content in contrast to Gatys, but the style really looks more similar to the German expressionist Kirchner and we see prominent brush strokes.
August 11, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
π₯StyleGAN3 by NVIDIA!
Do you remember the awesome smooth results by Alias-Free GAN I wrote about earlier? The authors have finally posted the code and now you can build your amazing projects on it.
I don't know about you, but my hands are already itching to try it out.
π Source code
π Project page
βοΈ Colab
Do you remember the awesome smooth results by Alias-Free GAN I wrote about earlier? The authors have finally posted the code and now you can build your amazing projects on it.
I don't know about you, but my hands are already itching to try it out.
π Source code
π Project page
βοΈ Colab
October 11, 2021
ββOn Neural Rendering
What is Neural Rendering? In a nutshell, neural rendering is when we take classic algorithms for image rendering from computer graphics and replace a part of the pipeline with neural networks (stupid, but effective). Neural rendering learns to render and represent a scene from one or more input photos by simulating the physical process of a camera that captures the scene. A key property of 3D neural rendering is the disentanglement of the camera capturing process (i.e., the projection and image formation) and the representation of a 3D scene during training. That is, we learn an explicit (voxels, point clouds, parametric surfaces) or an implicit (signed distance function) representation of a 3D scene. For training, we use observations of the scene from several camera viewpoints. The network is trained on these observations by rendering the estimated 3D scene from the training viewpoints, and minimizing the difference between the rendered and observed images. This learned scene representation can be rendered from any virtual camera in order to synthesize novel views. It is important for learning that the entire rendering pipeline is differentiable.
You may have noticed, that the topic of neural rendering, including all sorts of nerfs-schmerfs, is now a big hype in computer vision. You might say that neural rendering is very slow, and you'd be right. A typical training session on a small scene with ~ 50 input photos takes about 5.5 hours for the fastest method on a single GPU, but neural rendering methods have made significant progress in the last year improving both fidelity and efficiency. To catch up on all the recent developments in this direction, I highly recommend reading this SOTA report "Advances in Neural Rendering".
The gif is from Volume Rendering of Neural Implicit Surfaces paper.
What is Neural Rendering? In a nutshell, neural rendering is when we take classic algorithms for image rendering from computer graphics and replace a part of the pipeline with neural networks (stupid, but effective). Neural rendering learns to render and represent a scene from one or more input photos by simulating the physical process of a camera that captures the scene. A key property of 3D neural rendering is the disentanglement of the camera capturing process (i.e., the projection and image formation) and the representation of a 3D scene during training. That is, we learn an explicit (voxels, point clouds, parametric surfaces) or an implicit (signed distance function) representation of a 3D scene. For training, we use observations of the scene from several camera viewpoints. The network is trained on these observations by rendering the estimated 3D scene from the training viewpoints, and minimizing the difference between the rendered and observed images. This learned scene representation can be rendered from any virtual camera in order to synthesize novel views. It is important for learning that the entire rendering pipeline is differentiable.
You may have noticed, that the topic of neural rendering, including all sorts of nerfs-schmerfs, is now a big hype in computer vision. You might say that neural rendering is very slow, and you'd be right. A typical training session on a small scene with ~ 50 input photos takes about 5.5 hours for the fastest method on a single GPU, but neural rendering methods have made significant progress in the last year improving both fidelity and efficiency. To catch up on all the recent developments in this direction, I highly recommend reading this SOTA report "Advances in Neural Rendering".
The gif is from Volume Rendering of Neural Implicit Surfaces paper.
November 12, 2021
I'm back people! After some pause I decided to continue posting in this channel. I promise to select the most interesting papers and write at least 1-2 posts per week.
Cheers,
Artsiom
Cheers,
Artsiom
May 1, 2022
π¨π³ Chinese researchers brought deepfakes to the next level by changing the entire head
We have all seen deepfakes where faces are swapped. This paper went further, they substitute the driving head entirely. Miracles of Chinese engineering skills and a lot of losses do the job π€.
Compared to the usual "face swap", the new method exhibits better transfer of the personality from the target photo to the driving video, preserving the hair, eyebrows, and other important attributes. A slight improvement of the temporal stability is needed though - the edges of the head are a little twitchy. There is no code yet, but the authors promised to upload it soon.
β±β± Few-Shot Head Swapping in the Wild CVPR 2022, Oral
We have all seen deepfakes where faces are swapped. This paper went further, they substitute the driving head entirely. Miracles of Chinese engineering skills and a lot of losses do the job π€.
Compared to the usual "face swap", the new method exhibits better transfer of the personality from the target photo to the driving video, preserving the hair, eyebrows, and other important attributes. A slight improvement of the temporal stability is needed though - the edges of the head are a little twitchy. There is no code yet, but the authors promised to upload it soon.
β±β± Few-Shot Head Swapping in the Wild CVPR 2022, Oral
May 1, 2022