AI & ML Papers
Photo
🔥 What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
📅 Published on May 8
🔗 Links:
• arXiv: https://arxiv.org/abs/2605.07915
• PDF: https://arxiv.org/pdf/2605.07915
• Project Page: https://zhengrongyue.github.io/pae.github.io/
• GitHub: https://github.com/ZhengrongYue/PAE ⭐ 29
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#LatentDiffusionModels #GenerativeModeling #AutoencoderArchitecture #LatentManifoldLearning #DiffusionBasedGenerativeModels
💡 This paper investigates the properties of a latent manifold that are favorable for diffusion models, which are a type of generative model. The authors argue that existing methods for defining the latent space, known as tokenizers, are primarily designed to improve reconstruction fidelity or inherit pre-trained representations, but do not necessarily produce a latent space that is well-suited for generative modeling. To address this issue, the authors study the properties of a diffusion-friendly latent manifold and identify three key properties: coherent spatial structure, local manifold continuity, and global manifold semantics. They find that these properties are more closely related to downstream generation quality than reconstruction fidelity.
To explicitly shape the latent manifold with these desirable properties, the authors propose a new method called the Prior-Aligned AutoEncoder, or PAE. The PAE uses refined priors derived from variational autoencoders and perturbation-based regularization to turn the desired properties of the latent manifold into explicit training objectives. This approach allows the PAE to directly optimize the latent space structure for improved generative modeling.
The authors evaluate the PAE on the ImageNet 256x256 dataset and find that it improves both training efficiency and generation quality compared to existing tokenizers. Specifically, the PAE achieves comparable performance to the state-of-the-art method, RAE, but with up to 13 times faster convergence under the same training setup. Additionally, the PAE achieves a new state-of-the-art result, with a generative fidelity score of 1.03. These results highlight the importance of organizing the latent manifold for latent diffusion models and demonstrate the effectiveness of the PAE in producing high-quality generative models.
📅 Published on May 8
🔗 Links:
• arXiv: https://arxiv.org/abs/2605.07915
• PDF: https://arxiv.org/pdf/2605.07915
• Project Page: https://zhengrongyue.github.io/pae.github.io/
• GitHub: https://github.com/ZhengrongYue/PAE ⭐ 29
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#LatentDiffusionModels #GenerativeModeling #AutoencoderArchitecture #LatentManifoldLearning #DiffusionBasedGenerativeModels
arXiv.org
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned...
Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve...
❤1
AI & ML Papers
Photo
🔥 DiffusionBench: On Holistic Evaluation of Diffusion Transformers
📅 Published on Jun 23
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.24888
• PDF: https://arxiv.org/pdf/2606.24888
• Project Page: https://end2end-diffusion.github.io/diffusion-bench/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#DiffusionTransformers #ImageGenerationTasks #TextToImageGeneration #GenerativeModeling #DiffusionBasedArchitectures
💡 The paper introduces a unified framework called NanoGen for training and evaluating diffusion transformers, which are used in image generation tasks. The current evaluation setup for diffusion transformers is limited to class-conditional generation on ImageNet, which may not reflect real progress in generative modeling. The authors argue that text-to-image generation is a more comprehensive task, but it is often skipped due to perceived high costs and inconvenience. However, the authors show that with NanoGen, training and evaluating text-to-image models requires comparable compute to ImageNet.
The NanoGen framework supports various diffusion methods and can be easily configured to train models on both ImageNet and text-to-image tasks. The authors trained 21 latent diffusion models using NanoGen and found that the ranking of methods on ImageNet and text-to-image tasks shows no strong correlation. This suggests that a method that improves performance on ImageNet may not necessarily improve performance on text-to-image generation.
To address this issue, the authors propose a holistic benchmark called DiffusionBench, which summarizes results on both ImageNet and text-to-image tasks. The authors recommend reporting DiffusionBench in place of ImageNet alone, as methods that improve DiffusionBench are more likely to reflect broader progress in generative modeling. The main contribution of the paper is the introduction of NanoGen and DiffusionBench, which provide a more comprehensive evaluation setup for diffusion transformers and can help to advance research in generative modeling.
📅 Published on Jun 23
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.24888
• PDF: https://arxiv.org/pdf/2606.24888
• Project Page: https://end2end-diffusion.github.io/diffusion-bench/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#DiffusionTransformers #ImageGenerationTasks #TextToImageGeneration #GenerativeModeling #DiffusionBasedArchitectures
GitHub
Hugging Face
The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.