AI & ML Papers

🔥 What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

💡 This paper investigates the properties of a latent manifold that are favorable for diffusion models, which are a type of generative model. The authors argue that existing methods for defining the latent space, known as tokenizers, are primarily designed to improve reconstruction fidelity or inherit pre-trained representations, but do not necessarily produce a latent space that is well-suited for generative modeling. To address this issue, the authors study the properties of a diffusion-friendly latent manifold and identify three key properties: coherent spatial structure, local manifold continuity, and global manifold semantics. They find that these properties are more closely related to downstream generation quality than reconstruction fidelity.

To explicitly shape the latent manifold with these desirable properties, the authors propose a new method called the Prior-Aligned AutoEncoder, or PAE. The PAE uses refined priors derived from variational autoencoders and perturbation-based regularization to turn the desired properties of the latent manifold into explicit training objectives. This approach allows the PAE to directly optimize the latent space structure for improved generative modeling.

The authors evaluate the PAE on the ImageNet 256x256 dataset and find that it improves both training efficiency and generation quality compared to existing tokenizers. Specifically, the PAE achieves comparable performance to the state-of-the-art method, RAE, but with up to 13 times faster convergence under the same training setup. Additionally, the PAE achieves a new state-of-the-art result, with a generative fidelity score of 1.03. These results highlight the importance of organizing the latent manifold for latent diffusion models and demonstrate the effectiveness of the PAE in producing high-quality generative models.

📅 Published on May 8

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.07915
• PDF: https://arxiv.org/pdf/2605.07915
• Project Page: https://zhengrongyue.github.io/pae.github.io/
• GitHub: https://github.com/ZhengrongYue/PAE ⭐ 29

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LatentDiffusionModels #GenerativeModeling #AutoencoderArchitecture #LatentManifoldLearning #DiffusionBasedGenerativeModels

arXiv.org

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned...

Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve...

❤1

398 views21:49

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

AI & ML Papers

🔥 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

💡 The paper introduces AnyFlow, a novel framework for any-step video diffusion distillation that improves upon existing consistency distillation methods. The problem with consistency distillation is that its performance degrades as more sampling steps are used at test time, limiting its effectiveness for any-step video diffusion. This is because consistency distillation replaces the original probability-flow ODE trajectory with a consistency-sampling trajectory, which weakens the desirable test-time scaling behavior of ODE sampling.

To address this limitation, AnyFlow optimizes the full ODE sampling trajectory instead of distilling a model for only a few fixed sampling steps. The method involves shifting the distillation target from endpoint consistency mapping to flow-map transition learning over arbitrary time intervals. Additionally, the authors propose Flow Map Backward Simulation, which decomposes a full Euler rollout into shortcut flow-map transitions, enabling efficient on-policy distillation that reduces test-time errors.

The results of the paper show that AnyFlow achieves performance that matches or surpasses consistency-based counterparts in the few-step regime, while also scaling with sampling step budgets. The experiments were conducted across both bidirectional and causal architectures, at scales ranging from 1.3B to 14B parameters. Overall, the paper contributes a new framework for any-step video diffusion distillation that improves upon existing methods and achieves state-of-the-art results.

📅 Published on May 13

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.13724
• PDF: https://arxiv.org/pdf/2605.13724
• Project Page: https://nvlabs.github.io/AnyFlow/
• GitHub: https://github.com/NVlabs/AnyFlow ⭐ 197

🤖 Models citing this paper:
• https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers
• https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers
• https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#VideoDiffusionModels #OnPolicyLearning #FlowMapDistillation #AnyStepSampling #DiffusionBasedGenerativeModels

arXiv.org

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map...

Few-step video generation has been significantly advanced by consistency distillation. However, the performance of consistency-distilled models often degrades as more sampling steps are allocated...

496 views19:52

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

About

Blog

Apps

Platform