AI & ML Papers
Photo
🔥 Beyond Mode Collapse: Distribution Matching for Diverse Reasoning
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19461
• PDF: https://arxiv.org/pdf/2605.19461
📊 Datasets citing this paper:
• https://huggingface.co/datasets/OliverLee/NP_MM
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ModeCollapseMitigation #DistributionMatching #OnPolicyReinforcementLearning #DiverseReasoningTasks #CombinatorialOptimizationTechniques
💡 The paper addresses the problem of mode collapse in on-policy reinforcement learning, where methods like GRPO concentrate probability mass on a single solution and cease exploring alternative strategies. This is due to the reverse KL minimization method used, which reinforces the first high-reward trajectory found rather than maintaining a distribution over multiple diverse solutions. To solve this problem, the authors propose DMPO, a distribution-matching policy optimization method that uses forward KL minimization to maintain solution diversity and improve performance in combinatorial optimization and reasoning tasks. DMPO constructs a target distribution over sampled trajectories proportional to their rewards and aligns the policy distribution to this target, providing mode-covering behavior without requiring sampling from the intractable global target distribution. The authors validate DMPO on NP-hard combinatorial optimization tasks and achieve significant improvements over GRPO, with a 43.9 percent quality ratio on text-based tasks and 43.1 percent on vision-based tasks. These gains generalize to mathematical reasoning and out-of-domain tasks, demonstrating that diversity-preserving training enhances general reasoning capabilities across modalities. The results show that DMPO achieves consistent quality improvements and sustained exploration across diverse reasoning tasks, establishing distribution matching as a practical approach to preventing mode collapse in on-policy reinforcement learning.
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19461
• PDF: https://arxiv.org/pdf/2605.19461
📊 Datasets citing this paper:
• https://huggingface.co/datasets/OliverLee/NP_MM
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ModeCollapseMitigation #DistributionMatching #OnPolicyReinforcementLearning #DiverseReasoningTasks #CombinatorialOptimizationTechniques
GitHub
Hugging Face
The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.
AI & ML Papers
Photo
🔥 Representation Distribution Matching for One-Step Visual Generation
📅 Published on Jul 2
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.02375
• PDF: https://arxiv.org/pdf/2607.02375
• Project Page: https://alan-lanfeng.github.io/rdm/
🤖 Models citing this paper:
• https://huggingface.co/epfl-vita/flux2-klein-1step-rdm
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/epfl-vita/flux2-klein-1step-demo
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#VisualGeneration #RepresentationLearning #DistributionMatching #ImageSynthesis #DeepLearning
💡 The paper introduces Representation Distribution Matching, a method for one-step visual generation that matches feature distributions under pretrained encoders. The goal is to generate high-quality images by comparing the distributions of generated and reference features. The authors identify two key design axes: how the distributions are compared and the representations they are compared in. They conduct controlled studies and find three main results.
First, they show that the Maximum Mean Discrepancy, a classical method that was previously ineffective, becomes a strong and scalable objective when estimated correctly. Second, they find that the batch size of the generated images has a significant impact on performance, with an optimum batch size above 2048, which is much larger than typical batch sizes. Third, they demonstrate that using a single representation can be gamed, resulting in low scores despite visibly fake images, and instead propose using a balanced set of encoders and evaluating with a Sliced-Wasserstein distance over 14 encoders.
The authors combine these findings to develop an improved Representation Distribution Matching method, which they call iRDM. They evaluate iRDM on the ImageNet dataset and achieve state-of-the-art results, with a Sliced-Wasserstein distance of 1.30. Additionally, they use a human-preference proxy, called PickScore, which shows that iRDM is preferred over the previous best one-step generator on 71.2% of matched samples. They also apply the same method to post-train a four-step generator, called FLUX.2, and achieve better results than the original four-step version, with improved performance on GenEval and PickScore, and requiring only 90 GPU-hours. Overall, the paper presents a new method for one-step visual generation that achieves state-of-the-art results and can be used to improve existing generators.
📅 Published on Jul 2
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.02375
• PDF: https://arxiv.org/pdf/2607.02375
• Project Page: https://alan-lanfeng.github.io/rdm/
🤖 Models citing this paper:
• https://huggingface.co/epfl-vita/flux2-klein-1step-rdm
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/epfl-vita/flux2-klein-1step-demo
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#VisualGeneration #RepresentationLearning #DistributionMatching #ImageSynthesis #DeepLearning
GitHub
Hugging Face
The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.