AI & ML Papers

🔥 MARBLE: Multi-Aspect Reward Balance for Diffusion RL

💡 The paper introduces MARBLE, a novel gradient-space optimization framework for multi-reward reinforcement learning fine-tuning of diffusion models. The problem addressed is that existing methods for handling multiple rewards either train separate models for each reward or use a weighted-sum reward aggregation, which can lead to poor performance due to sample-level mismatch. This mismatch occurs because most rollouts are highly informative for certain reward dimensions but irrelevant for others, causing the weighted summation to dilute their supervision.

To address this issue, MARBLE maintains independent advantage estimators for each reward and computes per-reward policy gradients. These gradients are then harmonized into a single update direction without manual reward weighting, by solving a quadratic programming problem. This approach allows for a unified model that can be jointly trained on all rewards, eliminating the need for heavy manual tuning and sequential training.

The authors also propose an amortized formulation that reduces the computational cost of MARBLE, making it more efficient. Additionally, they use exponential moving average smoothing on the balancing coefficients to stabilize updates against transient fluctuations.

The results show that MARBLE improves all five reward dimensions simultaneously on the SD3.5 Medium dataset, outperforming the baseline method. Specifically, MARBLE turns the worst-aligned reward's gradient cosine from negative to consistently positive, indicating better alignment with human preferences. Furthermore, MARBLE runs at nearly the same training speed as the baseline method, with only a 3% slowdown. Overall, MARBLE provides a more effective and efficient approach to multi-reward reinforcement learning fine-tuning of diffusion models.

📅 Published on May 7

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.06507
• PDF: https://arxiv.org/pdf/2605.06507
• Project Page: https://aim-uofa.github.io/MARBLE/
• GitHub: https://github.com/aim-uofa/MARBLE ⭐ 24

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MultiRewardReinforcementLearning #DiffusionModels #GradientSpaceOptimization #MultiAspectRewardBalance #ReinforcementLearningFineTuning

arXiv.org

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and...

❤1

381 views13:35

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

About

Blog

Apps

Platform