AI & ML Papers
32.9K subscribers
7.11K photos
531 videos
24 files
7.77K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
AI & ML Papers
Photo
🔥 MARBLE: Multi-Aspect Reward Balance for Diffusion RL

💡 The paper introduces MARBLE, a novel gradient-space optimization framework for multi-reward reinforcement learning fine-tuning of diffusion models. The problem addressed is that existing methods for handling multiple rewards either train separate models for each reward or use a weighted-sum reward aggregation, which can lead to poor performance due to sample-level mismatch. This mismatch occurs because most rollouts are highly informative for certain reward dimensions but irrelevant for others, causing the weighted summation to dilute their supervision.

To address this issue, MARBLE maintains independent advantage estimators for each reward and computes per-reward policy gradients. These gradients are then harmonized into a single update direction without manual reward weighting, by solving a quadratic programming problem. This approach allows for a unified model that can be jointly trained on all rewards, eliminating the need for heavy manual tuning and sequential training.

The authors also propose an amortized formulation that reduces the computational cost of MARBLE, making it more efficient. Additionally, they use exponential moving average smoothing on the balancing coefficients to stabilize updates against transient fluctuations.

The results show that MARBLE improves all five reward dimensions simultaneously on the SD3.5 Medium dataset, outperforming the baseline method. Specifically, MARBLE turns the worst-aligned reward's gradient cosine from negative to consistently positive, indicating better alignment with human preferences. Furthermore, MARBLE runs at nearly the same training speed as the baseline method, with only a 3% slowdown. Overall, MARBLE provides a more effective and efficient approach to multi-reward reinforcement learning fine-tuning of diffusion models.


📅 Published on May 7

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.06507
• PDF: https://arxiv.org/pdf/2605.06507
• Project Page: https://aim-uofa.github.io/MARBLE/
• GitHub: https://github.com/aim-uofa/MARBLE 24

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MultiRewardReinforcementLearning #DiffusionModels #GradientSpaceOptimization #MultiAspectRewardBalance #ReinforcementLearningFineTuning
1