AI & ML Papers
32.9K subscribers
7.1K photos
529 videos
24 files
7.76K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
AI & ML Papers
Photo
🔥 DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

💡 The paper introduces DenoiseRL, a reinforcement learning framework that aims to improve reasoning in large language models by learning from incorrect reasoning traces. The problem with existing methods is that they rely heavily on stronger teacher models or carefully curated datasets, which limits their scalability and capability to improve. DenoiseRL addresses this issue by substituting external supervision with recovery-oriented optimization over failures from weak models. This approach allows the model to learn directly from incorrect reasoning traces, converting them into opportunities for improvement and making training more scalable and less dependent on external resources.

The method used in DenoiseRL involves failure-oriented optimization, where the model learns from its own mistakes and recovers from noisy prefixes. This approach yields a richer and more diverse learning signal, improving exploration efficiency from imperfect model behavior. As a result, DenoiseRL improves reasoning performance and overall training efficiency while reducing the need for expensive data curation or stronger teacher models.

The results of the paper show that DenoiseRL consistently outperforms strong on-policy RL baselines across competitive mathematical and general reasoning benchmarks. The framework also promotes stronger self-corrective behavior as training difficulty increases, highlighting an effective and scalable alternative pathway for improving reasoning in large language models. Overall, the paper contributes to the development of more efficient and scalable methods for improving reasoning in large language models, and demonstrates the potential of DenoiseRL as a framework for advancing reasoning capabilities in AI systems.


📅 Published on May 27

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.28421
• PDF: https://arxiv.org/pdf/2605.28421

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DenoiseRL #ReinforcementLearningForNLP #NoisyPrefixRecovery #ReasoningModelOptimization #LargeLanguageModelImprovement
1