AI & ML Papers

🔥 DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

💡 The paper introduces DenoiseRL, a reinforcement learning framework that aims to improve reasoning in large language models by learning from incorrect reasoning traces. The problem with existing methods is that they rely heavily on stronger teacher models or carefully curated datasets, which limits their scalability and capability to improve. DenoiseRL addresses this issue by substituting external supervision with recovery-oriented optimization over failures from weak models. This approach allows the model to learn directly from incorrect reasoning traces, converting them into opportunities for improvement and making training more scalable and less dependent on external resources.

The method used in DenoiseRL involves failure-oriented optimization, where the model learns from its own mistakes and recovers from noisy prefixes. This approach yields a richer and more diverse learning signal, improving exploration efficiency from imperfect model behavior. As a result, DenoiseRL improves reasoning performance and overall training efficiency while reducing the need for expensive data curation or stronger teacher models.

The results of the paper show that DenoiseRL consistently outperforms strong on-policy RL baselines across competitive mathematical and general reasoning benchmarks. The framework also promotes stronger self-corrective behavior as training difficulty increases, highlighting an effective and scalable alternative pathway for improving reasoning in large language models. Overall, the paper contributes to the development of more efficient and scalable methods for improving reasoning in large language models, and demonstrates the potential of DenoiseRL as a framework for advancing reasoning capabilities in AI systems.

📅 Published on May 27

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.28421
• PDF: https://arxiv.org/pdf/2605.28421

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DenoiseRL #ReinforcementLearningForNLP #NoisyPrefixRecovery #ReasoningModelOptimization #LargeLanguageModelImprovement

GitHub

Hugging Face

The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.

❤1

641 views19:52

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

About

Blog

Apps

Platform