AI & ML Papers

🔥 DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

💡 The paper introduces DenoiseRL, a reinforcement learning framework that aims to improve reasoning in large language models by learning from incorrect reasoning traces. The problem with existing methods is that they rely heavily on stronger teacher models or carefully curated datasets, which limits their scalability and capability to improve. DenoiseRL addresses this issue by substituting external supervision with recovery-oriented optimization over failures from weak models. This approach allows the model to learn directly from incorrect reasoning traces, converting them into opportunities for improvement and making training more scalable and less dependent on external resources.

The method used in DenoiseRL involves failure-oriented optimization, where the model learns from its own mistakes and recovers from noisy prefixes. This approach yields a richer and more diverse learning signal, improving exploration efficiency from imperfect model behavior. As a result, DenoiseRL improves reasoning performance and overall training efficiency while reducing the need for expensive data curation or stronger teacher models.

The results of the paper show that DenoiseRL consistently outperforms strong on-policy RL baselines across competitive mathematical and general reasoning benchmarks. The framework also promotes stronger self-corrective behavior as training difficulty increases, highlighting an effective and scalable alternative pathway for improving reasoning in large language models. Overall, the paper contributes to the development of more efficient and scalable methods for improving reasoning in large language models, and demonstrates the potential of DenoiseRL as a framework for advancing reasoning capabilities in AI systems.

📅 Published on May 27

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.28421
• PDF: https://arxiv.org/pdf/2605.28421

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DenoiseRL #ReinforcementLearningForNLP #NoisyPrefixRecovery #ReasoningModelOptimization #LargeLanguageModelImprovement

GitHub

Hugging Face

The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.

❤1

641 views19:52

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

AI & ML Papers

Photo

🔥 VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

💡 The paper introduces VibeThinker-3B, a compact language model with 3 billion parameters, that achieves state-of-the-art performance on verifiable reasoning tasks, challenging the conventional assumption that large models are necessary for such tasks. The model was developed using a specialized training pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. The model was evaluated on several highly demanding verifiable tasks and achieved impressive results, including a score of 94.3 on AIME26, 80.2 Pass@1 on LiveCodeBench v6, and a 96.1 percent acceptance rate on recent unseen LeetCode contests. These results place VibeThinker-3B in the performance band of first-tier reasoning systems, matching or exceeding the performance of much larger models. The paper also shows that the model's performance does not compromise its instruction controllability, with a score of 93.4 on IFEval. The results of this study support the Parametric Compression-Coverage Hypothesis, which suggests that verifiable reasoning can be compressed into compact reasoning cores, while open-domain knowledge and general-purpose competence require larger models with broader parameter coverage. Overall, the paper demonstrates that compact models can be a complementary path to achieving frontier-level performance on verifiable reasoning tasks, and that they are not just efficient substitutes for larger models.

📅 Published on Jun 15

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.16140
• PDF: https://arxiv.org/pdf/2606.16140
• Project Page: https://github.com/WeiboAI/VibeThinker

🤖 Models citing this paper:
• https://huggingface.co/WeiboAI/VibeThinker-3B
• https://huggingface.co/KakTakOne/VibeThinker-3B-GGUF
• https://huggingface.co/ffkbblu/pepekberbulu

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/Mike0021/vibethinker-3b-zerogpu
• https://huggingface.co/spaces/ffkbblu/trst

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#VerifiableReasoning #SmallLanguageModels #CompactModelArchitecture #ReinforcementLearningForNLP #EfficientLanguageModeling

GitHub

Hugging Face

The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.

536 views18:21

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

About

Blog

Apps

Platform