AI & ML Papers
Photo
🔥 AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
📅 Published on May 30, 2025
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2505.24298
• PDF: https://arxiv.org/pdf/2505.24298
🤖 Models citing this paper:
• https://huggingface.co/inclusionAI/AReaL-boba-2-8B
• https://huggingface.co/inclusionAI/AReaL-boba-2-14B
• https://huggingface.co/inclusionAI/AReaL-boba-2-8B-Open
📊 Datasets citing this paper:
• https://huggingface.co/datasets/inclusionAI/AReaL-tau2-data
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/rzvn/Medieval-Village-AI
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AsynchronousReinforcementLearning #LanguageReasoningTasks #LargeScaleLanguageModels #ReinforcementLearningSystems #DeepLearningForNaturalLanguageProcessing
💡 The paper presents AReaL, a large-scale asynchronous reinforcement learning system designed for training large language models on reasoning tasks. The problem with existing synchronous reinforcement learning systems is that they alternate between generation and training in a batch setting, which leads to severe system-level inefficiency and underutilization of GPUs. This is because generation must wait until the longest output in the batch is completed before the model can be updated.
To address this issue, AReaL decouples generation from training, allowing rollout workers to continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. This asynchronous approach leads to substantially higher GPU utilization. To stabilize reinforcement learning training, AReaL balances the workload of rollout and training workers to control data staleness and adopts a staleness-enhanced PPO variant to better handle outdated training samples.
The results show that AReaL achieves up to 2.57 times training speedup compared to the best synchronous systems with the same number of GPUs, while matching or even improving final performance. The system was tested on math and code reasoning benchmarks, demonstrating the effectiveness of the asynchronous approach. The code for AReaL is made available, allowing others to build upon and utilize the system. Overall, AReaL provides a more efficient and scalable solution for training large language models on reasoning tasks using reinforcement learning.
📅 Published on May 30, 2025
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2505.24298
• PDF: https://arxiv.org/pdf/2505.24298
🤖 Models citing this paper:
• https://huggingface.co/inclusionAI/AReaL-boba-2-8B
• https://huggingface.co/inclusionAI/AReaL-boba-2-14B
• https://huggingface.co/inclusionAI/AReaL-boba-2-8B-Open
📊 Datasets citing this paper:
• https://huggingface.co/datasets/inclusionAI/AReaL-tau2-data
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/rzvn/Medieval-Village-AI
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AsynchronousReinforcementLearning #LanguageReasoningTasks #LargeScaleLanguageModels #ReinforcementLearningSystems #DeepLearningForNaturalLanguageProcessing
GitHub
Hugging Face
The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.