AI & ML Papers
Photo
🔥 GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19577
• PDF: https://arxiv.org/pdf/2605.19577
• Project Page: https://huggingface.co/collections/Kwai-Klear/golongrl
🤖 Models citing this paper:
• https://huggingface.co/Kwai-Klear/GoLongRL-4B
• https://huggingface.co/Kwai-Klear/GoLongRL-30B-A3B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Klear/GoLongRL
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ReinforcementLearning #LongContextLearning #MultitaskAlignment #CapabilityOrientedLearning #DeepLearning
💡 The paper introduces GoLongRL, a new approach to long context reinforcement learning that focuses on capability oriented data construction and multitask alignment. The existing methods for long context reinforcement learning often result in homogeneous task coverage and reward formulations that do not accurately reflect real world requirements. To address this issue, the authors propose two main contributions.
First, they introduce a capability oriented data construction method that involves creating a dataset of 23,000 reinforcement learning samples with verifiable rewards, spanning 9 task types, each with its own evaluation metric. The dataset is openly released along with the construction pipeline and training code. The results show that this dataset outperforms a closed source dataset called QwenLong-L1.5 under the same training setup.
Second, the authors propose a new method called TMN-Reweight for heterogeneous multitask optimization. This method combines task level mean normalization for cross task reward scale alignment with difficulty adaptive weighting for more reliable advantage estimation. The results show that TMN-Reweight improves average performance over the vanilla GRPO method, while preserving or improving general capabilities across evaluations.
The authors also train a model called Qwen3-30B-A3B on the new dataset and achieve long context performance comparable to other state of the art models, such as DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507. This suggests that the new dataset and TMN-Reweight method can substantially improve long context capability. Overall, the paper presents a new approach to long context reinforcement learning that focuses on capability oriented data construction and multitask alignment, and achieves state of the art results.
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19577
• PDF: https://arxiv.org/pdf/2605.19577
• Project Page: https://huggingface.co/collections/Kwai-Klear/golongrl
🤖 Models citing this paper:
• https://huggingface.co/Kwai-Klear/GoLongRL-4B
• https://huggingface.co/Kwai-Klear/GoLongRL-30B-A3B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Klear/GoLongRL
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ReinforcementLearning #LongContextLearning #MultitaskAlignment #CapabilityOrientedLearning #DeepLearning
GitHub
Hugging Face
The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.