AI & ML Papers
Photo
🔥 Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs
📅 Published on Aug 6, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2508.04660
• PDF: https://arxiv.org/pdf/2508.04660
• Project Page: https://dspy.ai
• GitHub: https://github.com/stanfordnlp/dspy ⭐ 34.2k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultiModuleLearning #LanguageModelOptimization #PolicyGradientMethods #ModularAISystems #PromptOptimizationTechniques
💡 The paper introduces mmGRPO, a multi-module extension of Group Relative Policy Optimization, to improve the accuracy of modular AI systems that combine multiple language model calls and prompts. The problem addressed is that existing methods, such as GRPO, are not effective for optimizing language models in modular systems where multiple tasks are performed. The authors propose mmGRPO, which groups language model calls by module and handles variable-length and interrupted trajectories. The method is composed with automatic prompt optimization to further improve accuracy. The results show that mmGRPO improves accuracy by 11% on average across various tasks, including classification, many-hop search, and privacy-preserving delegation, compared to post-trained language models. Additionally, mmGRPO outperforms prompt optimization alone by 5%. The authors have open-sourced mmGRPO as the dspyGRPO optimizer, making it available for use in modular AI systems. Overall, the paper contributes a new method for optimizing language models in modular systems, which can lead to improved performance in a range of tasks.
📅 Published on Aug 6, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2508.04660
• PDF: https://arxiv.org/pdf/2508.04660
• Project Page: https://dspy.ai
• GitHub: https://github.com/stanfordnlp/dspy ⭐ 34.2k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultiModuleLearning #LanguageModelOptimization #PolicyGradientMethods #ModularAISystems #PromptOptimizationTechniques
arXiv.org
Composing Policy Gradients and Prompt Optimization for Language...
Group Relative Policy Optimization (GRPO) has proven to be an effective tool for post-training language models (LMs). However, AI systems are increasingly expressed as modular programs that mix...