✨SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
📝 Summary:
SWE-rebench V2 presents a new language-agnostic automated pipeline to create a large-scale dataset of over 32,000 software engineering tasks across 20 languages and 3,600 repositories. It provides reproducible environments and reliable tests, validated by LLMs, to advance training for SWE agents.
🔹 Publication Date: Published on Feb 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23866
• PDF: https://arxiv.org/pdf/2602.23866
• Github: https://huggingface.co/collections/nebius/swe-rebench-v2
✨ Datasets citing this paper:
• https://huggingface.co/datasets/nebius/SWE-rebench-V2
• https://huggingface.co/datasets/nebius/SWE-rebench-V2-PRs
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#SoftwareEngineering #LLMs #AI #Dataset #SWEAgents
📝 Summary:
SWE-rebench V2 presents a new language-agnostic automated pipeline to create a large-scale dataset of over 32,000 software engineering tasks across 20 languages and 3,600 repositories. It provides reproducible environments and reliable tests, validated by LLMs, to advance training for SWE agents.
🔹 Publication Date: Published on Feb 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23866
• PDF: https://arxiv.org/pdf/2602.23866
• Github: https://huggingface.co/collections/nebius/swe-rebench-v2
✨ Datasets citing this paper:
• https://huggingface.co/datasets/nebius/SWE-rebench-V2
• https://huggingface.co/datasets/nebius/SWE-rebench-V2-PRs
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#SoftwareEngineering #LLMs #AI #Dataset #SWEAgents
✨daVinci-Env: Open SWE Environment Synthesis at Scale
📝 Summary:
OpenSWE is the largest open framework for training software engineering agents, featuring 45,320 executable Python environments. It achieves state-of-the-art performance on SWE-bench Verified and shows substantial out-of-domain reasoning improvements.
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13023
• PDF: https://arxiv.org/pdf/2603.13023
• Github: https://github.com/GAIR-NLP/OpenSWE
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#SoftwareEngineering #AIagents #MachineLearning #OpenSWE #DeepLearning
📝 Summary:
OpenSWE is the largest open framework for training software engineering agents, featuring 45,320 executable Python environments. It achieves state-of-the-art performance on SWE-bench Verified and shows substantial out-of-domain reasoning improvements.
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13023
• PDF: https://arxiv.org/pdf/2603.13023
• Github: https://github.com/GAIR-NLP/OpenSWE
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#SoftwareEngineering #AIagents #MachineLearning #OpenSWE #DeepLearning
✨SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
📝 Summary:
Research using SWE-Skills-Bench shows agent skills offer limited benefits in real-world software engineering. Most skills yield no improvement, with an average pass-rate gain of only 1.2 percent. Only specialized skills provide meaningful gains, while some can even degrade performance.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15401
• PDF: https://arxiv.org/pdf/2603.15401
• Github: https://github.com/GeniusHTX/SWE-Skills-Bench
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#SoftwareEngineering #AIagents #Benchmarking #AIresearch #LLM
📝 Summary:
Research using SWE-Skills-Bench shows agent skills offer limited benefits in real-world software engineering. Most skills yield no improvement, with an average pass-rate gain of only 1.2 percent. Only specialized skills provide meaningful gains, while some can even degrade performance.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15401
• PDF: https://arxiv.org/pdf/2603.15401
• Github: https://github.com/GeniusHTX/SWE-Skills-Bench
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#SoftwareEngineering #AIagents #Benchmarking #AIresearch #LLM
✨SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks
📝 Summary:
Software development is iterative, yet agentic coding benchmarks overwhelmingly evaluate single-shot solutions against complete specifications. Code can pass the test suite but become progressively ha...
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24755
• PDF: https://arxiv.org/pdf/2603.24755
• Project Page: https://www.scbench.ai
• Github: https://github.com/SprocketLab/slop-code-bench
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AICoding #Benchmarking #LLMAgents #SoftwareEngineering #CodeGeneration
📝 Summary:
Software development is iterative, yet agentic coding benchmarks overwhelmingly evaluate single-shot solutions against complete specifications. Code can pass the test suite but become progressively ha...
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24755
• PDF: https://arxiv.org/pdf/2603.24755
• Project Page: https://www.scbench.ai
• Github: https://github.com/SprocketLab/slop-code-bench
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AICoding #Benchmarking #LLMAgents #SoftwareEngineering #CodeGeneration
❤1
✨IQuest-Coder-V1 Technical Report
📝 Summary:
The IQuest-Coder-V1 series presents new code LLMs using a multi-stage training paradigm to capture dynamic software logic. This approach achieves state-of-the-art performance in agentic software engineering and competitive programming tasks. The Loop variant also optimizes deployment efficiency.
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16733
• PDF: https://arxiv.org/pdf/2603.16733
• Project Page: https://iquestlab.github.io/release-1.0-2603/index.html
• Github: https://github.com/IQuestLab/IQuest-Coder-V1
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#CodeLLM #SoftwareEngineering #LargeLanguageModels #AIResearch #MachineLearning
📝 Summary:
The IQuest-Coder-V1 series presents new code LLMs using a multi-stage training paradigm to capture dynamic software logic. This approach achieves state-of-the-art performance in agentic software engineering and competitive programming tasks. The Loop variant also optimizes deployment efficiency.
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16733
• PDF: https://arxiv.org/pdf/2603.16733
• Project Page: https://iquestlab.github.io/release-1.0-2603/index.html
• Github: https://github.com/IQuestLab/IQuest-Coder-V1
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#CodeLLM #SoftwareEngineering #LargeLanguageModels #AIResearch #MachineLearning
✨Natural-Language Agent Harnesses
📝 Summary:
Natural-Language Agent Harnesses NLAHs and Intelligent Harness Runtime IHR enable portable, executable agent harness design through natural language. This externalizes control logic from code, making harnesses easier to transfer, compare, and study.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25723
• PDF: https://arxiv.org/pdf/2603.25723
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#NaturalLanguageProcessing #AI #AIAgents #SoftwareEngineering #CodePortability
📝 Summary:
Natural-Language Agent Harnesses NLAHs and Intelligent Harness Runtime IHR enable portable, executable agent harness design through natural language. This externalizes control logic from code, making harnesses easier to transfer, compare, and study.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25723
• PDF: https://arxiv.org/pdf/2603.25723
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#NaturalLanguageProcessing #AI #AIAgents #SoftwareEngineering #CodePortability
✨Learning to Commit: Generating Organic Pull Requests via Online Repository Memory
📝 Summary:
Learning to Commit improves LLM coding agent organicity using Online Repository Memory. It distills project-specific coding skills from historical commits, guiding agents to generate code that adheres to project conventions and architectural patterns, leading to more acceptable pull requests.
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26664
• PDF: https://arxiv.org/pdf/2603.26664
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLMAgents #SoftwareEngineering #CodeGeneration #AIResearch #MachineLearning
📝 Summary:
Learning to Commit improves LLM coding agent organicity using Online Repository Memory. It distills project-specific coding skills from historical commits, guiding agents to generate code that adheres to project conventions and architectural patterns, leading to more acceptable pull requests.
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26664
• PDF: https://arxiv.org/pdf/2603.26664
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLMAgents #SoftwareEngineering #CodeGeneration #AIResearch #MachineLearning
❤1
✨Composer 2 Technical Report
📝 Summary:
Composer 2 is a specialized coding model trained via phased learning for real-world software engineering tasks. It demonstrates superior performance on new and public benchmarks, showcasing strong long-term planning and coding intelligence.
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24477
• PDF: https://arxiv.org/pdf/2603.24477
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #Coding #SoftwareEngineering #MachineLearning #CodeGeneration
📝 Summary:
Composer 2 is a specialized coding model trained via phased learning for real-world software engineering tasks. It demonstrates superior performance on new and public benchmarks, showcasing strong long-term planning and coding intelligence.
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24477
• PDF: https://arxiv.org/pdf/2603.24477
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #Coding #SoftwareEngineering #MachineLearning #CodeGeneration
❤1
✨Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time
📝 Summary:
Researchers analyzed AI coding agent contributions to open source projects. They found increasing agent activity but higher code churn over time compared to human-authored code.
🔹 Publication Date: Published on Apr 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00917
• PDF: https://arxiv.org/pdf/2604.00917
• Project Page: https://arxiv.org/html/2604.00917v1
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AIAgents #SoftwareEngineering #OpenSource #CodeQuality #AIResearch
📝 Summary:
Researchers analyzed AI coding agent contributions to open source projects. They found increasing agent activity but higher code churn over time compared to human-authored code.
🔹 Publication Date: Published on Apr 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00917
• PDF: https://arxiv.org/pdf/2604.00917
• Project Page: https://arxiv.org/html/2604.00917v1
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AIAgents #SoftwareEngineering #OpenSource #CodeQuality #AIResearch
❤2
✨QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization
📝 Summary:
PRepair tackles over-editing in AI program repair by maximizing correct code reuse. It combines controlled bug injection and edit-aware policy optimization using an edit-aware reward. This framework significantly improves repair precision and decoding throughput.
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05963
• PDF: https://arxiv.org/pdf/2604.05963
• Github: https://github.com/kcxain/QiMeng-PRepair
🔹 Models citing this paper:
• https://huggingface.co/kcxain/Prepair-Python-7B-EA
• https://huggingface.co/kcxain/Prepair-Verilog-7B-EA
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#ProgramRepair #AI #MachineLearning #ReinforcementLearning #SoftwareEngineering
📝 Summary:
PRepair tackles over-editing in AI program repair by maximizing correct code reuse. It combines controlled bug injection and edit-aware policy optimization using an edit-aware reward. This framework significantly improves repair precision and decoding throughput.
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05963
• PDF: https://arxiv.org/pdf/2604.05963
• Github: https://github.com/kcxain/QiMeng-PRepair
🔹 Models citing this paper:
• https://huggingface.co/kcxain/Prepair-Python-7B-EA
• https://huggingface.co/kcxain/Prepair-Verilog-7B-EA
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#ProgramRepair #AI #MachineLearning #ReinforcementLearning #SoftwareEngineering