AI & ML Papers

✨SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

📝 Summary:
SWE-rebench V2 presents a new language-agnostic automated pipeline to create a large-scale dataset of over 32,000 software engineering tasks across 20 languages and 3,600 repositories. It provides reproducible environments and reliable tests, validated by LLMs, to advance training for SWE agents.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23866
• PDF: https://arxiv.org/pdf/2602.23866
• Github: https://huggingface.co/collections/nebius/swe-rebench-v2

✨ Datasets citing this paper:
• https://huggingface.co/datasets/nebius/SWE-rebench-V2
• https://huggingface.co/datasets/nebius/SWE-rebench-V2-PRs

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #LLMs #AI #Dataset #SWEAgents

152 views10:05

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨daVinci-Env: Open SWE Environment Synthesis at Scale

📝 Summary:
OpenSWE is the largest open framework for training software engineering agents, featuring 45,320 executable Python environments. It achieves state-of-the-art performance on SWE-bench Verified and shows substantial out-of-domain reasoning improvements.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13023
• PDF: https://arxiv.org/pdf/2603.13023
• Github: https://github.com/GAIR-NLP/OpenSWE

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #AIagents #MachineLearning #OpenSWE #DeepLearning

166 views02:01

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

📝 Summary:
Research using SWE-Skills-Bench shows agent skills offer limited benefits in real-world software engineering. Most skills yield no improvement, with an average pass-rate gain of only 1.2 percent. Only specialized skills provide meaningful gains, while some can even degrade performance.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15401
• PDF: https://arxiv.org/pdf/2603.15401
• Github: https://github.com/GeniusHTX/SWE-Skills-Bench

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #AIagents #Benchmarking #AIresearch #LLM

163 views08:06

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

📝 Summary:
Software development is iterative, yet agentic coding benchmarks overwhelmingly evaluate single-shot solutions against complete specifications. Code can pass the test suite but become progressively ha...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24755
• PDF: https://arxiv.org/pdf/2603.24755
• Project Page: https://www.scbench.ai
• Github: https://github.com/SprocketLab/slop-code-bench

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AICoding #Benchmarking #LLMAgents #SoftwareEngineering #CodeGeneration

❤1

189 views03:01

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨IQuest-Coder-V1 Technical Report

📝 Summary:
The IQuest-Coder-V1 series presents new code LLMs using a multi-stage training paradigm to capture dynamic software logic. This approach achieves state-of-the-art performance in agentic software engineering and competitive programming tasks. The Loop variant also optimizes deployment efficiency.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16733
• PDF: https://arxiv.org/pdf/2603.16733
• Project Page: https://iquestlab.github.io/release-1.0-2603/index.html
• Github: https://github.com/IQuestLab/IQuest-Coder-V1

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#CodeLLM #SoftwareEngineering #LargeLanguageModels #AIResearch #MachineLearning

287 views13:08

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Natural-Language Agent Harnesses

📝 Summary:
Natural-Language Agent Harnesses NLAHs and Intelligent Harness Runtime IHR enable portable, executable agent harness design through natural language. This externalizes control logic from code, making harnesses easier to transfer, compare, and study.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25723
• PDF: https://arxiv.org/pdf/2603.25723

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#NaturalLanguageProcessing #AI #AIAgents #SoftwareEngineering #CodePortability

327 views10:03

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

📝 Summary:
Learning to Commit improves LLM coding agent organicity using Online Repository Memory. It distills project-specific coding skills from historical commits, guiding agents to generate code that adheres to project conventions and architectural patterns, leading to more acceptable pull requests.

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26664
• PDF: https://arxiv.org/pdf/2603.26664

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMAgents #SoftwareEngineering #CodeGeneration #AIResearch #MachineLearning

❤1

312 views18:04

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Composer 2 Technical Report

📝 Summary:
Composer 2 is a specialized coding model trained via phased learning for real-world software engineering tasks. It demonstrates superior performance on new and public benchmarks, showcasing strong long-term planning and coding intelligence.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24477
• PDF: https://arxiv.org/pdf/2603.24477

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AI #Coding #SoftwareEngineering #MachineLearning #CodeGeneration

❤1

278 views18:04

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time

📝 Summary:
Researchers analyzed AI coding agent contributions to open source projects. They found increasing agent activity but higher code churn over time compared to human-authored code.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00917
• PDF: https://arxiv.org/pdf/2604.00917
• Project Page: https://arxiv.org/html/2604.00917v1

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AIAgents #SoftwareEngineering #OpenSource #CodeQuality #AIResearch

❤2

219 views09:05

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization

📝 Summary:
PRepair tackles over-editing in AI program repair by maximizing correct code reuse. It combines controlled bug injection and edit-aware policy optimization using an edit-aware reward. This framework significantly improves repair precision and decoding throughput.

🔹 Publication Date: Published on Apr 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05963
• PDF: https://arxiv.org/pdf/2604.05963
• Github: https://github.com/kcxain/QiMeng-PRepair

🔹 Models citing this paper:
• https://huggingface.co/kcxain/Prepair-Python-7B-EA
• https://huggingface.co/kcxain/Prepair-Verilog-7B-EA

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#ProgramRepair #AI #MachineLearning #ReinforcementLearning #SoftwareEngineering

214 views08:03

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform