AI & ML Papers

✨SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

📝 Summary:
SWE-Bench++ is an automated framework generating scalable, multilingual, repository-level coding tasks from live GitHub pull requests. It overcomes manual curation limits and static datasets, offering a benchmark to evaluate and improve code generation models across 11 languages.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17419
• PDF: https://arxiv.org/pdf/2512.17419
• Project Page: https://research.turing.com/swebench
• Github: https://huggingface.co/papers?q=GitHub%20pull%20requests

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #CodeGeneration #AIBenchmarking #MachineLearning #OpenSource

❤1

258 views03:01

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SWE-RM: Execution-free Feedback For Software Engineering Agents

📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels

❤1

324 views03:01

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

📝 Summary:
SWE-Lego achieves state-of-the-art software issue resolution through a lightweight supervised fine-tuning approach. It uses a high-quality dataset and refined training procedures like error masking and a difficulty-based curriculum, outperforming complex methods. Performance is further boosted by...

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01426
• PDF: https://arxiv.org/pdf/2601.01426
• Project Page: https://github.com/SWE-Lego/SWE-Lego
• Github: https://github.com/SWE-Lego/SWE-Lego

🔹 Models citing this paper:
• https://huggingface.co/SWE-Lego/SWE-Lego-Qwen3-8B
• https://huggingface.co/SWE-Lego/SWE-Lego-Qwen3-32B

✨ Datasets citing this paper:
• https://huggingface.co/datasets/SWE-Lego/SWE-Lego-Real-Data
• https://huggingface.co/datasets/SWE-Lego/SWE-Lego-Synthetic-Data

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #MachineLearning #LLM #FineTuning #AIforCode

arXiv.org

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for...

We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-ofthe-art performance in software engineering (SWE) issue resolving. In contrast to prevalent methods that rely...

311 views09:42

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

📝 Summary:
SWE-Pruner is a self-adaptive context pruning framework for coding agents. It performs task-aware adaptive pruning, guided by explicit agent goals and a neural skimmer, to reduce long context token usage by 23-54 percent with minimal performance loss.

🔹 Publication Date: Published on Jan 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16746
• PDF: https://arxiv.org/pdf/2601.16746
• Github: https://github.com/Ayanami1314/swe-pruner

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AIAgents #ContextPruning #LLM #AI #SoftwareEngineering

289 views03:02

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Prometheus: Unified Knowledge Graphs for Issue Resolution in Multilingual Codebases

📝 Summary:
Prometheus is a multi-agent system that uses a unified knowledge graph of code repositories to resolve real-world issues across multiple programming languages. It improves upon existing methods by handling diverse languages and real-world scenarios.

🔹 Publication Date: Published on Jul 26, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.19942
• PDF: https://arxiv.org/pdf/2507.19942
• Github: https://github.com/Pantheon-temple/Prometheus

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#KnowledgeGraphs #MultiAgentSystems #CodeAnalysis #SoftwareEngineering #AI

244 views03:01

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨RM -RF: Reward Model for Run-Free Unit Test Evaluation

📝 Summary:
RM-RF is a lightweight reward model predicting unit test outcomes directly from source code, skipping compile and run. It forecasts test suite success, coverage, and mutation kill rate, offering faster, cheaper evaluation for AI generated tests. This enables scalable feedback for test generation.

🔹 Publication Date: Published on Jan 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13097
• PDF: https://arxiv.org/pdf/2601.13097
• Github: https://github.com/trndcenter/RM-RF-unit-tests

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#RewardModels #UnitTesting #AIGeneratedTests #SoftwareEngineering #MachineLearning

208 views09:04

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

📝 Summary:
TAM-Eval is a new framework and benchmark for evaluating LLMs on comprehensive test suite maintenance tasks like creation, repair, and updating across Python, Java, and Go. It operates at the test file level with full repository context. Empirical results show current LLMs have limited capabiliti...

🔹 Publication Date: Published on Jan 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18241
• PDF: https://arxiv.org/pdf/2601.18241
• Github: https://github.com/trndcenter/TAM-Eval

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLM #SoftwareEngineering #TestAutomation #AI4Code #TAMEval

❤1

223 views09:04

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

📝 Summary:
This study finds that agent-generated tests for LLM software engineering agents may have limited value. Test writing frequency doesnt correlate with issue resolution, and agents prefer informal print statements. Varying test volume showed little impact, suggesting marginal utility in current prac...

🔹 Publication Date: Published on Feb 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.07900
• PDF: https://arxiv.org/pdf/2602.07900

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMAgents #SoftwareEngineering #AutomatedTesting #AIResearch #GenerativeAI

297 views13:51

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨AutoDev: Automated AI-Driven Development

📝 Summary:
AutoDev is an automated AI framework that uses autonomous agents to perform diverse software engineering tasks like coding, testing, and git operations in a secure Docker environment. It achieved high performance on HumanEval, significantly advancing AI-driven development.

🔹 Publication Date: Published on Mar 13, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2403.08299
• PDF: https://arxiv.org/pdf/2403.08299
• Github: https://github.com/vxcontrol/pentagi

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AI #SoftwareEngineering #AutomatedDevelopment #AutonomousAgents #GenAI

297 views13:05

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨CL4SE: A Context Learning Benchmark For Software Engineering Tasks

📝 Summary:
CL4SE presents a benchmark for evaluating context learning in software engineering tasks, defining four SE-specific context types. It demonstrates an average 24.7% performance improvement for LLMs across tasks like code generation and review, establishing a standardized evaluation framework.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23047
• PDF: https://arxiv.org/pdf/2602.23047
• Project Page: https://huggingface.co/papers?q=project-specific%20context
• Github: https://github.com/Tomsawyerhu/CodeCL

✨ Datasets citing this paper:
• https://huggingface.co/datasets/tomhu/codecl

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#ContextLearning #SoftwareEngineering #LLMs #CodeGeneration #Benchmarks

❤1

260 views14:04

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

📝 Summary:
SWE-rebench V2 presents a new language-agnostic automated pipeline to create a large-scale dataset of over 32,000 software engineering tasks across 20 languages and 3,600 repositories. It provides reproducible environments and reliable tests, validated by LLMs, to advance training for SWE agents.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23866
• PDF: https://arxiv.org/pdf/2602.23866
• Github: https://huggingface.co/collections/nebius/swe-rebench-v2

✨ Datasets citing this paper:
• https://huggingface.co/datasets/nebius/SWE-rebench-V2
• https://huggingface.co/datasets/nebius/SWE-rebench-V2-PRs

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #LLMs #AI #Dataset #SWEAgents

152 views10:05

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨daVinci-Env: Open SWE Environment Synthesis at Scale

📝 Summary:
OpenSWE is the largest open framework for training software engineering agents, featuring 45,320 executable Python environments. It achieves state-of-the-art performance on SWE-bench Verified and shows substantial out-of-domain reasoning improvements.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13023
• PDF: https://arxiv.org/pdf/2603.13023
• Github: https://github.com/GAIR-NLP/OpenSWE

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #AIagents #MachineLearning #OpenSWE #DeepLearning

168 views02:01

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

📝 Summary:
Research using SWE-Skills-Bench shows agent skills offer limited benefits in real-world software engineering. Most skills yield no improvement, with an average pass-rate gain of only 1.2 percent. Only specialized skills provide meaningful gains, while some can even degrade performance.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15401
• PDF: https://arxiv.org/pdf/2603.15401
• Github: https://github.com/GeniusHTX/SWE-Skills-Bench

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#SoftwareEngineering #AIagents #Benchmarking #AIresearch #LLM

164 views08:06

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

📝 Summary:
Software development is iterative, yet agentic coding benchmarks overwhelmingly evaluate single-shot solutions against complete specifications. Code can pass the test suite but become progressively ha...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24755
• PDF: https://arxiv.org/pdf/2603.24755
• Project Page: https://www.scbench.ai
• Github: https://github.com/SprocketLab/slop-code-bench

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AICoding #Benchmarking #LLMAgents #SoftwareEngineering #CodeGeneration

❤1

190 views03:01

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨IQuest-Coder-V1 Technical Report

📝 Summary:
The IQuest-Coder-V1 series presents new code LLMs using a multi-stage training paradigm to capture dynamic software logic. This approach achieves state-of-the-art performance in agentic software engineering and competitive programming tasks. The Loop variant also optimizes deployment efficiency.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16733
• PDF: https://arxiv.org/pdf/2603.16733
• Project Page: https://iquestlab.github.io/release-1.0-2603/index.html
• Github: https://github.com/IQuestLab/IQuest-Coder-V1

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#CodeLLM #SoftwareEngineering #LargeLanguageModels #AIResearch #MachineLearning

290 views13:08

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Natural-Language Agent Harnesses

📝 Summary:
Natural-Language Agent Harnesses NLAHs and Intelligent Harness Runtime IHR enable portable, executable agent harness design through natural language. This externalizes control logic from code, making harnesses easier to transfer, compare, and study.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25723
• PDF: https://arxiv.org/pdf/2603.25723

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#NaturalLanguageProcessing #AI #AIAgents #SoftwareEngineering #CodePortability

329 views10:03

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

📝 Summary:
Learning to Commit improves LLM coding agent organicity using Online Repository Memory. It distills project-specific coding skills from historical commits, guiding agents to generate code that adheres to project conventions and architectural patterns, leading to more acceptable pull requests.

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26664
• PDF: https://arxiv.org/pdf/2603.26664

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMAgents #SoftwareEngineering #CodeGeneration #AIResearch #MachineLearning

❤1

314 views18:04

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Composer 2 Technical Report

📝 Summary:
Composer 2 is a specialized coding model trained via phased learning for real-world software engineering tasks. It demonstrates superior performance on new and public benchmarks, showcasing strong long-term planning and coding intelligence.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24477
• PDF: https://arxiv.org/pdf/2603.24477

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AI #Coding #SoftwareEngineering #MachineLearning #CodeGeneration

❤1

280 views18:04

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time

📝 Summary:
Researchers analyzed AI coding agent contributions to open source projects. They found increasing agent activity but higher code churn over time compared to human-authored code.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00917
• PDF: https://arxiv.org/pdf/2604.00917
• Project Page: https://arxiv.org/html/2604.00917v1

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AIAgents #SoftwareEngineering #OpenSource #CodeQuality #AIResearch

❤2

221 views09:05

✨ Explore Data Science 📝 Write your paper

AI & ML Papers

✨QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization

📝 Summary:
PRepair tackles over-editing in AI program repair by maximizing correct code reuse. It combines controlled bug injection and edit-aware policy optimization using an edit-aware reward. This framework significantly improves repair precision and decoding throughput.

🔹 Publication Date: Published on Apr 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05963
• PDF: https://arxiv.org/pdf/2604.05963
• Github: https://github.com/kcxain/QiMeng-PRepair

🔹 Models citing this paper:
• https://huggingface.co/kcxain/Prepair-Python-7B-EA
• https://huggingface.co/kcxain/Prepair-Verilog-7B-EA

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#ProgramRepair #AI #MachineLearning #ReinforcementLearning #SoftwareEngineering

217 views08:03

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform