AI & ML Papers
32.8K subscribers
7.07K photos
523 videos
24 files
7.72K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

📝 Summary:
Nanbeige4.1-3B is a 3B-parameter model excelling in agentic behavior, code generation, and reasoning. It outperforms larger models through advanced reward modeling and training, demonstrating broad competence for a small language model.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13367
• PDF: https://arxiv.org/pdf/2602.13367
• Project Page: https://huggingface.co/Nanbeige/Nanbeige4.1-3B

🔹 Models citing this paper:
https://huggingface.co/Nanbeige/Nanbeige4.1-3B

Spaces citing this paper:
https://huggingface.co/spaces/PioTio/AIMan

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #AI #SmallLanguageModels #AgenticAI #CodeGeneration
1
TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models

📝 Summary:
TAROT proposes a reinforcement fine-tuning method for code generation that uses a four-tier test suite and capability-adaptive curriculum. This approach tailors curriculum progression based on a models skill, improving functional correctness and robustness.

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15449
• PDF: https://arxiv.org/pdf/2602.15449
• Github: https://github.com/deep-diver/TAROT

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #CodeGeneration #ReinforcementLearning #AI #MachineLearning
CL4SE: A Context Learning Benchmark For Software Engineering Tasks

📝 Summary:
CL4SE presents a benchmark for evaluating context learning in software engineering tasks, defining four SE-specific context types. It demonstrates an average 24.7% performance improvement for LLMs across tasks like code generation and review, establishing a standardized evaluation framework.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23047
• PDF: https://arxiv.org/pdf/2602.23047
• Project Page: https://huggingface.co/papers?q=project-specific%20context
• Github: https://github.com/Tomsawyerhu/CodeCL

Datasets citing this paper:
https://huggingface.co/datasets/tomhu/codecl

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#ContextLearning #SoftwareEngineering #LLMs #CodeGeneration #Benchmarks
1
This media is not supported in your browser
VIEW IN TELEGRAM
V_1: Unifying Generation and Self-Verification for Parallel Reasoners

📝 Summary:
V1 unifies generation and verification for complex reasoning tasks. It leverages models' superior ability in pairwise self-verification over independent scoring, improving performance and efficiency in code generation and math.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04304
• PDF: https://arxiv.org/pdf/2603.04304
• Project Page: https://harmandotpy.github.io/v1-verification/
• Github: https://github.com/HarmanDotpy/pairwise-self-verification

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AI #LLMs #MachineLearning #CodeGeneration #AIReasoning
1
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

📝 Summary:
ReflexiCoder uses reinforcement learning to teach large language models autonomous code reflection and self-correction. It internalizes the debugging process into the model, achieving state-of-the-art performance on coding benchmarks, rivaling proprietary models, and reducing inference compute by...

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05863
• PDF: https://arxiv.org/pdf/2603.05863
• Github: https://github.com/juyongjiang/ReflexiCoder

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #ReinforcementLearning #CodeGeneration #AI #DeepLearning
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

📝 Summary:
Researchers introduced CreativeBench, a benchmark for evaluating machine creativity in code generation using a quality-novelty metric. They found scaling improves combinatorial creativity but yields diminishing returns for exploration. They also proposed EvoRePE, an inference-time strategy to enh...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11863
• PDF: https://arxiv.org/pdf/2603.11863
• Project Page: https://zethwang.github.io/creativebench.github.io/
• Github: https://github.com/ZethWang/CreativeBench

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#MachineCreativity #CodeGeneration #AIBenchmark #GenerativeAI #AIResearch
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

📝 Summary:
Software development is iterative, yet agentic coding benchmarks overwhelmingly evaluate single-shot solutions against complete specifications. Code can pass the test suite but become progressively ha...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24755
• PDF: https://arxiv.org/pdf/2603.24755
• Project Page: https://www.scbench.ai
• Github: https://github.com/SprocketLab/slop-code-bench

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AICoding #Benchmarking #LLMAgents #SoftwareEngineering #CodeGeneration
1
Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

📝 Summary:
Learning to Commit improves LLM coding agent organicity using Online Repository Memory. It distills project-specific coding skills from historical commits, guiding agents to generate code that adheres to project conventions and architectural patterns, leading to more acceptable pull requests.

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26664
• PDF: https://arxiv.org/pdf/2603.26664

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLMAgents #SoftwareEngineering #CodeGeneration #AIResearch #MachineLearning
1
Composer 2 Technical Report

📝 Summary:
Composer 2 is a specialized coding model trained via phased learning for real-world software engineering tasks. It demonstrates superior performance on new and public benchmarks, showcasing strong long-term planning and coding intelligence.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24477
• PDF: https://arxiv.org/pdf/2603.24477

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AI #Coding #SoftwareEngineering #MachineLearning #CodeGeneration
1
InCoder-32B-Thinking: Industrial Code World Model for Thinking

📝 Summary:
Industrial software development lacks expert reasoning traces for hardware constraints, so a model was trained on error-driven reasoning chains and domain-specific execution traces to generate high-qu...

🔹 Publication Date: Published on Apr 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.03144
• PDF: https://arxiv.org/pdf/2604.03144

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AI #CodeGeneration #IndustrialAI #WorldModels #SoftwareDevelopment