AI & ML Papers
32.9K subscribers
7.09K photos
529 videos
24 files
7.76K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
DiRL: An Efficient Post-Training Framework for Diffusion Language Models

📝 Summary:
DiRL is an efficient post-training framework for Diffusion Language Models, integrating online updates and introducing DiPO for unbiased policy optimization. It achieves state-of-the-art math performance for dLLMs, surpassing comparable models.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22234
• PDF: https://arxiv.org/pdf/2512.22234
• Github: https://github.com/OpenMOSS/DiRL

🔹 Models citing this paper:
https://huggingface.co/OpenMOSS-Team/DiRL-8B-Instruct

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#DiffusionModels #LLM #ModelOptimization #MachineLearning #AI
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization
PACED: Distillation at the Frontier of Student Competence

📝 Summary:
PACED optimizes distillation by focusing training on a student competence frontier using a Beta kernel weighting. Derived from gradient analysis, this avoids wasted compute at extremes, boosting distillation and self-distillation performance.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11178
• PDF: https://arxiv.org/pdf/2603.11178

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#KnowledgeDistillation #DeepLearning #ModelOptimization #AIResearch #ComputeEfficiency
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

📝 Summary:
LookaheadKV enhances KV cache eviction in LLMs by accurately predicting future importance scores. It uses parameter-efficient modules, avoiding costly draft generation while maintaining high accuracy. This lightweight method significantly reduces eviction overhead and speeds up inference.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10899
• PDF: https://arxiv.org/pdf/2603.10899
• Github: https://github.com/SamsungLabs/LookaheadKV

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #KVCache #ModelOptimization #DeepLearning #AI
Media is too big
VIEW IN TELEGRAM
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

📝 Summary:
Astrolabe is an efficient online reinforcement learning framework for distilled autoregressive video models. It improves generation quality using a forward-process RL formulation and streaming training with a multi-reward objective, avoiding expensive re-distillation or reverse-process optimization.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17051
• PDF: https://arxiv.org/pdf/2603.17051
• Project Page: https://franklinz233.github.io/projects/astrolabe/
• Github: https://github.com/franklinz233/Astrolabe

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#ReinforcementLearning #VideoGeneration #DeepLearning #AI #ModelOptimization
BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection

📝 Summary:
BEAVER is a training-free framework that improves long-context LLM inference using structure-aware hierarchical selection and dense tensor mapping. It maintains semantic integrity, achieves comparable performance to SOTA methods, and significantly reduces latency by 26.4x on large contexts.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19635
• PDF: https://arxiv.org/pdf/2603.19635
• Project Page: https://cslikai.cn/BEAVER/
• Github: https://github.com/JusperLee/BEAVER

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #AI #PromptEngineering #DeepLearning #ModelOptimization
6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models

📝 Summary:
This paper introduces a mixed-precision quantization framework for video diffusion transformers. It dynamically allocates NVFP4/INT8 based on layer volatility and uses Temporal Delta Cache to skip computations, significantly reducing memory and cost while preserving quality.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18742
• PDF: https://arxiv.org/pdf/2603.18742

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#Quantization #DiffusionModels #VideoAI #DeepLearning #ModelOptimization
S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models

📝 Summary:
S0 tuning optimizes recurrent state matrices in hybrid models, outperforming LoRA with zero inference overhead. It significantly improves performance on benchmarks like HumanEval and enables efficient task switching.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01168
• PDF: https://arxiv.org/pdf/2604.01168
• Project Page: https://www.jackyoung.io/research/s0-tuning
• Github: https://github.com/JackYoung27/s0-tuning

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#S0Tuning #DeepLearning #LLMs #ModelOptimization #MachineLearning
1
Test-Time Scaling Makes Overtraining Compute-Optimal

📝 Summary:
New Train-to-Test T^2 scaling laws optimize model size, training, and inference samples under budget. Considering inference costs, optimal pretraining shifts into an overtraining regime, yielding better performance for modern LLMs.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01411
• PDF: https://arxiv.org/pdf/2604.01411

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #MachineLearning #AIResearch #ScalingLaws #ModelOptimization
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

📝 Summary:
SPEED-Bench is introduced as a new benchmark for Speculative Decoding SD evaluation. It provides diverse semantic domains and realistic serving regimes to address limitations of existing benchmarks. This enables accurate measurement of SD performance in production environments, setting a unified ...

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09557
• PDF: https://arxiv.org/pdf/2604.09557
• Project Page: https://huggingface.co/blog/nvidia/speed-bench
• Github: https://github.com/NVIDIA/Model-Optimizer

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#SpeculativeDecoding #AIBenchmarks #LLMs #DeepLearning #ModelOptimization