AI & ML Papers – Telegram

AI & ML Papers

32.9K subscribers

7.09K photos

529 videos

24 files

7.76K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

32.9K subscribers

✨Qwen3.5-Omni Technical Report

📝 Summary:
Qwen3.5-Omni is a large multimodal model excelling in audio-visual understanding and generation, achieving SOTA results across many benchmarks. It features a Hybrid Attention MoE architecture, introduces ARIA for improved speech synthesis, and exhibits a new Audio-Visual Vibe Coding capability.

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15804
• PDF: https://arxiv.org/pdf/2604.15804

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#MultimodalAI #AIResearch #DeepLearning #GenerativeAI #SpeechSynthesis

207 views03:00

✨ Explore Data Science 📝 Write your paper

✨Learning Adaptive Reasoning Paths for Efficient Visual Reasoning

📝 Summary:
Existing visual reasoning models often overthink, using redundant steps. AVR is an adaptive framework that dynamically chooses efficient reasoning formats. It reduces token usage by 50-90 percent while maintaining accuracy.

🔹 Publication Date: Published on Apr 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14568
• PDF: https://arxiv.org/pdf/2604.14568
• Github: https://github.com/RunRiotComeOn/AVR

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#VisualReasoning #AI #MachineLearning #Efficiency #DeepLearning

173 views04:02

✨ Explore Data Science 📝 Write your paper

This media is not supported in your browser

VIEW IN TELEGRAM

✨Repurposing 3D Generative Model for Autoregressive Layout Generation

📝 Summary:
LaviGen is a 3D layout generation framework that repurposes 3D generative models. It uses an adapted 3D diffusion model for autoregressive generation, explicitly modeling geometric relations and physical constraints. This achieves superior, more plausible 3D layouts 65% faster than previous methods.

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16299
• PDF: https://arxiv.org/pdf/2604.16299
• Project Page: https://fenghora.github.io/LaviGen-Page/
• Github: https://github.com/fenghora/LaviGen

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning

178 views05:02

✨ Explore Data Science 📝 Write your paper

Media is too big

VIEW IN TELEGRAM

✨Hierarchical Codec Diffusion for Video-to-Speech Generation

📝 Summary:
HiCoDiT generates speech from videos by leveraging the hierarchical structure of discrete speech tokens, achieving better audio-visual alignment through coarse-to-fine conditioning with dual-scale nor...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15923
• PDF: https://arxiv.org/pdf/2604.15923

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#VideoToSpeech #DiffusionModels #GenerativeAI #SpeechSynthesis #DeepLearning

232 views12:06

✨ Explore Data Science 📝 Write your paper

✨Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

📝 Summary:
This paper improves vision-language models for compositional reasoning by using concreteness-based negative sample selection and a novel margin-based loss. Their framework, Slipform, achieves state-of-the-art accuracy on compositional benchmarks and cross-modal retrieval.

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13313
• PDF: https://arxiv.org/pdf/2604.13313

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#VisionLanguage #DeepLearning #AIResearch #ComputerVision #NLP

262 views10:07

✨ Explore Data Science 📝 Write your paper

✨UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

📝 Summary:
UDM-GRPO integrates Uniform Discrete Diffusion Models with reinforcement learning, solving training instability issues. It optimizes using final samples as actions and reconstructed trajectories. This achieves state-of-the-art performance in text-to-image generation and OCR tasks.

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18518
• PDF: https://arxiv.org/pdf/2604.18518
• Project Page: https://yovecent.github.io/UDM-GRPO.github.io/
• Github: https://github.com/Yovecent/UDM-GRPO

🔹 Models citing this paper:
• https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-GenEval
• https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-PickScore

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #TextToImage #DeepLearning

❤1

116 views07:04

✨ Explore Data Science 📝 Write your paper

✨Scaling Test-Time Compute for Agentic Coding

📝 Summary:
This framework improves long-horizon agentic coding by using compact trajectory representations for test-time scaling. It employs Recursive Tournament Voting and adapted Parallel-Distill-Refine to significantly boost coding agent performance on benchmarks.

🔹 Publication Date: Published on Apr 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16529
• PDF: https://arxiv.org/pdf/2604.16529

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AgenticAI #CodingAgents #MachineLearning #AIResearch #DeepLearning

❤1

206 views07:02

✨ Explore Data Science 📝 Write your paper

This media is not supported in your browser

VIEW IN TELEGRAM

✨DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

📝 Summary:
DeVI enables physically plausible dexterous robot control by leveraging text-conditioned synthetic videos through a hybrid tracking reward that combines 3D and 2D tracking for improved hand-object int...

🔹 Publication Date: Published on Apr 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.20841
• PDF: https://arxiv.org/pdf/2604.20841
• Project Page: https://snuvclab.github.io/devi/
• Github: https://github.com/snuvclab/devi

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#Robotics #AI #ComputerVision #HumanRobotInteraction #DeepLearning

285 views09:04

✨ Explore Data Science 📝 Write your paper

✨Encoder-Free Human Motion Understanding via Structured Motion Descriptions

📝 Summary:
Structured Motion Description SMD converts human motion into natural language, enabling large language models LLMs to reason about it directly. This encoder-free method achieves state-of-the-art performance on motion question answering and captioning.

🔹 Publication Date: Published on Apr 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21668
• PDF: https://arxiv.org/pdf/2604.21668
• Project Page: https://yaozhang182.github.io/motion-smd/
• Github: https://yaozhang182.github.io/motion-smd/

🔹 Models citing this paper:
• https://huggingface.co/zyyy12138/motion-smd-lora

✨ Datasets citing this paper:
• https://huggingface.co/datasets/zyyy12138/motion-smd-data

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#HumanMotionUnderstanding #LLMs #NLP #AI #DeepLearning

Encoder-Free Human Motion Understanding via Structured Motion Descriptions

The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question...

❤1

309 views11:07

✨ Explore Data Science 📝 Write your paper

✨A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

📝 Summary:
Mixture of Experts MoE models enhance large AI model efficiency and performance by dynamically selecting sub-models for diverse data. This survey details MoE design, algorithms, theory, and applications in various machine learning fields.

🔹 Publication Date: Published on Mar 10, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.07137
• PDF: https://arxiv.org/pdf/2503.07137
• Github: https://github.com/deepseek-ai/DeepEP

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#MixtureOfExperts #MoE #AI #MachineLearning #DeepLearning

❤1

567 views23:10

✨ Explore Data Science 📝 Write your paper

✨LLM Safety From Within: Detecting Harmful Content with Internal Representations

📝 Summary:
SIREN is a lightweight guard model that uses LLM internal layer features to detect harmful content, outperforming current models. It is more efficient, generalizes better, and requires significantly fewer parameters than existing guard models.

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18519
• PDF: https://arxiv.org/pdf/2604.18519
• Github: https://github.com/CSSLab/SIREN

🔹 Models citing this paper:
• https://huggingface.co/UofTCSSLab/SIREN-Qwen3-0.6B
• https://huggingface.co/UofTCSSLab/SIREN-Qwen3-4B
• https://huggingface.co/UofTCSSLab/SIREN-Llama-3.2-1B

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMSafety #AIethics #HarmfulContent #DeepLearning #NLP

334 views02:00

✨ Explore Data Science 📝 Write your paper

✨DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

📝 Summary:
DiffNR enhances sparse-view CT reconstruction with neural representations by employing SliceFixer, a single-step diffusion model. It corrects artifacts via pseudo-reference volumes, offering 3D supervision for better accuracy and efficient optimization, with a 3.99 dB PSNR gain.

🔹 Publication Date: Published on Apr 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21518
• PDF: https://arxiv.org/pdf/2604.21518
• Project Page: https://ooonesevennn.github.io/DiffNR/
• Github: https://github.com/ooonesevennn/DiffNR

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#3DReconstruction #DiffusionModels #NeuralNetworks #CTReconstruction #DeepLearning

191 views04:01

✨ Explore Data Science 📝 Write your paper

✨FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing

📝 Summary:
FlowAnchor stabilizes inversion-free video editing by addressing signal instability in high-dimensional latent spaces. It uses spatial-aware attention refinement and adaptive magnitude modulation to ensure precise localization and sufficient editing strength, leading to faithful and coherent vide...

🔹 Publication Date: Published on Apr 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.22586
• PDF: https://arxiv.org/pdf/2604.22586

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#VideoEditing #DeepLearning #ComputerVision #GenerativeAI #AIResearch

211 views04:02

✨ Explore Data Science 📝 Write your paper

✨Sessa: Selective State Space Attention

📝 Summary:
Sessa is a new decoder architecture that puts attention inside a recurrent feedback path. This allows it to model long contexts better than Transformers and state-space models, achieving power-law memory decay and flexible selective retrieval. It outperforms on long-context tasks.

🔹 Publication Date: Published on Apr 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18580
• PDF: https://arxiv.org/pdf/2604.18580
• Github: https://github.com/LibratioAI/sessa

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#Sessa #DeepLearning #AttentionMechanisms #StateSpaceModels #LongContextAI

300 views05:02

✨ Explore Data Science 📝 Write your paper

✨Sapiens2

📝 Summary:
Sapiens2 is a high-resolution transformer model for human-centric vision. It achieves state-of-the-art performance by combining unified pretraining objectives, a large 1-billion image dataset, and architectural improvements, excelling in tasks like pose and segmentation.

🔹 Publication Date: Published on Apr 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21681
• PDF: https://arxiv.org/pdf/2604.21681
• Github: https://github.com/facebookresearch/sapiens2

🔹 Models citing this paper:
• https://huggingface.co/facebook/sapiens2
• https://huggingface.co/facebook/sapiens2-seg-5b
• https://huggingface.co/facebook/sapiens2-seg-1b

✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/sapiens2-seg
• https://huggingface.co/spaces/facebook/sapiens2-pointmap
• https://huggingface.co/spaces/facebook/sapiens2-normal

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#Sapiens2 #ComputerVision #TransformerModels #HumanCentricAI #DeepLearning

We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5...

201 views20:16

✨ Explore Data Science 📝 Write your paper

✨Large Language Models Explore by Latent Distilling

📝 Summary:
Exploratory Sampling ESamp boosts LLM diversity beyond lexical variation. It uses a lightweight Distiller to predict hidden representations, biasing decoding towards novel semantic patterns via prediction error. ESamp boosts reasoning efficiency and creative writing, with low overhead.

🔹 Publication Date: Published on Apr 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24927
• PDF: https://arxiv.org/pdf/2604.24927
• Github: https://github.com/LinesHogan/tllm

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLM #AI #NLP #DeepLearning #GenerativeAI

❤1

301 views10:02

✨ Explore Data Science 📝 Write your paper

✨ViPO: Visual Preference Optimization at Scale

📝 Summary:
ViPO scales visual preference optimization using Poly-DPO for noisy data and constructing ViPO, a large high-quality dataset. This dual approach yields superior performance, emphasizing that algorithmic adaptability and data quality are crucial.

🔹 Publication Date: Published on Apr 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24953
• PDF: https://arxiv.org/pdf/2604.24953
• Project Page: https://liming-ai.github.io/ViPO
• Github: https://liming-ai.github.io/ViPO

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#VisualAI #MachineLearning #DeepLearning #Optimization #DataScience

336 views15:05

✨ Explore Data Science 📝 Write your paper

✨SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

📝 Summary:
SignRoundV2 is a post-training quantization method for LLMs. It achieves competitive, near full-precision accuracy even at extremely low-bits like 2-bits. This is done via layer-wise bit allocation and pre-tuning scale search.

🔹 Publication Date: Published on Dec 4, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04746
• PDF: https://arxiv.org/pdf/2512.04746
• Project Page: https://github.com/intel/auto-round
• Github: https://github.com/intel/auto-round

🔹 Models citing this paper:
• https://huggingface.co/Intel/MiroThinker-v1.5-30B-gguf-q2ks-mixed-AutoRound
• https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMs #Quantization #DeepLearning #AI #MachineLearning

556 views18:06

✨ Explore Data Science 📝 Write your paper

✨Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

📝 Summary:
Nemotron 3 Nano Omni is a new efficient, open multimodal AI model. It natively supports audio, text, images, and video inputs, improving accuracy and efficiency over previous versions. It excels in document understanding and long audio-video comprehension.

🔹 Publication Date: Published on Apr 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24954
• PDF: https://arxiv.org/pdf/2604.24954

🔹 Models citing this paper:
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8

✨ Spaces citing this paper:
• https://huggingface.co/spaces/akhaliq/Nemotron-3-Nano-Omni
• https://huggingface.co/spaces/developerjeremylive/Nemotron-3-Nano-Omni-etheroi

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#AI #MultimodalAI #DeepLearning #OpenSourceAI #AIResearch

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni...

❤2

816 views20:07

✨ Explore Data Science 📝 Write your paper

✨DeepSeek-OCR: Contexts Optical Compression

📝 Summary:
DeepSeek-OCR compresses long contexts via optical 2D mapping to achieve high OCR precision with significantly reduced vision tokens. It shows 97% accuracy at 10x compression, outperforming other OCR models efficiently. This innovation holds practical value for document processing and LLM training...

🔹 Publication Date: Published on Oct 21, 2025

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/deepseek-ocr-contexts-optical-compression
• PDF: https://arxiv.org/pdf/2510.18234
• Github: https://github.com/deepseek-ai/DeepSeek-OCR

🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-OCR
• https://huggingface.co/deepseek-ai/DeepSeek-OCR-2
• https://huggingface.co/unsloth/DeepSeek-OCR

✨ Spaces citing this paper:
• https://huggingface.co/spaces/merterbak/DeepSeek-OCR-Demo
• https://huggingface.co/spaces/davidpcm/openclaw-stock-analyst
• https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#OCR #AI #DeepLearning #ContextCompression #LLM

DeepSeek-OCR: Contexts Optical Compression - Explained Simply

By Haoran Wei, Yaofeng Sun, Yukun Li. # DeepSeek-OCR: A Game-Changer for Processing Text-Heavy Documents

**The Problem:** Current AI syst...

❤4👍1

1.04K views11:27

✨ Explore Data Science 📝 Write your paper

🔥 GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

💡 The paper introduces GoLongRL, a new approach to long context reinforcement learning that focuses on capability oriented data construction and multitask alignment. The existing methods for long context reinforcement learning often result in homogeneous task coverage and reward formulations that do not accurately reflect real world requirements. To address this issue, the authors propose two main contributions.

First, they introduce a capability oriented data construction method that involves creating a dataset of 23,000 reinforcement learning samples with verifiable rewards, spanning 9 task types, each with its own evaluation metric. The dataset is openly released along with the construction pipeline and training code. The results show that this dataset outperforms a closed source dataset called QwenLong-L1.5 under the same training setup.

Second, the authors propose a new method called TMN-Reweight for heterogeneous multitask optimization. This method combines task level mean normalization for cross task reward scale alignment with difficulty adaptive weighting for more reliable advantage estimation. The results show that TMN-Reweight improves average performance over the vanilla GRPO method, while preserving or improving general capabilities across evaluations.

The authors also train a model called Qwen3-30B-A3B on the new dataset and achieve long context performance comparable to other state of the art models, such as DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507. This suggests that the new dataset and TMN-Reweight method can substantially improve long context capability. Overall, the paper presents a new approach to long context reinforcement learning that focuses on capability oriented data construction and multitask alignment, and achieves state of the art results.

📅 Published on May 19

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19577
• PDF: https://arxiv.org/pdf/2605.19577
• Project Page: https://huggingface.co/collections/Kwai-Klear/golongrl

🤖 Models citing this paper:
• https://huggingface.co/Kwai-Klear/GoLongRL-4B
• https://huggingface.co/Kwai-Klear/GoLongRL-30B-A3B

📊 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Klear/GoLongRL

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#ReinforcementLearning #LongContextLearning #MultitaskAlignment #CapabilityOrientedLearning #DeepLearning

The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.

560 views17:53

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate