✨Qwen3.5-Omni Technical Report
📝 Summary:
Qwen3.5-Omni is a large multimodal model excelling in audio-visual understanding and generation, achieving SOTA results across many benchmarks. It features a Hybrid Attention MoE architecture, introduces ARIA for improved speech synthesis, and exhibits a new Audio-Visual Vibe Coding capability.
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15804
• PDF: https://arxiv.org/pdf/2604.15804
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#MultimodalAI #AIResearch #DeepLearning #GenerativeAI #SpeechSynthesis
📝 Summary:
Qwen3.5-Omni is a large multimodal model excelling in audio-visual understanding and generation, achieving SOTA results across many benchmarks. It features a Hybrid Attention MoE architecture, introduces ARIA for improved speech synthesis, and exhibits a new Audio-Visual Vibe Coding capability.
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15804
• PDF: https://arxiv.org/pdf/2604.15804
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#MultimodalAI #AIResearch #DeepLearning #GenerativeAI #SpeechSynthesis
✨Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
📝 Summary:
Existing visual reasoning models often overthink, using redundant steps. AVR is an adaptive framework that dynamically chooses efficient reasoning formats. It reduces token usage by 50-90 percent while maintaining accuracy.
🔹 Publication Date: Published on Apr 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14568
• PDF: https://arxiv.org/pdf/2604.14568
• Github: https://github.com/RunRiotComeOn/AVR
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VisualReasoning #AI #MachineLearning #Efficiency #DeepLearning
📝 Summary:
Existing visual reasoning models often overthink, using redundant steps. AVR is an adaptive framework that dynamically chooses efficient reasoning formats. It reduces token usage by 50-90 percent while maintaining accuracy.
🔹 Publication Date: Published on Apr 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14568
• PDF: https://arxiv.org/pdf/2604.14568
• Github: https://github.com/RunRiotComeOn/AVR
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VisualReasoning #AI #MachineLearning #Efficiency #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨Repurposing 3D Generative Model for Autoregressive Layout Generation
📝 Summary:
LaviGen is a 3D layout generation framework that repurposes 3D generative models. It uses an adapted 3D diffusion model for autoregressive generation, explicitly modeling geometric relations and physical constraints. This achieves superior, more plausible 3D layouts 65% faster than previous methods.
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16299
• PDF: https://arxiv.org/pdf/2604.16299
• Project Page: https://fenghora.github.io/LaviGen-Page/
• Github: https://github.com/fenghora/LaviGen
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
📝 Summary:
LaviGen is a 3D layout generation framework that repurposes 3D generative models. It uses an adapted 3D diffusion model for autoregressive generation, explicitly modeling geometric relations and physical constraints. This achieves superior, more plausible 3D layouts 65% faster than previous methods.
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16299
• PDF: https://arxiv.org/pdf/2604.16299
• Project Page: https://fenghora.github.io/LaviGen-Page/
• Github: https://github.com/fenghora/LaviGen
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
Media is too big
VIEW IN TELEGRAM
✨Hierarchical Codec Diffusion for Video-to-Speech Generation
📝 Summary:
HiCoDiT generates speech from videos by leveraging the hierarchical structure of discrete speech tokens, achieving better audio-visual alignment through coarse-to-fine conditioning with dual-scale nor...
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15923
• PDF: https://arxiv.org/pdf/2604.15923
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VideoToSpeech #DiffusionModels #GenerativeAI #SpeechSynthesis #DeepLearning
📝 Summary:
HiCoDiT generates speech from videos by leveraging the hierarchical structure of discrete speech tokens, achieving better audio-visual alignment through coarse-to-fine conditioning with dual-scale nor...
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15923
• PDF: https://arxiv.org/pdf/2604.15923
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VideoToSpeech #DiffusionModels #GenerativeAI #SpeechSynthesis #DeepLearning
✨Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding
📝 Summary:
This paper improves vision-language models for compositional reasoning by using concreteness-based negative sample selection and a novel margin-based loss. Their framework, Slipform, achieves state-of-the-art accuracy on compositional benchmarks and cross-modal retrieval.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13313
• PDF: https://arxiv.org/pdf/2604.13313
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VisionLanguage #DeepLearning #AIResearch #ComputerVision #NLP
📝 Summary:
This paper improves vision-language models for compositional reasoning by using concreteness-based negative sample selection and a novel margin-based loss. Their framework, Slipform, achieves state-of-the-art accuracy on compositional benchmarks and cross-modal retrieval.
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13313
• PDF: https://arxiv.org/pdf/2604.13313
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VisionLanguage #DeepLearning #AIResearch #ComputerVision #NLP
✨UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
📝 Summary:
UDM-GRPO integrates Uniform Discrete Diffusion Models with reinforcement learning, solving training instability issues. It optimizes using final samples as actions and reconstructed trajectories. This achieves state-of-the-art performance in text-to-image generation and OCR tasks.
🔹 Publication Date: Published on Apr 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18518
• PDF: https://arxiv.org/pdf/2604.18518
• Project Page: https://yovecent.github.io/UDM-GRPO.github.io/
• Github: https://github.com/Yovecent/UDM-GRPO
🔹 Models citing this paper:
• https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-GenEval
• https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-PickScore
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#DiffusionModels #ReinforcementLearning #GenerativeAI #TextToImage #DeepLearning
📝 Summary:
UDM-GRPO integrates Uniform Discrete Diffusion Models with reinforcement learning, solving training instability issues. It optimizes using final samples as actions and reconstructed trajectories. This achieves state-of-the-art performance in text-to-image generation and OCR tasks.
🔹 Publication Date: Published on Apr 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18518
• PDF: https://arxiv.org/pdf/2604.18518
• Project Page: https://yovecent.github.io/UDM-GRPO.github.io/
• Github: https://github.com/Yovecent/UDM-GRPO
🔹 Models citing this paper:
• https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-GenEval
• https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-PickScore
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#DiffusionModels #ReinforcementLearning #GenerativeAI #TextToImage #DeepLearning
❤1
✨Scaling Test-Time Compute for Agentic Coding
📝 Summary:
This framework improves long-horizon agentic coding by using compact trajectory representations for test-time scaling. It employs Recursive Tournament Voting and adapted Parallel-Distill-Refine to significantly boost coding agent performance on benchmarks.
🔹 Publication Date: Published on Apr 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16529
• PDF: https://arxiv.org/pdf/2604.16529
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AgenticAI #CodingAgents #MachineLearning #AIResearch #DeepLearning
📝 Summary:
This framework improves long-horizon agentic coding by using compact trajectory representations for test-time scaling. It employs Recursive Tournament Voting and adapted Parallel-Distill-Refine to significantly boost coding agent performance on benchmarks.
🔹 Publication Date: Published on Apr 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16529
• PDF: https://arxiv.org/pdf/2604.16529
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AgenticAI #CodingAgents #MachineLearning #AIResearch #DeepLearning
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation
📝 Summary:
DeVI enables physically plausible dexterous robot control by leveraging text-conditioned synthetic videos through a hybrid tracking reward that combines 3D and 2D tracking for improved hand-object int...
🔹 Publication Date: Published on Apr 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.20841
• PDF: https://arxiv.org/pdf/2604.20841
• Project Page: https://snuvclab.github.io/devi/
• Github: https://github.com/snuvclab/devi
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#Robotics #AI #ComputerVision #HumanRobotInteraction #DeepLearning
📝 Summary:
DeVI enables physically plausible dexterous robot control by leveraging text-conditioned synthetic videos through a hybrid tracking reward that combines 3D and 2D tracking for improved hand-object int...
🔹 Publication Date: Published on Apr 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.20841
• PDF: https://arxiv.org/pdf/2604.20841
• Project Page: https://snuvclab.github.io/devi/
• Github: https://github.com/snuvclab/devi
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#Robotics #AI #ComputerVision #HumanRobotInteraction #DeepLearning
✨Encoder-Free Human Motion Understanding via Structured Motion Descriptions
📝 Summary:
Structured Motion Description SMD converts human motion into natural language, enabling large language models LLMs to reason about it directly. This encoder-free method achieves state-of-the-art performance on motion question answering and captioning.
🔹 Publication Date: Published on Apr 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21668
• PDF: https://arxiv.org/pdf/2604.21668
• Project Page: https://yaozhang182.github.io/motion-smd/
• Github: https://yaozhang182.github.io/motion-smd/
🔹 Models citing this paper:
• https://huggingface.co/zyyy12138/motion-smd-lora
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zyyy12138/motion-smd-data
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#HumanMotionUnderstanding #LLMs #NLP #AI #DeepLearning
📝 Summary:
Structured Motion Description SMD converts human motion into natural language, enabling large language models LLMs to reason about it directly. This encoder-free method achieves state-of-the-art performance on motion question answering and captioning.
🔹 Publication Date: Published on Apr 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21668
• PDF: https://arxiv.org/pdf/2604.21668
• Project Page: https://yaozhang182.github.io/motion-smd/
• Github: https://yaozhang182.github.io/motion-smd/
🔹 Models citing this paper:
• https://huggingface.co/zyyy12138/motion-smd-lora
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zyyy12138/motion-smd-data
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#HumanMotionUnderstanding #LLMs #NLP #AI #DeepLearning
arXiv.org
Encoder-Free Human Motion Understanding via Structured Motion Descriptions
The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question...
❤1
✨A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
📝 Summary:
Mixture of Experts MoE models enhance large AI model efficiency and performance by dynamically selecting sub-models for diverse data. This survey details MoE design, algorithms, theory, and applications in various machine learning fields.
🔹 Publication Date: Published on Mar 10, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.07137
• PDF: https://arxiv.org/pdf/2503.07137
• Github: https://github.com/deepseek-ai/DeepEP
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#MixtureOfExperts #MoE #AI #MachineLearning #DeepLearning
📝 Summary:
Mixture of Experts MoE models enhance large AI model efficiency and performance by dynamically selecting sub-models for diverse data. This survey details MoE design, algorithms, theory, and applications in various machine learning fields.
🔹 Publication Date: Published on Mar 10, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.07137
• PDF: https://arxiv.org/pdf/2503.07137
• Github: https://github.com/deepseek-ai/DeepEP
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#MixtureOfExperts #MoE #AI #MachineLearning #DeepLearning
❤1
✨LLM Safety From Within: Detecting Harmful Content with Internal Representations
📝 Summary:
SIREN is a lightweight guard model that uses LLM internal layer features to detect harmful content, outperforming current models. It is more efficient, generalizes better, and requires significantly fewer parameters than existing guard models.
🔹 Publication Date: Published on Apr 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18519
• PDF: https://arxiv.org/pdf/2604.18519
• Github: https://github.com/CSSLab/SIREN
🔹 Models citing this paper:
• https://huggingface.co/UofTCSSLab/SIREN-Qwen3-0.6B
• https://huggingface.co/UofTCSSLab/SIREN-Qwen3-4B
• https://huggingface.co/UofTCSSLab/SIREN-Llama-3.2-1B
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLMSafety #AIethics #HarmfulContent #DeepLearning #NLP
📝 Summary:
SIREN is a lightweight guard model that uses LLM internal layer features to detect harmful content, outperforming current models. It is more efficient, generalizes better, and requires significantly fewer parameters than existing guard models.
🔹 Publication Date: Published on Apr 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18519
• PDF: https://arxiv.org/pdf/2604.18519
• Github: https://github.com/CSSLab/SIREN
🔹 Models citing this paper:
• https://huggingface.co/UofTCSSLab/SIREN-Qwen3-0.6B
• https://huggingface.co/UofTCSSLab/SIREN-Qwen3-4B
• https://huggingface.co/UofTCSSLab/SIREN-Llama-3.2-1B
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLMSafety #AIethics #HarmfulContent #DeepLearning #NLP
✨DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction
📝 Summary:
DiffNR enhances sparse-view CT reconstruction with neural representations by employing SliceFixer, a single-step diffusion model. It corrects artifacts via pseudo-reference volumes, offering 3D supervision for better accuracy and efficient optimization, with a 3.99 dB PSNR gain.
🔹 Publication Date: Published on Apr 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21518
• PDF: https://arxiv.org/pdf/2604.21518
• Project Page: https://ooonesevennn.github.io/DiffNR/
• Github: https://github.com/ooonesevennn/DiffNR
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#3DReconstruction #DiffusionModels #NeuralNetworks #CTReconstruction #DeepLearning
📝 Summary:
DiffNR enhances sparse-view CT reconstruction with neural representations by employing SliceFixer, a single-step diffusion model. It corrects artifacts via pseudo-reference volumes, offering 3D supervision for better accuracy and efficient optimization, with a 3.99 dB PSNR gain.
🔹 Publication Date: Published on Apr 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21518
• PDF: https://arxiv.org/pdf/2604.21518
• Project Page: https://ooonesevennn.github.io/DiffNR/
• Github: https://github.com/ooonesevennn/DiffNR
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#3DReconstruction #DiffusionModels #NeuralNetworks #CTReconstruction #DeepLearning
✨FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing
📝 Summary:
FlowAnchor stabilizes inversion-free video editing by addressing signal instability in high-dimensional latent spaces. It uses spatial-aware attention refinement and adaptive magnitude modulation to ensure precise localization and sufficient editing strength, leading to faithful and coherent vide...
🔹 Publication Date: Published on Apr 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.22586
• PDF: https://arxiv.org/pdf/2604.22586
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VideoEditing #DeepLearning #ComputerVision #GenerativeAI #AIResearch
📝 Summary:
FlowAnchor stabilizes inversion-free video editing by addressing signal instability in high-dimensional latent spaces. It uses spatial-aware attention refinement and adaptive magnitude modulation to ensure precise localization and sufficient editing strength, leading to faithful and coherent vide...
🔹 Publication Date: Published on Apr 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.22586
• PDF: https://arxiv.org/pdf/2604.22586
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VideoEditing #DeepLearning #ComputerVision #GenerativeAI #AIResearch
✨Sessa: Selective State Space Attention
📝 Summary:
Sessa is a new decoder architecture that puts attention inside a recurrent feedback path. This allows it to model long contexts better than Transformers and state-space models, achieving power-law memory decay and flexible selective retrieval. It outperforms on long-context tasks.
🔹 Publication Date: Published on Apr 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18580
• PDF: https://arxiv.org/pdf/2604.18580
• Github: https://github.com/LibratioAI/sessa
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#Sessa #DeepLearning #AttentionMechanisms #StateSpaceModels #LongContextAI
📝 Summary:
Sessa is a new decoder architecture that puts attention inside a recurrent feedback path. This allows it to model long contexts better than Transformers and state-space models, achieving power-law memory decay and flexible selective retrieval. It outperforms on long-context tasks.
🔹 Publication Date: Published on Apr 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18580
• PDF: https://arxiv.org/pdf/2604.18580
• Github: https://github.com/LibratioAI/sessa
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#Sessa #DeepLearning #AttentionMechanisms #StateSpaceModels #LongContextAI
✨Sapiens2
📝 Summary:
Sapiens2 is a high-resolution transformer model for human-centric vision. It achieves state-of-the-art performance by combining unified pretraining objectives, a large 1-billion image dataset, and architectural improvements, excelling in tasks like pose and segmentation.
🔹 Publication Date: Published on Apr 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21681
• PDF: https://arxiv.org/pdf/2604.21681
• Github: https://github.com/facebookresearch/sapiens2
🔹 Models citing this paper:
• https://huggingface.co/facebook/sapiens2
• https://huggingface.co/facebook/sapiens2-seg-5b
• https://huggingface.co/facebook/sapiens2-seg-1b
✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/sapiens2-seg
• https://huggingface.co/spaces/facebook/sapiens2-pointmap
• https://huggingface.co/spaces/facebook/sapiens2-normal
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#Sapiens2 #ComputerVision #TransformerModels #HumanCentricAI #DeepLearning
📝 Summary:
Sapiens2 is a high-resolution transformer model for human-centric vision. It achieves state-of-the-art performance by combining unified pretraining objectives, a large 1-billion image dataset, and architectural improvements, excelling in tasks like pose and segmentation.
🔹 Publication Date: Published on Apr 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21681
• PDF: https://arxiv.org/pdf/2604.21681
• Github: https://github.com/facebookresearch/sapiens2
🔹 Models citing this paper:
• https://huggingface.co/facebook/sapiens2
• https://huggingface.co/facebook/sapiens2-seg-5b
• https://huggingface.co/facebook/sapiens2-seg-1b
✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/sapiens2-seg
• https://huggingface.co/spaces/facebook/sapiens2-pointmap
• https://huggingface.co/spaces/facebook/sapiens2-normal
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#Sapiens2 #ComputerVision #TransformerModels #HumanCentricAI #DeepLearning
arXiv.org
Sapiens2
We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5...
✨Large Language Models Explore by Latent Distilling
📝 Summary:
Exploratory Sampling ESamp boosts LLM diversity beyond lexical variation. It uses a lightweight Distiller to predict hidden representations, biasing decoding towards novel semantic patterns via prediction error. ESamp boosts reasoning efficiency and creative writing, with low overhead.
🔹 Publication Date: Published on Apr 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24927
• PDF: https://arxiv.org/pdf/2604.24927
• Github: https://github.com/LinesHogan/tllm
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLM #AI #NLP #DeepLearning #GenerativeAI
📝 Summary:
Exploratory Sampling ESamp boosts LLM diversity beyond lexical variation. It uses a lightweight Distiller to predict hidden representations, biasing decoding towards novel semantic patterns via prediction error. ESamp boosts reasoning efficiency and creative writing, with low overhead.
🔹 Publication Date: Published on Apr 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24927
• PDF: https://arxiv.org/pdf/2604.24927
• Github: https://github.com/LinesHogan/tllm
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLM #AI #NLP #DeepLearning #GenerativeAI
❤1
✨ViPO: Visual Preference Optimization at Scale
📝 Summary:
ViPO scales visual preference optimization using Poly-DPO for noisy data and constructing ViPO, a large high-quality dataset. This dual approach yields superior performance, emphasizing that algorithmic adaptability and data quality are crucial.
🔹 Publication Date: Published on Apr 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24953
• PDF: https://arxiv.org/pdf/2604.24953
• Project Page: https://liming-ai.github.io/ViPO
• Github: https://liming-ai.github.io/ViPO
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VisualAI #MachineLearning #DeepLearning #Optimization #DataScience
📝 Summary:
ViPO scales visual preference optimization using Poly-DPO for noisy data and constructing ViPO, a large high-quality dataset. This dual approach yields superior performance, emphasizing that algorithmic adaptability and data quality are crucial.
🔹 Publication Date: Published on Apr 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24953
• PDF: https://arxiv.org/pdf/2604.24953
• Project Page: https://liming-ai.github.io/ViPO
• Github: https://liming-ai.github.io/ViPO
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#VisualAI #MachineLearning #DeepLearning #Optimization #DataScience
✨SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs
📝 Summary:
SignRoundV2 is a post-training quantization method for LLMs. It achieves competitive, near full-precision accuracy even at extremely low-bits like 2-bits. This is done via layer-wise bit allocation and pre-tuning scale search.
🔹 Publication Date: Published on Dec 4, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04746
• PDF: https://arxiv.org/pdf/2512.04746
• Project Page: https://github.com/intel/auto-round
• Github: https://github.com/intel/auto-round
🔹 Models citing this paper:
• https://huggingface.co/Intel/MiroThinker-v1.5-30B-gguf-q2ks-mixed-AutoRound
• https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLMs #Quantization #DeepLearning #AI #MachineLearning
📝 Summary:
SignRoundV2 is a post-training quantization method for LLMs. It achieves competitive, near full-precision accuracy even at extremely low-bits like 2-bits. This is done via layer-wise bit allocation and pre-tuning scale search.
🔹 Publication Date: Published on Dec 4, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04746
• PDF: https://arxiv.org/pdf/2512.04746
• Project Page: https://github.com/intel/auto-round
• Github: https://github.com/intel/auto-round
🔹 Models citing this paper:
• https://huggingface.co/Intel/MiroThinker-v1.5-30B-gguf-q2ks-mixed-AutoRound
• https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLMs #Quantization #DeepLearning #AI #MachineLearning
✨Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence
📝 Summary:
Nemotron 3 Nano Omni is a new efficient, open multimodal AI model. It natively supports audio, text, images, and video inputs, improving accuracy and efficiency over previous versions. It excels in document understanding and long audio-video comprehension.
🔹 Publication Date: Published on Apr 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24954
• PDF: https://arxiv.org/pdf/2604.24954
🔹 Models citing this paper:
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8
✨ Spaces citing this paper:
• https://huggingface.co/spaces/akhaliq/Nemotron-3-Nano-Omni
• https://huggingface.co/spaces/developerjeremylive/Nemotron-3-Nano-Omni-etheroi
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #MultimodalAI #DeepLearning #OpenSourceAI #AIResearch
📝 Summary:
Nemotron 3 Nano Omni is a new efficient, open multimodal AI model. It natively supports audio, text, images, and video inputs, improving accuracy and efficiency over previous versions. It excels in document understanding and long audio-video comprehension.
🔹 Publication Date: Published on Apr 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24954
• PDF: https://arxiv.org/pdf/2604.24954
🔹 Models citing this paper:
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8
✨ Spaces citing this paper:
• https://huggingface.co/spaces/akhaliq/Nemotron-3-Nano-Omni
• https://huggingface.co/spaces/developerjeremylive/Nemotron-3-Nano-Omni-etheroi
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #MultimodalAI #DeepLearning #OpenSourceAI #AIResearch
arXiv.org
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence
We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni...
❤2
✨DeepSeek-OCR: Contexts Optical Compression
📝 Summary:
DeepSeek-OCR compresses long contexts via optical 2D mapping to achieve high OCR precision with significantly reduced vision tokens. It shows 97% accuracy at 10x compression, outperforming other OCR models efficiently. This innovation holds practical value for document processing and LLM training...
🔹 Publication Date: Published on Oct 21, 2025
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/deepseek-ocr-contexts-optical-compression
• PDF: https://arxiv.org/pdf/2510.18234
• Github: https://github.com/deepseek-ai/DeepSeek-OCR
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-OCR
• https://huggingface.co/deepseek-ai/DeepSeek-OCR-2
• https://huggingface.co/unsloth/DeepSeek-OCR
✨ Spaces citing this paper:
• https://huggingface.co/spaces/merterbak/DeepSeek-OCR-Demo
• https://huggingface.co/spaces/davidpcm/openclaw-stock-analyst
• https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#OCR #AI #DeepLearning #ContextCompression #LLM
📝 Summary:
DeepSeek-OCR compresses long contexts via optical 2D mapping to achieve high OCR precision with significantly reduced vision tokens. It shows 97% accuracy at 10x compression, outperforming other OCR models efficiently. This innovation holds practical value for document processing and LLM training...
🔹 Publication Date: Published on Oct 21, 2025
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/deepseek-ocr-contexts-optical-compression
• PDF: https://arxiv.org/pdf/2510.18234
• Github: https://github.com/deepseek-ai/DeepSeek-OCR
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-OCR
• https://huggingface.co/deepseek-ai/DeepSeek-OCR-2
• https://huggingface.co/unsloth/DeepSeek-OCR
✨ Spaces citing this paper:
• https://huggingface.co/spaces/merterbak/DeepSeek-OCR-Demo
• https://huggingface.co/spaces/davidpcm/openclaw-stock-analyst
• https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#OCR #AI #DeepLearning #ContextCompression #LLM
Arxivexplained
DeepSeek-OCR: Contexts Optical Compression - Explained Simply
By Haoran Wei, Yaofeng Sun, Yukun Li. # DeepSeek-OCR: A Game-Changer for Processing Text-Heavy Documents
**The Problem:** Current AI syst...
**The Problem:** Current AI syst...
❤4👍1
AI & ML Papers
Photo
🔥 GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19577
• PDF: https://arxiv.org/pdf/2605.19577
• Project Page: https://huggingface.co/collections/Kwai-Klear/golongrl
🤖 Models citing this paper:
• https://huggingface.co/Kwai-Klear/GoLongRL-4B
• https://huggingface.co/Kwai-Klear/GoLongRL-30B-A3B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Klear/GoLongRL
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ReinforcementLearning #LongContextLearning #MultitaskAlignment #CapabilityOrientedLearning #DeepLearning
💡 The paper introduces GoLongRL, a new approach to long context reinforcement learning that focuses on capability oriented data construction and multitask alignment. The existing methods for long context reinforcement learning often result in homogeneous task coverage and reward formulations that do not accurately reflect real world requirements. To address this issue, the authors propose two main contributions.
First, they introduce a capability oriented data construction method that involves creating a dataset of 23,000 reinforcement learning samples with verifiable rewards, spanning 9 task types, each with its own evaluation metric. The dataset is openly released along with the construction pipeline and training code. The results show that this dataset outperforms a closed source dataset called QwenLong-L1.5 under the same training setup.
Second, the authors propose a new method called TMN-Reweight for heterogeneous multitask optimization. This method combines task level mean normalization for cross task reward scale alignment with difficulty adaptive weighting for more reliable advantage estimation. The results show that TMN-Reweight improves average performance over the vanilla GRPO method, while preserving or improving general capabilities across evaluations.
The authors also train a model called Qwen3-30B-A3B on the new dataset and achieve long context performance comparable to other state of the art models, such as DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507. This suggests that the new dataset and TMN-Reweight method can substantially improve long context capability. Overall, the paper presents a new approach to long context reinforcement learning that focuses on capability oriented data construction and multitask alignment, and achieves state of the art results.
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19577
• PDF: https://arxiv.org/pdf/2605.19577
• Project Page: https://huggingface.co/collections/Kwai-Klear/golongrl
🤖 Models citing this paper:
• https://huggingface.co/Kwai-Klear/GoLongRL-4B
• https://huggingface.co/Kwai-Klear/GoLongRL-30B-A3B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Klear/GoLongRL
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ReinforcementLearning #LongContextLearning #MultitaskAlignment #CapabilityOrientedLearning #DeepLearning
GitHub
Hugging Face
The AI community building the future. Hugging Face has 443 repositories available. Follow their code on GitHub.