✨FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption
📝 Summary:
FlashRT significantly enhances the efficiency of optimization-based prompt injection and knowledge corruption attacks for long-context LLMs. It delivers 2x-7x speedup and 2x-4x GPU memory reduction, enabling systematic and scalable security evaluations.
🔹 Publication Date: Published on Apr 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.28157
• PDF: https://arxiv.org/pdf/2604.28157
• Github: https://github.com/wang-yanting/FlashRT
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
FlashRT significantly enhances the efficiency of optimization-based prompt injection and knowledge corruption attacks for long-context LLMs. It delivers 2x-7x speedup and 2x-4x GPU memory reduction, enabling systematic and scalable security evaluations.
🔹 Publication Date: Published on Apr 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.28157
• PDF: https://arxiv.org/pdf/2604.28157
• Github: https://github.com/wang-yanting/FlashRT
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Step-level Optimization for Efficient Computer-use Agents
📝 Summary:
Computer-use agents are inefficient when using large models for every step. This paper proposes an event-driven cascade that uses small policies by default, escalating to stronger models only when lightweight monitors detect high risk like stalls or semantic drift, thereby optimizing compute.
🔹 Publication Date: Published on Apr 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.27151
• PDF: https://arxiv.org/pdf/2604.27151
• Github: https://github.com/yale-nlp/StepWise
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #AgentSystems #ResourceOptimization #EfficientAI #AdaptiveSystems
📝 Summary:
Computer-use agents are inefficient when using large models for every step. This paper proposes an event-driven cascade that uses small policies by default, escalating to stronger models only when lightweight monitors detect high risk like stalls or semantic drift, thereby optimizing compute.
🔹 Publication Date: Published on Apr 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.27151
• PDF: https://arxiv.org/pdf/2604.27151
• Github: https://github.com/yale-nlp/StepWise
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #AgentSystems #ResourceOptimization #EfficientAI #AdaptiveSystems
✨SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs
📝 Summary:
SignRoundV2 is a post-training quantization method for LLMs. It achieves competitive, near full-precision accuracy even at extremely low-bits like 2-bits. This is done via layer-wise bit allocation and pre-tuning scale search.
🔹 Publication Date: Published on Dec 4, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04746
• PDF: https://arxiv.org/pdf/2512.04746
• Project Page: https://github.com/intel/auto-round
• Github: https://github.com/intel/auto-round
🔹 Models citing this paper:
• https://huggingface.co/Intel/MiroThinker-v1.5-30B-gguf-q2ks-mixed-AutoRound
• https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLMs #Quantization #DeepLearning #AI #MachineLearning
📝 Summary:
SignRoundV2 is a post-training quantization method for LLMs. It achieves competitive, near full-precision accuracy even at extremely low-bits like 2-bits. This is done via layer-wise bit allocation and pre-tuning scale search.
🔹 Publication Date: Published on Dec 4, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04746
• PDF: https://arxiv.org/pdf/2512.04746
• Project Page: https://github.com/intel/auto-round
• Github: https://github.com/intel/auto-round
🔹 Models citing this paper:
• https://huggingface.co/Intel/MiroThinker-v1.5-30B-gguf-q2ks-mixed-AutoRound
• https://huggingface.co/Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#LLMs #Quantization #DeepLearning #AI #MachineLearning
✨Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence
📝 Summary:
Nemotron 3 Nano Omni is a new efficient, open multimodal AI model. It natively supports audio, text, images, and video inputs, improving accuracy and efficiency over previous versions. It excels in document understanding and long audio-video comprehension.
🔹 Publication Date: Published on Apr 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24954
• PDF: https://arxiv.org/pdf/2604.24954
🔹 Models citing this paper:
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8
✨ Spaces citing this paper:
• https://huggingface.co/spaces/akhaliq/Nemotron-3-Nano-Omni
• https://huggingface.co/spaces/developerjeremylive/Nemotron-3-Nano-Omni-etheroi
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #MultimodalAI #DeepLearning #OpenSourceAI #AIResearch
📝 Summary:
Nemotron 3 Nano Omni is a new efficient, open multimodal AI model. It natively supports audio, text, images, and video inputs, improving accuracy and efficiency over previous versions. It excels in document understanding and long audio-video comprehension.
🔹 Publication Date: Published on Apr 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24954
• PDF: https://arxiv.org/pdf/2604.24954
🔹 Models citing this paper:
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
• https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8
✨ Spaces citing this paper:
• https://huggingface.co/spaces/akhaliq/Nemotron-3-Nano-Omni
• https://huggingface.co/spaces/developerjeremylive/Nemotron-3-Nano-Omni-etheroi
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#AI #MultimodalAI #DeepLearning #OpenSourceAI #AIResearch
arXiv.org
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence
We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni...
❤2
✨DeepSeek-OCR: Contexts Optical Compression
📝 Summary:
DeepSeek-OCR compresses long contexts via optical 2D mapping to achieve high OCR precision with significantly reduced vision tokens. It shows 97% accuracy at 10x compression, outperforming other OCR models efficiently. This innovation holds practical value for document processing and LLM training...
🔹 Publication Date: Published on Oct 21, 2025
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/deepseek-ocr-contexts-optical-compression
• PDF: https://arxiv.org/pdf/2510.18234
• Github: https://github.com/deepseek-ai/DeepSeek-OCR
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-OCR
• https://huggingface.co/deepseek-ai/DeepSeek-OCR-2
• https://huggingface.co/unsloth/DeepSeek-OCR
✨ Spaces citing this paper:
• https://huggingface.co/spaces/merterbak/DeepSeek-OCR-Demo
• https://huggingface.co/spaces/davidpcm/openclaw-stock-analyst
• https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#OCR #AI #DeepLearning #ContextCompression #LLM
📝 Summary:
DeepSeek-OCR compresses long contexts via optical 2D mapping to achieve high OCR precision with significantly reduced vision tokens. It shows 97% accuracy at 10x compression, outperforming other OCR models efficiently. This innovation holds practical value for document processing and LLM training...
🔹 Publication Date: Published on Oct 21, 2025
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/deepseek-ocr-contexts-optical-compression
• PDF: https://arxiv.org/pdf/2510.18234
• Github: https://github.com/deepseek-ai/DeepSeek-OCR
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-OCR
• https://huggingface.co/deepseek-ai/DeepSeek-OCR-2
• https://huggingface.co/unsloth/DeepSeek-OCR
✨ Spaces citing this paper:
• https://huggingface.co/spaces/merterbak/DeepSeek-OCR-Demo
• https://huggingface.co/spaces/davidpcm/openclaw-stock-analyst
• https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#OCR #AI #DeepLearning #ContextCompression #LLM
Arxivexplained
DeepSeek-OCR: Contexts Optical Compression - Explained Simply
By Haoran Wei, Yaofeng Sun, Yukun Li. # DeepSeek-OCR: A Game-Changer for Processing Text-Heavy Documents
**The Problem:** Current AI syst...
**The Problem:** Current AI syst...
❤4👍1
AI & ML Papers
Photo
🔥 SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models
📅 Published on Apr 14
🔗 Links:
• arXiv: https://arxiv.org/abs/2604.12617
• PDF: https://arxiv.org/pdf/2604.12617
• Project Page: https://hy-soar.github.io/
• GitHub: https://github.com/Tencent-Hunyuan/HY-SOAR ⭐ 350
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#DiffusionModels #SelfCorrectionTechniques #OptimalAlignmentMethods #RefinementInAI #PostTrainingMethods
💡 The paper introduces a new post-training method called SOAR for diffusion models, which addresses the gap between supervised fine-tuning and reinforcement learning. Currently, supervised fine-tuning optimizes the denoiser only on ground-truth states, but once inference deviates from these ideal states, it relies on out-of-distribution generalization rather than learned correction, leading to exposure bias. Reinforcement learning can address this mismatch, but its terminal reward signal is sparse and suffers from credit-assignment difficulty.
SOAR proposes a bias-correction post-training method that fills this gap by providing dense, reward-free supervision through self-correction mechanisms. The method starts from a real sample, performs a single stop-gradient rollout with the current model, re-noises the resulting off-trajectory state, and supervises the model to steer back toward the original clean target. This approach is on-policy, reward-free, and provides dense per-timestep supervision with no credit-assignment problem.
The results show that SOAR improves the performance of diffusion models on various tasks, including image and text generation. On the SD3.5-Medium dataset, SOAR improves the GenEval score from 0.70 to 0.78 and the OCR score from 0.64 to 0.67 over supervised fine-tuning. Additionally, SOAR surpasses the performance of Flow-GRPO in final metric value on both aesthetic and text-image alignment tasks, despite having no access to a reward model. The paper concludes that SOAR can directly replace supervised fine-tuning as a stronger first post-training stage after pretraining, while remaining fully compatible with subsequent reinforcement learning alignment.
📅 Published on Apr 14
🔗 Links:
• arXiv: https://arxiv.org/abs/2604.12617
• PDF: https://arxiv.org/pdf/2604.12617
• Project Page: https://hy-soar.github.io/
• GitHub: https://github.com/Tencent-Hunyuan/HY-SOAR ⭐ 350
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#DiffusionModels #SelfCorrectionTechniques #OptimalAlignmentMethods #RefinementInAI #PostTrainingMethods
arXiv.org
SOAR: Self-Correction for Optimal Alignment and Refinement in...
The post-training pipeline for diffusion models currently has two stages: supervised fine-tuning (SFT) on curated data and reinforcement learning (RL) with reward models. A fundamental gap...
AI & ML Papers
Photo
🔥 TradingAgents: Multi-Agents LLM Financial Trading Framework
📅 Published on Dec 28, 2024
🔗 Links:
• arXiv: https://arxiv.org/abs/2412.20138
• PDF: https://arxiv.org/pdf/2412.20138
• GitHub: https://github.com/tauricresearch/tradingagents ⭐ 66.0k
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/shanghengdu/LLM-Agent-Optimization-PaperList
• https://huggingface.co/spaces/tahp0604/ai-stock-watchlist
• https://huggingface.co/spaces/Ervin2077/qiu
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultiAgentSystems #LargeLanguageModels #FinancialTrading #ArtificialIntelligenceInFinance #AgentBasedModeling
💡 The paper introduces TradingAgents, a multi-agent framework that utilizes large language models for stock trading, simulating the collaborative dynamics of real-world trading firms. The framework consists of various agents, including fundamental analysts, sentiment analysts, technical analysts, and traders with different risk profiles, all powered by large language models. These agents work together to assess market conditions, manage risk, and make informed trading decisions. The framework also includes researcher agents that evaluate market conditions and a risk management team that monitors exposure.
The authors propose this framework as a solution to the limitations of existing single-agent systems and multi-agent frameworks that gather data independently. By simulating a dynamic and collaborative trading environment, TradingAgents aims to improve trading performance metrics such as cumulative returns and Sharpe ratio.
The results of the experiments show that the TradingAgents framework outperforms baseline models, with significant improvements in cumulative returns, Sharpe ratio, and maximum drawdown. The framework is made available to the public, demonstrating the potential of multi-agent large language model frameworks in financial trading. Overall, the paper contributes to the development of more sophisticated and collaborative trading systems, inspired by the dynamics of real-world trading firms.
📅 Published on Dec 28, 2024
🔗 Links:
• arXiv: https://arxiv.org/abs/2412.20138
• PDF: https://arxiv.org/pdf/2412.20138
• GitHub: https://github.com/tauricresearch/tradingagents ⭐ 66.0k
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/shanghengdu/LLM-Agent-Optimization-PaperList
• https://huggingface.co/spaces/tahp0604/ai-stock-watchlist
• https://huggingface.co/spaces/Ervin2077/qiu
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultiAgentSystems #LargeLanguageModels #FinancialTrading #ArtificialIntelligenceInFinance #AgentBasedModeling
arXiv.org
TradingAgents: Multi-Agents LLM Financial Trading Framework
Significant progress has been made in automated problem-solving using societies of agents powered by large language models (LLMs). In finance, efforts have largely focused on single-agent systems...
AI & ML Papers
Photo
🔥 VibeVoice Technical Report
📅 Published on Aug 26, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2508.19205
• PDF: https://arxiv.org/pdf/2508.19205
• Project Page: https://microsoft.github.io/VibeVoice/
• GitHub: https://github.com/microsoft/VibeVoice ⭐ 46.4k
🤖 Models citing this paper:
• https://huggingface.co/microsoft/VibeVoice-1.5B
• https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B
• https://huggingface.co/aoi-ot/VibeVoice-Large
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/ChaitanyaChandra/VibeVoice
• https://huggingface.co/spaces/lths/VibeVoice-Demo
• https://huggingface.co/spaces/vibingvoice/vibe-voice-custom-voices
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#SpeechSynthesis #MultiSpeakerModeling #DiffusionBasedModeling #ContinuousSpeechTokenization #LatentVectorGeneration
💡 The VibeVoice Technical Report presents a novel model for synthesizing long-form multi-speaker speech. The problem addressed is the need for a method that can efficiently and effectively generate high-quality long-form speech with multiple speakers. To solve this problem, the authors propose a method called next-token diffusion, which is a unified approach for modeling continuous data by generating latent vectors via diffusion.
The authors introduce a novel continuous speech tokenizer that significantly improves data compression and computational efficiency. This tokenizer achieves an 80 times improvement in data compression compared to the popular Encodec model while maintaining comparable performance. The tokenizer preserves audio fidelity and enables the efficient processing of long sequences.
The results of the VibeVoice model are impressive, with the ability to synthesize long-form speech for up to 90 minutes with a maximum of 4 speakers. The model captures the authentic conversational tone and surpasses open-source and proprietary dialogue models. The VibeVoice model achieves superior performance and fidelity, making it a significant contribution to the field of speech synthesis. Overall, the VibeVoice Technical Report presents a novel and efficient approach to synthesizing high-quality long-form multi-speaker speech.
📅 Published on Aug 26, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2508.19205
• PDF: https://arxiv.org/pdf/2508.19205
• Project Page: https://microsoft.github.io/VibeVoice/
• GitHub: https://github.com/microsoft/VibeVoice ⭐ 46.4k
🤖 Models citing this paper:
• https://huggingface.co/microsoft/VibeVoice-1.5B
• https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B
• https://huggingface.co/aoi-ot/VibeVoice-Large
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/ChaitanyaChandra/VibeVoice
• https://huggingface.co/spaces/lths/VibeVoice-Demo
• https://huggingface.co/spaces/vibingvoice/vibe-voice-custom-voices
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#SpeechSynthesis #MultiSpeakerModeling #DiffusionBasedModeling #ContinuousSpeechTokenization #LatentVectorGeneration
arXiv.org
VibeVoice Technical Report
This report presents VibeVoice, a novel model designed to synthesize long-form speech with multiple speakers by employing next-token diffusion, which is a unified method for modeling continuous...
AI & ML Papers
Photo
🔥 Kronos: A Foundation Model for the Language of Financial Markets
📅 Published on Aug 2, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2508.02739
• PDF: https://arxiv.org/pdf/2508.02739
• GitHub: https://github.com/shiyu-coder/Kronos ⭐ 22.7k
🤖 Models citing this paper:
• https://huggingface.co/NeoQuasar/Kronos-base
• https://huggingface.co/NeoQuasar/Kronos-Tokenizer-base
• https://huggingface.co/NeoQuasar/Kronos-mini
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/yingfeng64/kronos-api
• https://huggingface.co/spaces/almascp/kronos-eurusd-dashboard
• https://huggingface.co/spaces/superyan/kronos-jp
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#FinancialLanguageModels #KLineDataAnalysis #TimeSeriesForecasting #VolatilityPrediction #FinancialMarketModeling
💡 The paper introduces Kronos, a pre-training framework for financial K-line data that outperforms existing models in forecasting and synthetic data generation. The problem addressed is that current time series foundation models often underperform non-pre-trained architectures when applied to financial candlestick data and overlook important tasks such as volatility prediction and synthetic data generation. To solve this, the authors propose a specialized tokenizer that converts continuous market information into token sequences, preserving price dynamics and trade activity patterns. Kronos is pre-trained using an autoregressive objective on a large dataset of over 12 billion K-line records from 45 global exchanges, enabling it to learn nuanced temporal and cross-asset representations. The results show that Kronos excels in a zero-shot setting across various financial tasks, achieving a 93 percent improvement in price series forecasting over the leading time series foundation model and an 87 percent improvement over the best non-pre-trained baseline. Additionally, Kronos achieves a 9 percent lower mean absolute error in volatility forecasting and a 22 percent improvement in generative fidelity for synthetic K-line sequences. The pre-trained model is publicly available, establishing Kronos as a robust and versatile foundation model for end-to-end financial time series analysis.
📅 Published on Aug 2, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2508.02739
• PDF: https://arxiv.org/pdf/2508.02739
• GitHub: https://github.com/shiyu-coder/Kronos ⭐ 22.7k
🤖 Models citing this paper:
• https://huggingface.co/NeoQuasar/Kronos-base
• https://huggingface.co/NeoQuasar/Kronos-Tokenizer-base
• https://huggingface.co/NeoQuasar/Kronos-mini
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/yingfeng64/kronos-api
• https://huggingface.co/spaces/almascp/kronos-eurusd-dashboard
• https://huggingface.co/spaces/superyan/kronos-jp
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#FinancialLanguageModels #KLineDataAnalysis #TimeSeriesForecasting #VolatilityPrediction #FinancialMarketModeling
arXiv.org
Kronos: A Foundation Model for the Language of Financial Markets
The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to...
❤3
AI & ML Papers
Photo
🔥 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
📅 Published on Sep 26, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2509.22186
• PDF: https://arxiv.org/pdf/2509.22186
• Project Page: https://opendatalab.github.io/MinerU/
• GitHub: https://github.com/opendatalab/MinerU ⭐ 61.9k
🤖 Models citing this paper:
• https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
• https://huggingface.co/opendatalab/MinerU-Diffusion-V1-0320-2.5B
• https://huggingface.co/freakynit/MinerU2.5-2509-1.2B
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/opendatalab/MinerU-Diffusion-V1-0320-2.5B
• https://huggingface.co/spaces/Instantnewdesign/document_extract
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#DocumentParsing #VisionLanguageModel #HighResolutionImageProcessing #LayoutAnalysis #ContentRecognition
💡 The paper introduces MinerU2.5, a 1.2 billion parameter vision-language model designed for efficient high-resolution document parsing. The model achieves state-of-the-art recognition accuracy while maintaining computational efficiency through a two-stage parsing strategy. In the first stage, the model performs layout analysis on downsampled images to identify structural elements, reducing computational overhead. In the second stage, it performs targeted content recognition on native-resolution crops extracted from the original image, preserving fine-grained details in dense text, complex formulas, and tables. To support this strategy, the authors developed a comprehensive data engine that generates diverse, large-scale training corpora for both pretraining and fine-tuning. The results demonstrate that MinerU2.5 achieves state-of-the-art performance on multiple benchmarks, surpassing both general-purpose and domain-specific models across various recognition tasks, while maintaining significantly lower computational overhead. Overall, the paper contributes a novel approach to document parsing that balances accuracy and efficiency, making it suitable for a wide range of applications.
📅 Published on Sep 26, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2509.22186
• PDF: https://arxiv.org/pdf/2509.22186
• Project Page: https://opendatalab.github.io/MinerU/
• GitHub: https://github.com/opendatalab/MinerU ⭐ 61.9k
🤖 Models citing this paper:
• https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
• https://huggingface.co/opendatalab/MinerU-Diffusion-V1-0320-2.5B
• https://huggingface.co/freakynit/MinerU2.5-2509-1.2B
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/opendatalab/MinerU-Diffusion-V1-0320-2.5B
• https://huggingface.co/spaces/Instantnewdesign/document_extract
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#DocumentParsing #VisionLanguageModel #HighResolutionImageProcessing #LayoutAnalysis #ContentRecognition
arXiv.org
MinerU2.5: A Decoupled Vision-Language Model for Efficient...
We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our...
❤4
