AI & ML Papers
32.8K subscribers
7.05K photos
519 videos
24 files
7.7K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
AI & ML Papers
Photo
🔥 EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

💡 The paper introduces EverMemOS, a self-organizing memory operating system designed to enhance the long-term interaction capabilities of large language models. The problem addressed is that current large language models have limited context windows, making it difficult to sustain coherent behavior over extended interactions. Existing memory systems store isolated records and retrieve fragments, which limits their ability to consolidate evolving user states and resolve conflicts.

The method proposed by EverMemOS involves an engram-inspired lifecycle for computational memory, which includes three main components: Episodic Trace Formation, Semantic Consolidation, and Reconstructive Recollection. Episodic Trace Formation converts dialogue streams into memory cells that capture episodic traces, atomic facts, and time-bounded foresight signals. Semantic Consolidation organizes these memory cells into thematic scenes, distilling stable semantic structures and updating user profiles. Reconstructive Recollection performs scene-guided agentic retrieval to compose the necessary and sufficient context for downstream reasoning.

The results show that EverMemOS achieves state-of-the-art performance on memory-augmented reasoning tasks, as demonstrated by experiments on LoCoMo and LongMemEval. Additionally, a profile study on PersonaMem v2 and qualitative case studies illustrate the chat-oriented capabilities of EverMemOS, such as user profiling and foresight. The code for EverMemOS is available, making it possible for others to build upon and extend this work. Overall, the paper presents a significant contribution to the development of large language models, enabling them to engage in more coherent and effective long-term interactions.


📅 Published on Jan 5

🔗 Links:
• arXiv: https://arxiv.org/abs/2601.02163
• PDF: https://arxiv.org/pdf/2601.02163
• GitHub: https://github.com/EverMind-AI/EverMemOS 4.4k

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#SelfOrganizingMemory #LongHorizonReasoning #LargeLanguageModels #MemoryOperatingSystem #StructuredReasoning
AI & ML Papers
Photo
🔥 DFlash: Block Diffusion for Flash Speculative Decoding

💡 The paper introduces DFlash, a speculative decoding framework designed to improve the speed of large language models while maintaining their quality. The problem with current large language models is that they require sequential decoding, which leads to high latency and poor GPU utilization. Speculative decoding has been proposed as a solution, where a fast draft model generates outputs that are then verified in parallel by the target model. However, existing speculative decoding methods still rely on sequential drafting, which limits their speedup.

To address this, the authors propose using a lightweight block diffusion model for parallel drafting. This model generates draft tokens in a single forward pass and conditions the draft model on context features extracted from the target model. The result is a framework that enables efficient drafting with high-quality outputs and higher acceptance rates.

The experiments show that DFlash achieves significant speedup over existing autoregressive methods, with over 6x lossless acceleration across a range of models and tasks. This is up to 2.5x higher speedup than the state-of-the-art speculative decoding method. The method contributes to improving the efficiency of large language models, making them more suitable for practical applications. Overall, DFlash offers a promising solution for speeding up large language models without sacrificing their performance.


📅 Published on Feb 5

🔗 Links:
• arXiv: https://arxiv.org/abs/2602.06036
• PDF: https://arxiv.org/pdf/2602.06036
• Project Page: https://z-lab.ai/projects/dflash/
• GitHub: https://github.com/z-lab/dflash 3.1k

🤖 Models citing this paper:
https://huggingface.co/z-lab/Qwen3.6-27B-DFlash
https://huggingface.co/z-lab/Qwen3.6-35B-A3B-DFlash
https://huggingface.co/z-lab/Qwen3.5-27B-DFlash

🚀 Spaces citing this paper:
https://huggingface.co/spaces/Jackrong/qwen36-eval

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#SpeculativeDecoding #BlockDiffusionModels #LargeLanguageModels #ParallelDecodingTechniques #FlashSpeculativeDecoding
AI & ML Papers
Photo
🔥 Adam's Law: Textual Frequency Law on Large Language Models

💡 The paper proposes a novel framework to improve large language model performance through textual frequency analysis. The authors argue that textual frequency, which is the frequency of certain words or phrases in a language, is relevant to human cognition and can also be applied to large language models. However, this topic has been understudied in the context of large language models.

The proposed framework consists of three main components. First, the authors introduce the Textual Frequency Law, which states that frequent textual data should be preferred for large language models, both for prompting and fine-tuning. To estimate the sentence-level frequency, the authors use online resources, as many large language models are closed-source in their training data. They also utilize an input paraphraser to paraphrase the input into a more frequent textual expression.

The second component is Textual Frequency Distillation, which involves querying large language models to conduct story completion by extending sentences in the datasets. The resulting corpora are used to adjust the initial estimation of textual frequency.

The third component is Curriculum Textual Frequency Training, which fine-tunes large language models in an increasing order of sentence-level frequency. This means that the models are first trained on the most frequent sentences and then gradually moved to less frequent ones.

The authors conducted experiments on a curated dataset called Textual Frequency Paired Dataset, which covers tasks such as math reasoning, machine translation, commonsense reasoning, and agentic tool calling. The results show that the proposed framework is effective in improving large language model performance.

Overall, the paper contributes to the understanding of textual frequency in large language models and provides a novel framework for improving their performance. The proposed framework has the potential to be applied to various natural language processing tasks and can lead to more efficient and effective large language models.


📅 Published on Apr 2

🔗 Links:
• arXiv: https://arxiv.org/abs/2604.02176
• PDF: https://arxiv.org/pdf/2604.02176
• GitHub: https://github.com/HongyuanLuke/frequencylaw 658

📊 Datasets citing this paper:
https://huggingface.co/datasets/Akaashiiii/TFPD

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#AdamSLaw #TextualFrequencyAnalysis #LargeLanguageModels #NaturalLanguageProcessing #LanguageModelOptimization
2
AI & ML Papers
Photo
🔥 QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

💡 The paper introduces QuantAgent, a multi-agent large language model framework designed specifically for high-frequency trading. High-frequency trading requires rapid and precise decisions based on short-term market signals, which is different from traditional financial applications that involve long-term semantic reasoning. Existing large language models are not well-suited for high-frequency trading due to their lack of structured reasoning capabilities and domain-specific tools.

To address this problem, the QuantAgent framework decomposes trading into four specialized agents: Indicator, Pattern, Trend, and Risk. Each agent is equipped with domain-specific tools and structured reasoning capabilities to capture distinct aspects of market dynamics over short temporal windows. The Indicator agent focuses on technical indicators, the Pattern agent focuses on chart patterns, the Trend agent focuses on trend-based features, and the Risk agent focuses on risk management.

The results show that QuantAgent outperforms strong neural and rule-based baselines in terms of predictive accuracy and cumulative return over 4-hour trading intervals. The evaluation was conducted across ten financial instruments, including Bitcoin and Nasdaq futures, using zero-shot evaluations. The findings suggest that combining structured financial priors with language-native reasoning can unlock new potential for real-time decision systems in high-frequency financial markets.

The main contribution of the paper is the introduction of a multi-agent large language model framework that is specifically designed for high-frequency trading. The framework's ability to decompose trading into specialized agents and leverage domain-specific tools and structured reasoning capabilities makes it well-suited for the high-speed and precision-critical demands of high-frequency trading. The results demonstrate the effectiveness of the QuantAgent framework and highlight its potential for use in real-world high-frequency trading applications.


📅 Published on Sep 12, 2025

🔗 Links:
• arXiv: https://arxiv.org/abs/2509.09995
• PDF: https://arxiv.org/pdf/2509.09995
• Project Page: https://Y-Research-SBU.github.io/QuantAgent/
• GitHub: https://github.com/Y-Research-SBU/QuantAgent 2.5k

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#HighFrequencyTrading #MultiAgentSystems #LargeLanguageModels #FinancialMachineLearning #AlgorithmicTrading
3👍2
AI & ML Papers
Photo
🔥 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

💡 The paper proposes a novel approach to improve the performance of large language models through test-time scaling, which involves allocating additional computation during inference. Existing test-time scaling strategies are typically hand-crafted, relying on manual design and tuning of reasoning patterns and heuristics. This approach leaves much of the computation-allocation space unexplored, resulting in potential inefficiencies.

To address this limitation, the authors introduce AutoTTS, an environment-driven framework that automates the discovery of test-time scaling strategies. Instead of designing individual strategies, researchers can create environments where optimal strategies can be discovered automatically. The key to AutoTTS lies in constructing a discovery environment that provides a tractable control space and frequent, low-cost feedback for strategy search.

The authors formulate test-time scaling as a controller synthesis problem over pre-collected reasoning trajectories and probe signals. In this framework, controllers decide when to branch, continue, probe, prune, or stop, and can be evaluated cheaply without requiring repeated calls to the language model. To make the search tractable, the authors introduce beta parameterization, which enables fine-grained execution trace feedback to improve discovery efficiency.

The proposed approach is evaluated on mathematical reasoning benchmarks, where the discovered strategies demonstrate improved accuracy-cost tradeoffs over strong manually designed baselines. The discovered strategies also generalize to held-out benchmarks and model scales, indicating their robustness and flexibility. Notably, the entire discovery process incurs a relatively low cost of 39.9 dollars and 160 minutes, making it a practical and efficient solution.

Overall, the paper contributes a novel framework for automating test-time scaling strategy discovery, which has the potential to improve the performance of large language models while reducing the need for manual design and tuning. The authors also make their data and code available, facilitating further research and development in this area.


📅 Published on May 8

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.08083
• PDF: https://arxiv.org/pdf/2605.08083
• Project Page: https://zhengkid.github.io/AutoTTS-web/
• GitHub: https://github.com/zhengkid/AutoTTS 43

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #TestTimeScaling #AgenticDiscovery #AutomatedReasoning #LanguageModelOptimization
3
AI & ML Papers
Photo
🔥 UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification

💡 The paper introduces UniPrefill, a universal prefill acceleration framework designed to improve the inference efficiency of long-context processing in large language models. The problem addressed is that existing prefill acceleration methods are limited to specific model architectures and suffer performance degradation when applied to emerging architectures. Additionally, these methods are often incompatible with continuous batching, making it difficult to integrate them into modern inference engines.

The proposed UniPrefill framework overcomes these limitations by directly accelerating the model's computation at the token level, making it applicable to virtually any model architecture. UniPrefill is implemented as a continuous batching operator and is integrated into the vLLM inference engine, enabling seamless support for prefill-decode co-processing and tensor parallelism.

The results show that UniPrefill achieves significant speedup, with up to 2.1x improvement in Time-To-First-Token, and the acceleration becomes more pronounced as the number of concurrent requests grows. This makes UniPrefill a valuable contribution to the field, enabling more efficient and scalable long-context processing in large language models.


📅 Published on May 7

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.06221
• PDF: https://arxiv.org/pdf/2605.06221
• GitHub: https://github.com/qhfan/UniPrefill 22

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LongContextProcessing #PrefillAcceleration #DynamicSparsification #LargeLanguageModels #BlockWiseOptimization
4
AI & ML Papers
Photo
🔥 δ-mem: Efficient Online Memory for Large Language Models

💡 The paper proposes a lightweight memory mechanism called delta-mem to enhance large language models by providing a compact online state of associative memory. The problem addressed is the need for large language models to accumulate and reuse historical information in long-term assistants and agent systems, which is challenging due to the high cost of expanding the context window and ineffective context utilization.

The proposed method, delta-mem, augments a frozen full-attention backbone with a compact online state that compresses past information into a fixed-size state matrix updated by delta-rule learning. This online state is used to generate low-rank corrections to the backbone's attention computation during generation, allowing for efficient online memory.

The results show that delta-mem improves the average score of the frozen backbone and achieves larger gains on memory-heavy benchmarks, such as MemoryAgentBench and LoCoMo, while preserving general capabilities. Notably, delta-mem achieves these results with only an 8x8 online memory state, demonstrating that effective memory can be realized through a compact online state directly coupled with attention computation, without requiring full fine-tuning, backbone replacement, or explicit context extension. Overall, the paper contributes a novel and efficient approach to enhancing large language models with online memory, which has the potential to improve performance in a range of applications.


📅 Published on May 12

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.12357
• PDF: https://arxiv.org/pdf/2605.12357
• GitHub: https://github.com/declare-lab/delta-Mem 46

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #AssociativeMemoryMechanisms #EfficientOnlineLearning #DeltaRuleLearning #CompactStateRepresentations
AI & ML Papers
Photo
🔥 Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

💡 The paper introduces Orthrus, a dual architecture framework that combines the strengths of autoregressive large language models and diffusion models to achieve fast parallel token generation while maintaining exact inference fidelity. The problem with standard autoregressive decoding is that it is sequential, which represents a fundamental bottleneck for high throughput inference. Diffusion language models try to address this issue with parallel generation, but they suffer from performance degradation, high training costs, and lack of convergence guarantees.

The Orthrus framework resolves this issue by augmenting a frozen large language model with a lightweight trainable module to create a parallel diffusion view alongside the standard autoregressive view. Both views attend to the same high fidelity key value cache, where the autoregressive head executes context pre filling to construct accurate key value representations, and the diffusion head executes parallel generation. The framework employs an exact consensus mechanism between the two views to guarantee lossless inference.

The results show that Orthrus delivers a speedup of up to 7.8 times with only a constant memory cache overhead and minimal parameter additions. This is achieved by sharing key value caches and using a consensus mechanism, which allows the framework to maintain exact inference fidelity while generating tokens in parallel. Overall, the Orthrus framework provides a simple and efficient solution to the problem of slow sequential decoding in autoregressive large language models, and it has the potential to be seamlessly integrated into existing transformer architectures.


📅 Published on May 12

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.12825
• PDF: https://arxiv.org/pdf/2605.12825

🤖 Models citing this paper:
https://huggingface.co/chiennv/Orthrus-Qwen3-8B
https://huggingface.co/chiennv/Orthrus-Qwen3-4B
https://huggingface.co/chiennv/Orthrus-Qwen3-1.7B

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DiffusionLanguageModels #ParallelTokenGeneration #AutoregressiveDecoding #DualViewDiffusion #LargeLanguageModels
AI & ML Papers
Photo
🔥 DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

💡 The paper introduces DataFlow, a framework for unified data preparation and workflow automation in the context of large language models. The problem addressed is the current lack of scalable and reliable data preparation pipelines, which are often dominated by ad-hoc scripts and loosely specified workflows, hindering reproducibility and model performance.

To address this challenge, the authors propose DataFlow, a framework that provides system-level abstractions for modular, reusable, and composable data transformations. It includes a PyTorch-style pipeline construction API and nearly 200 reusable operators, as well as six domain-general pipelines for various tasks such as text, mathematical reasoning, and code.

The framework also includes DataFlow-Agent, which can automatically translate natural-language specifications into executable pipelines. This is achieved through operator synthesis, pipeline planning, and iterative verification.

The results show that DataFlow consistently improves downstream large language model performance across six representative use cases. The framework outperforms curated human datasets and specialized synthetic baselines, achieving significant gains in execution accuracy and average improvements on code benchmarks.

For example, the math, code, and text pipelines achieve up to 3 percent execution accuracy in Text-to-SQL, 7 percent average improvements on code benchmarks, and 1-3 point gains on math benchmarks. Additionally, a unified dataset produced by DataFlow enables base models to surpass counterparts trained on larger datasets.

Overall, the paper demonstrates that DataFlow provides a practical and high-performance substrate for reliable, reproducible, and scalable large language model data preparation, and establishes a system-level foundation for future data-centric AI development.


📅 Published on Dec 18, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2512.16676
• PDF: https://arxiv.org/pdf/2512.16676
• Project Page: https://github.com/OpenDCAI/DataFlow

📊 Datasets citing this paper:
https://huggingface.co/datasets/OpenDCAI/dataflow-demo-Text2SQL
https://huggingface.co/datasets/OpenDCAI/dataflow-mm-context_vqa
https://huggingface.co/datasets/OpenDCAI/dataflow-instruct-10k

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#DataCentricAI #LLMDrivenFrameworks #UnifiedDataPreparation #WorkflowAutomation #LargeLanguageModels
2
AI & ML Papers
Photo
🔥 SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

💡 The paper introduces SkillsVote, a governance framework for managing reusable skills in long-horizon large language model agents. The problem addressed is that raw trajectories of agent experiences are noisy and hard to govern, making it difficult to reuse and improve agent skills. To solve this, the authors propose treating agent skills as an experience schema that combines executable scripts with non-executable guidance on procedures.

The SkillsVote framework consists of three main processes: collection, recommendation, and evolution of agent skills. It starts by profiling a large open-source corpus of skills to identify environment requirements, quality, and verifiability. Then, it synthesizes tasks for verifiable skills and performs a search over a structured skill library to provide instructional context before execution. After execution, it decomposes trajectories into skill-linked subtasks, attributes outcomes to skill use, and admits only successful reusable discoveries to updates.

The evaluation of SkillsVote shows promising results, with offline evolution improving performance on Terminal-Bench 2.0 by up to 7.9 percentage points and online evolution improving performance on SWE-Bench Pro by up to 2.6 percentage points. The key contribution of the paper is that governed external skill libraries can improve frozen agents without requiring model updates, as long as systems control exposure, credit, and preservation of skills. Overall, the SkillsVote framework provides a structured approach to managing and improving agent skills, enabling more efficient and effective reuse of experience and knowledge.


📅 Published on May 18

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18401
• PDF: https://arxiv.org/pdf/2605.18401
• Project Page: https://skills.vote

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#AgentGovernance #LargeLanguageModels #SkillEvolution #ReusableSkills #LifecycleManagement
AI & ML Papers
Photo
🔥 Code as Agent Harness

💡 The paper discusses the concept of code as agent harness, where large language models are used as operational substrates for agent reasoning and execution in agentic systems. The authors argue that code is no longer just a target output, but serves as a unified infrastructure layer across multiple domains and applications. They introduce a unified view that centers code as the basis for agent infrastructure, and organize their survey around three connected layers: the harness interface, harness mechanisms, and scaling the harness.

The harness interface layer explores how code connects agents to reasoning, action, and environment modeling. The harness mechanisms layer examines planning, memory, and tool use for long-horizon execution, as well as feedback-driven control and optimization. The scaling layer discusses how to extend the harness from single-agent systems to multi-agent settings, where shared code artifacts support multi-agent coordination, review, and verification.

The authors summarize representative methods and practical applications of code as agent harness, including coding assistants, GUI/OS automation, embodied agents, scientific discovery, personalization and recommendation, DevOps, and enterprise workflows. They also outline open challenges for harness engineering, such as evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across multiple agents, human oversight for safety-critical actions, and extensions to multimodal environments.

The paper provides a unified roadmap toward executable, verifiable, and stateful AI agent systems by centering code as the harness of agentic AI. The authors demonstrate the potential of code as agent harness to enable more efficient, adaptable, and reliable agent systems, and highlight the need for further research in harness engineering to address the open challenges and limitations of this approach. Overall, the paper contributes to the development of agentic systems by providing a new perspective on the role of code in agent infrastructure and highlighting the potential benefits and challenges of this approach.


📅 Published on May 18

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18747
• PDF: https://arxiv.org/pdf/2605.18747

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#AgenticSystems #LargeLanguageModels #AgentReasoning #CodeAsInfrastructure #ArtificialIntelligence
3
AI & ML Papers
Photo
🔥 MemOS: A Memory OS for AI System

💡 The paper introduces MemOS, a memory operating system designed for Large Language Models to address the challenges of memory management. Current models lack a well-defined memory management system, relying on static parameters and short-lived contextual states, which limits their ability to track user preferences or update knowledge over time. The proposed MemOS system unifies plaintext, activation-based, and parameter-level memories, enabling efficient storage, retrieval, and continual learning.

The key contribution of MemOS is the introduction of a basic unit called a MemCube, which encapsulates both memory content and metadata such as provenance and versioning. MemCubes can be composed, migrated, and fused over time, allowing for flexible transitions between memory types and bridging retrieval with parameter-based learning.

By treating memory as a manageable system resource, MemOS establishes a memory-centric system framework that brings controllability, plasticity, and evolvability to Large Language Models. This framework enables cost-efficient storage and retrieval, laying the foundation for continual learning and personalized modeling. The proposed system has the potential to address the broader challenges of managing heterogeneous knowledge spanning different temporal scales and sources, and can substantially reduce the training and inference costs of Large Language Models.

Overall, the paper proposes a novel approach to memory management for Large Language Models, which can improve their ability to learn and adapt over time, and can pave the way for the development of more advanced Artificial General Intelligence systems. The results of the paper demonstrate the effectiveness of the proposed MemOS system in addressing the challenges of memory management in Large Language Models, and highlight its potential to enable more efficient and effective learning and adaptation in these models.


📅 Published on Jul 4, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2507.03724
• PDF: https://arxiv.org/pdf/2507.03724
• Project Page: https://memos.openmem.net/

🤖 Models citing this paper:
https://huggingface.co/kagvi13/HMP

📊 Datasets citing this paper:
https://huggingface.co/datasets/MemTensor/MemOS_eval_result

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MemoryOperatingSystem #LargeLanguageModels #MemoryManagementSystems #ContinualLearningAlgorithms #ArtificialIntelligenceArchitecture
AI & ML Papers
Photo
🔥 OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform

💡 OpenGuardrails is an open source project that provides a unified model for detecting content safety and model manipulation risks in large language models. The project aims to address the critical issue of safeguarding large language models against unsafe, malicious, or privacy violating content. The OpenGuardrails platform offers a comprehensive solution that includes a context aware safety and manipulation detection model, as well as a separate named entity recognition pipeline for identifying and redacting sensitive data.

The platform protects against various types of risks, including content safety risks, model manipulation attacks such as prompt injection and jailbreaking, and data leakage. The content safety and model manipulation detection are implemented using a unified large model, while data leakage identification and redaction are performed using a separate lightweight named entity recognition pipeline.

The OpenGuardrails system can be deployed in various ways, including as a security gateway or an API based service, with enterprise grade deployment options that ensure fully private deployment. The project achieves state of the art performance on safety benchmarks, excelling in both prompt and response classification across multiple languages, including English, Chinese, and multilingual tasks.

The key contributions of the OpenGuardrails project include providing a unified model for content safety and model manipulation detection, offering a separate named entity recognition pipeline for data leakage identification and redaction, and achieving state of the art performance on safety benchmarks. The project also makes all models available under the Apache 2.0 license for public use, allowing for widespread adoption and further development of the technology. Overall, OpenGuardrails provides a comprehensive and effective solution for safeguarding large language models against various types of risks, and its open source nature makes it a valuable resource for the data science community.


📅 Published on Oct 22, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2510.19169
• PDF: https://arxiv.org/pdf/2510.19169
• Project Page: https://openguardrails.com

🤖 Models citing this paper:
https://huggingface.co/openguardrails/OpenGuardrails-Text-2510
https://huggingface.co/openguardrails/OpenGuardrails-Text-4B-0124

📊 Datasets citing this paper:
https://huggingface.co/datasets/openguardrails/OpenGuardrailsMixZh_97k
https://huggingface.co/datasets/qtqtqtqt/OpenGuardrailsMixZh_97k
https://huggingface.co/datasets/ruishen123/OpenGuardrailsMixZh_97k

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#ContextAwareAI #LargeLanguageModels #ContentSafety #ModelManipulation #NamedEntityRecognition
🔥 MiniCPM4: Ultra-Efficient LLMs on End Devices

💡 The paper introduces MiniCPM4, a highly efficient large language model designed for end-side devices. The goal is to achieve superior performance while being efficient, which is a challenge for large language models due to their computational requirements. To address this, the authors propose innovations in four key areas: model architecture, training data, training algorithms, and inference systems.

In terms of model architecture, the authors propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. For training data, they propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens.

The authors also propose ModelTunnel v2 for efficient pre-training strategy search and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient ternary LLM, BitCPM. For inference systems, they propose CPM.cu, which integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding.

The MiniCPM4 model is available in two versions, with 0.5B and 8B parameters, respectively. The evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences.

The results also show that MiniCPM4 can be adapted to power diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability. Overall, the paper presents a highly efficient large language model that achieves superior performance on end-side devices, making it a significant contribution to the field of natural language processing.


📅 Published on Jun 9, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2506.07900
• PDF: https://arxiv.org/pdf/2506.07900
• Project Page: https://huggingface.co/collections/openbmb/minicpm4-6841ab29d180257e940baa9b

🤖 Models citing this paper:
https://huggingface.co/openbmb/MiniCPM4.1-8B
https://huggingface.co/openbmb/MiniCPM5-1B
https://huggingface.co/openbmb/MiniCPM4-8B

📊 Datasets citing this paper:
https://huggingface.co/datasets/openbmb/Ultra-FineWeb

🚀 Spaces citing this paper:
https://huggingface.co/spaces/openbmb/MiniCPM5-1B-Demo
https://huggingface.co/spaces/openbmb/Ultra-FineWeb-L2-Selector
https://huggingface.co/spaces/openbmb/MiniCPM4.1-8B-Demo

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#EfficientLLMs #LargeLanguageModels #SparseAttentionMechanisms #EndDeviceComputing #LowResourceNLP
AI & ML Papers
Photo
🔥 ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

💡 The paper presents ProRL Agent, a scalable infrastructure for reinforcement learning training of multi-turn large language model agents. The problem addressed is the difficulty in generating and managing large numbers of sandboxed rollout trajectories required for reinforcement learning, which is a key component for improving the long-horizon behavior of these agents. Existing infrastructures often combine rollout orchestration with the training loop, making systems hard to migrate and maintain.

To solve this problem, the authors propose a rollout-as-a-service approach, where ProRL Agent serves the full agentic rollout lifecycle through an API service. This allows for decoupling rollout orchestration from the training loop, making the system more flexible and easier to maintain. Additionally, ProRL Agent provides standardized and extensible sandbox environments that support diverse agentic tasks in high-performance computing settings.

The authors validate ProRL Agent by applying it to reinforcement learning training on various tasks, including software engineering, math, STEM, and coding. The results demonstrate the effectiveness of ProRL Agent in supporting scalable and efficient reinforcement learning training. Furthermore, ProRL Agent is open-sourced and integrated as part of NVIDIA NeMo Gym, making it accessible to the research community. Overall, the paper contributes a scalable and flexible infrastructure for reinforcement learning training of multi-turn large language model agents, which can facilitate advancements in complex, interactive tasks.


📅 Published on Mar 19

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2603.18815
• PDF: https://arxiv.org/pdf/2603.18815

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#ReinforcementLearning #LargeLanguageModels #MultiTurnDialogue #RolloutOptimization #RLTrainingInfrastructure
1
AI & ML Papers
Photo
🔥 LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

💡 The paper LongTraceRL addresses the challenge of long-context reasoning in large language models. Long-context reasoning is a central challenge for these models as they often fail to locate and integrate key information in extensive distracting content. Existing methods using reinforcement learning with verifiable rewards have shown promise but are limited by low-confusability distractors and sparse reward signals that cannot supervise intermediate reasoning steps.

To address these issues, the authors introduce LongTraceRL, a method that uses tiered distractor construction and rubric reward design to improve reasoning quality. For data construction, the authors generate multi-hop questions via knowledge graph random walks and leverage search agent trajectories to build tiered distractors. These distractors include documents the agent read but did not cite, which are high in confusability, and documents that appeared in search results but were never opened, which are low in confusability. This approach produces training contexts that are far more challenging than those built by random sampling or one-shot search.

The authors also propose a rubric reward that uses gold entities along each reasoning chain as fine-grained, entity-level process supervision. This reward is applied only to responses with correct final answers, which distinguishes the reasoning quality among correct responses and prevents reward hacking.

The experiments on three reasoning large language models across five long-context benchmarks demonstrate that LongTraceRL consistently outperforms strong baselines and encourages comprehensive, evidence-grounded reasoning. The results show that LongTraceRL is effective in improving the long-context reasoning capabilities of large language models. The codes, datasets, and models are available for further research and development. Overall, LongTraceRL provides a new approach to addressing the challenge of long-context reasoning in large language models and has the potential to improve the performance of these models in a variety of applications.


📅 Published on May 29

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.31584
• PDF: https://arxiv.org/pdf/2605.31584

🤖 Models citing this paper:
https://huggingface.co/THU-KEG/LongTraceRL-4B
https://huggingface.co/THU-KEG/LongTraceRL-8B
https://huggingface.co/THU-KEG/LongTraceRL-30B

📊 Datasets citing this paper:
https://huggingface.co/datasets/THU-KEG/LongTraceRL

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LongContextReasoning #ReinforcementLearning #LargeLanguageModels #RubricRewards #SearchAgentTrajectories
2
🔥 KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

💡 The paper introduces KVarN, a new method for quantizing KV-cache in large language models to reduce error accumulation during autoregressive decoding. The problem addressed is that test-time scaling, which improves reasoning in large language models, becomes memory-bottlenecked during long-horizon decoding due to the growing KV-cache. Existing KV-cache quantization methods are not effective in this setting because they are evaluated under prefill-like settings, where errors behave differently than in autoregressive decoding. In autoregressive decoding, quantization errors accumulate across timesteps, primarily due to incorrect token scales.

The KVarN method addresses this issue by applying a Hadamard rotation followed by a dual-scaling variance normalization across both axes of the K and V matrices. This combination fixes outlying token-scale errors and substantially reduces error accumulation. The method is calibration-free, meaning it does not require any additional calibration steps.

The results show that KVarN establishes a new state-of-the-art for KV-cache quantization on generative benchmarks, including MATH500, AIME24, and HumanEval, at 2-bit precision. This means that KVarN is able to achieve better performance than existing methods while using less memory. The KVarN method is also available for implementation in large language models, providing a practical solution to the problem of error accumulation in autoregressive decoding. Overall, the paper contributes a new and effective method for quantizing KV-cache in large language models, which can improve the performance and efficiency of these models in reasoning tasks.


📅 Published on Jun 2

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.03458
• PDF: https://arxiv.org/pdf/2606.03458
• Project Page: https://github.com/huawei-csl/KVarN

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#KVCacheQuantization #AutoregressiveDecoding #LargeLanguageModels #ErrorAccumulationMitigation #QuantizationMethodsForReasoningTasks
AI & ML Papers
Photo
🔥 Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

💡 The paper introduces the Role-Agent framework, which aims to improve the performance of Large Language Model agents by addressing the limitations of inefficient interaction feedback and static training environments. The problem with current Large Language Model agents is that their learning is hindered by the lack of effective feedback and the inability to adapt to changing environments, resulting in limited generalization.

To address this issue, the Role-Agent framework enables a single Large Language Model to function as both the agent and the environment, allowing for a bootstrapped co-evolution process. This framework consists of two components: World-In-Agent and Agent-In-World. The World-In-Agent component uses the Large Language Model as the agent to predict future states after each action, and the alignment between predicted and actual states is used as a reward to encourage environment-aware reasoning.

The Agent-In-World component analyzes failure modes from failed trajectories and retrieves tasks with similar failure patterns, thereby reshaping the training data distribution for targeted practice. This allows the Large Language Model to focus on improving its performance in areas where it is struggling.

The results of the experiments show that the Role-Agent framework consistently improves performance, with an average gain of over 4 percent over strong baselines. This demonstrates the effectiveness of the Role-Agent framework in improving the performance of Large Language Model agents by enabling them to adapt to changing environments and focus on targeted practice. Overall, the Role-Agent framework provides a novel approach to improving the performance of Large Language Model agents, and its results have significant implications for the development of more effective and adaptive language models.


📅 Published on Jun 9

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.10917
• PDF: https://arxiv.org/pdf/2606.10917

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #AgentEnvironmentInteraction #DualRoleEvolution #BootstrappedLearning #CoEvolutionaryAlgorithms
AI & ML Papers
Photo
🔥 LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference

💡 The paper presents LMCache, an efficient key-value cache layer for large language model inference at the enterprise scale. The problem addressed is the traditional storage of key-value caches in GPU memory, which limits cache reuse across different queries and inference engines. As the total key-value cache stored by users grows rapidly, exceeding the capacity of GPU memory, there is a need to move caches outside GPU devices.

The authors propose LMCache as a solution, which extracts and stores key-value caches generated by modern large language model engines out of the GPU memory and shares them across engines and queries. LMCache supports cache offloading and prefill-decode disaggregation, allowing for cross-engine and GPU cache transfer. The key contributions of LMCache include highly optimized key-value cache data movement, a modular cache connector component that decouples LMCache from the evolution of inference engines, and a control API for flexible cache orchestration across different layers.

The evaluation of LMCache shows significant improvements in throughput, with up to 15 times improvement when combined with a large language model engine. The adoption of LMCache in enterprise settings provides valuable insights, such as the benefits of fetching key-value caches from remote storage and the impact of context truncation on prefix cache hit ratio. Overall, LMCache is presented as an efficient and open-source key-value caching solution that addresses the need for efficient cache management in large language model inference.


📅 Published on Oct 8, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2510.09665
• PDF: https://arxiv.org/pdf/2510.09665
• Project Page: https://huggingface.co/collections/dvps/dvps-scientific-watch

🤖 Models citing this paper:
https://huggingface.co/enfinity7B/apac

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #LLMInference #KVCacheOptimization #EnterpriseScaleAI #GPUAcceleratedInference
2
AI & ML Papers
Photo
🔥 Foundations of Large Language Models

💡 The book Foundations of Large Language Models provides a comprehensive overview of the fundamental concepts underlying large language models. The book is structured into four main chapters, each focusing on a key area: pre-training, generative models, prompting techniques, and alignment methods. The authors aim to provide a foundational understanding of large language models, rather than a comprehensive coverage of all cutting-edge technologies. The book is intended for college students, professionals, and practitioners in natural language processing and related fields, serving as a reference for anyone interested in large language models.

The problem addressed by the book is the need for a clear understanding of the foundational concepts of large language models, which are becoming increasingly important in natural language processing. The method used to address this problem is a structured approach, dividing the topic into four key areas and exploring each in depth. The results of this approach are a book that provides a solid foundation for understanding large language models, which can be used as a reference by students, professionals, and practitioners in the field.

Overall, the book provides a foundational understanding of large language models, covering key areas such as pre-training, generative models, prompting techniques, and alignment methods, and is intended to serve as a reference for those interested in this topic. The book does not aim to cover all cutting-edge technologies, but rather provides a solid foundation for understanding the underlying concepts of large language models.


📅 Published on Jan 16, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2501.09223
• PDF: https://arxiv.org/pdf/2501.09223

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #NaturalLanguageProcessing #PreTrainingMethods #GenerativeModels #LanguageModelAlignment
1