AI & ML Papers

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

312 views03:49

262 views03:49

🔥 HRM-Text: Efficient Pretraining Beyond Scaling

💡 The current approach to training large language models requires massive computational power and large amounts of raw text, creating a significant barrier to research. Inspired by the efficient learning processes of biological systems, the authors propose a new approach called HRM-Text, which uses a Hierarchical Recurrent Model architecture. This architecture decouples computation into two layers, a slow-evolving strategic layer and a fast-evolving execution layer, allowing for more efficient processing. To stabilize this model, the authors introduce two new techniques, MagicNorm and warmup deep credit assignment.

Instead of training on raw text, HRM-Text is trained exclusively on instruction-response pairs using a task-completion objective. The model is also trained with PrefixLM masking, which helps to improve its performance. The results show that a 1 billion parameter HRM-Text model, trained from scratch on only 40 billion unique tokens and with a budget of 1500 dollars, achieves competitive performance on several benchmarks, including MMLU, ARC-C, DROP, GSM8K, and MATH.

Notably, HRM-Text achieves this performance while utilizing significantly fewer training tokens and less estimated compute than standard baselines. Specifically, it uses 100-900 times fewer training tokens and 96-432 times less estimated compute. This demonstrates that co-designing architectures and objectives can radically reduce the compute-to-performance ratio, making it possible to train large language models from scratch with limited resources. The authors' approach makes pretraining more accessible to the broader research community, which could lead to further advancements in the field of natural language processing.

📅 Published on May 20

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.20613
• PDF: https://arxiv.org/pdf/2605.20613
• Project Page: https://github.com/sapientinc/HRM-Text

🤖 Models citing this paper:
• https://huggingface.co/sapientinc/HRM-Text-1B

📊 Datasets citing this paper:
• https://huggingface.co/datasets/sapientinc/HRM-Text-data-io-cleaned-20260515

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/nikravan/HRM-Text-1B
• https://huggingface.co/spaces/Bhaddy392/GPT_AI
• https://huggingface.co/spaces/bunnycore/HRM-Text-1B

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#HierarchicalRecurrentModels #EfficientPretrainingMethods #LargeLanguageModelOptimization #InstructionResponsePairLearning #NeuralArchitectureInnovation

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

415 views03:49

312 views03:49

🔥 MemOS: A Memory OS for AI System

💡 The paper introduces MemOS, a memory operating system designed for Large Language Models to address the challenges of memory management. Current models lack a well-defined memory management system, relying on static parameters and short-lived contextual states, which limits their ability to track user preferences or update knowledge over time. The proposed MemOS system unifies plaintext, activation-based, and parameter-level memories, enabling efficient storage, retrieval, and continual learning.

The key contribution of MemOS is the introduction of a basic unit called a MemCube, which encapsulates both memory content and metadata such as provenance and versioning. MemCubes can be composed, migrated, and fused over time, allowing for flexible transitions between memory types and bridging retrieval with parameter-based learning.

By treating memory as a manageable system resource, MemOS establishes a memory-centric system framework that brings controllability, plasticity, and evolvability to Large Language Models. This framework enables cost-efficient storage and retrieval, laying the foundation for continual learning and personalized modeling. The proposed system has the potential to address the broader challenges of managing heterogeneous knowledge spanning different temporal scales and sources, and can substantially reduce the training and inference costs of Large Language Models.

Overall, the paper proposes a novel approach to memory management for Large Language Models, which can improve their ability to learn and adapt over time, and can pave the way for the development of more advanced Artificial General Intelligence systems. The results of the paper demonstrate the effectiveness of the proposed MemOS system in addressing the challenges of memory management in Large Language Models, and highlight its potential to enable more efficient and effective learning and adaptation in these models.

📅 Published on Jul 4, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2507.03724
• PDF: https://arxiv.org/pdf/2507.03724
• Project Page: https://memos.openmem.net/

🤖 Models citing this paper:
• https://huggingface.co/kagvi13/HMP

📊 Datasets citing this paper:
• https://huggingface.co/datasets/MemTensor/MemOS_eval_result

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MemoryOperatingSystem #LargeLanguageModels #MemoryManagementSystems #ContinualLearningAlgorithms #ArtificialIntelligenceArchitecture

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

502 views03:49

459 views03:49

🔥 AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

💡 AutoResearchClaw is a new autonomous research system that improves scientific discovery by incorporating human collaboration and iterative learning. The problem with existing autonomous research systems is that they often model the research process as a linear pipeline, relying on single agent reasoning and stopping when execution fails, without carrying experience across runs.

The authors of AutoResearchClaw address this issue by introducing a multi agent autonomous research pipeline built on five key mechanisms. The first mechanism is structured multi agent debate for hypothesis generation and result analysis, which allows for multiple perspectives to be considered. The second mechanism is a self healing executor with a pivot refine decision loop that transforms failures into information, enabling the system to learn from its mistakes. The third mechanism is verifiable result reporting that prevents fabricated numbers and hallucinated citations, ensuring the accuracy of the results. The fourth mechanism is human in the loop collaboration with seven intervention modes, allowing for varying levels of human oversight and collaboration. The fifth mechanism is cross run evolution that converts past mistakes into future safeguards, enabling the system to improve over time.

The results of AutoResearchClaw are impressive, outperforming a previous system called AI Scientist v2 by 54.7 percent on a 25 topic experiment stage benchmark. The authors also conducted a human in the loop ablation study, which revealed that precise targeted collaboration at high leverage decision points consistently outperforms both full autonomy and exhaustive step by step oversight. Overall, AutoResearchClaw is positioned as a research amplifier that augments rather than replaces human scientific judgment, and its code is available for further development and use.

📅 Published on May 19

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.20025
• PDF: https://arxiv.org/pdf/2605.20025
• Project Page: https://github.com/aiming-lab/AutoResearchClaw

📊 Datasets citing this paper:
• https://huggingface.co/datasets/AIMING-Lab-UNC/ARC-Bench

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#AutonomousResearchSystems #HumanAICollaboration #MultiAgentLearning #ArtificialIntelligenceInScience #SelfReinforcingSystems

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

❤1

585 views03:49

469 views13:50

🔥 Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

💡 The paper introduces Gated DeltaNet-2, a new linear attention model that improves upon existing models by decoupling erase and write operations through distinct channel-wise gates. The problem with existing linear attention models is that they use a single scalar gate to control both erasing old content and writing new content, which can lead to scrambling existing associations in the compressed memory. Gated DeltaNet-2 addresses this limitation by introducing separate channel-wise gates for erasing and writing, allowing for more precise control over the editing process.

The method builds upon previous models such as Delta-rule models and Kimi Delta Attention, which use a delta rule to subtract the current read before writing a new value and sharpen forgetting with channel-wise decay. Gated DeltaNet-2 generalizes these models by inheriting adaptive forgetting and channel-wise decay while separating the roles of erasing and writing. The model uses a fast-weight update view, a chunkwise WY algorithm with channel-wise decay absorbed into asymmetric erase factors, and a gate-aware backward pass that preserves efficient parallel training.

The results show that Gated DeltaNet-2 achieves the strongest overall results among several variants, including Mamba-2, Gated DeltaNet, KDA, and Mamba-3, across language modeling, commonsense reasoning, and retrieval tasks. The model is particularly effective on long-context tasks, such as the RULER needle-in-a-haystack benchmarks, where it improves the evaluated multi-key retrieval setting and remains strong in both recurrent and hybrid settings. The model was trained on 100B FineWeb-Edu tokens with 1.3B parameters, and the code is available for further research. Overall, Gated DeltaNet-2 provides a significant improvement over existing linear attention models, allowing for more precise control over the editing process and achieving state-of-the-art results on several tasks.

📅 Published on May 21

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.22791
• PDF: https://arxiv.org/pdf/2605.22791

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LinearAttentionMechanisms #GatedDeltaNet2 #DecoupledEraseWriteOperations #ChannelWiseGating #DeltaRuleModels

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

562 views13:50

464 views13:50

🔥 π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

💡 The paper introduces a new benchmark called pi-Bench to evaluate proactive personal assistant agents in long-horizon workflows. The problem addressed is that current benchmarks fail to assess the ability of agents to identify hidden user intents through sustained multi-turn interactions. To address this gap, the authors created pi-Bench, which consists of 100 multi-turn tasks across 5 domain-specific user personas. This benchmark incorporates hidden user intents, inter-task dependencies, and cross-session continuity to evaluate agents' ability to anticipate and address user needs over extended interactions.

The method used to create pi-Bench involved designing tasks that require agents to identify and act on hidden user intents before they are explicitly stated. The benchmark measures agents' proactivity and task completion in long-horizon trajectories, which better reflect real-world use. The authors conducted experiments using pi-Bench to evaluate the performance of proactive assistance agents.

The results show that proactive assistance remains a challenging task, and there is a clear distinction between task completion and proactivity. The experiments also demonstrate the value of prior interaction for proactive intent resolution in later tasks. Overall, the paper contributes to the development of more effective proactive personal assistant agents by providing a benchmark that can be used to evaluate and improve their performance in real-world settings. The introduction of pi-Bench has the potential to advance the field of personal assistant agents and improve their ability to support users across everyday life and work.

📅 Published on May 19

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.14678
• PDF: https://arxiv.org/pdf/2605.14678
• Project Page: https://simplified-reasoning.github.io/Pi-Bench

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#ProactivePersonalAssistants #LongHorizonWorkflows #MultiTurnInteractions #HiddenUserIntents #PersonalAssistantAgents

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

❤1

539 views13:50

🔥 PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

💡 The paper introduces PhysX-Omni, a unified framework for generating simulation-ready 3D assets with physical properties across multiple categories. The problem addressed is that existing 3D generation methods either neglect physical properties or are limited to a single asset category, such as rigid, deformable, or articulated objects. To address this, the authors develop a novel geometry representation tailored for vision-language models, which directly encodes high-resolution 3D structures without compression, significantly improving generation performance.

The PhysX-Omni framework generates simulation-ready physical 3D assets using this novel geometry representation. The authors also construct the first general simulation-ready 3D dataset, PhysXVerse, covering diverse indoor and outdoor categories. To evaluate the framework, they propose PhysX-Bench, a benchmark that encompasses six key attributes: geometry, absolute scale, material, affordance, kinematics, and function description.

The results show that PhysX-Omni performs strongly in both generation and understanding, outperforming conventional metrics and PhysX-Bench. Additional studies validate the potential of PhysX-Omni for applications in simulation-ready scene generation and robotic policy learning. The authors believe that PhysX-Omni can significantly advance a wide range of downstream applications, particularly in embodied AI and physics-based simulation.

The key contributions of the paper are the development of a novel geometry representation, the construction of the PhysXVerse dataset, and the proposal of the PhysX-Bench benchmark. These contributions enable the generation of simulation-ready physical 3D assets across multiple categories, which can be used in various applications such as robotics, computer vision, and simulation. Overall, the paper presents a significant advancement in the field of 3D generation and simulation, with potential applications in a wide range of areas.

📅 Published on May 20

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.21572
• PDF: https://arxiv.org/pdf/2605.21572
• Project Page: https://physx-omni.github.io

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#ComputerVision #3DModeling #PhysicsBasedSimulation #ArticulatedObjectSimulation #DeformableObjectModeling

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

508 views23:51

410 viewsedited 23:51

🔥 GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

💡 The paper proposes a self-evolving image generation framework called GenEvolve that improves generative capabilities through iterative learning and reference-based prompting. The problem addressed is that high-quality image generation often requires combining a model's internal generative ability with external resources, and existing methods have limitations in handling diverse and demanding requests.

The GenEvolve framework models each generation attempt as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing methods that rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience.

This visual experience is provided to a privileged teacher branch, which uses visual experience distillation to provide dense token-level supervision to a student branch. This helps the student internalize better search, knowledge activation, reference selection, and prompt construction. The authors also construct GenEvolve-Data and GenEvolve-Bench to evaluate the framework.

The results show that GenEvolve achieves substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. The experiments on public benchmarks and GenEvolve-Bench demonstrate the effectiveness of the proposed framework. Overall, the paper contributes a novel self-evolving image generation framework that can effectively handle diverse and demanding generation challenges.

📅 Published on May 20

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.21605
• PDF: https://arxiv.org/pdf/2605.21605
• Project Page: https://ephemeral182.github.io/GenEvolve/

🤖 Models citing this paper:
• https://huggingface.co/MeiGen-AI/GenEvolve

📊 Datasets citing this paper:
• https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data-Bench

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#ComputerVision #ImageGeneration #GenerativeModels #SelfEvolvingSystems #DeepLearning

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

631 views23:51

578 views23:51

580 views23:51

🔥 LongCat-Video Technical Report

💡 The paper introduces LongCat-Video, a 13.6 billion parameter video generation model based on the Diffusion Transformer framework. The model is designed to generate high-quality long videos efficiently, which is a crucial step towards creating world models. LongCat-Video has a unified architecture that can perform multiple tasks, including text-to-video, image-to-video, and video continuation, using a single model.

The model achieves efficient long video generation through a coarse-to-fine generation strategy and block sparse attention, allowing it to generate 720p, 30fps videos within minutes. The coarse-to-fine generation strategy works by gradually increasing the resolution and detail of the video, both in terms of time and space. Block sparse attention is a technique that reduces the computational cost of the model by only attending to certain parts of the input data.

The model was trained using a multi-reward reinforcement learning from human feedback approach, which enables it to achieve performance comparable to state-of-the-art models. The use of multi-reward reinforcement learning from human feedback allows the model to learn from human evaluators and improve its performance over time.

The results show that LongCat-Video excels in generating high-quality long videos, maintaining temporal coherence and quality even in videos that are several minutes long. The model's efficiency and performance make it a significant contribution to the field of video generation, and the fact that the code and model weights are publicly available will accelerate progress in this area. Overall, LongCat-Video is a foundational model that takes an important step towards creating world models, which are complex models that can simulate and generate realistic videos and other types of data.

📅 Published on Oct 25, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2510.22200
• PDF: https://arxiv.org/pdf/2510.22200

🤖 Models citing this paper:
• https://huggingface.co/meituan-longcat/LongCat-Video
• https://huggingface.co/Nishant2414/LongCat-Video
• https://huggingface.co/fjkane/LongCat-Video-bf16

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/cpuai/LongCat-Video-Avatar
• https://huggingface.co/spaces/multimodalart/LongCat-Video
• https://huggingface.co/spaces/armaishere/meituan-longcat-LongCat-Video

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#VideoGenerationModels #DiffusionTransformer #LongVideoSynthesis #TextToVideoSynthesis #ImageToVideoGeneration

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

❤4

811 views23:51

626 viewsedited 05:51