AI & ML Papers
Photo
🔥 Zep: A Temporal Knowledge Graph Architecture for Agent Memory
📅 Published on Jan 20, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• GitHub: https://github.com/getzep/graphiti ⭐ 25.7k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#TemporalKnowledgeGraphs #ArtificialIntelligenceAgents #KnowledgeGraphArchitecture #RetrievalAugmentedGeneration #DynamicKnowledgeIntegration
💡 The paper introduces Zep, a novel memory layer service for artificial intelligence agents, which outperforms the current state of the art system, MemGPT. The problem addressed is the limitation of existing retrieval-augmented generation frameworks, which are restricted to static document retrieval and cannot handle dynamic knowledge integration from diverse sources, including ongoing conversations and business data.
To address this limitation, Zep uses a core component called Graphiti, a temporally-aware knowledge graph engine that dynamically synthesizes both unstructured conversational data and structured business data while maintaining historical relationships. This allows Zep to excel in dynamic knowledge integration and temporal reasoning, critical for enterprise use cases.
The results show that Zep demonstrates superior performance in the Deep Memory Retrieval benchmark, with an accuracy of 94.8 percent compared to MemGPT's 93.4 percent. Furthermore, Zep's capabilities are validated through the LongMemEval benchmark, which better reflects enterprise use cases through complex temporal reasoning tasks. In this evaluation, Zep achieves substantial results with accuracy improvements of up to 18.5 percent while simultaneously reducing response latency by 90 percent compared to baseline implementations.
Overall, the paper presents Zep as an effective solution for real-world applications, particularly in enterprise-critical tasks such as cross-session information synthesis and long-term context maintenance, demonstrating its potential for deployment in real-world applications.
📅 Published on Jan 20, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• GitHub: https://github.com/getzep/graphiti ⭐ 25.7k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#TemporalKnowledgeGraphs #ArtificialIntelligenceAgents #KnowledgeGraphArchitecture #RetrievalAugmentedGeneration #DynamicKnowledgeIntegration
arXiv.org
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark. Additionally, Zep excels in...
AI & ML Papers
Photo
🔥 AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets
📅 Published on Dec 1, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2512.10971
• PDF: https://arxiv.org/pdf/2512.10971
• Project Page: https://ai4trade.ai/
• GitHub: https://github.com/HKUDS/AI-Trader ⭐ 14.0k
📊 Datasets citing this paper:
• https://huggingface.co/datasets/T1anyu/AI-Trader
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AIBenchmarking #FinancialMarketAnalysis #AutonomousTradingAgents #LargeLanguageModels #RealTimeFinancialDecisionMaking
💡 The paper introduces AI-Trader, a fully automated live benchmark for evaluating large language models in financial decision-making across multiple markets. The benchmark is designed to address the gap in systematic benchmarking for real-world financial applications, where autonomous agents must make decisions in fully dynamic and live environments. The authors argue that existing efforts have not adequately addressed the challenge of evaluating large language models in real-time financial markets, where stringent requirements exist for live strategic responsiveness.
To address this gap, the authors developed AI-Trader, which spans three major financial markets: US stocks, A-shares, and cryptocurrencies, with multiple trading granularities to simulate live financial environments. The benchmark implements a fully autonomous minimal information paradigm, where agents receive only essential context and must independently search, verify, and synthesize live market information without human intervention.
The authors evaluated six mainstream large language models across three markets and multiple trading frequencies. The results show that general intelligence does not automatically translate to effective trading capability, with most agents exhibiting poor returns and weak risk management. The analysis reveals that risk control capability determines cross-market robustness, and that AI trading strategies achieve excess returns more readily in highly liquid markets than policy-driven environments.
The paper's contributions include the introduction of a novel benchmark for evaluating large language models in real-time financial markets, and the identification of critical limitations in current autonomous agents. The findings provide clear directions for future improvements, including the need for better risk control and the development of more effective trading strategies. The code and evaluation data are open-sourced to foster community research. Overall, the paper presents a significant step forward in the development of autonomous agents for financial decision-making, and highlights the need for further research in this area.
📅 Published on Dec 1, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2512.10971
• PDF: https://arxiv.org/pdf/2512.10971
• Project Page: https://ai4trade.ai/
• GitHub: https://github.com/HKUDS/AI-Trader ⭐ 14.0k
📊 Datasets citing this paper:
• https://huggingface.co/datasets/T1anyu/AI-Trader
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AIBenchmarking #FinancialMarketAnalysis #AutonomousTradingAgents #LargeLanguageModels #RealTimeFinancialDecisionMaking
arXiv.org
AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets
Large Language Models (LLMs) have demonstrated remarkable potential as autonomous agents, approaching human-expert performance through advanced reasoning and tool orchestration. However,...
❤1
AI & ML Papers
Photo
🔥 AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
📅 Published on Aug 22, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2508.16279
• PDF: https://arxiv.org/pdf/2508.16279
• GitHub: https://github.com/agentscope-ai/agentscope ⭐ 24.6k
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/yashu2000/TemporalBenchEnv_Blog
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AgenticApplications #AgentScope #LargeLanguageModels #ReActParadigm #AgenticFrameworkDevelopment
💡 The paper introduces AgentScope 1.0, a framework designed to support the development of agentic applications. The framework addresses the need for flexible and efficient tool-based interactions between agents and their environment, driven by the rapid advancements in Large Language Models. AgentScope provides a comprehensive set of tools and infrastructure to enable developers to build agentic applications, including unified interfaces, extensible modules, and advanced agent-level infrastructure based on the ReAct paradigm. The framework also includes built-in agents tailored to specific practical scenarios and robust engineering support for a developer-friendly experience. Additionally, AgentScope features a scalable evaluation module with a visual studio interface and a runtime sandbox to ensure safe agent execution and facilitate rapid deployment in production environments. The overall goal of AgentScope is to provide a practical foundation for building scalable, adaptive, and effective agentic applications, and the framework achieves this by providing a systematic asynchronous design that enriches human-agent and agent-agent interaction patterns while improving execution efficiency. The results of the framework are a set of tools and infrastructure that enable developers to easily leverage the latest progress in agentic applications, such as new models and MCPs, and to build long-trajectory agentic applications that are more manageable and easier to trace.
📅 Published on Aug 22, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2508.16279
• PDF: https://arxiv.org/pdf/2508.16279
• GitHub: https://github.com/agentscope-ai/agentscope ⭐ 24.6k
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/yashu2000/TemporalBenchEnv_Blog
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AgenticApplications #AgentScope #LargeLanguageModels #ReActParadigm #AgenticFrameworkDevelopment
arXiv.org
AgentScope 1.0: A Developer-Centric Framework for Building Agentic...
Driven by rapid advancements of Large Language Models (LLMs), agents are empowered to combine intrinsic knowledge with dynamic tool use, greatly enhancing their capacity to address real-world...
❤2
AI & ML Papers
Photo
🔥 Very Large-Scale Multi-Agent Simulation in AgentScope
📅 Published on Jul 25, 2024
🔗 Links:
• arXiv: https://arxiv.org/abs/2407.17789
• PDF: https://arxiv.org/pdf/2407.17789
• GitHub: https://github.com/modelscope/agentscope ⭐ 24.6k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultiAgentSimulation #AgentBasedModeling #DistributedSimulation #ScalableComputing #ParallelProcessing
💡 The paper addresses the challenges of conducting large scale multi agent simulations with existing platforms, which include limited scalability, low efficiency, and effort intensive management processes. To overcome these challenges, the authors enhance the AgentScope platform by introducing several new features and components. They propose an actor based distributed mechanism to improve scalability and efficiency, and provide flexible environment support to simulate various real world scenarios. This allows for parallel execution of multiple agents, centralized workflow orchestration, and interactions among agents. The authors also integrate a configurable tool and an automatic background generation pipeline to simplify the process of creating agents with diverse background settings. Additionally, they provide a web based interface for monitoring and managing a large number of agents across multiple devices. The authors conduct a comprehensive simulation to demonstrate the effectiveness of the proposed enhancements and release the source code on GitHub to inspire further research and development in large scale multi agent simulations. The results show the great potential of applying multi agent systems in large scale simulations, and the enhancements to AgentScope improve its convenience and flexibility for supporting very large scale multi agent simulations.
📅 Published on Jul 25, 2024
🔗 Links:
• arXiv: https://arxiv.org/abs/2407.17789
• PDF: https://arxiv.org/pdf/2407.17789
• GitHub: https://github.com/modelscope/agentscope ⭐ 24.6k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultiAgentSimulation #AgentBasedModeling #DistributedSimulation #ScalableComputing #ParallelProcessing
arXiv.org
Very Large-Scale Multi-Agent Simulation in AgentScope
Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting...
❤2
AI & ML Papers
Photo
🔥 EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery
📅 Published on Mar 9
🔗 Links:
• arXiv: https://arxiv.org/abs/2603.08127
• PDF: https://arxiv.org/pdf/2603.08127
• GitHub: https://github.com/EvoScientist/EvoScientist ⭐ 2.6k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultiAgentSystems #EvolvingAI #ScientificDiscovery #ArtificialIntelligenceResearch #AutonomousScience
💡 The paper introduces EvoScientist, a multi-agent framework designed to enhance scientific discovery by learning from past interactions. The problem with current AI scientist systems is that they rely on static pipelines and fail to adapt based on accumulated interaction histories, leading to overlooked research directions, repeated failed experiments, and pursuit of infeasible ideas. To address this, EvoScientist uses three specialized agents: a Researcher Agent for idea generation, an Engineer Agent for experiment implementation, and an Evolution Manager Agent that distills insights from prior interactions into reusable knowledge. The framework also includes two persistent memory modules: an ideation memory that summarizes feasible research directions and records unsuccessful ones, and an experimentation memory that captures effective data processing and model training strategies. These modules enable the agents to retrieve relevant prior strategies, improving idea quality and code execution success rates over time. The results show that EvoScientist outperforms seven state-of-the-art systems in scientific idea generation, achieving higher novelty, feasibility, relevance, and clarity, and also improves code execution success rates through multi-agent evolution, demonstrating the effectiveness of persistent memory for end-to-end scientific discovery. Overall, the paper contributes a novel framework that enables AI scientists to learn from their past interactions and adapt their research strategies, leading to more effective and efficient scientific discovery.
📅 Published on Mar 9
🔗 Links:
• arXiv: https://arxiv.org/abs/2603.08127
• PDF: https://arxiv.org/pdf/2603.08127
• GitHub: https://github.com/EvoScientist/EvoScientist ⭐ 2.6k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultiAgentSystems #EvolvingAI #ScientificDiscovery #ArtificialIntelligenceResearch #AutonomousScience
arXiv.org
EvoScientist: Towards Multi-Agent Evolving AI Scientists for...
The increasing adoption of Large Language Models (LLMs) has enabled AI scientists to perform complex end-to-end scientific discovery tasks requiring coordination of specialized roles, including...
AI & ML Papers
Photo
🔥 Recursive Language Models
📅 Published on Dec 31, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2512.24601
• PDF: https://arxiv.org/pdf/2512.24601
• Project Page: https://alexzhang13.github.io/blog/2025/rlm/
• GitHub: https://github.com/alexzhang13/rlm ⭐ 4.2k
🤖 Models citing this paper:
• https://huggingface.co/mit-oasys/rlm-qwen3-8b-v0.1
• https://huggingface.co/nightmedia/Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking-qx86-hi-mlx
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/sergiopaniego/repl
• https://huggingface.co/spaces/openenv/repl
• https://huggingface.co/spaces/sergiopaniego/repl-env
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#RecursiveLanguageModels #LargeLanguageModels #LongContextProcessing #LanguageModelArchitectures #NaturalLanguageProcessing
💡 The paper introduces Recursive Language Models, a novel approach to enable large language models to process arbitrarily long prompts. The problem addressed is that current language models have limited context windows, which restricts their ability to handle long inputs. The proposed method treats long prompts as part of an external environment and allows the language model to programmatically examine, decompose, and recursively call itself over snippets of the prompt. This approach enables the model to handle inputs that are up to two orders of magnitude beyond the model context window. The results show that Recursive Language Models successfully handle long inputs and outperform base language models and common long-context scaffolds across four diverse long-context tasks, while having comparable or cheaper cost per query. Overall, the paper contributes a general inference strategy that improves the ability of large language models to process long prompts, making them more effective and efficient.
📅 Published on Dec 31, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2512.24601
• PDF: https://arxiv.org/pdf/2512.24601
• Project Page: https://alexzhang13.github.io/blog/2025/rlm/
• GitHub: https://github.com/alexzhang13/rlm ⭐ 4.2k
🤖 Models citing this paper:
• https://huggingface.co/mit-oasys/rlm-qwen3-8b-v0.1
• https://huggingface.co/nightmedia/Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking-qx86-hi-mlx
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/sergiopaniego/repl
• https://huggingface.co/spaces/openenv/repl
• https://huggingface.co/spaces/sergiopaniego/repl-env
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#RecursiveLanguageModels #LargeLanguageModels #LongContextProcessing #LanguageModelArchitectures #NaturalLanguageProcessing
arXiv.org
Recursive Language Models
We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference...
❤3
AI & ML Papers
Photo
🔥 EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning
📅 Published on Jan 5
🔗 Links:
• arXiv: https://arxiv.org/abs/2601.02163
• PDF: https://arxiv.org/pdf/2601.02163
• GitHub: https://github.com/EverMind-AI/EverMemOS ⭐ 4.4k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#SelfOrganizingMemory #LongHorizonReasoning #LargeLanguageModels #MemoryOperatingSystem #StructuredReasoning
💡 The paper introduces EverMemOS, a self-organizing memory operating system designed to enhance the long-term interaction capabilities of large language models. The problem addressed is that current large language models have limited context windows, making it difficult to sustain coherent behavior over extended interactions. Existing memory systems store isolated records and retrieve fragments, which limits their ability to consolidate evolving user states and resolve conflicts.
The method proposed by EverMemOS involves an engram-inspired lifecycle for computational memory, which includes three main components: Episodic Trace Formation, Semantic Consolidation, and Reconstructive Recollection. Episodic Trace Formation converts dialogue streams into memory cells that capture episodic traces, atomic facts, and time-bounded foresight signals. Semantic Consolidation organizes these memory cells into thematic scenes, distilling stable semantic structures and updating user profiles. Reconstructive Recollection performs scene-guided agentic retrieval to compose the necessary and sufficient context for downstream reasoning.
The results show that EverMemOS achieves state-of-the-art performance on memory-augmented reasoning tasks, as demonstrated by experiments on LoCoMo and LongMemEval. Additionally, a profile study on PersonaMem v2 and qualitative case studies illustrate the chat-oriented capabilities of EverMemOS, such as user profiling and foresight. The code for EverMemOS is available, making it possible for others to build upon and extend this work. Overall, the paper presents a significant contribution to the development of large language models, enabling them to engage in more coherent and effective long-term interactions.
📅 Published on Jan 5
🔗 Links:
• arXiv: https://arxiv.org/abs/2601.02163
• PDF: https://arxiv.org/pdf/2601.02163
• GitHub: https://github.com/EverMind-AI/EverMemOS ⭐ 4.4k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#SelfOrganizingMemory #LongHorizonReasoning #LargeLanguageModels #MemoryOperatingSystem #StructuredReasoning
arXiv.org
EverMemOS: A Self-Organizing Memory Operating System for...
Large Language Models (LLMs) are increasingly deployed as long-term interactive agents, yet their limited context windows make it difficult to sustain coherent behavior over extended interactions....
AI & ML Papers
Photo
🔥 Qwen3-TTS Technical Report
📅 Published on Jan 22
🔗 Links:
• arXiv: https://arxiv.org/abs/2601.15621
• PDF: https://arxiv.org/pdf/2601.15621
• GitHub: https://github.com/QwenLM/Qwen3-TTS ⭐ 11.2k
🤖 Models citing this paper:
• https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
• https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base
• https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
📊 Datasets citing this paper:
• https://huggingface.co/datasets/Izzyzlin/CFSDD
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/Qwen/Qwen3-TTS
• https://huggingface.co/spaces/Sovenok-Hacker/Qwen3-TTS
• https://huggingface.co/spaces/katyado/Qwen3-TTS
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultilingualTextToSpeech #VoiceCloningTechnology #ControllableSpeechGeneration #DualTrackLMArchitecture #TextToSpeechSynthesis
💡 The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities. The problem addressed by this research is the need for efficient and high-quality text-to-speech models that can support multiple languages and allow for fine-grained control over the output speech.
The method used to address this problem is a dual-track LM architecture, which enables real-time synthesis, coupled with two specialized speech tokenizers. The first tokenizer, Qwen-TTS-Tokenizer-25Hz, emphasizes semantic content and enables streaming waveform reconstruction. The second tokenizer, Qwen-TTS-Tokenizer-12Hz, achieves extreme bitrate reduction and ultra-low-latency streaming, enabling immediate first-packet emission.
The Qwen3-TTS models were trained on over 5 million hours of speech data spanning 10 languages. The results of the research indicate state-of-the-art performance across diverse objective and subjective benchmarks, including the TTS multilingual test set, InstructTTSEval, and a long speech test set. The models support state-of-the-art 3-second voice cloning and description-based control, allowing for the creation of entirely novel voices and fine-grained manipulation over the output speech.
The researchers have released both tokenizers and models under the Apache 2.0 license to facilitate community research and development. Overall, the Qwen3-TTS series presents a significant contribution to the field of text-to-speech synthesis, offering advanced multilingual and controllable speech generation capabilities.
📅 Published on Jan 22
🔗 Links:
• arXiv: https://arxiv.org/abs/2601.15621
• PDF: https://arxiv.org/pdf/2601.15621
• GitHub: https://github.com/QwenLM/Qwen3-TTS ⭐ 11.2k
🤖 Models citing this paper:
• https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
• https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base
• https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
📊 Datasets citing this paper:
• https://huggingface.co/datasets/Izzyzlin/CFSDD
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/Qwen/Qwen3-TTS
• https://huggingface.co/spaces/Sovenok-Hacker/Qwen3-TTS
• https://huggingface.co/spaces/katyado/Qwen3-TTS
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultilingualTextToSpeech #VoiceCloningTechnology #ControllableSpeechGeneration #DualTrackLMArchitecture #TextToSpeechSynthesis
arXiv.org
Qwen3-TTS Technical Report
In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of-the-art 3-second voice...
❤1
AI & ML Papers
Photo
🔥 Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
📅 Published on Feb 17, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2502.11880
• PDF: https://arxiv.org/pdf/2502.11880
• GitHub: https://github.com/microsoft/BitNet ⭐ 38.9k
🤖 Models citing this paper:
• https://huggingface.co/Lgr54HFi/chimera
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/knoxel/bitnet-b158-cpu-explorer
• https://huggingface.co/spaces/knoxel/bitnet-cpp-explorer
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#TernaryLLMs #EdgeInferenceOptimization #MixedPrecisionMatrixMultiplication #EfficientInferenceSystems #TernaryNeuralNetworks
💡 The paper introduces Bitnet.cpp, a system designed to improve edge inference for ternary large language models. Ternary large language models, such as BitNet b1.58, have gained attention but efficient edge inference for these models is still lacking. The main challenge is that mixed-precision matrix multiplication, which is a significant part of the inference time, is not optimized for ternary models.
To address this issue, Bitnet.cpp uses a novel mixed-precision matrix multiplication library that enables efficient and lossless inference. The library has two key components: the Ternary Lookup Table, which reduces spatial inefficiencies, and Int2 with a Scale, which ensures lossless edge inference.
The experiments show that Bitnet.cpp significantly outperforms full-precision and low-bit baselines, achieving up to a 6.25 times increase in speed over full-precision baselines and up to 2.32 times increase in speed over low-bit baselines. The system is publicly available, providing a practical solution for the efficient deployment of edge large language models. Additionally, the paper expands the Ternary Lookup Table to an element-wise lookup table for low-bit large language models, showing its potential for further improvement.
Overall, the paper contributes to the field by providing a novel and efficient solution for edge inference in ternary large language models, setting new benchmarks and offering a publicly available system for practical deployment.
📅 Published on Feb 17, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2502.11880
• PDF: https://arxiv.org/pdf/2502.11880
• GitHub: https://github.com/microsoft/BitNet ⭐ 38.9k
🤖 Models citing this paper:
• https://huggingface.co/Lgr54HFi/chimera
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/knoxel/bitnet-b158-cpu-explorer
• https://huggingface.co/spaces/knoxel/bitnet-cpp-explorer
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#TernaryLLMs #EdgeInferenceOptimization #MixedPrecisionMatrixMultiplication #EfficientInferenceSystems #TernaryNeuralNetworks
arXiv.org
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference...
❤1
AI & ML Papers
Photo
🔥 Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
📅 Published on Apr 1, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2504.00906
• PDF: https://arxiv.org/pdf/2504.00906
• Project Page: https://www.simular.ai/articles/agent-s2-technical-review
• GitHub: https://github.com/simular-ai/Agent-S ⭐ 11.1k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#CompositionalAI #GraphicalUserInterfaceAutomation #GeneralistSpecialistModels #MixtureOfGrounding #HierarchicalTaskPlanning
💡 The paper introduces Agent S2, a novel compositional framework designed to improve the performance of computer use agents that automate digital tasks by interacting with graphical user interfaces. Current agents face challenges such as imprecise grounding of GUI elements, difficulties with long-horizon task planning, and performance bottlenecks due to relying on single generalist models. To address these challenges, Agent S2 delegates cognitive responsibilities across various generalist and specialist models. The framework uses a Mixture-of-Grounding technique to achieve precise GUI localization and Proactive Hierarchical Planning to dynamically refine action plans in response to evolving observations. The evaluations demonstrate that Agent S2 achieves state-of-the-art performance on three prominent computer use benchmarks, with relative improvements of 18.9% and 32.7% over leading baseline agents on the OSWorld 15-step and 50-step evaluation. Additionally, Agent S2 generalizes effectively to other operating systems and applications, surpassing previous best methods by 52.8% on WindowsAgentArena and by 16.52% on AndroidWorld. The code for Agent S2 is available, making it possible for others to build upon and further improve the framework. Overall, the paper contributes a novel approach to improving the performance of computer use agents, with significant implications for enhancing human productivity by automating digital tasks.
📅 Published on Apr 1, 2025
🔗 Links:
• arXiv: https://arxiv.org/abs/2504.00906
• PDF: https://arxiv.org/pdf/2504.00906
• Project Page: https://www.simular.ai/articles/agent-s2-technical-review
• GitHub: https://github.com/simular-ai/Agent-S ⭐ 11.1k
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#CompositionalAI #GraphicalUserInterfaceAutomation #GeneralistSpecialistModels #MixtureOfGrounding #HierarchicalTaskPlanning
arXiv.org
Agent S2: A Compositional Generalist-Specialist Framework for...
Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human...
❤1
AI & ML Papers
Photo
🔥 DFlash: Block Diffusion for Flash Speculative Decoding
📅 Published on Feb 5
🔗 Links:
• arXiv: https://arxiv.org/abs/2602.06036
• PDF: https://arxiv.org/pdf/2602.06036
• Project Page: https://z-lab.ai/projects/dflash/
• GitHub: https://github.com/z-lab/dflash ⭐ 3.1k
🤖 Models citing this paper:
• https://huggingface.co/z-lab/Qwen3.6-27B-DFlash
• https://huggingface.co/z-lab/Qwen3.6-35B-A3B-DFlash
• https://huggingface.co/z-lab/Qwen3.5-27B-DFlash
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/Jackrong/qwen36-eval
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#SpeculativeDecoding #BlockDiffusionModels #LargeLanguageModels #ParallelDecodingTechniques #FlashSpeculativeDecoding
💡 The paper introduces DFlash, a speculative decoding framework designed to improve the speed of large language models while maintaining their quality. The problem with current large language models is that they require sequential decoding, which leads to high latency and poor GPU utilization. Speculative decoding has been proposed as a solution, where a fast draft model generates outputs that are then verified in parallel by the target model. However, existing speculative decoding methods still rely on sequential drafting, which limits their speedup.
To address this, the authors propose using a lightweight block diffusion model for parallel drafting. This model generates draft tokens in a single forward pass and conditions the draft model on context features extracted from the target model. The result is a framework that enables efficient drafting with high-quality outputs and higher acceptance rates.
The experiments show that DFlash achieves significant speedup over existing autoregressive methods, with over 6x lossless acceleration across a range of models and tasks. This is up to 2.5x higher speedup than the state-of-the-art speculative decoding method. The method contributes to improving the efficiency of large language models, making them more suitable for practical applications. Overall, DFlash offers a promising solution for speeding up large language models without sacrificing their performance.
📅 Published on Feb 5
🔗 Links:
• arXiv: https://arxiv.org/abs/2602.06036
• PDF: https://arxiv.org/pdf/2602.06036
• Project Page: https://z-lab.ai/projects/dflash/
• GitHub: https://github.com/z-lab/dflash ⭐ 3.1k
🤖 Models citing this paper:
• https://huggingface.co/z-lab/Qwen3.6-27B-DFlash
• https://huggingface.co/z-lab/Qwen3.6-35B-A3B-DFlash
• https://huggingface.co/z-lab/Qwen3.5-27B-DFlash
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/Jackrong/qwen36-eval
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#SpeculativeDecoding #BlockDiffusionModels #LargeLanguageModels #ParallelDecodingTechniques #FlashSpeculativeDecoding
arXiv.org
DFlash: Block Diffusion for Flash Speculative Decoding
Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding...