AI & ML Papers
Photo
🔥 Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18451
• PDF: https://arxiv.org/pdf/2605.18451
• Project Page: https://code-as-room.github.io/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#CodeAsRoom #3DRoomGeneration #AgenticCodeSynthesis #IndoorSceneUnderstanding #ArchitectureGeneration
💡 The paper proposes a novel framework called Code-as-Room for generating 3D indoor rooms from top-down view images. The problem addressed is the difficulty in designing realistic and functional 3D indoor rooms, which is essential for various applications such as interior design, virtual reality, and gaming. Existing methods that use text-based descriptions or reference images struggle to capture precise spatial information and suffer from instability and infinite looping when tasked with holistic room generation.
The proposed method, Code-as-Room, uses a multilayer language model-based agentic framework with a structured execution harness to generate executable Blender code from top-down images. The framework parses the reference image to extract scene elements and their spatial relationships and synthesizes code for geometry, materials, and lighting in a multi-stage pipeline. A cross-stage memory module is used to maintain context and mitigate context forgetting.
The results show that the proposed framework is effective in generating 3D rooms from top-down images. A dedicated benchmark for code-based 3D room synthesis is introduced, which encompasses various evaluation protocols. Comprehensive comparisons against existing agent-based methods are conducted, validating the effectiveness of the proposed execution harness. The paper contributes to the field by providing a principled approach to 3D room synthesis from top-down views, addressing the limitations of existing methods and demonstrating the potential of using executable code as a representation for 3D rooms.
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18451
• PDF: https://arxiv.org/pdf/2605.18451
• Project Page: https://code-as-room.github.io/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#CodeAsRoom #3DRoomGeneration #AgenticCodeSynthesis #IndoorSceneUnderstanding #ArchitectureGeneration
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
AI & ML Papers
Photo
🔥 SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18401
• PDF: https://arxiv.org/pdf/2605.18401
• Project Page: https://skills.vote
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AgentGovernance #LargeLanguageModels #SkillEvolution #ReusableSkills #LifecycleManagement
💡 The paper introduces SkillsVote, a governance framework for managing reusable skills in long-horizon large language model agents. The problem addressed is that raw trajectories of agent experiences are noisy and hard to govern, making it difficult to reuse and improve agent skills. To solve this, the authors propose treating agent skills as an experience schema that combines executable scripts with non-executable guidance on procedures.
The SkillsVote framework consists of three main processes: collection, recommendation, and evolution of agent skills. It starts by profiling a large open-source corpus of skills to identify environment requirements, quality, and verifiability. Then, it synthesizes tasks for verifiable skills and performs a search over a structured skill library to provide instructional context before execution. After execution, it decomposes trajectories into skill-linked subtasks, attributes outcomes to skill use, and admits only successful reusable discoveries to updates.
The evaluation of SkillsVote shows promising results, with offline evolution improving performance on Terminal-Bench 2.0 by up to 7.9 percentage points and online evolution improving performance on SWE-Bench Pro by up to 2.6 percentage points. The key contribution of the paper is that governed external skill libraries can improve frozen agents without requiring model updates, as long as systems control exposure, credit, and preservation of skills. Overall, the SkillsVote framework provides a structured approach to managing and improving agent skills, enabling more efficient and effective reuse of experience and knowledge.
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18401
• PDF: https://arxiv.org/pdf/2605.18401
• Project Page: https://skills.vote
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AgentGovernance #LargeLanguageModels #SkillEvolution #ReusableSkills #LifecycleManagement
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
AI & ML Papers
Photo
🔥 AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs
📅 Published on May 15
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.15565
• PDF: https://arxiv.org/pdf/2605.15565
• Project Page: https://infini-ai-lab.github.io/astraflow/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#DataflowOrientedRL #ReinforcementLearningForLLMs #AgenticLanguageModels #LargeLanguageModelAgents #ScalableRLSystems
💡 The paper introduces AstraFlow, a dataflow-oriented reinforcement learning system designed to improve the efficiency and scalability of large language model agents. The problem addressed is that current reinforcement learning systems are prohibitively expensive and struggle to support complex workloads, such as multi-policy collaborative training, while efficiently using diverse compute resources.
The authors propose AstraFlow as a solution, which replaces conventional trainer-centered control with principled component abstractions. In AstraFlow, rollout services, dataflow management, and training are decoupled into autonomous components, allowing the system to natively support complex multi-policy agentic RL workloads and efficiently exploit diverse compute resources.
The results show that AstraFlow supports multi-policy training, elastic scaling, heterogeneous cross-region execution, and composable data algorithms without requiring system-level code changes. The system achieves comparable or better accuracy than existing RL systems while speeding up training time by 2.7 times in multi-policy collaborative training. The evaluation is done across various workloads, including math, code, search, and AgentBench, demonstrating the system's versatility and efficiency.
Overall, AstraFlow's contributions include its ability to efficiently support complex workloads, scale to large language model agents, and provide a principled abstraction for reinforcement learning system components, making it a significant advancement in the field of reinforcement learning for large language models.
📅 Published on May 15
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.15565
• PDF: https://arxiv.org/pdf/2605.15565
• Project Page: https://infini-ai-lab.github.io/astraflow/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#DataflowOrientedRL #ReinforcementLearningForLLMs #AgenticLanguageModels #LargeLanguageModelAgents #ScalableRLSystems
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
AI & ML Papers
Photo
🔥 SAM 3: Segment Anything with Concepts
📅 Published on Nov 20, 2025
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2511.16719
• PDF: https://arxiv.org/pdf/2511.16719
• Project Page: https://ai.meta.com/sam3/
🤖 Models citing this paper:
• https://huggingface.co/AllanVester/SAM3.1-CoreML-FP16
• https://huggingface.co/AllanVester/SAM3.1-CoreML
• https://huggingface.co/embedl/sam3
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/kith777/rag_agent
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ComputerVision #ObjectSegmentation #ConceptLearning #ImageTracking #PromptableSegmentation
💡 The paper introduces Segment Anything Model 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts. The model achieves state-of-the-art performance in promptable concept segmentation and tracking by leveraging a unified model architecture with decoupled recognition and localization. The concept prompts can be short noun phrases, image exemplars, or a combination of both, and the model returns segmentation masks and unique identities for all matching object instances.
To advance promptable concept segmentation, the authors built a scalable data engine that produces a high-quality dataset with 4 million unique concept labels, including hard negatives, across images and videos. The model consists of an image-level detector and a memory-based video tracker that share a single backbone. The recognition and localization are decoupled with a presence head, which boosts detection accuracy.
The results show that Segment Anything Model 3 doubles the accuracy of existing systems in both image and video promptable concept segmentation, and improves previous capabilities on visual segmentation tasks. The authors also open source Segment Anything Model 3 along with a new benchmark for promptable concept segmentation, called Segment Anything with Concepts.
The main contributions of the paper are the introduction of a unified model architecture that achieves state-of-the-art performance in promptable concept segmentation and tracking, the creation of a large-scale dataset with unique concept labels, and the development of a new benchmark for evaluating promptable concept segmentation models. Overall, the paper presents a significant advancement in the field of computer vision and object segmentation, enabling more accurate and efficient detection, segmentation, and tracking of objects in images and videos based on concept prompts.
📅 Published on Nov 20, 2025
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2511.16719
• PDF: https://arxiv.org/pdf/2511.16719
• Project Page: https://ai.meta.com/sam3/
🤖 Models citing this paper:
• https://huggingface.co/AllanVester/SAM3.1-CoreML-FP16
• https://huggingface.co/AllanVester/SAM3.1-CoreML
• https://huggingface.co/embedl/sam3
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/kith777/rag_agent
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ComputerVision #ObjectSegmentation #ConceptLearning #ImageTracking #PromptableSegmentation
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
AI & ML Papers
Photo
🔥 PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
📅 Published on May 11
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.10341
• PDF: https://arxiv.org/pdf/2605.10341
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#VisionInLoopTypesetting #ScientificDocumentOptimization #LaTeXTypesetting #DocumentLayoutOptimization #TypesettingAutomation
💡 The paper addresses the problem of visual typesetting optimization for scientific documents, which involves transforming a compilable LaTeX paper into a visually polished and page-budget-compliant PDF. The authors argue that existing methods, such as rule-based tools and text-only language models, are insufficient because they operate only on source code and log files, and are unable to predict or verify the two-dimensional layout consequences of their changes.
To solve this problem, the authors introduce a vision-in-the-loop agent called PaperFit, which iteratively renders pages, diagnoses defects, and applies constrained repairs. The authors also formalize the problem as Visual Typesetting Optimization, and introduce a five-category taxonomy of typesetting defects to guide diagnosis.
To evaluate PaperFit, the authors construct a benchmark called PaperFit-Bench, which consists of 200 papers across 10 venue templates and 13 defect types at different difficulty levels. The results of extensive experiments show that PaperFit outperforms all baselines by a large margin, demonstrating the effectiveness of vision-in-the-loop optimization for visual typesetting optimization.
The authors conclude that bridging the gap from compilable source to publication-ready PDF requires vision-in-the-loop optimization, and that Visual Typesetting Optimization constitutes a critical missing stage in the document automation pipeline. Overall, the paper contributes a new approach to visual typesetting optimization, a benchmark for evaluating VTO methods, and a demonstration of the importance of vision-in-the-loop optimization for producing high-quality scientific documents.
📅 Published on May 11
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.10341
• PDF: https://arxiv.org/pdf/2605.10341
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#VisionInLoopTypesetting #ScientificDocumentOptimization #LaTeXTypesetting #DocumentLayoutOptimization #TypesettingAutomation
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
❤2
Forwarded from Machine Learning with Python
🙏💸 500$ FOR THE FIRST 500 WHO JOIN THE CHANNEL! 🙏💸
Join our channel today for free! Tomorrow it will cost 500$!
https://xn--r1a.website/+-WZeIeP8YI8wM2E6
You can join at this link! 👆👇
https://xn--r1a.website/+-WZeIeP8YI8wM2E6
Join our channel today for free! Tomorrow it will cost 500$!
https://xn--r1a.website/+-WZeIeP8YI8wM2E6
You can join at this link! 👆👇
https://xn--r1a.website/+-WZeIeP8YI8wM2E6
AI & ML Papers
Photo
🔥 EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18703
• PDF: https://arxiv.org/pdf/2605.18703
🤖 Models citing this paper:
• https://huggingface.co/LARK-Lab/EnvFactory-1.7B
• https://huggingface.co/LARK-Lab/EnvFactory-4B
• https://huggingface.co/LARK-Lab/EnvFactory-8B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/LARK-Lab/EnvFactory-SFT-ALL
• https://huggingface.co/datasets/LARK-Lab/EnvFactory-SFT-FILTERED
• https://huggingface.co/datasets/LARK-Lab/EnvFactory-RL
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ExecutableEnvironments #ToolUseAgents #AgenticReinforcementLearning #RobustRL #LanguageModelTraining
💡 The paper introduces EnvFactory, a framework that automates the creation of executable tool environments and natural multi-turn trajectories for training large language models with agentic reinforcement learning. The problem addressed is that current approaches to equip large language models with tool-use capabilities are limited by the lack of scalable and robust execution environments and the scarcity of realistic training data. Existing methods rely on costly real-world APIs, simulators that are prone to hallucination, or synthetic environments that are often single-turn or based on pre-collected documents.
EnvFactory addresses these challenges by autonomously exploring and verifying stateful, executable tool environments from authentic resources, and synthesizing natural multi-turn trajectories through topology-aware sampling and calibrated refinement. This approach produces grounded queries with implicit intents, which are more effective for reinforcement learning training.
The method involves using a fully automated framework to generate environments and trajectories. The results show that using only 85 verified environments across 7 domains, EnvFactory generates a large number of trajectories, achieving superior training efficiency and downstream performance. The framework improves the performance of Qwen3-series models by up to 15 percent on certain benchmarks, and by up to 8.6 percent and 6 percent on other conversational benchmarks.
The contributions of the paper are that EnvFactory provides a scalable, extensible, and robust foundation for agentic reinforcement learning, and that it achieves superior performance with fewer resources compared to prior work. The framework has the potential to advance the field of large language models and their application to real-world problems. Overall, the paper presents a significant contribution to the field of artificial intelligence and natural language processing.
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18703
• PDF: https://arxiv.org/pdf/2605.18703
🤖 Models citing this paper:
• https://huggingface.co/LARK-Lab/EnvFactory-1.7B
• https://huggingface.co/LARK-Lab/EnvFactory-4B
• https://huggingface.co/LARK-Lab/EnvFactory-8B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/LARK-Lab/EnvFactory-SFT-ALL
• https://huggingface.co/datasets/LARK-Lab/EnvFactory-SFT-FILTERED
• https://huggingface.co/datasets/LARK-Lab/EnvFactory-RL
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ExecutableEnvironments #ToolUseAgents #AgenticReinforcementLearning #RobustRL #LanguageModelTraining
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
AI & ML Papers
Photo
🔥 Semantic Generative Tuning for Unified Multimodal Models
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18714
• PDF: https://arxiv.org/pdf/2605.18714
• Project Page: https://song2yu.github.io/SGT/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultimodalLearning #SemanticSegmentation #GenerativeModels #UnifiedMultimodalModels #MultimodalRepresentationLearning
💡 The paper addresses the issue of unified multimodal models where visual understanding and generation are not well aligned due to separate training objectives. The prevailing approach of optimizing understanding through text signals and generation through pixel objectives leads to isolated representation spaces. To bridge this gap, the authors propose a novel approach called Semantic Generative Tuning, which uses semantic segmentation as a generative proxy to align and synergize multimodal capabilities.
The method involves formulating hierarchical visual tasks as generative proxies, with a focus on high-level semantic tasks like image segmentation. The authors find that segmentation provides structural semantics that enhance both vision-centric perception and generative layout fidelity. Unlike low-level tasks, segmentation does not distract models with texture details, making it an optimal proxy.
The results show that Semantic Generative Tuning fundamentally improves feature linear separability and optimizes visual-textual attention allocation patterns. Extensive evaluations demonstrate that this approach consistently improves both multimodal comprehension and generative fidelity across mainstream benchmarks. The authors provide a systematic investigation into generative post-training and introduce a new paradigm that leverages segmentation to align multimodal capabilities. The code for the proposed method is made available for further research and development. Overall, the paper presents a significant contribution to the field of unified multimodal models by introducing a novel approach that enhances multimodal alignment and performance.
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18714
• PDF: https://arxiv.org/pdf/2605.18714
• Project Page: https://song2yu.github.io/SGT/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultimodalLearning #SemanticSegmentation #GenerativeModels #UnifiedMultimodalModels #MultimodalRepresentationLearning
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
AI & ML Papers
Photo
🔥 GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19577
• PDF: https://arxiv.org/pdf/2605.19577
• Project Page: https://huggingface.co/collections/Kwai-Klear/golongrl
🤖 Models citing this paper:
• https://huggingface.co/Kwai-Klear/GoLongRL-4B
• https://huggingface.co/Kwai-Klear/GoLongRL-30B-A3B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Klear/GoLongRL
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ReinforcementLearning #LongContextLearning #MultitaskAlignment #CapabilityOrientedLearning #DeepLearning
💡 The paper introduces GoLongRL, a new approach to long context reinforcement learning that focuses on capability oriented data construction and multitask alignment. The existing methods for long context reinforcement learning often result in homogeneous task coverage and reward formulations that do not accurately reflect real world requirements. To address this issue, the authors propose two main contributions.
First, they introduce a capability oriented data construction method that involves creating a dataset of 23,000 reinforcement learning samples with verifiable rewards, spanning 9 task types, each with its own evaluation metric. The dataset is openly released along with the construction pipeline and training code. The results show that this dataset outperforms a closed source dataset called QwenLong-L1.5 under the same training setup.
Second, the authors propose a new method called TMN-Reweight for heterogeneous multitask optimization. This method combines task level mean normalization for cross task reward scale alignment with difficulty adaptive weighting for more reliable advantage estimation. The results show that TMN-Reweight improves average performance over the vanilla GRPO method, while preserving or improving general capabilities across evaluations.
The authors also train a model called Qwen3-30B-A3B on the new dataset and achieve long context performance comparable to other state of the art models, such as DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507. This suggests that the new dataset and TMN-Reweight method can substantially improve long context capability. Overall, the paper presents a new approach to long context reinforcement learning that focuses on capability oriented data construction and multitask alignment, and achieves state of the art results.
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.19577
• PDF: https://arxiv.org/pdf/2605.19577
• Project Page: https://huggingface.co/collections/Kwai-Klear/golongrl
🤖 Models citing this paper:
• https://huggingface.co/Kwai-Klear/GoLongRL-4B
• https://huggingface.co/Kwai-Klear/GoLongRL-30B-A3B
📊 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Klear/GoLongRL
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#ReinforcementLearning #LongContextLearning #MultitaskAlignment #CapabilityOrientedLearning #DeepLearning
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
AI & ML Papers
Photo
🔥 Code as Agent Harness
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18747
• PDF: https://arxiv.org/pdf/2605.18747
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AgenticSystems #LargeLanguageModels #AgentReasoning #CodeAsInfrastructure #ArtificialIntelligence
💡 The paper discusses the concept of code as agent harness, where large language models are used as operational substrates for agent reasoning and execution in agentic systems. The authors argue that code is no longer just a target output, but serves as a unified infrastructure layer across multiple domains and applications. They introduce a unified view that centers code as the basis for agent infrastructure, and organize their survey around three connected layers: the harness interface, harness mechanisms, and scaling the harness.
The harness interface layer explores how code connects agents to reasoning, action, and environment modeling. The harness mechanisms layer examines planning, memory, and tool use for long-horizon execution, as well as feedback-driven control and optimization. The scaling layer discusses how to extend the harness from single-agent systems to multi-agent settings, where shared code artifacts support multi-agent coordination, review, and verification.
The authors summarize representative methods and practical applications of code as agent harness, including coding assistants, GUI/OS automation, embodied agents, scientific discovery, personalization and recommendation, DevOps, and enterprise workflows. They also outline open challenges for harness engineering, such as evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across multiple agents, human oversight for safety-critical actions, and extensions to multimodal environments.
The paper provides a unified roadmap toward executable, verifiable, and stateful AI agent systems by centering code as the harness of agentic AI. The authors demonstrate the potential of code as agent harness to enable more efficient, adaptable, and reliable agent systems, and highlight the need for further research in harness engineering to address the open challenges and limitations of this approach. Overall, the paper contributes to the development of agentic systems by providing a new perspective on the role of code in agent infrastructure and highlighting the potential benefits and challenges of this approach.
📅 Published on May 18
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.18747
• PDF: https://arxiv.org/pdf/2605.18747
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#AgenticSystems #LargeLanguageModels #AgentReasoning #CodeAsInfrastructure #ArtificialIntelligence
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.
❤3
🔥 TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.20150
• PDF: https://arxiv.org/pdf/2605.20150
• Project Page: https://sponge-lab.github.io/TideGS/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#3DGaussianSplatting #ScalableDeepLearning #OutofCoreOptimization #GPUAcceleration #ComputerVisionTechniques
💡 The paper introduces TideGS, a scalable training framework for 3D Gaussian Splatting with over one billion primitives on a single GPU. The problem with training 3D Gaussian Splatting at a large scale is that it is memory-bound, with each Gaussian primitive having a large attribute vector that quickly exceeds GPU capacity. Prior systems were limited to tens of millions of Gaussians on commodity single-GPU hardware.
The authors observe that 3D Gaussian Splatting training is inherently sparse and trajectory-conditioned, meaning that each iteration only activates the Gaussians visible from the current camera batch. This insight allows the authors to manage parameters across an SSD-CPU-GPU hierarchy using three techniques: block-virtualized geometry for spatial locality, a hierarchical asynchronous pipeline to overlap I/O with computation, and trajectory-adaptive differential streaming that transfers only incremental working-set deltas between iterations.
The TideGS framework enables training with over one billion Gaussians on a single 24 GB GPU, achieving the best reconstruction quality among evaluated single-GPU baselines on large-scale scenes. This is a significant improvement over prior out-of-core baselines, which were limited to approximately 100 million Gaussians, and standard in-memory training, which was limited to approximately 11 million Gaussians. The results demonstrate that TideGS can scale beyond prior systems, making it a promising solution for large-scale 3D Gaussian Splatting applications.
📅 Published on May 19
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2605.20150
• PDF: https://arxiv.org/pdf/2605.20150
• Project Page: https://sponge-lab.github.io/TideGS/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#3DGaussianSplatting #ScalableDeepLearning #OutofCoreOptimization #GPUAcceleration #ComputerVisionTechniques
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.