AI & ML Papers
Photo
🔥 SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
📅 Published on Jun 11
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.13673
• PDF: https://arxiv.org/pdf/2606.13673
• Project Page: https://spatialclaw.github.io/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#SpatialReasoning #VisionLanguageModels #AgenticInterfaces #SpatialArtificialIntelligence #CodeBasedActionInterfaces
💡 The paper introduces SpatialClaw, a training-free framework that enables flexible and stateful spatial reasoning in vision-language models. The problem addressed is the limitation of current spatial agents in performing open-ended spatial reasoning tasks, which is due to the design of the action interface that invokes specialist perception modules. Existing spatial agents use either single-pass code execution or a structured tool-call interface, both of which offer limited flexibility for complex 3D/4D spatial reasoning.
The proposed SpatialClaw framework uses code as the action interface, allowing a vision-language model-backed agent to write executable code conditioned on prior outputs. This approach enables the agent to flexibly compose and manipulate perception results and adapt its analysis to intermediate text and visual observations. SpatialClaw maintains a stateful Python kernel pre-loaded with input frames and a suite of perception and geometry primitives.
The results show that SpatialClaw achieves superior performance across diverse 3D/4D spatial reasoning tasks, with an average accuracy of 59.9% across 20 benchmarks. This represents a significant improvement of 11.2 points over the recent spatial agent, with consistent gains across six vision-language model backbones from two model families, without any benchmark- or model-specific adaptation. The paper's contribution is the introduction of a flexible and effective framework for spatial reasoning that can be applied to a wide range of tasks without requiring training or adaptation.
📅 Published on Jun 11
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.13673
• PDF: https://arxiv.org/pdf/2606.13673
• Project Page: https://spatialclaw.github.io/
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#SpatialReasoning #VisionLanguageModels #AgenticInterfaces #SpatialArtificialIntelligence #CodeBasedActionInterfaces
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.