AI & ML Papers
32.8K subscribers
7.05K photos
519 videos
24 files
7.7K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
AI & ML Papers
Photo
🔥 SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

💡 The paper introduces SpatialClaw, a training-free framework that enables flexible and stateful spatial reasoning in vision-language models. The problem addressed is the limitation of current spatial agents in performing open-ended spatial reasoning tasks, which is due to the design of the action interface that invokes specialist perception modules. Existing spatial agents use either single-pass code execution or a structured tool-call interface, both of which offer limited flexibility for complex 3D/4D spatial reasoning.

The proposed SpatialClaw framework uses code as the action interface, allowing a vision-language model-backed agent to write executable code conditioned on prior outputs. This approach enables the agent to flexibly compose and manipulate perception results and adapt its analysis to intermediate text and visual observations. SpatialClaw maintains a stateful Python kernel pre-loaded with input frames and a suite of perception and geometry primitives.

The results show that SpatialClaw achieves superior performance across diverse 3D/4D spatial reasoning tasks, with an average accuracy of 59.9% across 20 benchmarks. This represents a significant improvement of 11.2 points over the recent spatial agent, with consistent gains across six vision-language model backbones from two model families, without any benchmark- or model-specific adaptation. The paper's contribution is the introduction of a flexible and effective framework for spatial reasoning that can be applied to a wide range of tasks without requiring training or adaptation.


📅 Published on Jun 11

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.13673
• PDF: https://arxiv.org/pdf/2606.13673
• Project Page: https://spatialclaw.github.io/

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#SpatialReasoning #VisionLanguageModels #AgenticInterfaces #SpatialArtificialIntelligence #CodeBasedActionInterfaces