AI & ML Papers

🔥 RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

💡 The paper introduces RepWAM, a representation-centric world action model that improves robot manipulation performance through language-guided future state prediction and action modeling. The problem with existing world action models is that they use reconstruction-oriented video tokenizers that prioritize visual fidelity over instruction-following dynamics, limiting their ability to connect future prediction with robot control. To address this, the authors propose a semantic visual-action latent space that maps visual inputs into aligned visual and latent action tokens. They train a representation visual-action tokenizer and pretrain their world action model to jointly model future visual states and latent actions under language instructions. The model is then adapted to real robot trajectories for closed-loop manipulation. The results show that RepWAM delivers strong performance across diverse manipulation settings, outperforming reconstruction-oriented alternatives. The authors highlight the value of semantic visual-action tokenization as a promising foundation for world action models and a step toward generalist robot policies. The code and weights for RepWAM will be made available, allowing for further development and application of this technology. Overall, the paper contributes a new approach to world action modeling that prioritizes instruction-following dynamics and semantic understanding, leading to improved robot manipulation performance.

📅 Published on Jun 11

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.13674
• PDF: https://arxiv.org/pdf/2606.13674
• Project Page: https://wdrink.github.io/RepWAM/

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#RobotManipulation #WorldActionModeling #VisualActionTokenizers #LanguageGuidedControl #FutureStatePrediction

GitHub

Hugging Face

The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.

462 views19:54

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

About

Blog

Apps

Platform