AI & ML Papers
32.8K subscribers
7.05K photos
519 videos
24 files
7.7K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
AI & ML Papers
Photo
🔥 RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

💡 The paper introduces RepWAM, a representation-centric world action model that improves robot manipulation performance through language-guided future state prediction and action modeling. The problem with existing world action models is that they use reconstruction-oriented video tokenizers that prioritize visual fidelity over instruction-following dynamics, limiting their ability to connect future prediction with robot control. To address this, the authors propose a semantic visual-action latent space that maps visual inputs into aligned visual and latent action tokens. They train a representation visual-action tokenizer and pretrain their world action model to jointly model future visual states and latent actions under language instructions. The model is then adapted to real robot trajectories for closed-loop manipulation. The results show that RepWAM delivers strong performance across diverse manipulation settings, outperforming reconstruction-oriented alternatives. The authors highlight the value of semantic visual-action tokenization as a promising foundation for world action models and a step toward generalist robot policies. The code and weights for RepWAM will be made available, allowing for further development and application of this technology. Overall, the paper contributes a new approach to world action modeling that prioritizes instruction-following dynamics and semantic understanding, leading to improved robot manipulation performance.


📅 Published on Jun 11

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.13674
• PDF: https://arxiv.org/pdf/2606.13674
• Project Page: https://wdrink.github.io/RepWAM/

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#RobotManipulation #WorldActionModeling #VisualActionTokenizers #LanguageGuidedControl #FutureStatePrediction