AI & ML Papers
Photo
🔥 Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?
📅 Published on Jun 6
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.08063
• PDF: https://arxiv.org/pdf/2606.08063
• Project Page: https://huggingface.co/spaces/Jiaqi-hkust/Robust-U1
🤖 Models citing this paper:
• https://huggingface.co/Jiaqi-hkust/Robust-U1-SFT
• https://huggingface.co/Jiaqi-hkust/Robust-U1-RL
• https://huggingface.co/Jiaqi-hkust/Robust-U1
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/Jiaqi-hkust/Robust-U1
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultimodalLearning #VisualContentRecovery #RobustLanguageModels #SelfRecoveryMechanisms #CorruptionResistantAI
💡 The paper proposes a novel framework called Robust-U1 to enhance the robustness of multimodal large language models against visual corruptions. The problem addressed is that existing models perform poorly when faced with real-world visual corruptions such as noise or blur. Current approaches to improve robustness have limitations, either lacking interpretability or being unable to restore lost pixel-level details.
The Robust-U1 framework is designed to equip models with explicit visual self-recovery capability, allowing them to recover corrupted visual content by themselves. The approach consists of three stages: supervised fine-tuning for initial reconstruction, reinforcement learning with dual rewards to align high visual quality, and multimodal reasoning that considers both the corrupted input and the recovered image.
The results show that Robust-U1 achieves state-of-the-art robustness on a real-world corruption benchmark and maintains superior performance under adversarial corruptions on general visual question answering benchmarks. The analysis confirms that high-quality visual recovery directly enhances reasoning performance, establishing self-recovery as a critical mechanism for robust visual understanding. Overall, the paper demonstrates that multimodal large language models can self-recover corrupted visual content, leading to improved robustness and performance in visual understanding tasks.
📅 Published on Jun 6
🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.08063
• PDF: https://arxiv.org/pdf/2606.08063
• Project Page: https://huggingface.co/spaces/Jiaqi-hkust/Robust-U1
🤖 Models citing this paper:
• https://huggingface.co/Jiaqi-hkust/Robust-U1-SFT
• https://huggingface.co/Jiaqi-hkust/Robust-U1-RL
• https://huggingface.co/Jiaqi-hkust/Robust-U1
🚀 Spaces citing this paper:
• https://huggingface.co/spaces/Jiaqi-hkust/Robust-U1
━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus
#MultimodalLearning #VisualContentRecovery #RobustLanguageModels #SelfRecoveryMechanisms #CorruptionResistantAI
GitHub
Hugging Face
The AI community building the future. Hugging Face has 438 repositories available. Follow their code on GitHub.