AI & ML Papers
32.9K subscribers
7.09K photos
529 videos
24 files
7.76K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🔥 UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

💡 The paper introduces UniVidX, a unified multimodal framework for versatile video generation using video diffusion model priors. The problem with existing methods is that they train separate models for each task, limiting the modeling of correlations across different modalities. UniVidX addresses this issue by formulating pixel-aligned tasks as conditional generation in a shared multimodal space, allowing it to adapt to modality-specific distributions while preserving the native priors of the video diffusion model.

The framework consists of three key designs: Stochastic Condition Masking, Decoupled Gated LoRA, and Cross-Modal Self-Attention. Stochastic Condition Masking enables omni-directional conditional generation by randomly partitioning modalities into clean conditions and noisy targets during training. Decoupled Gated LoRA preserves the strong priors of the video diffusion model by introducing per-modality LoRAs that are activated when a modality serves as the generation target. Cross-Modal Self-Attention facilitates information exchange and inter-modal alignment by sharing keys and values across modalities while keeping modality-specific queries.

The authors instantiate UniVidX in two domains: UniVid-Intrinsic for RGB videos and intrinsic maps, and UniVid-Alpha for blended RGB videos and their constituent RGBA layers. The results show that both models achieve performance competitive with state-of-the-art methods across distinct tasks and generalize robustly to in-the-wild scenarios, even when trained on fewer than 1000 videos. Overall, UniVidX provides a unified framework for versatile video generation, allowing for more efficient and effective modeling of correlations across different modalities.


📅 Published on May 1

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.00658
• PDF: https://arxiv.org/pdf/2605.00658
• Project Page: https://houyuanchen111.github.io/UniVidX.github.io/
• GitHub: https://github.com/houyuanchen111/UniVidX 93

🤖 Models citing this paper:
https://huggingface.co/houyuanchen/UniVidX

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MultimodalVideoGeneration #VideoDiffusionModels #ConditionalGeneration #CrossModalLearning #MultimodalFusionArchitectures