This media is not supported in your browser
VIEW IN TELEGRAM
🦄 Native Unified Multimodal 🦄
👉META unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal review💙
👉Review https://t.ly/7wmKP
👉Paper https://lnkd.in/djT4WGEU
👉Project https://tuna-ai.org/
👉Repo github.com/wren93/tuna
👉META unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal review💙
👉Review https://t.ly/7wmKP
👉Paper https://lnkd.in/djT4WGEU
👉Project https://tuna-ai.org/
👉Repo github.com/wren93/tuna
❤6🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
✌️SOTA Generative SLP✌️
👉Stable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data 💙
👉Review https://t.ly/yKZhn
👉Paper arxiv.org/pdf/2512.04048
👉Project stablesigner.github.io/
👉Data github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
👉Stable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data 💙
👉Review https://t.ly/yKZhn
👉Paper arxiv.org/pdf/2512.04048
👉Project stablesigner.github.io/
👉Data github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
❤5🔥1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐘TTSC for 3D Generative🐘
👉SpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announced💙
👉Review https://t.ly/1zrah
👉Paper https://lnkd.in/dEWh3vep
👉Project https://lnkd.in/dScftUmm
👉Repo TBA
👉SpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announced💙
👉Review https://t.ly/1zrah
👉Paper https://lnkd.in/dEWh3vep
👉Project https://lnkd.in/dScftUmm
👉Repo TBA
❤8🔥2👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷Layered PSD Diffusion🎷
👉OmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online Demo💙
👉Review https://t.ly/YNRAC
👉Paper arxiv.org/pdf/2512.09247
👉Project showlab.github.io/OmniPSD/
👉Demo https://www.lovart.ai/it
👉OmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online Demo💙
👉Review https://t.ly/YNRAC
👉Paper arxiv.org/pdf/2512.09247
👉Project showlab.github.io/OmniPSD/
👉Demo https://www.lovart.ai/it
🔥9❤8👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🧱Pixel Art Volumetric Rendering🧱
👉Voxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announced💙
👉Review https://t.ly/qPyNl
👉Paper https://lnkd.in/du5ikJGN
👉Project https://lnkd.in/dpiAjj5m
👉Repo TBA
👉Voxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announced💙
👉Review https://t.ly/qPyNl
👉Paper https://lnkd.in/du5ikJGN
👉Project https://lnkd.in/dpiAjj5m
👉Repo TBA
❤7🔥4
This media is not supported in your browser
VIEW IN TELEGRAM
🫎 MoCapAnything is out 🫎
👉MoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced 🥲
👉Review https://t.ly/_Tw6t
👉Paper arxiv.org/pdf/2512.10881
👉Project animotionlab.github.io/MoCapAnything
👉MoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced 🥲
👉Review https://t.ly/_Tw6t
👉Paper arxiv.org/pdf/2512.10881
👉Project animotionlab.github.io/MoCapAnything
❤11👍4🔥4👏1🤯1😢1
This media is not supported in your browser
VIEW IN TELEGRAM
💚 MatAnyone 2 is out! 💚
👉MatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announced💙
👉Review https://t.ly/vxOBO
👉Paper arxiv.org/pdf/2512.11782
👉Project pq-yang.github.io/projects/MatAnyone2
👉Repo github.com/pq-yang/MatAnyone2
👉MatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announced💙
👉Review https://t.ly/vxOBO
👉Paper arxiv.org/pdf/2512.11782
👉Project pq-yang.github.io/projects/MatAnyone2
👉Repo github.com/pq-yang/MatAnyone2
🔥5❤4👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💷 SOTA Zero-Shot Stereo Matching💷
👉Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announced💙
👉Review https://t.ly/XD6pO
👉Paper https://lnkd.in/d9_YKW2A
👉Project https://lnkd.in/dKDxm7EX
👉Repo https://lnkd.in/dR4-PdsW
👉Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announced💙
👉Review https://t.ly/XD6pO
👉Paper https://lnkd.in/d9_YKW2A
👉Project https://lnkd.in/dKDxm7EX
👉Repo https://lnkd.in/dR4-PdsW
2🔥11❤4
This media is not supported in your browser
VIEW IN TELEGRAM
👀DriverGaze360: Driver SOTA👀
👉DriverGaze360 is a large-scale 360◦ field of view driver attention dataset, containing ∼1M gaze-labeled frames. Code & Dataset announced💙
👉Review https://t.ly/ZcoUw
👉Paper arxiv.org/pdf/2512.14266
👉Project av.dfki.de/drivergaze360/
👉Repo github.com/dfki-av/drivergaze360
👉Data av.dfki.de/drivergaze360/dataset
👉DriverGaze360 is a large-scale 360◦ field of view driver attention dataset, containing ∼1M gaze-labeled frames. Code & Dataset announced💙
👉Review https://t.ly/ZcoUw
👉Paper arxiv.org/pdf/2512.14266
👉Project av.dfki.de/drivergaze360/
👉Repo github.com/dfki-av/drivergaze360
👉Data av.dfki.de/drivergaze360/dataset
🔥11❤4
This media is not supported in your browser
VIEW IN TELEGRAM
🫠FlexAvatar: 3D Heads🫠
👉TUM introduces FlexAvatar, a novel method for creating HQ and complete 3D head avatars from a single image. Code announced💙
👉Review https://t.ly/Rkdtd
👉Paper arxiv.org/pdf/2512.15599
👉Project tobias-kirschstein.github.io/flexavatar/
👉Repo TBA
👉TUM introduces FlexAvatar, a novel method for creating HQ and complete 3D head avatars from a single image. Code announced💙
👉Review https://t.ly/Rkdtd
👉Paper arxiv.org/pdf/2512.15599
👉Project tobias-kirschstein.github.io/flexavatar/
👉Repo TBA
🔥7❤3👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🏜️ Depth Any Panoramas 🏜️
👉DAP is the new SOTA foundation model for panoramic depth estimation with a large scale dataset. Data & Repo under MIT💙
👉Review https://t.ly/LaUmd
👉Paper arxiv.org/pdf/2512.16913
👉Project https://lnkd.in/dvqNV9jx
👉Repo https://lnkd.in/dmNzhb-7
👉Demo https://lnkd.in/dDwjMF3u
👉DAP is the new SOTA foundation model for panoramic depth estimation with a large scale dataset. Data & Repo under MIT💙
👉Review https://t.ly/LaUmd
👉Paper arxiv.org/pdf/2512.16913
👉Project https://lnkd.in/dvqNV9jx
👉Repo https://lnkd.in/dmNzhb-7
👉Demo https://lnkd.in/dDwjMF3u
🔥9❤5👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎯Generative Refocusing is out🎯
👉Generative Refocusing is a two-step process that uses DeblurNet to recover all-in-focus images from various inputs and BokehNet for creating controllable bokeh (in semi-supervised mode). Repo under Apache2.0💙
👉Review https://t.ly/8t7PA
👉Paper arxiv.org/pdf/2512.16923
👉Project generative-refocusing.github.io/
👉Repo github.com/rayray9999/Genfocus
👉Demo huggingface.co/spaces/nycu-cplab/Genfocus-Demo
👉Generative Refocusing is a two-step process that uses DeblurNet to recover all-in-focus images from various inputs and BokehNet for creating controllable bokeh (in semi-supervised mode). Repo under Apache2.0💙
👉Review https://t.ly/8t7PA
👉Paper arxiv.org/pdf/2512.16923
👉Project generative-refocusing.github.io/
👉Repo github.com/rayray9999/Genfocus
👉Demo huggingface.co/spaces/nycu-cplab/Genfocus-Demo
🔥7❤2