This media is not supported in your browser
VIEW IN TELEGRAM
π§Monocular 3D Clothed Humanπ§
πMultiGO++ is a novel framework for monocular 3D clothed human reconstruction via geometry-texture collaboration. New SOTA but no code announcedπ₯²
πReview https://t.ly/YKY44
πPaper arxiv.org/pdf/2603.04993
πProject 3dagentworld.github.io/multigo++
πMultiGO++ is a novel framework for monocular 3D clothed human reconstruction via geometry-texture collaboration. New SOTA but no code announcedπ₯²
πReview https://t.ly/YKY44
πPaper arxiv.org/pdf/2603.04993
πProject 3dagentworld.github.io/multigo++
β€4π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺSOTA Arbitrary Trackingπͺ
πTAPFormer is the novel SOTA transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-freq point tracking. Repo & Dataset under MITπ
πReview https://t.ly/-q4wm
πPaper https://arxiv.org/pdf/2603.04989
πProject http://tapformer.github.io/
πRepo https://github.com/ljx1002/TAPFormer
πTAPFormer is the novel SOTA transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-freq point tracking. Repo & Dataset under MITπ
πReview https://t.ly/-q4wm
πPaper https://arxiv.org/pdf/2603.04989
πProject http://tapformer.github.io/
πRepo https://github.com/ljx1002/TAPFormer
β€5π3π₯3π2πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πReal-Time Scene Graphπ
πREACT++ by Umea University is the new state-of-the-art model for real-time SGG: 20% faster with a gain of 10% in relation prediction accuracy on average. Code under MITπ
πReview https://t.ly/c12VX
πPaper https://arxiv.org/pdf/2603.06386
πRepo https://github.com/Maelic/SGG-Benchmark
πREACT++ by Umea University is the new state-of-the-art model for real-time SGG: 20% faster with a gain of 10% in relation prediction accuracy on average. Code under MITπ
πReview https://t.ly/c12VX
πPaper https://arxiv.org/pdf/2603.06386
πRepo https://github.com/Maelic/SGG-Benchmark
π₯6β€3π3π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Holistic 3D Spatial Intelligenceπ₯
πHoli-Spatial is the first fully automated pipeline capable of converting raw video streams into holistic 3D spatial annotations without human intervention. Code/Data announcedπ
πReview https://t.ly/PDpr9
πPaper https://lnkd.in/dTbMuZCm
πProject https://lnkd.in/d66CYB4q
πRepo https://lnkd.in/dAGzShXj
πHoli-Spatial is the first fully automated pipeline capable of converting raw video streams into holistic 3D spatial annotations without human intervention. Code/Data announcedπ
πReview https://t.ly/PDpr9
πPaper https://lnkd.in/dTbMuZCm
πProject https://lnkd.in/d66CYB4q
πRepo https://lnkd.in/dAGzShXj
β€8π₯7π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πSurface Light Tokenizerπ
πApple unveils LITO a novel latent flow matching model enables HQ image-to-3D. Latent representation that encodes a surface light field into a compact set of latent vectors. Impressive results but no codeπ₯²
πReview https://t.ly/xcWNe
πPaper https://lnkd.in/dYHwY4YX
πProject https://lnkd.in/dtJT8bXy
πApple unveils LITO a novel latent flow matching model enables HQ image-to-3D. Latent representation that encodes a surface light field into a compact set of latent vectors. Impressive results but no codeπ₯²
πReview https://t.ly/xcWNe
πPaper https://lnkd.in/dYHwY4YX
πProject https://lnkd.in/dtJT8bXy
β€8π4π₯2π2π€―1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ OmniStream Backbone βοΈ
πNovel unified streaming visual backbone that effectively perceives, reconstructs, and acts from diverse visual inputs. Repo/Models announcedπ
πReview https://t.ly/_zZMO
πPaper arxiv.org/pdf/2603.12265
πProject go2heart.github.io/omnistream/
πRepo github.com/Go2Heart/OmniStream
πNovel unified streaming visual backbone that effectively perceives, reconstructs, and acts from diverse visual inputs. Repo/Models announcedπ
πReview https://t.ly/_zZMO
πPaper arxiv.org/pdf/2603.12265
πProject go2heart.github.io/omnistream/
πRepo github.com/Go2Heart/OmniStream
β€6π2π€―2π©1
This media is not supported in your browser
VIEW IN TELEGRAM
π New SOTA Video Depth π
πDVD is the new Video Depth Estimation SOTA with full training suite available under Apache2.0π
πReview https://t.ly/gpCkG
πPaper https://arxiv.org/pdf/2603.12250
πProject https://dvd-project.github.io/
πRepo github.com/EnVision-Research/DVD
πDVD is the new Video Depth Estimation SOTA with full training suite available under Apache2.0π
πReview https://t.ly/gpCkG
πPaper https://arxiv.org/pdf/2603.12250
πProject https://dvd-project.github.io/
πRepo github.com/EnVision-Research/DVD
β€7π₯3π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π€Physically-Plausible Humanπ€
πPhysMoDPO is a novel direct preference optimization framework for humanoid motion generation. Repo under MITπ
πReview https://t.ly/clf8w
πPaper https://arxiv.org/pdf/2603.13228
πProject https://mael-zys.github.io/PhysMoDPO/
πRepo https://github.com/Mael-zys/PhysMoDPO
πPhysMoDPO is a novel direct preference optimization framework for humanoid motion generation. Repo under MITπ
πReview https://t.ly/clf8w
πPaper https://arxiv.org/pdf/2603.13228
πProject https://mael-zys.github.io/PhysMoDPO/
πRepo https://github.com/Mael-zys/PhysMoDPO
1β€4π₯2
This media is not supported in your browser
VIEW IN TELEGRAM
π§10,000Γ faster SAM-3Dπ§
πFast SAM 3D Body achieves up to 10.9Γ speedup, over 10,000Γ faster MHR-to-SMPL conversion -> real-time humanoid control from RGB. Repo availableπ
πReview https://t.ly/uHx84
πPaper https://arxiv.org/pdf/2603.15603
πProject yangtiming.github.io/Fast-SAM-3D-Body-Page/
πRepo https://github.com/yangtiming/Fast-SAM-3D-Body
πFast SAM 3D Body achieves up to 10.9Γ speedup, over 10,000Γ faster MHR-to-SMPL conversion -> real-time humanoid control from RGB. Repo availableπ
πReview https://t.ly/uHx84
πPaper https://arxiv.org/pdf/2603.15603
πProject yangtiming.github.io/Fast-SAM-3D-Body-Page/
πRepo https://github.com/yangtiming/Fast-SAM-3D-Body
π₯9β€2π2
This media is not supported in your browser
VIEW IN TELEGRAM
πMaterial-Aware Groupingπ
πMaterial Magic Wand (Adobe) is a tool for material-aware grouping of parts in untextured 3D meshes. Given one selected part, it automatically retrieves the other parts in the same shape by its material. Repo announcedπ
πReview https://t.ly/q00SU
πPaper https://arxiv.org/pdf/2603.17370
πProject umangi-jain.github.io/material-magic-wand/
πRepo TBA
πMaterial Magic Wand (Adobe) is a tool for material-aware grouping of parts in untextured 3D meshes. Given one selected part, it automatically retrieves the other parts in the same shape by its material. Repo announcedπ
πReview https://t.ly/q00SU
πPaper https://arxiv.org/pdf/2603.17370
πProject umangi-jain.github.io/material-magic-wand/
πRepo TBA
π₯4
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ͺOccAny: Universal 3D Occupancyπ¦ͺ
πOccAny by Valeo is a novel unified framework for generalized unconstrained urban 3D occupancy prediction. Repo under Apache 2.0π
πReview https://t.ly/FFiU0
πPaper https://arxiv.org/pdf/2603.23502
πProject https://valeoai.github.io/OccAny/
πRepo https://github.com/valeoai/OccAny
πOccAny by Valeo is a novel unified framework for generalized unconstrained urban 3D occupancy prediction. Repo under Apache 2.0π
πReview https://t.ly/FFiU0
πPaper https://arxiv.org/pdf/2603.23502
πProject https://valeoai.github.io/OccAny/
πRepo https://github.com/valeoai/OccAny
π₯6π2β€1
This media is not supported in your browser
VIEW IN TELEGRAM
πPose-Appearance-Motion for HOIπ
πPAM is a novel PoseβAppearanceβMotion Engine for controllable HandβObject Interaction SOTA video generation. Repo/models availableπ
πReview https://t.ly/JU4MD
πPaper arxiv.org/pdf/2603.22193
πProject gasaiyu.github.io/PAM.github.io/
πRepo https://github.com/GasaiYU/PAM
πPAM is a novel PoseβAppearanceβMotion Engine for controllable HandβObject Interaction SOTA video generation. Repo/models availableπ
πReview https://t.ly/JU4MD
πPaper arxiv.org/pdf/2603.22193
πProject gasaiyu.github.io/PAM.github.io/
πRepo https://github.com/GasaiYU/PAM
β€7π2π₯2
Please open Telegram to view this post
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ GaussianGPT 3D GSCπ₯
πFrom TUM, GaussianGPT: transformer-based 3D Gaussians generation via next-token prediction -> full 3D complex indoor scene. Repo announcedπ
πReview https://t.ly/bj-lL
πPaper arxiv.org/pdf/2603.26661
πProject nicolasvonluetzow.github.io/GaussianGPT/
πRepo TBA
πFrom TUM, GaussianGPT: transformer-based 3D Gaussians generation via next-token prediction -> full 3D complex indoor scene. Repo announcedπ
πReview https://t.ly/bj-lL
πPaper arxiv.org/pdf/2603.26661
πProject nicolasvonluetzow.github.io/GaussianGPT/
πRepo TBA
π₯8β€2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πHandX: Scaling Hands Motionπ
π HandX is a unified foundation spanning data, annotation, and evaluation: novel large-scale dataset of bimanual & dexterous motions with fine-grained textual. Around 6M frames. Repo availableπ
πReview https://t.ly/1nGxw
πPaper https://arxiv.org/pdf/2603.28766
πProject https://handx-project.github.io/
πRepo github.com/handx-project/HandX
π HandX is a unified foundation spanning data, annotation, and evaluation: novel large-scale dataset of bimanual & dexterous motions with fine-grained textual. Around 6M frames. Repo availableπ
πReview https://t.ly/1nGxw
πPaper https://arxiv.org/pdf/2603.28766
πProject https://handx-project.github.io/
πRepo github.com/handx-project/HandX
π₯9β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π΅SOTA Training-Free In-Context Segmentationπ΅
πINSID3 is the new SOTA, training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. Repo under Apache 2.0π
πReview https://t.ly/NVWHN
πPaper arxiv.org/pdf/2603.28480
πProject visinf.github.io/INSID3/
πRepo github.com/visinf/INSID3
πINSID3 is the new SOTA, training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. Repo under Apache 2.0π
πReview https://t.ly/NVWHN
πPaper arxiv.org/pdf/2603.28480
πProject visinf.github.io/INSID3/
πRepo github.com/visinf/INSID3
β€16π₯2π€©2π1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ¬Camera Raw Image Generationπͺ¬
πRawGen by #Samsung is a generative approach that learns the complex distribution of raw sensor data directly, enabling high-fidelity generation from either text descriptions or standard sRGB images across arbitrary camera sensors. Linear raw image once, then apply any ISP operation. Repo announcedπ
πReview https://t.ly/_QVKP
πPaper https://arxiv.org/pdf/2604.00093
πProject https://dy112.github.io/rawgen-page/
πRepo TBA
πRawGen by #Samsung is a generative approach that learns the complex distribution of raw sensor data directly, enabling high-fidelity generation from either text descriptions or standard sRGB images across arbitrary camera sensors. Linear raw image once, then apply any ISP operation. Repo announcedπ
πReview https://t.ly/_QVKP
πPaper https://arxiv.org/pdf/2604.00093
πProject https://dy112.github.io/rawgen-page/
πRepo TBA
β€2π₯2π1
If you have to invest TODAY 1B$ on a frontier tech for the next decade, would you invest in space, agentic, quantum or frugal GPUs? Vote here: https://t.ly/hSx6i
π€£3β€1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πVideo Object Deletionπ
πVoid by Netflix is a novel video object removal framework designed to perform physically-plausible inpainting in very complex scenarios. Repo under Apache 2.0π
πReview https://t.ly/cMVny
πPaper https://arxiv.org/pdf/2604.02296
πProject https://void-model.github.io/
πRepo https://github.com/Netflix/void-model
πVoid by Netflix is a novel video object removal framework designed to perform physically-plausible inpainting in very complex scenarios. Repo under Apache 2.0π
πReview https://t.ly/cMVny
πPaper https://arxiv.org/pdf/2604.02296
πProject https://void-model.github.io/
πRepo https://github.com/Netflix/void-model
β€3π€―2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Vanast: VTON w/ Human Animationπ₯
πSNU unveils a novel unified framework that generates garment-transferred human animation videos directly from a single human/garment images, and pose guidance clip. Repo announcedπ
πReview https://t.ly/c0t79
πPaper arxiv.org/pdf/2604.04934
πProject hyunsoocha.github.io/vanast/
πRepo github.com/snuvclab/vanast
πSNU unveils a novel unified framework that generates garment-transferred human animation videos directly from a single human/garment images, and pose guidance clip. Repo announcedπ
πReview https://t.ly/c0t79
πPaper arxiv.org/pdf/2604.04934
πProject hyunsoocha.github.io/vanast/
πRepo github.com/snuvclab/vanast
β€6π€―1