If you have to invest TODAY 1B$ on a frontier tech for the next decade, would you invest in space, agentic, quantum or frugal GPUs? Vote here: https://t.ly/hSx6i
π€£3β€1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πVideo Object Deletionπ
πVoid by Netflix is a novel video object removal framework designed to perform physically-plausible inpainting in very complex scenarios. Repo under Apache 2.0π
πReview https://t.ly/cMVny
πPaper https://arxiv.org/pdf/2604.02296
πProject https://void-model.github.io/
πRepo https://github.com/Netflix/void-model
πVoid by Netflix is a novel video object removal framework designed to perform physically-plausible inpainting in very complex scenarios. Repo under Apache 2.0π
πReview https://t.ly/cMVny
πPaper https://arxiv.org/pdf/2604.02296
πProject https://void-model.github.io/
πRepo https://github.com/Netflix/void-model
β€4π€―3π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Vanast: VTON w/ Human Animationπ₯
πSNU unveils a novel unified framework that generates garment-transferred human animation videos directly from a single human/garment images, and pose guidance clip. Repo announcedπ
πReview https://t.ly/c0t79
πPaper arxiv.org/pdf/2604.04934
πProject hyunsoocha.github.io/vanast/
πRepo github.com/snuvclab/vanast
πSNU unveils a novel unified framework that generates garment-transferred human animation videos directly from a single human/garment images, and pose guidance clip. Repo announcedπ
πReview https://t.ly/c0t79
πPaper arxiv.org/pdf/2604.04934
πProject hyunsoocha.github.io/vanast/
πRepo github.com/snuvclab/vanast
β€6π2π₯1π€―1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯BoxerNet: SOTA 2D->3D BBsπ₯
πBoxer by META: transformer-based network to lift 2D BB proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/mlmV1
πPaper https://arxiv.org/pdf/2604.05212
πProject facebookresearch.github.io/boxer/
πRepo github.com/facebookresearch/boxer
πBoxer by META: transformer-based network to lift 2D BB proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/mlmV1
πPaper https://arxiv.org/pdf/2604.05212
πProject facebookresearch.github.io/boxer/
πRepo github.com/facebookresearch/boxer
π€―9π1π₯1
Media is too big
VIEW IN TELEGRAM
Here the preview, tomorrow the full clip from official source :)
β€5π₯1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ1.1M Metric VTON Datasetπͺ
πGoogle's Fit-Inclusive Try-on: large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. Repo & dataset announcedπ
πReview https://t.ly/cs-pt
πPaper arxiv.org/pdf/2604.08526
πProject johannakarras.github.io/FIT/
πRepo TBA
πGoogle's Fit-Inclusive Try-on: large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. Repo & dataset announcedπ
πReview https://t.ly/cs-pt
πPaper arxiv.org/pdf/2604.08526
πProject johannakarras.github.io/FIT/
πRepo TBA
π₯8β€2π1
π6D Object Pose w/ Deformationπ
πDeSOPE by Xidian & #MagicLeap is a novel large-scale dataset for 6DoF deformed objects: 665K pose annotations produced via a semiautomatic pipeline. Repo & Dataset announcedπ
πReview https://t.ly/M5VgX
πPaper https://arxiv.org/pdf/2604.06720
πProject https://desope-6d.github.io/
πRepo TBA
πDeSOPE by Xidian & #MagicLeap is a novel large-scale dataset for 6DoF deformed objects: 665K pose annotations produced via a semiautomatic pipeline. Repo & Dataset announcedπ
πReview https://t.ly/M5VgX
πPaper https://arxiv.org/pdf/2604.06720
πProject https://desope-6d.github.io/
πRepo TBA
π₯8β€3π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯SOTA 3D Detection in the wildπ₯
πWildDet3D is a novel unified geometry-aware architecture for 3D detection that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. New SOTA! Repo, models and iphone π
πReview https://t.ly/8NxBN
πPaper arxiv.org/pdf/2604.08626
πProject allenai.github.io/WildDet3D/
πRepo github.com/allenai/WildDet3D
πWildDet3D is a novel unified geometry-aware architecture for 3D detection that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. New SOTA! Repo, models and iphone π
πReview https://t.ly/8NxBN
πPaper arxiv.org/pdf/2604.08626
πProject allenai.github.io/WildDet3D/
πRepo github.com/allenai/WildDet3D
π₯7β€4π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π§΄OmniShow Content Creationπ§΄
πOmniShow is the novel SOTA in content creation with industry-grade performance. Impressive results, best with audio. Repo announcedπ
πReview https://t.ly/Pm-7U
πPaper arxiv.org/pdf/2604.11804
πProject correr-zhou.github.io/OmniShow/
πRepo github.com/Correr-Zhou/OmniShow
πOmniShow is the novel SOTA in content creation with industry-grade performance. Impressive results, best with audio. Repo announcedπ
πReview https://t.ly/Pm-7U
πPaper arxiv.org/pdf/2604.11804
πProject correr-zhou.github.io/OmniShow/
πRepo github.com/Correr-Zhou/OmniShow
β€6π€―6π’1
This media is not supported in your browser
VIEW IN TELEGRAM
πInteractive Objects from EgoVideoπ
πEgoFun3D by Simon Fraser University is a coordinated task, dataset and benchmark for modeling interactive 3D objects from egocentric videos. Repo (TBA), demo & datasetπ
πReview https://t.ly/YhGN7
πPaper arxiv.org/pdf/2604.11038
πProject 3dlg-hcvc.github.io/EgoFun3D/
πRepo github.com/3dlg-hcvc/EgoFun3D
πDemo bc79fea884062374b3.gradio.live/
πEgoFun3D by Simon Fraser University is a coordinated task, dataset and benchmark for modeling interactive 3D objects from egocentric videos. Repo (TBA), demo & datasetπ
πReview https://t.ly/YhGN7
πPaper arxiv.org/pdf/2604.11038
πProject 3dlg-hcvc.github.io/EgoFun3D/
πRepo github.com/3dlg-hcvc/EgoFun3D
πDemo bc79fea884062374b3.gradio.live/
β€2π€―2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π±3D Human-Object Contactπ±
πPi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announcedπ
πReview https://t.ly/TAgG1
πPaper https://arxiv.org/pdf/2604.12923
πProject https://pi-hoc.github.io/
πRepo https://github.com/SravanChittupalli/Pi-HOC
πPi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announcedπ
πReview https://t.ly/TAgG1
πPaper https://arxiv.org/pdf/2604.12923
πProject https://pi-hoc.github.io/
πRepo https://github.com/SravanChittupalli/Pi-HOC
π₯3β€2π2π€©2
This media is not supported in your browser
VIEW IN TELEGRAM
πGCT 3D Reconstructionπ
πANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/ExodA
πPaper https://arxiv.org/pdf/2604.14141
πProject https://arxiv.org/pdf/2604.14141
πRepo github.com/robbyant/lingbot-map
πANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/ExodA
πPaper https://arxiv.org/pdf/2604.14141
πProject https://arxiv.org/pdf/2604.14141
πRepo github.com/robbyant/lingbot-map
π₯8β€4π3
This media is not supported in your browser
VIEW IN TELEGRAM
π©βπ¦°Deformable 3D Hairπ©βπ¦°
πXiβan Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announcedπ
πReview https://t.ly/kWZdd
πPaper https://arxiv.org/pdf/2604.14782
πProject yuansun-xjtu.github.io/CompHairHead.io/
πRepo yuansun-xjtu.github.io/CompHairHead.io/
πXiβan Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announcedπ
πReview https://t.ly/kWZdd
πPaper https://arxiv.org/pdf/2604.14782
πProject yuansun-xjtu.github.io/CompHairHead.io/
πRepo yuansun-xjtu.github.io/CompHairHead.io/
β€6π₯3π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πMobile Ultra-detailed Avatarsπ
πGiven skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announcedπ
πReview https://t.ly/QPCy6
πPaper https://arxiv.org/pdf/2604.18583
πProject https://vcai.mpi-inf.mpg.de/projects/MUA/
πRepo TBA
πGiven skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announcedπ
πReview https://t.ly/QPCy6
πPaper https://arxiv.org/pdf/2604.18583
πProject https://vcai.mpi-inf.mpg.de/projects/MUA/
πRepo TBA
β€10π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πFace Anything 4D (SOTA)π
πA novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announcedπ
πReview https://t.ly/zItie
πPaper https://arxiv.org/pdf/2604.19702
πProject kocasariumut.github.io/FaceAnything
πRepo TBA
πA novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announcedπ
πReview https://t.ly/zItie
πPaper https://arxiv.org/pdf/2604.19702
πProject kocasariumut.github.io/FaceAnything
πRepo TBA
β€4π₯2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π PY4AI 2026: here we are! π
πThe third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
πThe third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
β€10π1π€―1π’1π€©1
Please open Telegram to view this post
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
π Reshoot-Anything is out π
πReshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 π
πReview https://t.ly/MIqAc
πPaper https://arxiv.org/pdf/2604.21776
πProject adithyaiyer1999.github.io/reshoot-anything/
πRepo github.com/morphicfilms/video-to-video
πReshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 π
πReview https://t.ly/MIqAc
πPaper https://arxiv.org/pdf/2604.21776
πProject adithyaiyer1999.github.io/reshoot-anything/
πRepo github.com/morphicfilms/video-to-video
β€4π₯4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ββοΈHolistic Shot Boundary Detectionπ§ββοΈ
πOmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmarkπ
πReview https://t.ly/sTi7N
πPaper https://arxiv.org/pdf/2604.24762
πProject uva-computer-vision-lab.github.io/OmniShotCut_website/
πRepo github.com/UVA-Computer-Vision-Lab/OmniShotCut
πOmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmarkπ
πReview https://t.ly/sTi7N
πPaper https://arxiv.org/pdf/2604.24762
πProject uva-computer-vision-lab.github.io/OmniShotCut_website/
πRepo github.com/UVA-Computer-Vision-Lab/OmniShotCut
π₯4β€2π1π1