Media is too big
VIEW IN TELEGRAM
Here the preview, tomorrow the full clip from official source :)
โค5๐ฅ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช1.1M Metric VTON Dataset๐ช
๐Google's Fit-Inclusive Try-on: large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. Repo & dataset announced๐
๐Review https://t.ly/cs-pt
๐Paper arxiv.org/pdf/2604.08526
๐Project johannakarras.github.io/FIT/
๐Repo TBA
๐Google's Fit-Inclusive Try-on: large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. Repo & dataset announced๐
๐Review https://t.ly/cs-pt
๐Paper arxiv.org/pdf/2604.08526
๐Project johannakarras.github.io/FIT/
๐Repo TBA
๐ฅ8โค2๐1
๐6D Object Pose w/ Deformation๐
๐DeSOPE by Xidian & #MagicLeap is a novel large-scale dataset for 6DoF deformed objects: 665K pose annotations produced via a semiautomatic pipeline. Repo & Dataset announced๐
๐Review https://t.ly/M5VgX
๐Paper https://arxiv.org/pdf/2604.06720
๐Project https://desope-6d.github.io/
๐Repo TBA
๐DeSOPE by Xidian & #MagicLeap is a novel large-scale dataset for 6DoF deformed objects: 665K pose annotations produced via a semiautomatic pipeline. Repo & Dataset announced๐
๐Review https://t.ly/M5VgX
๐Paper https://arxiv.org/pdf/2604.06720
๐Project https://desope-6d.github.io/
๐Repo TBA
๐ฅ8โค3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅSOTA 3D Detection in the wild๐ฅ
๐WildDet3D is a novel unified geometry-aware architecture for 3D detection that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. New SOTA! Repo, models and iphone ๐
๐Review https://t.ly/8NxBN
๐Paper arxiv.org/pdf/2604.08626
๐Project allenai.github.io/WildDet3D/
๐Repo github.com/allenai/WildDet3D
๐WildDet3D is a novel unified geometry-aware architecture for 3D detection that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. New SOTA! Repo, models and iphone ๐
๐Review https://t.ly/8NxBN
๐Paper arxiv.org/pdf/2604.08626
๐Project allenai.github.io/WildDet3D/
๐Repo github.com/allenai/WildDet3D
๐ฅ7โค4๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งดOmniShow Content Creation๐งด
๐OmniShow is the novel SOTA in content creation with industry-grade performance. Impressive results, best with audio. Repo announced๐
๐Review https://t.ly/Pm-7U
๐Paper arxiv.org/pdf/2604.11804
๐Project correr-zhou.github.io/OmniShow/
๐Repo github.com/Correr-Zhou/OmniShow
๐OmniShow is the novel SOTA in content creation with industry-grade performance. Impressive results, best with audio. Repo announced๐
๐Review https://t.ly/Pm-7U
๐Paper arxiv.org/pdf/2604.11804
๐Project correr-zhou.github.io/OmniShow/
๐Repo github.com/Correr-Zhou/OmniShow
โค6๐คฏ6๐ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Interactive Objects from EgoVideo๐
๐EgoFun3D by Simon Fraser University is a coordinated task, dataset and benchmark for modeling interactive 3D objects from egocentric videos. Repo (TBA), demo & dataset๐
๐Review https://t.ly/YhGN7
๐Paper arxiv.org/pdf/2604.11038
๐Project 3dlg-hcvc.github.io/EgoFun3D/
๐Repo github.com/3dlg-hcvc/EgoFun3D
๐Demo bc79fea884062374b3.gradio.live/
๐EgoFun3D by Simon Fraser University is a coordinated task, dataset and benchmark for modeling interactive 3D objects from egocentric videos. Repo (TBA), demo & dataset๐
๐Review https://t.ly/YhGN7
๐Paper arxiv.org/pdf/2604.11038
๐Project 3dlg-hcvc.github.io/EgoFun3D/
๐Repo github.com/3dlg-hcvc/EgoFun3D
๐Demo bc79fea884062374b3.gradio.live/
โค2๐คฏ2๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฑ3D Human-Object Contact๐ฑ
๐Pi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announced๐
๐Review https://t.ly/TAgG1
๐Paper https://arxiv.org/pdf/2604.12923
๐Project https://pi-hoc.github.io/
๐Repo https://github.com/SravanChittupalli/Pi-HOC
๐Pi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announced๐
๐Review https://t.ly/TAgG1
๐Paper https://arxiv.org/pdf/2604.12923
๐Project https://pi-hoc.github.io/
๐Repo https://github.com/SravanChittupalli/Pi-HOC
๐ฅ3โค2๐2๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐GCT 3D Reconstruction๐
๐ANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 International๐
๐Review https://t.ly/ExodA
๐Paper https://arxiv.org/pdf/2604.14141
๐Project https://arxiv.org/pdf/2604.14141
๐Repo github.com/robbyant/lingbot-map
๐ANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 International๐
๐Review https://t.ly/ExodA
๐Paper https://arxiv.org/pdf/2604.14141
๐Project https://arxiv.org/pdf/2604.14141
๐Repo github.com/robbyant/lingbot-map
๐ฅ9โค4๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฉโ๐ฆฐDeformable 3D Hair๐ฉโ๐ฆฐ
๐Xiโan Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announced๐
๐Review https://t.ly/kWZdd
๐Paper https://arxiv.org/pdf/2604.14782
๐Project yuansun-xjtu.github.io/CompHairHead.io/
๐Repo yuansun-xjtu.github.io/CompHairHead.io/
๐Xiโan Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announced๐
๐Review https://t.ly/kWZdd
๐Paper https://arxiv.org/pdf/2604.14782
๐Project yuansun-xjtu.github.io/CompHairHead.io/
๐Repo yuansun-xjtu.github.io/CompHairHead.io/
โค6๐ฅ3๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Mobile Ultra-detailed Avatars๐
๐Given skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announced๐
๐Review https://t.ly/QPCy6
๐Paper https://arxiv.org/pdf/2604.18583
๐Project https://vcai.mpi-inf.mpg.de/projects/MUA/
๐Repo TBA
๐Given skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announced๐
๐Review https://t.ly/QPCy6
๐Paper https://arxiv.org/pdf/2604.18583
๐Project https://vcai.mpi-inf.mpg.de/projects/MUA/
๐Repo TBA
โค10๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Face Anything 4D (SOTA)๐
๐A novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announced๐
๐Review https://t.ly/zItie
๐Paper https://arxiv.org/pdf/2604.19702
๐Project kocasariumut.github.io/FaceAnything
๐Repo TBA
๐A novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announced๐
๐Review https://t.ly/zItie
๐Paper https://arxiv.org/pdf/2604.19702
๐Project kocasariumut.github.io/FaceAnything
๐Repo TBA
โค5๐ฅ2๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ PY4AI 2026: here we are! ๐
๐The third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
๐The third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
โค10๐1๐คฏ1๐ข1๐คฉ1
Please open Telegram to view this post
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Reshoot-Anything is out ๐
๐Reshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 ๐
๐Review https://t.ly/MIqAc
๐Paper https://arxiv.org/pdf/2604.21776
๐Project adithyaiyer1999.github.io/reshoot-anything/
๐Repo github.com/morphicfilms/video-to-video
๐Reshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 ๐
๐Review https://t.ly/MIqAc
๐Paper https://arxiv.org/pdf/2604.21776
๐Project adithyaiyer1999.github.io/reshoot-anything/
๐Repo github.com/morphicfilms/video-to-video
โค5๐ฅ4๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งโโ๏ธHolistic Shot Boundary Detection๐งโโ๏ธ
๐OmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmark๐
๐Review https://t.ly/sTi7N
๐Paper https://arxiv.org/pdf/2604.24762
๐Project uva-computer-vision-lab.github.io/OmniShotCut_website/
๐Repo github.com/UVA-Computer-Vision-Lab/OmniShotCut
๐OmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmark๐
๐Review https://t.ly/sTi7N
๐Paper https://arxiv.org/pdf/2604.24762
๐Project uva-computer-vision-lab.github.io/OmniShotCut_website/
๐Repo github.com/UVA-Computer-Vision-Lab/OmniShotCut
๐ฅ6โค3๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชSyn4D: Multiview Synthetic 4D Dataset๐ช
๐Syn4D is novel multi-view synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations๐
๐Review https://t.ly/SL1mk
๐Paper https://arxiv.org/pdf/2605.05207
๐Project https://jzr99.github.io/Syn4D/
๐Repo https://github.com/jzr99/Syn4D
๐Data huggingface.co/datasets/Syn4D/Syn4D_RGBD/tree/main
๐Syn4D is novel multi-view synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations๐
๐Review https://t.ly/SL1mk
๐Paper https://arxiv.org/pdf/2605.05207
๐Project https://jzr99.github.io/Syn4D/
๐Repo https://github.com/jzr99/Syn4D
๐Data huggingface.co/datasets/Syn4D/Syn4D_RGBD/tree/main
โค5๐ฅ5๐2๐1
About the frequency of posting in the channel:
Anonymous Poll
63%
๐ 1 per day is great
37%
๐ a few posts per day (such as breaking news with less details) would be better
โค4๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆUnified Correspondence Transformer๐ฆ
๐UniCorrn is the first correspondence model with shared weights that unifies 2D-2D, 2D-3D, and 3D-3D geometric matching with a transformer. CC BY-NC-SA 4.0๐
๐Review https://t.ly/2OBdq
๐Paper https://arxiv.org/pdf/2605.04044
๐Project https://neu-vi.github.io/UniCorrn/
๐Repo https://github.com/neu-vi/UniCorrn
๐UniCorrn is the first correspondence model with shared weights that unifies 2D-2D, 2D-3D, and 3D-3D geometric matching with a transformer. CC BY-NC-SA 4.0๐
๐Review https://t.ly/2OBdq
๐Paper https://arxiv.org/pdf/2605.04044
๐Project https://neu-vi.github.io/UniCorrn/
๐Repo https://github.com/neu-vi/UniCorrn
๐5๐ฅ5โค4๐คฏ4๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Count Anything, Any Granularity๐
๐Open-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text specifies the intended semantic granularity across five explicit levels. Repo/Data under Apache๐
๐Review https://t.ly/nqz80
๐Paper https://lnkd.in/dp7khTRU
๐Project https://lnkd.in/d_jfX_Yn
๐Repo https://lnkd.in/dkTRGZkG
๐Data https://lnkd.in/dB83jRyT
๐Open-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text specifies the intended semantic granularity across five explicit levels. Repo/Data under Apache๐
๐Review https://t.ly/nqz80
๐Paper https://lnkd.in/dp7khTRU
๐Project https://lnkd.in/d_jfX_Yn
๐Repo https://lnkd.in/dkTRGZkG
๐Data https://lnkd.in/dB83jRyT
1โค12๐6๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชLatent Decoding Pixel Diffusion๐ช
๐PiD by Nvidia is a plug-and-play diffusion decoder that replaces VAE/RAE decoders, turning latent representations directly into super-resolved pixels in a single pass. Repo under Apache 2.0๐
๐Review https://t.ly/y19mA
๐Paper https://lnkd.in/duVC25C2
๐Project https://lnkd.in/dW6TkzCB
๐Repo https://lnkd.in/dnGdgKRr
๐PiD by Nvidia is a plug-and-play diffusion decoder that replaces VAE/RAE decoders, turning latent representations directly into super-resolved pixels in a single pass. Repo under Apache 2.0๐
๐Review https://t.ly/y19mA
๐Paper https://lnkd.in/duVC25C2
๐Project https://lnkd.in/dW6TkzCB
๐Repo https://lnkd.in/dnGdgKRr
๐ฅ5โค4๐1๐1