πΆβπ«οΈ SOTA Full-Head Synthesis πΆβπ«οΈ
πHyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announcedπ
πReview https://t.ly/WYfP3
πPaper arxiv.org/pdf/2509.16748
πProject https://lhyfst.github.io/hyplanehead/
πRepo github.com/lhyfst/HyPlaneHead
πHyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announcedπ
πReview https://t.ly/WYfP3
πPaper arxiv.org/pdf/2509.16748
πProject https://lhyfst.github.io/hyplanehead/
πRepo github.com/lhyfst/HyPlaneHead
β€3π₯3π2π1π’1
This media is not supported in your browser
VIEW IN TELEGRAM
π AnyTouch 2 is out π
πAnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Dataπ
πReview https://t.ly/fP4dP
πPaper https://arxiv.org/pdf/2602.09617
πProject gewu-lab.github.io/AnyTouch2/
πRepo github.com/GeWu-Lab/AnyTouch2
πAnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Dataπ
πReview https://t.ly/fP4dP
πPaper https://arxiv.org/pdf/2602.09617
πProject gewu-lab.github.io/AnyTouch2/
πRepo github.com/GeWu-Lab/AnyTouch2
β€6π₯1
Vote here please π
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
β€2π₯1
π AGENT BANANA (SOTA) π
πAgent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announcedπ
πReview https://t.ly/EXaCH
πPaper https://arxiv.org/pdf/2602.09084
πProject https://agent-banana.github.io/
πRepo https://github.com/taco-group/agent-banana
πAgent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announcedπ
πReview https://t.ly/EXaCH
πPaper https://arxiv.org/pdf/2602.09084
πProject https://agent-banana.github.io/
πRepo https://github.com/taco-group/agent-banana
β€12π1
This media is not supported in your browser
VIEW IN TELEGRAM
π οΈ IndustryShapes 6D Pose π οΈ
πIndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset availableπ
πReview https://t.ly/KKcuH
πPaper https://arxiv.org/pdf/2602.05555
πProject https://pose-lab.github.io/IndustryShapes/
πDataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
πIndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset availableπ
πReview https://t.ly/KKcuH
πPaper https://arxiv.org/pdf/2602.05555
πProject https://pose-lab.github.io/IndustryShapes/
πDataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
β€8π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π€Generalized Human Trackingπ€
πBeijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
πReview https://t.ly/ucmuB
πPaper arxiv.org/pdf/2601.23080
πProject zeonsunlightyu.github.io/RGMT.github.io
πBeijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
πReview https://t.ly/ucmuB
πPaper arxiv.org/pdf/2601.23080
πProject zeonsunlightyu.github.io/RGMT.github.io
π₯11β€2π€―2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π«§SurfPhase: 3D Interfacial Dynamicsπ«§
πSurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announcedπ
πReview https://t.ly/g2P5F
πPaper https://arxiv.org/pdf/2602.11154
πProject https://yuegao.me/SurfPhase/
πRepo github.com/yuegao/SurfPhase
πSurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announcedπ
πReview https://t.ly/g2P5F
πPaper https://arxiv.org/pdf/2602.11154
πProject https://yuegao.me/SurfPhase/
πRepo github.com/yuegao/SurfPhase
β€6π₯2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺΏTeaching AI to illusionsπͺΏ
πStroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code releasedπ
πReview https://t.ly/98Oim
πPaper https://lnkd.in/dTA7iuce
πProject https://lnkd.in/dhTMGw23
πRepo https://lnkd.in/deQyDGFu
πStroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code releasedπ
πReview https://t.ly/98Oim
πPaper https://lnkd.in/dTA7iuce
πProject https://lnkd.in/dhTMGw23
πRepo https://lnkd.in/deQyDGFu
β€7π2π2
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Conversational Segmentationπ₯
πCIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo releasedπ
πReview https://t.ly/SsG57
πPaper arxiv.org/pdf/2602.13195
πProject glab-caltech.github.io/converseg/
πRepo github.com/AadSah/ConverSeg
πDemo glab-caltech.github.io/converseg/#interactive-demo
πCIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo releasedπ
πReview https://t.ly/SsG57
πPaper arxiv.org/pdf/2602.13195
πProject glab-caltech.github.io/converseg/
πRepo github.com/AadSah/ConverSeg
πDemo glab-caltech.github.io/converseg/#interactive-demo
β€6π₯3π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π² Efficient VLMs π²
πCoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announcedπ
πReview https://t.ly/3_GqN
πPaper https://arxiv.org/pdf/2602.13191
πProject https://sayands.github.io/cope/
πRepo TBA
πCoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announcedπ
πReview https://t.ly/3_GqN
πPaper https://arxiv.org/pdf/2602.13191
πProject https://sayands.github.io/cope/
πRepo TBA
π₯11β€5π1
This media is not supported in your browser
VIEW IN TELEGRAM
πDex4D: Task-Agnostic Trackπ
πDex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0π
πReview https://t.ly/ZGx9T
πPaper arxiv.org/pdf/2602.15828
πProject dex4d.github.io/
πSim github.com/Dex4D/Dex4D-Simulation
πVision github.com/Dex4D/Dex4D-Vision
πHW https://github.com/Dex4D/Dex4D-Hardware
πDex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0π
πReview https://t.ly/ZGx9T
πPaper arxiv.org/pdf/2602.15828
πProject dex4d.github.io/
πSim github.com/Dex4D/Dex4D-Simulation
πVision github.com/Dex4D/Dex4D-Vision
πHW https://github.com/Dex4D/Dex4D-Hardware
β€8π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π€Video Neural Compressionπ€
πTeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3Γ faster. Code announcedπ
πReview https://t.ly/0AtCK
πPaper arxiv.org/pdf/2602.16711
πProject namithap10.github.io/teconerv/
πRepo github.com/namithap10/TeCoNeRV/
πTeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3Γ faster. Code announcedπ
πReview https://t.ly/0AtCK
πPaper arxiv.org/pdf/2602.16711
πProject namithap10.github.io/teconerv/
πRepo github.com/namithap10/TeCoNeRV/
π₯10β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯New SOTA Planar Trackingπ₯
πWOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0π
πReview https://t.ly/VUOe5
πPaper https://lnkd.in/dZfc_DhQ
πRepo https://lnkd.in/dAcneJGn
πWOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0π
πReview https://t.ly/VUOe5
πPaper https://lnkd.in/dZfc_DhQ
πRepo https://lnkd.in/dAcneJGn
π₯8π4β€2π1π€―1π€£1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π«Έ World-Grounded Hand-Objπ«Έ
πWHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announcedπ
πReview https://t.ly/c5w8h
πPaper https://arxiv.org/pdf/2602.22209
πProject https://judyye.github.io/whole-www/
πRepo TBA
πWHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announcedπ
πReview https://t.ly/c5w8h
πPaper https://arxiv.org/pdf/2602.22209
πProject https://judyye.github.io/whole-www/
πRepo TBA
β€2π1π₯1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§±Solaris: generative #Minecraftπ§±
πNYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & Datasetπ
πReview https://t.ly/VrcrT
πPaper https://arxiv.org/pdf/2602.22208
πProject https://solaris-wm.github.io/
πRepo https://github.com/solaris-wm/
πNYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & Datasetπ
πReview https://t.ly/VrcrT
πPaper https://arxiv.org/pdf/2602.22208
πProject https://solaris-wm.github.io/
πRepo https://github.com/solaris-wm/
π₯6β€2π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Geometry-Aware 4D Headπ¦
π GeoDiff4D is a novel framework that reconstructs animatable 4D head avatars from a single portrait image through geometry-aware diffusion. Code announcedπ
πReview https://t.ly/J9L-t
πPaper https://lnkd.in/ddpv-78g
πProject https://lnkd.in/d-vhukyj
πRepo https://lnkd.in/dzd6mnFv
π GeoDiff4D is a novel framework that reconstructs animatable 4D head avatars from a single portrait image through geometry-aware diffusion. Code announcedπ
πReview https://t.ly/J9L-t
πPaper https://lnkd.in/ddpv-78g
πProject https://lnkd.in/d-vhukyj
πRepo https://lnkd.in/dzd6mnFv
β€5π3π1π₯1π€―1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πFully Offline Mobile-VTONπ
πA novel, hq, privacy-preserving framework that enables fully offline virtual try-on on commodity mobile devices using only a single user image and a garment image. Repo announced, to be releasedπ
πReview https://t.ly/dsrIn
πPaper arxiv.org/pdf/2603.00947
πProject zhenchenwan.github.io/Mobile-VTON/
πRepo https://github.com/tmllab/2026_CVPR_Mobile-VTON
πA novel, hq, privacy-preserving framework that enables fully offline virtual try-on on commodity mobile devices using only a single user image and a garment image. Repo announced, to be releasedπ
πReview https://t.ly/dsrIn
πPaper arxiv.org/pdf/2603.00947
πProject zhenchenwan.github.io/Mobile-VTON/
πRepo https://github.com/tmllab/2026_CVPR_Mobile-VTON
β€11π€―3π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺΏAll Point Clouds-One EncoderπͺΏ
πUtonia is a step toward one-from-all and one-for-all point cloud encoder. It pretrains a single encoder on diverse point cloud data and reuses it as a reliable backbone for downstream tasks. Code under Apache 2.0π
πReview https://t.ly/yqSyZ
πPaper https://arxiv.org/pdf/2603.03283
πProject pointcept.github.io/Utonia/
πRepo https://github.com/Pointcept/Utonia
πUtonia is a step toward one-from-all and one-for-all point cloud encoder. It pretrains a single encoder on diverse point cloud data and reuses it as a reliable backbone for downstream tasks. Code under Apache 2.0π
πReview https://t.ly/yqSyZ
πPaper https://arxiv.org/pdf/2603.03283
πProject pointcept.github.io/Utonia/
πRepo https://github.com/Pointcept/Utonia
β€7π₯2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺDuoMo: Dual Motion Diffusionπͺ
πDuoMo by META is a novel generative method that recovers human motion in world-space coordinates from unconstrained videos with noisy or incomplete observations. Code announcedπ
πReview https://t.ly/dnA3K
πPaper arxiv.org/pdf/2603.03265
πProject yufu-wang.github.io/duomo/
πRepo TBA
πDuoMo by META is a novel generative method that recovers human motion in world-space coordinates from unconstrained videos with noisy or incomplete observations. Code announcedπ
πReview https://t.ly/dnA3K
πPaper arxiv.org/pdf/2603.03265
πProject yufu-wang.github.io/duomo/
πRepo TBA
β€7π2π€―2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πAny Resolution, Any Geometryπ
πUltra Resolution Geometry Transformer (URGT) for arbitrary resolutions (e.g. 4K, 6K, 8K) depthβnormal estimation. New SOTA. Repo under MITπ
πReview https://t.ly/HXg1n
πPaper arxiv.org/pdf/2603.03026
πProject dreamaker-mrc.github.io/Any-Resolution-Any-Geometry/
πRepo github.com/Dreamaker-MrC/Any-Resolution-Any-Geometry
πUltra Resolution Geometry Transformer (URGT) for arbitrary resolutions (e.g. 4K, 6K, 8K) depthβnormal estimation. New SOTA. Repo under MITπ
πReview https://t.ly/HXg1n
πPaper arxiv.org/pdf/2603.03026
πProject dreamaker-mrc.github.io/Any-Resolution-Any-Geometry/
πRepo github.com/Dreamaker-MrC/Any-Resolution-Any-Geometry
π₯8β€6π1π1