πΆβπ«οΈ SOTA Full-Head Synthesis πΆβπ«οΈ
πHyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announcedπ
πReview https://t.ly/WYfP3
πPaper arxiv.org/pdf/2509.16748
πProject https://lhyfst.github.io/hyplanehead/
πRepo github.com/lhyfst/HyPlaneHead
πHyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announcedπ
πReview https://t.ly/WYfP3
πPaper arxiv.org/pdf/2509.16748
πProject https://lhyfst.github.io/hyplanehead/
πRepo github.com/lhyfst/HyPlaneHead
β€3π₯3π2π1π’1
This media is not supported in your browser
VIEW IN TELEGRAM
π AnyTouch 2 is out π
πAnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Dataπ
πReview https://t.ly/fP4dP
πPaper https://arxiv.org/pdf/2602.09617
πProject gewu-lab.github.io/AnyTouch2/
πRepo github.com/GeWu-Lab/AnyTouch2
πAnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Dataπ
πReview https://t.ly/fP4dP
πPaper https://arxiv.org/pdf/2602.09617
πProject gewu-lab.github.io/AnyTouch2/
πRepo github.com/GeWu-Lab/AnyTouch2
β€6π₯1
Vote here please π
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
β€2π₯1
π AGENT BANANA (SOTA) π
πAgent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announcedπ
πReview https://t.ly/EXaCH
πPaper https://arxiv.org/pdf/2602.09084
πProject https://agent-banana.github.io/
πRepo https://github.com/taco-group/agent-banana
πAgent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announcedπ
πReview https://t.ly/EXaCH
πPaper https://arxiv.org/pdf/2602.09084
πProject https://agent-banana.github.io/
πRepo https://github.com/taco-group/agent-banana
β€12π1
This media is not supported in your browser
VIEW IN TELEGRAM
π οΈ IndustryShapes 6D Pose π οΈ
πIndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset availableπ
πReview https://t.ly/KKcuH
πPaper https://arxiv.org/pdf/2602.05555
πProject https://pose-lab.github.io/IndustryShapes/
πDataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
πIndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset availableπ
πReview https://t.ly/KKcuH
πPaper https://arxiv.org/pdf/2602.05555
πProject https://pose-lab.github.io/IndustryShapes/
πDataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
β€8π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π€Generalized Human Trackingπ€
πBeijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
πReview https://t.ly/ucmuB
πPaper arxiv.org/pdf/2601.23080
πProject zeonsunlightyu.github.io/RGMT.github.io
πBeijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
πReview https://t.ly/ucmuB
πPaper arxiv.org/pdf/2601.23080
πProject zeonsunlightyu.github.io/RGMT.github.io
π₯11β€2π€―2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π«§SurfPhase: 3D Interfacial Dynamicsπ«§
πSurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announcedπ
πReview https://t.ly/g2P5F
πPaper https://arxiv.org/pdf/2602.11154
πProject https://yuegao.me/SurfPhase/
πRepo github.com/yuegao/SurfPhase
πSurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announcedπ
πReview https://t.ly/g2P5F
πPaper https://arxiv.org/pdf/2602.11154
πProject https://yuegao.me/SurfPhase/
πRepo github.com/yuegao/SurfPhase
β€5π₯2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺΏTeaching AI to illusionsπͺΏ
πStroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code releasedπ
πReview https://t.ly/98Oim
πPaper https://lnkd.in/dTA7iuce
πProject https://lnkd.in/dhTMGw23
πRepo https://lnkd.in/deQyDGFu
πStroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code releasedπ
πReview https://t.ly/98Oim
πPaper https://lnkd.in/dTA7iuce
πProject https://lnkd.in/dhTMGw23
πRepo https://lnkd.in/deQyDGFu
β€7π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Conversational Segmentationπ₯
πCIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo releasedπ
πReview https://t.ly/SsG57
πPaper arxiv.org/pdf/2602.13195
πProject glab-caltech.github.io/converseg/
πRepo github.com/AadSah/ConverSeg
πDemo glab-caltech.github.io/converseg/#interactive-demo
πCIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo releasedπ
πReview https://t.ly/SsG57
πPaper arxiv.org/pdf/2602.13195
πProject glab-caltech.github.io/converseg/
πRepo github.com/AadSah/ConverSeg
πDemo glab-caltech.github.io/converseg/#interactive-demo
β€5π₯3π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π² Efficient VLMs π²
πCoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announcedπ
πReview https://t.ly/3_GqN
πPaper https://arxiv.org/pdf/2602.13191
πProject https://sayands.github.io/cope/
πRepo TBA
πCoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announcedπ
πReview https://t.ly/3_GqN
πPaper https://arxiv.org/pdf/2602.13191
πProject https://sayands.github.io/cope/
πRepo TBA
π₯11β€5π1
This media is not supported in your browser
VIEW IN TELEGRAM
πDex4D: Task-Agnostic Trackπ
πDex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0π
πReview https://t.ly/ZGx9T
πPaper arxiv.org/pdf/2602.15828
πProject dex4d.github.io/
πSim github.com/Dex4D/Dex4D-Simulation
πVision github.com/Dex4D/Dex4D-Vision
πHW https://github.com/Dex4D/Dex4D-Hardware
πDex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0π
πReview https://t.ly/ZGx9T
πPaper arxiv.org/pdf/2602.15828
πProject dex4d.github.io/
πSim github.com/Dex4D/Dex4D-Simulation
πVision github.com/Dex4D/Dex4D-Vision
πHW https://github.com/Dex4D/Dex4D-Hardware
β€8π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π€Video Neural Compressionπ€
πTeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3Γ faster. Code announcedπ
πReview https://t.ly/0AtCK
πPaper arxiv.org/pdf/2602.16711
πProject namithap10.github.io/teconerv/
πRepo github.com/namithap10/TeCoNeRV/
πTeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3Γ faster. Code announcedπ
πReview https://t.ly/0AtCK
πPaper arxiv.org/pdf/2602.16711
πProject namithap10.github.io/teconerv/
πRepo github.com/namithap10/TeCoNeRV/
π₯9β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯New SOTA Planar Trackingπ₯
πWOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0π
πReview https://t.ly/VUOe5
πPaper https://lnkd.in/dZfc_DhQ
πRepo https://lnkd.in/dAcneJGn
πWOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0π
πReview https://t.ly/VUOe5
πPaper https://lnkd.in/dZfc_DhQ
πRepo https://lnkd.in/dAcneJGn
π₯8π3β€2π1π€―1π€£1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π«Έ World-Grounded Hand-Objπ«Έ
πWHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announcedπ
πReview https://t.ly/c5w8h
πPaper https://arxiv.org/pdf/2602.22209
πProject https://judyye.github.io/whole-www/
πRepo TBA
πWHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announcedπ
πReview https://t.ly/c5w8h
πPaper https://arxiv.org/pdf/2602.22209
πProject https://judyye.github.io/whole-www/
πRepo TBA
β€2π2π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§±Solaris: generative #Minecraftπ§±
πNYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & Datasetπ
πReview https://t.ly/VrcrT
πPaper https://arxiv.org/pdf/2602.22208
πProject https://solaris-wm.github.io/
πRepo https://github.com/solaris-wm/
πNYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & Datasetπ
πReview https://t.ly/VrcrT
πPaper https://arxiv.org/pdf/2602.22208
πProject https://solaris-wm.github.io/
πRepo https://github.com/solaris-wm/
π₯6β€2π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Geometry-Aware 4D Headπ¦
π GeoDiff4D is a novel framework that reconstructs animatable 4D head avatars from a single portrait image through geometry-aware diffusion. Code announcedπ
πReview https://t.ly/J9L-t
πPaper https://lnkd.in/ddpv-78g
πProject https://lnkd.in/d-vhukyj
πRepo https://lnkd.in/dzd6mnFv
π GeoDiff4D is a novel framework that reconstructs animatable 4D head avatars from a single portrait image through geometry-aware diffusion. Code announcedπ
πReview https://t.ly/J9L-t
πPaper https://lnkd.in/ddpv-78g
πProject https://lnkd.in/d-vhukyj
πRepo https://lnkd.in/dzd6mnFv
β€3π3π₯2