This media is not supported in your browser
VIEW IN TELEGRAM
๐ SOTA Style Transfer ๐
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
โค12๐2๐ฅ1๐คฏ1๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Metric Anything is out ๐
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐ฅ11โค6๐1
โค8
This media is not supported in your browser
VIEW IN TELEGRAM
๐Segment Any Events by Language๐
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐ฅ7โค4๐1๐คฏ1
๐RAM prices skyrocketing
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐คฃ25โค4๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎCoWTracker: Track-Warping๐ฎ
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐ฅ4โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐TrajVG Trajectory-Geometry๐
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
โค7๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชMOMENTUM #NeurIPS 2025 ๐ช
๐MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐
๐Review https://t.ly/06h7Q
๐Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Repo TBA
๐MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐
๐Review https://t.ly/06h7Q
๐Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Repo TBA
๐3๐ฅ2โค1
๐ถโ๐ซ๏ธ SOTA Full-Head Synthesis ๐ถโ๐ซ๏ธ
๐HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐
๐Review https://t.ly/WYfP3
๐Paper arxiv.org/pdf/2509.16748
๐Project https://lhyfst.github.io/hyplanehead/
๐Repo github.com/lhyfst/HyPlaneHead
๐HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐
๐Review https://t.ly/WYfP3
๐Paper arxiv.org/pdf/2509.16748
๐Project https://lhyfst.github.io/hyplanehead/
๐Repo github.com/lhyfst/HyPlaneHead
โค3๐ฅ3๐2๐1๐ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ AnyTouch 2 is out ๐
๐AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data๐
๐Review https://t.ly/fP4dP
๐Paper https://arxiv.org/pdf/2602.09617
๐Project gewu-lab.github.io/AnyTouch2/
๐Repo github.com/GeWu-Lab/AnyTouch2
๐AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data๐
๐Review https://t.ly/fP4dP
๐Paper https://arxiv.org/pdf/2602.09617
๐Project gewu-lab.github.io/AnyTouch2/
๐Repo github.com/GeWu-Lab/AnyTouch2
โค6๐ฅ1
Vote here please ๐
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
โค2๐ฅ1
๐ AGENT BANANA (SOTA) ๐
๐Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced๐
๐Review https://t.ly/EXaCH
๐Paper https://arxiv.org/pdf/2602.09084
๐Project https://agent-banana.github.io/
๐Repo https://github.com/taco-group/agent-banana
๐Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced๐
๐Review https://t.ly/EXaCH
๐Paper https://arxiv.org/pdf/2602.09084
๐Project https://agent-banana.github.io/
๐Repo https://github.com/taco-group/agent-banana
โค12๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ ๏ธ IndustryShapes 6D Pose ๐ ๏ธ
๐IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available๐
๐Review https://t.ly/KKcuH
๐Paper https://arxiv.org/pdf/2602.05555
๐Project https://pose-lab.github.io/IndustryShapes/
๐Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
๐IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available๐
๐Review https://t.ly/KKcuH
๐Paper https://arxiv.org/pdf/2602.05555
๐Project https://pose-lab.github.io/IndustryShapes/
๐Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
โค8๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐คGeneralized Human Tracking๐ค
๐Beijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
๐Review https://t.ly/ucmuB
๐Paper arxiv.org/pdf/2601.23080
๐Project zeonsunlightyu.github.io/RGMT.github.io
๐Beijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
๐Review https://t.ly/ucmuB
๐Paper arxiv.org/pdf/2601.23080
๐Project zeonsunlightyu.github.io/RGMT.github.io
๐ฅ11โค2๐คฏ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซงSurfPhase: 3D Interfacial Dynamics๐ซง
๐SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announced๐
๐Review https://t.ly/g2P5F
๐Paper https://arxiv.org/pdf/2602.11154
๐Project https://yuegao.me/SurfPhase/
๐Repo github.com/yuegao/SurfPhase
๐SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announced๐
๐Review https://t.ly/g2P5F
๐Paper https://arxiv.org/pdf/2602.11154
๐Project https://yuegao.me/SurfPhase/
๐Repo github.com/yuegao/SurfPhase
โค6๐ฅ2๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชฟTeaching AI to illusions๐ชฟ
๐Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code released๐
๐Review https://t.ly/98Oim
๐Paper https://lnkd.in/dTA7iuce
๐Project https://lnkd.in/dhTMGw23
๐Repo https://lnkd.in/deQyDGFu
๐Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code released๐
๐Review https://t.ly/98Oim
๐Paper https://lnkd.in/dTA7iuce
๐Project https://lnkd.in/dhTMGw23
๐Repo https://lnkd.in/deQyDGFu
โค7๐2๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅConversational Segmentation๐ฅ
๐CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo released๐
๐Review https://t.ly/SsG57
๐Paper arxiv.org/pdf/2602.13195
๐Project glab-caltech.github.io/converseg/
๐Repo github.com/AadSah/ConverSeg
๐Demo glab-caltech.github.io/converseg/#interactive-demo
๐CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo released๐
๐Review https://t.ly/SsG57
๐Paper arxiv.org/pdf/2602.13195
๐Project glab-caltech.github.io/converseg/
๐Repo github.com/AadSah/ConverSeg
๐Demo glab-caltech.github.io/converseg/#interactive-demo
โค6๐ฅ3๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฒ Efficient VLMs ๐ฒ
๐CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announced๐
๐Review https://t.ly/3_GqN
๐Paper https://arxiv.org/pdf/2602.13191
๐Project https://sayands.github.io/cope/
๐Repo TBA
๐CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announced๐
๐Review https://t.ly/3_GqN
๐Paper https://arxiv.org/pdf/2602.13191
๐Project https://sayands.github.io/cope/
๐Repo TBA
๐ฅ11โค5๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Dex4D: Task-Agnostic Track๐
๐Dex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0๐
๐Review https://t.ly/ZGx9T
๐Paper arxiv.org/pdf/2602.15828
๐Project dex4d.github.io/
๐Sim github.com/Dex4D/Dex4D-Simulation
๐Vision github.com/Dex4D/Dex4D-Vision
๐HW https://github.com/Dex4D/Dex4D-Hardware
๐Dex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0๐
๐Review https://t.ly/ZGx9T
๐Paper arxiv.org/pdf/2602.15828
๐Project dex4d.github.io/
๐Sim github.com/Dex4D/Dex4D-Simulation
๐Vision github.com/Dex4D/Dex4D-Vision
๐HW https://github.com/Dex4D/Dex4D-Hardware
โค8๐ฅ1๐1