This media is not supported in your browser
VIEW IN TELEGRAM
๐ปMLLMs Fine Segmentation๐ป
๐SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐
๐Review https://t.ly/eVguh
๐Paper arxiv.org/pdf/2601.19228
๐Project simpleseg.github.io/
๐Repo github.com/songtianhui/SimpleSeg
๐SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐
๐Review https://t.ly/eVguh
๐Paper arxiv.org/pdf/2601.19228
๐Project simpleseg.github.io/
๐Repo github.com/songtianhui/SimpleSeg
๐ฅ4๐3โค2๐1
๐ฅ DeepSeek-OCR 2 is out ๐ฅ
๐DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐
๐Review https://t.ly/gX4bX
๐Paper https://arxiv.org/pdf/2601.20552
๐Repo github.com/deepseek-ai/DeepSeek-OCR-2
๐DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐
๐Review https://t.ly/gX4bX
๐Paper https://arxiv.org/pdf/2601.20552
๐Repo github.com/deepseek-ai/DeepSeek-OCR-2
๐ฅ7โค4๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ SOTA Style Transfer ๐
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
โค10๐2๐ฅ1๐คฏ1๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Metric Anything is out ๐
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐ฅ10โค5๐1
โค6
This media is not supported in your browser
VIEW IN TELEGRAM
๐Segment Any Events by Language๐
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐ฅ7โค4๐1๐คฏ1
๐RAM prices skyrocketing
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐คฃ21โค4๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎCoWTracker: Track-Warping๐ฎ
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐ฅ4โค2
This media is not supported in your browser
VIEW IN TELEGRAM
๐TrajVG Trajectory-Geometry๐
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
โค7๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชMOMENTUM #NeurIPS 2025 ๐ช
๐MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐
๐Review https://t.ly/06h7Q
๐Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Repo TBA
๐MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐
๐Review https://t.ly/06h7Q
๐Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Repo TBA
๐3โค1
๐ถโ๐ซ๏ธ SOTA Full-Head Synthesis ๐ถโ๐ซ๏ธ
๐HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐
๐Review https://t.ly/WYfP3
๐Paper arxiv.org/pdf/2509.16748
๐Project https://lhyfst.github.io/hyplanehead/
๐Repo github.com/lhyfst/HyPlaneHead
๐HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐
๐Review https://t.ly/WYfP3
๐Paper arxiv.org/pdf/2509.16748
๐Project https://lhyfst.github.io/hyplanehead/
๐Repo github.com/lhyfst/HyPlaneHead
โค4๐ฅ3๐1๐ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ AnyTouch 2 is out ๐
๐AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data๐
๐Review https://t.ly/fP4dP
๐Paper https://arxiv.org/pdf/2602.09617
๐Project gewu-lab.github.io/AnyTouch2/
๐Repo github.com/GeWu-Lab/AnyTouch2
๐AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data๐
๐Review https://t.ly/fP4dP
๐Paper https://arxiv.org/pdf/2602.09617
๐Project gewu-lab.github.io/AnyTouch2/
๐Repo github.com/GeWu-Lab/AnyTouch2
โค6
Vote here please ๐
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
โค2
๐ AGENT BANANA (SOTA) ๐
๐Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced๐
๐Review https://t.ly/EXaCH
๐Paper https://arxiv.org/pdf/2602.09084
๐Project https://agent-banana.github.io/
๐Repo https://github.com/taco-group/agent-banana
๐Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced๐
๐Review https://t.ly/EXaCH
๐Paper https://arxiv.org/pdf/2602.09084
๐Project https://agent-banana.github.io/
๐Repo https://github.com/taco-group/agent-banana
โค11๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ ๏ธ IndustryShapes 6D Pose ๐ ๏ธ
๐IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available๐
๐Review https://t.ly/KKcuH
๐Paper https://arxiv.org/pdf/2602.05555
๐Project https://pose-lab.github.io/IndustryShapes/
๐Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
๐IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available๐
๐Review https://t.ly/KKcuH
๐Paper https://arxiv.org/pdf/2602.05555
๐Project https://pose-lab.github.io/IndustryShapes/
๐Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
โค3๐ฅ1