This media is not supported in your browser
VIEW IN TELEGRAM
๐MoRo: Human Motion๐
๐Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐
๐Review https://t.ly/kK_je
๐Paper arxiv.org/pdf/2601.16079
๐Project mikeqzy.github.io/MoRo/
๐Repo github.com/mikeqzy/MoRo
๐Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐
๐Review https://t.ly/kK_je
๐Paper arxiv.org/pdf/2601.16079
๐Project mikeqzy.github.io/MoRo/
๐Repo github.com/mikeqzy/MoRo
โค6๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ BBoxMaskPose v2 is fire ๐ฅ
๐BBoxMaskPose v2 by ฤVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐
๐Review https://t.ly/GkkDl
๐Paper arxiv.org/pdf/2601.15200
๐Project https://lnkd.in/dQ_3hxjC
๐Repo https://lnkd.in/dVqwD3jN
๐BBoxMaskPose v2 by ฤVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐
๐Review https://t.ly/GkkDl
๐Paper arxiv.org/pdf/2601.15200
๐Project https://lnkd.in/dQ_3hxjC
๐Repo https://lnkd.in/dVqwD3jN
โค5๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Generalized-Scale Counting๐ฆ
๐GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo ๐
๐Review https://t.ly/2_7I8
๐Paper https://arxiv.org/pdf/2511.08048
๐Repo https://github.com/jerpelhan/GECO2
๐Demo huggingface.co/spaces/jerpelhan/GECO2-demo
๐GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo ๐
๐Review https://t.ly/2_7I8
๐Paper https://arxiv.org/pdf/2511.08048
๐Repo https://github.com/jerpelhan/GECO2
๐Demo huggingface.co/spaces/jerpelhan/GECO2-demo
๐11โค1๐ฅ1
๐ฅ๐ฅSuper-Hard Poll folks๐ฅ๐ฅ
๐ This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
๐ This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
โค5๐1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ปMLLMs Fine Segmentation๐ป
๐SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐
๐Review https://t.ly/eVguh
๐Paper arxiv.org/pdf/2601.19228
๐Project simpleseg.github.io/
๐Repo github.com/songtianhui/SimpleSeg
๐SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐
๐Review https://t.ly/eVguh
๐Paper arxiv.org/pdf/2601.19228
๐Project simpleseg.github.io/
๐Repo github.com/songtianhui/SimpleSeg
๐ฅ4๐3โค2๐1
๐ฅ DeepSeek-OCR 2 is out ๐ฅ
๐DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐
๐Review https://t.ly/gX4bX
๐Paper https://arxiv.org/pdf/2601.20552
๐Repo github.com/deepseek-ai/DeepSeek-OCR-2
๐DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐
๐Review https://t.ly/gX4bX
๐Paper https://arxiv.org/pdf/2601.20552
๐Repo github.com/deepseek-ai/DeepSeek-OCR-2
๐ฅ7โค4๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ SOTA Style Transfer ๐
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
โค10๐2๐ฅ1๐คฏ1๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Metric Anything is out ๐
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐ฅ10โค5๐1
โค6
This media is not supported in your browser
VIEW IN TELEGRAM
๐Segment Any Events by Language๐
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐ฅ7โค4๐1๐คฏ1
๐RAM prices skyrocketing
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐คฃ22โค4๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎCoWTracker: Track-Warping๐ฎ
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐ฅ4โค2
This media is not supported in your browser
VIEW IN TELEGRAM
๐TrajVG Trajectory-Geometry๐
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
โค7๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชMOMENTUM #NeurIPS 2025 ๐ช
๐MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐
๐Review https://t.ly/06h7Q
๐Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Repo TBA
๐MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐
๐Review https://t.ly/06h7Q
๐Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Repo TBA
๐3โค1
๐ถโ๐ซ๏ธ SOTA Full-Head Synthesis ๐ถโ๐ซ๏ธ
๐HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐
๐Review https://t.ly/WYfP3
๐Paper arxiv.org/pdf/2509.16748
๐Project https://lhyfst.github.io/hyplanehead/
๐Repo github.com/lhyfst/HyPlaneHead
๐HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐
๐Review https://t.ly/WYfP3
๐Paper arxiv.org/pdf/2509.16748
๐Project https://lhyfst.github.io/hyplanehead/
๐Repo github.com/lhyfst/HyPlaneHead
โค4๐ฅ3๐1๐ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ AnyTouch 2 is out ๐
๐AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data๐
๐Review https://t.ly/fP4dP
๐Paper https://arxiv.org/pdf/2602.09617
๐Project gewu-lab.github.io/AnyTouch2/
๐Repo github.com/GeWu-Lab/AnyTouch2
๐AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data๐
๐Review https://t.ly/fP4dP
๐Paper https://arxiv.org/pdf/2602.09617
๐Project gewu-lab.github.io/AnyTouch2/
๐Repo github.com/GeWu-Lab/AnyTouch2
โค5๐ฅ1
Vote here please ๐
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
https://www.linkedin.com/posts/visionarynet_py4ai-2026-coming-soon-activity-7427290532034265088-y69e
โค2๐ฅ1
๐ AGENT BANANA (SOTA) ๐
๐Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced๐
๐Review https://t.ly/EXaCH
๐Paper https://arxiv.org/pdf/2602.09084
๐Project https://agent-banana.github.io/
๐Repo https://github.com/taco-group/agent-banana
๐Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced๐
๐Review https://t.ly/EXaCH
๐Paper https://arxiv.org/pdf/2602.09084
๐Project https://agent-banana.github.io/
๐Repo https://github.com/taco-group/agent-banana
โค11๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ ๏ธ IndustryShapes 6D Pose ๐ ๏ธ
๐IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available๐
๐Review https://t.ly/KKcuH
๐Paper https://arxiv.org/pdf/2602.05555
๐Project https://pose-lab.github.io/IndustryShapes/
๐Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
๐IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available๐
๐Review https://t.ly/KKcuH
๐Paper https://arxiv.org/pdf/2602.05555
๐Project https://pose-lab.github.io/IndustryShapes/
๐Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
โค4๐ฅ1๐1