This media is not supported in your browser
VIEW IN TELEGRAM
๐นSOTA Part-level Generator๐น
๐A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released๐
๐Review https://t.ly/leB_R
๐Paper arxiv.org/pdf/2601.10909
๐Project coral79.github.io/frankenmotion/
๐Repo github.com/Coral79/FrankenMotion-Code
๐A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released๐
๐Review https://t.ly/leB_R
๐Paper arxiv.org/pdf/2601.10909
๐Project coral79.github.io/frankenmotion/
๐Repo github.com/Coral79/FrankenMotion-Code
โค3๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ #META 3D Casual Captures ๐
๐#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0๐
๐Review https://t.ly/j08sJ
๐Paper arxiv.org/pdf/2601.11514
๐Project facebookresearch.github.io/ShapeR/
๐Repo github.com/facebookresearch/ShapeR
๐#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0๐
๐Review https://t.ly/j08sJ
๐Paper arxiv.org/pdf/2601.11514
๐Project facebookresearch.github.io/ShapeR/
๐Repo github.com/facebookresearch/ShapeR
๐ฅ7โค4๐1
๐Foundation Medical SAM3 ๐
๐Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced๐
๐Review https://t.ly/C6jcy
๐Paper https://arxiv.org/pdf/2601.10880
๐Project chongcongjiang.github.io/MedicalSAM3/#
๐Repo github.com/AIM-Research-Lab/Medical-SAM3
๐Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced๐
๐Review https://t.ly/C6jcy
๐Paper https://arxiv.org/pdf/2601.10880
๐Project chongcongjiang.github.io/MedicalSAM3/#
๐Repo github.com/AIM-Research-Lab/Medical-SAM3
โค13๐ฅ3๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆงMask-Guided Matting๐ฆง
๐VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo๐
๐Review https://t.ly/l_0f8
๐Paper arxiv.org/pdf/2601.14255
๐Project cvlab-kaist.github.io/VideoMaMa
๐Repo github.com/cvlab-kaist/VideoMaMa
๐Demo huggingface.co/spaces/SammyLim/VideoMaMa
๐VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo๐
๐Review https://t.ly/l_0f8
๐Paper arxiv.org/pdf/2601.14255
๐Project cvlab-kaist.github.io/VideoMaMa
๐Repo github.com/cvlab-kaist/VideoMaMa
๐Demo huggingface.co/spaces/SammyLim/VideoMaMa
โค5๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐MoRo: Human Motion๐
๐Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐
๐Review https://t.ly/kK_je
๐Paper arxiv.org/pdf/2601.16079
๐Project mikeqzy.github.io/MoRo/
๐Repo github.com/mikeqzy/MoRo
๐Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐
๐Review https://t.ly/kK_je
๐Paper arxiv.org/pdf/2601.16079
๐Project mikeqzy.github.io/MoRo/
๐Repo github.com/mikeqzy/MoRo
โค6๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ BBoxMaskPose v2 is fire ๐ฅ
๐BBoxMaskPose v2 by ฤVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐
๐Review https://t.ly/GkkDl
๐Paper arxiv.org/pdf/2601.15200
๐Project https://lnkd.in/dQ_3hxjC
๐Repo https://lnkd.in/dVqwD3jN
๐BBoxMaskPose v2 by ฤVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐
๐Review https://t.ly/GkkDl
๐Paper arxiv.org/pdf/2601.15200
๐Project https://lnkd.in/dQ_3hxjC
๐Repo https://lnkd.in/dVqwD3jN
โค5๐3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Generalized-Scale Counting๐ฆ
๐GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo ๐
๐Review https://t.ly/2_7I8
๐Paper https://arxiv.org/pdf/2511.08048
๐Repo https://github.com/jerpelhan/GECO2
๐Demo huggingface.co/spaces/jerpelhan/GECO2-demo
๐GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo ๐
๐Review https://t.ly/2_7I8
๐Paper https://arxiv.org/pdf/2511.08048
๐Repo https://github.com/jerpelhan/GECO2
๐Demo huggingface.co/spaces/jerpelhan/GECO2-demo
๐11โค1๐ฅ1
๐ฅ๐ฅSuper-Hard Poll folks๐ฅ๐ฅ
๐ This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
๐ This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
โค5๐1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ปMLLMs Fine Segmentation๐ป
๐SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐
๐Review https://t.ly/eVguh
๐Paper arxiv.org/pdf/2601.19228
๐Project simpleseg.github.io/
๐Repo github.com/songtianhui/SimpleSeg
๐SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐
๐Review https://t.ly/eVguh
๐Paper arxiv.org/pdf/2601.19228
๐Project simpleseg.github.io/
๐Repo github.com/songtianhui/SimpleSeg
๐ฅ4๐3โค2๐1
๐ฅ DeepSeek-OCR 2 is out ๐ฅ
๐DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐
๐Review https://t.ly/gX4bX
๐Paper https://arxiv.org/pdf/2601.20552
๐Repo github.com/deepseek-ai/DeepSeek-OCR-2
๐DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐
๐Review https://t.ly/gX4bX
๐Paper https://arxiv.org/pdf/2601.20552
๐Repo github.com/deepseek-ai/DeepSeek-OCR-2
โค8๐ฅ7๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ SOTA Style Transfer ๐
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
โค12๐2๐ฅ1๐คฏ1๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Metric Anything is out ๐
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐ฅ11โค5๐1
โค7
This media is not supported in your browser
VIEW IN TELEGRAM
๐Segment Any Events by Language๐
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐ฅ7โค4๐1๐คฏ1
๐RAM prices skyrocketing
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐คฃ24โค4๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎCoWTracker: Track-Warping๐ฎ
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐ฅ4โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐TrajVG Trajectory-Geometry๐
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
โค7๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชMOMENTUM #NeurIPS 2025 ๐ช
๐MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐
๐Review https://t.ly/06h7Q
๐Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Repo TBA
๐MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐
๐Review https://t.ly/06h7Q
๐Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Repo TBA
๐3โค1๐ฅ1
๐ถโ๐ซ๏ธ SOTA Full-Head Synthesis ๐ถโ๐ซ๏ธ
๐HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐
๐Review https://t.ly/WYfP3
๐Paper arxiv.org/pdf/2509.16748
๐Project https://lhyfst.github.io/hyplanehead/
๐Repo github.com/lhyfst/HyPlaneHead
๐HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐
๐Review https://t.ly/WYfP3
๐Paper arxiv.org/pdf/2509.16748
๐Project https://lhyfst.github.io/hyplanehead/
๐Repo github.com/lhyfst/HyPlaneHead
โค3๐ฅ3๐2๐1๐ข1