This media is not supported in your browser
VIEW IN TELEGRAM
๐Interactive Humanoid Generation๐
๐FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) ๐
๐Review https://t.ly/aQhol
๐Paper arxiv.org/pdf/2601.10103
๐Project grisoon.github.io/FlowAct-R1/
๐FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) ๐
๐Review https://t.ly/aQhol
๐Paper arxiv.org/pdf/2601.10103
๐Project grisoon.github.io/FlowAct-R1/
โค9๐คฏ6๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ข3D Human Gen-Seg๐ข
๐CoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing๐
๐Review https://t.ly/khSkm
๐Paper arxiv.org/pdf/2601.10632
๐Project igl-hkust.github.io/CoMoVi/
๐Repo github.com/IGL-HKUST/CoMoVi
๐Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
๐CoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing๐
๐Review https://t.ly/khSkm
๐Paper arxiv.org/pdf/2601.10632
๐Project igl-hkust.github.io/CoMoVi/
๐Repo github.com/IGL-HKUST/CoMoVi
๐Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
๐ฅ3โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐นSOTA Part-level Generator๐น
๐A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released๐
๐Review https://t.ly/leB_R
๐Paper arxiv.org/pdf/2601.10909
๐Project coral79.github.io/frankenmotion/
๐Repo github.com/Coral79/FrankenMotion-Code
๐A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released๐
๐Review https://t.ly/leB_R
๐Paper arxiv.org/pdf/2601.10909
๐Project coral79.github.io/frankenmotion/
๐Repo github.com/Coral79/FrankenMotion-Code
โค3๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ #META 3D Casual Captures ๐
๐#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0๐
๐Review https://t.ly/j08sJ
๐Paper arxiv.org/pdf/2601.11514
๐Project facebookresearch.github.io/ShapeR/
๐Repo github.com/facebookresearch/ShapeR
๐#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0๐
๐Review https://t.ly/j08sJ
๐Paper arxiv.org/pdf/2601.11514
๐Project facebookresearch.github.io/ShapeR/
๐Repo github.com/facebookresearch/ShapeR
๐ฅ7โค4๐1
๐Foundation Medical SAM3 ๐
๐Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced๐
๐Review https://t.ly/C6jcy
๐Paper https://arxiv.org/pdf/2601.10880
๐Project chongcongjiang.github.io/MedicalSAM3/#
๐Repo github.com/AIM-Research-Lab/Medical-SAM3
๐Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced๐
๐Review https://t.ly/C6jcy
๐Paper https://arxiv.org/pdf/2601.10880
๐Project chongcongjiang.github.io/MedicalSAM3/#
๐Repo github.com/AIM-Research-Lab/Medical-SAM3
โค13๐ฅ3๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆงMask-Guided Matting๐ฆง
๐VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo๐
๐Review https://t.ly/l_0f8
๐Paper arxiv.org/pdf/2601.14255
๐Project cvlab-kaist.github.io/VideoMaMa
๐Repo github.com/cvlab-kaist/VideoMaMa
๐Demo huggingface.co/spaces/SammyLim/VideoMaMa
๐VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo๐
๐Review https://t.ly/l_0f8
๐Paper arxiv.org/pdf/2601.14255
๐Project cvlab-kaist.github.io/VideoMaMa
๐Repo github.com/cvlab-kaist/VideoMaMa
๐Demo huggingface.co/spaces/SammyLim/VideoMaMa
โค5๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐MoRo: Human Motion๐
๐Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐
๐Review https://t.ly/kK_je
๐Paper arxiv.org/pdf/2601.16079
๐Project mikeqzy.github.io/MoRo/
๐Repo github.com/mikeqzy/MoRo
๐Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐
๐Review https://t.ly/kK_je
๐Paper arxiv.org/pdf/2601.16079
๐Project mikeqzy.github.io/MoRo/
๐Repo github.com/mikeqzy/MoRo
โค6๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ BBoxMaskPose v2 is fire ๐ฅ
๐BBoxMaskPose v2 by ฤVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐
๐Review https://t.ly/GkkDl
๐Paper arxiv.org/pdf/2601.15200
๐Project https://lnkd.in/dQ_3hxjC
๐Repo https://lnkd.in/dVqwD3jN
๐BBoxMaskPose v2 by ฤVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐
๐Review https://t.ly/GkkDl
๐Paper arxiv.org/pdf/2601.15200
๐Project https://lnkd.in/dQ_3hxjC
๐Repo https://lnkd.in/dVqwD3jN
โค5๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Generalized-Scale Counting๐ฆ
๐GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo ๐
๐Review https://t.ly/2_7I8
๐Paper https://arxiv.org/pdf/2511.08048
๐Repo https://github.com/jerpelhan/GECO2
๐Demo huggingface.co/spaces/jerpelhan/GECO2-demo
๐GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo ๐
๐Review https://t.ly/2_7I8
๐Paper https://arxiv.org/pdf/2511.08048
๐Repo https://github.com/jerpelhan/GECO2
๐Demo huggingface.co/spaces/jerpelhan/GECO2-demo
๐11โค1๐ฅ1
๐ฅ๐ฅSuper-Hard Poll folks๐ฅ๐ฅ
๐ This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
๐ This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
โค5๐1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ปMLLMs Fine Segmentation๐ป
๐SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐
๐Review https://t.ly/eVguh
๐Paper arxiv.org/pdf/2601.19228
๐Project simpleseg.github.io/
๐Repo github.com/songtianhui/SimpleSeg
๐SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐
๐Review https://t.ly/eVguh
๐Paper arxiv.org/pdf/2601.19228
๐Project simpleseg.github.io/
๐Repo github.com/songtianhui/SimpleSeg
๐ฅ4๐3โค2๐1
๐ฅ DeepSeek-OCR 2 is out ๐ฅ
๐DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐
๐Review https://t.ly/gX4bX
๐Paper https://arxiv.org/pdf/2601.20552
๐Repo github.com/deepseek-ai/DeepSeek-OCR-2
๐DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐
๐Review https://t.ly/gX4bX
๐Paper https://arxiv.org/pdf/2601.20552
๐Repo github.com/deepseek-ai/DeepSeek-OCR-2
๐ฅ7โค4๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ SOTA Style Transfer ๐
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
๐TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโs robust capabilities in content preservation & style customization. Code & Model released๐
๐Review https://t.ly/viVR0
๐Paper arxiv.org/pdf/2601.20175
๐Project tele-ai.github.io/TeleStyle/
๐Repo github.com/Tele-AI/TeleStyle
โค10๐2๐ฅ1๐คฏ1๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Metric Anything is out ๐
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐
๐Review https://t.ly/54Ccr
๐Paper arxiv.org/pdf/2601.22054
๐Project metric-anything.github.io/metric-anything-io/
๐Repo github.com/metric-anything/metric-anything
๐ฅ10โค5๐1
โค5
This media is not supported in your browser
VIEW IN TELEGRAM
๐Segment Any Events by Language๐
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐
๐Review https://t.ly/1ZMF0
๐Paper https://arxiv.org/pdf/2601.23159
๐Project https://0nandon.github.io/SEAL/
๐Repo https://github.com/0nandon/SEAL
๐ฅ6โค3๐1๐คฏ1
๐RAM prices skyrocketing
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐คฃ20โค4๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎCoWTracker: Track-Warping๐ฎ
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
๐CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐
๐Review https://t.ly/6bAn9
๐Paper https://arxiv.org/pdf/2602.04877
๐Project https://cowtracker.github.io/
๐Repo https://github.com/facebookresearch/cowtracker
โค2๐ฅ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐TrajVG Trajectory-Geometry๐
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
๐TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐
๐Review https://t.ly/yVi01
๐Paper arxiv.org/pdf/2602.04439
๐Project xingy038.github.io/TrajVG/
๐Repo github.com/xingy038/TrajVG
โค2๐ฅ2