This media is not supported in your browser
VIEW IN TELEGRAM
π Multi-target SAM3 π
πSAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM Licenseπ
πReview https://t.ly/jJOAr
πPaper https://arxiv.org/pdf/2601.09699
πRepo https://github.com/FudanCVL/SAM3-DMS
πSAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM Licenseπ
πReview https://t.ly/jJOAr
πPaper https://arxiv.org/pdf/2601.09699
πRepo https://github.com/FudanCVL/SAM3-DMS
π₯5β€2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΏ100M Video Action DatasetπΏ
πAction100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research Licenseπ
πReview https://t.ly/w5KXe
πPaper arxiv.org/pdf/2601.10592
πRepo github.com/facebookresearch/Action100M
πAction100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research Licenseπ
πReview https://t.ly/w5KXe
πPaper arxiv.org/pdf/2601.10592
πRepo github.com/facebookresearch/Action100M
π₯10π2β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πInteractive Humanoid Generationπ
πFlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) π
πReview https://t.ly/aQhol
πPaper arxiv.org/pdf/2601.10103
πProject grisoon.github.io/FlowAct-R1/
πFlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) π
πReview https://t.ly/aQhol
πPaper arxiv.org/pdf/2601.10103
πProject grisoon.github.io/FlowAct-R1/
β€9π€―6π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π’3D Human Gen-Segπ’
πCoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasingπ
πReview https://t.ly/khSkm
πPaper arxiv.org/pdf/2601.10632
πProject igl-hkust.github.io/CoMoVi/
πRepo github.com/IGL-HKUST/CoMoVi
πData huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
πCoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasingπ
πReview https://t.ly/khSkm
πPaper arxiv.org/pdf/2601.10632
πProject igl-hkust.github.io/CoMoVi/
πRepo github.com/IGL-HKUST/CoMoVi
πData huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
π₯3β€1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉSOTA Part-level GeneratorπΉ
πA novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be releasedπ
πReview https://t.ly/leB_R
πPaper arxiv.org/pdf/2601.10909
πProject coral79.github.io/frankenmotion/
πRepo github.com/Coral79/FrankenMotion-Code
πA novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be releasedπ
πReview https://t.ly/leB_R
πPaper arxiv.org/pdf/2601.10909
πProject coral79.github.io/frankenmotion/
πRepo github.com/Coral79/FrankenMotion-Code
β€3π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π #META 3D Casual Captures π
π#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0π
πReview https://t.ly/j08sJ
πPaper arxiv.org/pdf/2601.11514
πProject facebookresearch.github.io/ShapeR/
πRepo github.com/facebookresearch/ShapeR
π#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0π
πReview https://t.ly/j08sJ
πPaper arxiv.org/pdf/2601.11514
πProject facebookresearch.github.io/ShapeR/
πRepo github.com/facebookresearch/ShapeR
π₯7β€4π1
πFoundation Medical SAM3 π
πMedical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announcedπ
πReview https://t.ly/C6jcy
πPaper https://arxiv.org/pdf/2601.10880
πProject chongcongjiang.github.io/MedicalSAM3/#
πRepo github.com/AIM-Research-Lab/Medical-SAM3
πMedical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announcedπ
πReview https://t.ly/C6jcy
πPaper https://arxiv.org/pdf/2601.10880
πProject chongcongjiang.github.io/MedicalSAM3/#
πRepo github.com/AIM-Research-Lab/Medical-SAM3
β€12π₯3π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦§Mask-Guided Mattingπ¦§
πVideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demoπ
πReview https://t.ly/l_0f8
πPaper arxiv.org/pdf/2601.14255
πProject cvlab-kaist.github.io/VideoMaMa
πRepo github.com/cvlab-kaist/VideoMaMa
πDemo huggingface.co/spaces/SammyLim/VideoMaMa
πVideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demoπ
πReview https://t.ly/l_0f8
πPaper arxiv.org/pdf/2601.14255
πProject cvlab-kaist.github.io/VideoMaMa
πRepo github.com/cvlab-kaist/VideoMaMa
πDemo huggingface.co/spaces/SammyLim/VideoMaMa
β€5π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πMoRo: Human Motionπ
πMasked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo releasedπ
πReview https://t.ly/kK_je
πPaper arxiv.org/pdf/2601.16079
πProject mikeqzy.github.io/MoRo/
πRepo github.com/mikeqzy/MoRo
πMasked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo releasedπ
πReview https://t.ly/kK_je
πPaper arxiv.org/pdf/2601.16079
πProject mikeqzy.github.io/MoRo/
πRepo github.com/mikeqzy/MoRo
β€6π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ BBoxMaskPose v2 is fire π₯
πBBoxMaskPose v2 by ΔVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data availableπ
πReview https://t.ly/GkkDl
πPaper arxiv.org/pdf/2601.15200
πProject https://lnkd.in/dQ_3hxjC
πRepo https://lnkd.in/dVqwD3jN
πBBoxMaskPose v2 by ΔVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data availableπ
πReview https://t.ly/GkkDl
πPaper arxiv.org/pdf/2601.15200
πProject https://lnkd.in/dQ_3hxjC
πRepo https://lnkd.in/dVqwD3jN
β€5π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Generalized-Scale Countingπ¦
πGeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo π
πReview https://t.ly/2_7I8
πPaper https://arxiv.org/pdf/2511.08048
πRepo https://github.com/jerpelhan/GECO2
πDemo huggingface.co/spaces/jerpelhan/GECO2-demo
πGeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo π
πReview https://t.ly/2_7I8
πPaper https://arxiv.org/pdf/2511.08048
πRepo https://github.com/jerpelhan/GECO2
πDemo huggingface.co/spaces/jerpelhan/GECO2-demo
π10β€1π₯1
π₯π₯Super-Hard Poll folksπ₯π₯
π This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
π This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
β€4π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π»MLLMs Fine Segmentationπ»
πSimpleSeg: MLLMs with native pixel-level perception. Repo & Model availableπ
πReview https://t.ly/eVguh
πPaper arxiv.org/pdf/2601.19228
πProject simpleseg.github.io/
πRepo github.com/songtianhui/SimpleSeg
πSimpleSeg: MLLMs with native pixel-level perception. Repo & Model availableπ
πReview https://t.ly/eVguh
πPaper arxiv.org/pdf/2601.19228
πProject simpleseg.github.io/
πRepo github.com/songtianhui/SimpleSeg
π₯4π3β€2π1
π₯ DeepSeek-OCR 2 is out π₯
πDeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weightsπ
πReview https://t.ly/gX4bX
πPaper https://arxiv.org/pdf/2601.20552
πRepo github.com/deepseek-ai/DeepSeek-OCR-2
πDeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weightsπ
πReview https://t.ly/gX4bX
πPaper https://arxiv.org/pdf/2601.20552
πRepo github.com/deepseek-ai/DeepSeek-OCR-2
π₯7β€4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π SOTA Style Transfer π
πTeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelβs robust capabilities in content preservation & style customization. Code & Model releasedπ
πReview https://t.ly/viVR0
πPaper arxiv.org/pdf/2601.20175
πProject tele-ai.github.io/TeleStyle/
πRepo github.com/Tele-AI/TeleStyle
πTeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelβs robust capabilities in content preservation & style customization. Code & Model releasedπ
πReview https://t.ly/viVR0
πPaper arxiv.org/pdf/2601.20175
πProject tele-ai.github.io/TeleStyle/
πRepo github.com/Tele-AI/TeleStyle
β€10π1π₯1π€―1π€£1
This media is not supported in your browser
VIEW IN TELEGRAM
π Metric Anything is out π
πMetric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced π
πReview https://t.ly/54Ccr
πPaper arxiv.org/pdf/2601.22054
πProject metric-anything.github.io/metric-anything-io/
πRepo github.com/metric-anything/metric-anything
πMetric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced π
πReview https://t.ly/54Ccr
πPaper arxiv.org/pdf/2601.22054
πProject metric-anything.github.io/metric-anything-io/
πRepo github.com/metric-anything/metric-anything
π₯9β€5π1
β€3
This media is not supported in your browser
VIEW IN TELEGRAM
πSegment Any Events by Languageπ
πSEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announcedπ
πReview https://t.ly/1ZMF0
πPaper https://arxiv.org/pdf/2601.23159
πProject https://0nandon.github.io/SEAL/
πRepo https://github.com/0nandon/SEAL
πSEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announcedπ
πReview https://t.ly/1ZMF0
πPaper https://arxiv.org/pdf/2601.23159
πProject https://0nandon.github.io/SEAL/
πRepo https://github.com/0nandon/SEAL
π₯5β€2π1π€―1
πRAM prices skyrocketing
πMe acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
πMe acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
π€£13β€4π₯1π1