AI with Papers - Artificial Intelligence & Deep Learning
17.5K subscribers
152 photos
263 videos
14 files
1.39K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŽ‡ Multi-target SAM3 πŸŽ‡

πŸ‘‰SAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM LicenseπŸ’™

πŸ‘‰Review https://t.ly/jJOAr
πŸ‘‰Paper https://arxiv.org/pdf/2601.09699
πŸ‘‰Repo https://github.com/FudanCVL/SAM3-DMS
πŸ”₯5❀2πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍿100M Video Action Dataset🍿

πŸ‘‰Action100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research LicenseπŸ’™

πŸ‘‰Review https://t.ly/w5KXe
πŸ‘‰Paper arxiv.org/pdf/2601.10592
πŸ‘‰Repo github.com/facebookresearch/Action100M
πŸ”₯10πŸ‘2❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’œInteractive Humanoid GenerationπŸ’œ

πŸ‘‰FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) πŸ’™

πŸ‘‰Review https://t.ly/aQhol
πŸ‘‰Paper arxiv.org/pdf/2601.10103
πŸ‘‰Project grisoon.github.io/FlowAct-R1/
❀9🀯6πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’’3D Human Gen-SegπŸ’’

πŸ‘‰CoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasingπŸ’™

πŸ‘‰Review https://t.ly/khSkm
πŸ‘‰Paper arxiv.org/pdf/2601.10632
πŸ‘‰Project igl-hkust.github.io/CoMoVi/
πŸ‘‰Repo github.com/IGL-HKUST/CoMoVi
πŸ‘‰Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
πŸ”₯3❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ΉSOTA Part-level GeneratorπŸ‘Ή

πŸ‘‰A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be releasedπŸ’™

πŸ‘‰Review https://t.ly/leB_R
πŸ‘‰Paper arxiv.org/pdf/2601.10909
πŸ‘‰Project coral79.github.io/frankenmotion/
πŸ‘‰Repo github.com/Coral79/FrankenMotion-Code
❀3πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’š #META 3D Casual Captures πŸ’š

πŸ‘‰#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0πŸ’™

πŸ‘‰Review https://t.ly/j08sJ
πŸ‘‰Paper arxiv.org/pdf/2601.11514
πŸ‘‰Project facebookresearch.github.io/ShapeR/
πŸ‘‰Repo github.com/facebookresearch/ShapeR
πŸ”₯7❀4πŸ‘1
πŸ’ŠFoundation Medical SAM3 πŸ’Š

πŸ‘‰Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announcedπŸ’™

πŸ‘‰Review https://t.ly/C6jcy
πŸ‘‰Paper https://arxiv.org/pdf/2601.10880
πŸ‘‰Project chongcongjiang.github.io/MedicalSAM3/#
πŸ‘‰Repo github.com/AIM-Research-Lab/Medical-SAM3
❀12πŸ”₯3πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧Mask-Guided Matting🦧

πŸ‘‰VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & DemoπŸ’™

πŸ‘‰Review https://t.ly/l_0f8
πŸ‘‰Paper arxiv.org/pdf/2601.14255
πŸ‘‰Project cvlab-kaist.github.io/VideoMaMa
πŸ‘‰Repo github.com/cvlab-kaist/VideoMaMa
πŸ‘‰Demo huggingface.co/spaces/SammyLim/VideoMaMa
❀5πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’œMoRo: Human MotionπŸ’œ

πŸ‘‰Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo releasedπŸ’™

πŸ‘‰Review https://t.ly/kK_je
πŸ‘‰Paper arxiv.org/pdf/2601.16079
πŸ‘‰Project mikeqzy.github.io/MoRo/
πŸ‘‰Repo github.com/mikeqzy/MoRo
❀6πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ BBoxMaskPose v2 is fire πŸ”₯

πŸ‘‰BBoxMaskPose v2 by ČVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data availableπŸ’™

πŸ‘‰Review https://t.ly/GkkDl
πŸ‘‰Paper arxiv.org/pdf/2601.15200
πŸ‘‰Project https://lnkd.in/dQ_3hxjC
πŸ‘‰Repo https://lnkd.in/dVqwD3jN
❀5πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠Generalized-Scale Counting🦠

πŸ‘‰GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo πŸ’™

πŸ‘‰Review https://t.ly/2_7I8
πŸ‘‰Paper https://arxiv.org/pdf/2511.08048
πŸ‘‰Repo https://github.com/jerpelhan/GECO2
πŸ‘‰Demo huggingface.co/spaces/jerpelhan/GECO2-demo
πŸ‘10❀1πŸ”₯1
πŸ”₯πŸ”₯Super-Hard Poll folksπŸ”₯πŸ”₯

πŸ‘‰ This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG

(and of course comment here)
❀4πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌻MLLMs Fine Segmentation🌻

πŸ‘‰SimpleSeg: MLLMs with native pixel-level perception. Repo & Model availableπŸ’™

πŸ‘‰Review https://t.ly/eVguh
πŸ‘‰Paper arxiv.org/pdf/2601.19228
πŸ‘‰Project simpleseg.github.io/
πŸ‘‰Repo github.com/songtianhui/SimpleSeg
πŸ”₯4πŸ‘3❀2πŸ‘1
πŸ”₯ DeepSeek-OCR 2 is out πŸ”₯

πŸ‘‰DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weightsπŸ’™

πŸ‘‰Review https://t.ly/gX4bX
πŸ‘‰Paper https://arxiv.org/pdf/2601.20552
πŸ‘‰Repo github.com/deepseek-ai/DeepSeek-OCR-2
πŸ”₯7❀4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Š SOTA Style Transfer πŸ“Š

πŸ‘‰TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base model’s robust capabilities in content preservation & style customization. Code & Model releasedπŸ’™

πŸ‘‰Review https://t.ly/viVR0
πŸ‘‰Paper arxiv.org/pdf/2601.20175
πŸ‘‰Project tele-ai.github.io/TeleStyle/
πŸ‘‰Repo github.com/Tele-AI/TeleStyle
❀10πŸ‘1πŸ”₯1🀯1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ Metric Anything is out πŸ‘

πŸ‘‰Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced πŸ’™

πŸ‘‰Review https://t.ly/54Ccr
πŸ‘‰Paper arxiv.org/pdf/2601.22054
πŸ‘‰Project metric-anything.github.io/metric-anything-io/
πŸ‘‰Repo github.com/metric-anything/metric-anything
πŸ”₯9❀5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Segment Any Events by Language🌈

πŸ‘‰SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/1ZMF0
πŸ‘‰Paper https://arxiv.org/pdf/2601.23159
πŸ‘‰Project https://0nandon.github.io/SEAL/
πŸ‘‰Repo https://github.com/0nandon/SEAL
πŸ”₯5❀2πŸ‘1🀯1
πŸ‘‰RAM prices skyrocketing

πŸ‘‰Me acting like a rich kid.

Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
🀣13❀4πŸ”₯1πŸ‘1