AI with Papers - Artificial Intelligence & Deep Learning
17.5K subscribers
152 photos
264 videos
14 files
1.39K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
โญTOP 5 Papers you loved in 2025โญ

๐Ÿ‘‰ In 2025 novel architectures have redefined efficiency and accuracy, and almost every day brought a new SOTA in image understanding, tracking, and GenAI. Itโ€™s been an inspiring ride, and 2026 it will be even wilder. This community (LinkedIn + Telegram) is now around 80,000+ people.

๐๐š๐ฉ๐ž๐ซ๐ฌ (๐›๐ฒ ๐ฒ๐จ๐ฎ๐ซ ๐ฉ๐ซ๐ž๐Ÿ๐ž๐ซ๐ž๐ง๐œ๐ž):
โญ3D LLM https://t.ly/ejr1s
โญDynOMo https://t.ly/t5pCf
โญTrack Transf. https://t.ly/NPyW4
โญYOLOv12 https://t.ly/jj1oR
โญG-Surface Tracking https://t.ly/udpMq

Thank you all๐Ÿ’™
โค24๐Ÿ‘3๐Ÿ‘2๐Ÿ”ฅ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ™ Depth as Neural Implicit ๐Ÿฆ™

๐Ÿ‘‰InfiniDepth represents depth as neural implicit fields, "infinite" (i.e.16K) resolution and geometrical details. Repo under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/4we5t
๐Ÿ‘‰Paper https://lnkd.in/dpiHQExj
๐Ÿ‘‰Project https://lnkd.in/dy3JxKye
๐Ÿ‘‰Repo https://lnkd.in/dAXbnK5z
1๐Ÿ”ฅ12โค2๐Ÿ‘1๐Ÿ‘1
๐Ÿ”ฅ Back from Holidays mood ๐Ÿ”ฅ
๐Ÿคฃ24โค4๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒLabel Any Object in 3D ๐ŸŒ

๐Ÿ‘‰LabelAny3D: novel analysis-by-synthesis framework that reconstructs holistic 3D scenes from 2D to efficiently produce HQ 3D BBs annotations. Repo under CC-BY-4.0 license๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/bO93j
๐Ÿ‘‰Paper https://lnkd.in/dYb97zWG
๐Ÿ‘‰Project https://lnkd.in/dJ9UKERb
๐Ÿ‘‰Repo https://lnkd.in/d9SxtmiA
โค7๐Ÿ”ฅ7๐Ÿ‘1๐Ÿ‘1
๐Ÿ”ฅ New #AI Startups in 2026? ๐Ÿ”ฅ

In 2026, which area would you focus on?
๐Ÿค–Agents โ†’ workflows, copilots, etc.
๐ŸญVertical AI โ†’ Pharma, Automotive, Energy ...
๐Ÿง Infrastructure โ†’ MLOps, Security, Cost Control ...
๐ŸŽจAI for Creators/Media โ†’ Video, avatars, contents ...

Please, help me understanding what's next with this poll on LinkedIn :)

https://www.linkedin.com/posts/visionarynet_ai-ai-deeplearning-activity-7415377341779996672-sQO1

LUV U \m/
๐Ÿ”ฅ5โค1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅOrient Anything V2 is out๐Ÿ”ฅ

๐Ÿ‘‰Orient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Ht7Xd
๐Ÿ‘‰Paper arxiv.org/pdf/2601.05573
๐Ÿ‘‰Project orient-anythingv2.github.io/
๐Ÿ‘‰Repo github.com/SpatialVision/Orient-Anything-V2
โค5๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ›Active Object Reconstruction๐Ÿซ›

๐Ÿ‘‰ObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/au6HE
๐Ÿ‘‰Paper arxiv.org/pdf/2601.06997
๐Ÿ‘‰Project li-yuetao.github.io/ObjSplat-page/
๐Ÿ‘‰Repo https://github.com/Li-Yuetao/ObjSplat
โค6๐Ÿ‘1
๐Ÿ‘‰Games Workshop (Warhammer) is banning the use of AI in creative and design processes to protect IP and human creativity. A decision that goes against the current hype of widespread AI adoption.

And what about your organization? I need your help๐Ÿ‘‡

Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
โค2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’šSegment Anything Geometry๐Ÿ’š

๐Ÿ‘‰3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/olZwE
๐Ÿ‘‰Paper https://arxiv.org/pdf/2601.08831
๐Ÿ‘‰Project https://jayisaking.github.io/3AM-Page/
๐Ÿ‘‰Repo https://github.com/jayisaking
๐Ÿ‘‰Demo https://huggingface.co/spaces/nycu-cplab/3AM
๐Ÿ”ฅ10โค4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽ‡ Multi-target SAM3 ๐ŸŽ‡

๐Ÿ‘‰SAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM License๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/jJOAr
๐Ÿ‘‰Paper https://arxiv.org/pdf/2601.09699
๐Ÿ‘‰Repo https://github.com/FudanCVL/SAM3-DMS
๐Ÿ”ฅ5โค2๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฟ100M Video Action Dataset๐Ÿฟ

๐Ÿ‘‰Action100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research License๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/w5KXe
๐Ÿ‘‰Paper arxiv.org/pdf/2601.10592
๐Ÿ‘‰Repo github.com/facebookresearch/Action100M
๐Ÿ”ฅ10๐Ÿ‘2โค1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’œInteractive Humanoid Generation๐Ÿ’œ

๐Ÿ‘‰FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/aQhol
๐Ÿ‘‰Paper arxiv.org/pdf/2601.10103
๐Ÿ‘‰Project grisoon.github.io/FlowAct-R1/
โค9๐Ÿคฏ6๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ข3D Human Gen-Seg๐Ÿ’ข

๐Ÿ‘‰CoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/khSkm
๐Ÿ‘‰Paper arxiv.org/pdf/2601.10632
๐Ÿ‘‰Project igl-hkust.github.io/CoMoVi/
๐Ÿ‘‰Repo github.com/IGL-HKUST/CoMoVi
๐Ÿ‘‰Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
๐Ÿ”ฅ3โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘นSOTA Part-level Generator๐Ÿ‘น

๐Ÿ‘‰A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/leB_R
๐Ÿ‘‰Paper arxiv.org/pdf/2601.10909
๐Ÿ‘‰Project coral79.github.io/frankenmotion/
๐Ÿ‘‰Repo github.com/Coral79/FrankenMotion-Code
โค3๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’š #META 3D Casual Captures ๐Ÿ’š

๐Ÿ‘‰#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/j08sJ
๐Ÿ‘‰Paper arxiv.org/pdf/2601.11514
๐Ÿ‘‰Project facebookresearch.github.io/ShapeR/
๐Ÿ‘‰Repo github.com/facebookresearch/ShapeR
๐Ÿ”ฅ7โค4๐Ÿ‘1
๐Ÿ’ŠFoundation Medical SAM3 ๐Ÿ’Š

๐Ÿ‘‰Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/C6jcy
๐Ÿ‘‰Paper https://arxiv.org/pdf/2601.10880
๐Ÿ‘‰Project chongcongjiang.github.io/MedicalSAM3/#
๐Ÿ‘‰Repo github.com/AIM-Research-Lab/Medical-SAM3
โค13๐Ÿ”ฅ3๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฆงMask-Guided Matting๐Ÿฆง

๐Ÿ‘‰VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/l_0f8
๐Ÿ‘‰Paper arxiv.org/pdf/2601.14255
๐Ÿ‘‰Project cvlab-kaist.github.io/VideoMaMa
๐Ÿ‘‰Repo github.com/cvlab-kaist/VideoMaMa
๐Ÿ‘‰Demo huggingface.co/spaces/SammyLim/VideoMaMa
โค5๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’œMoRo: Human Motion๐Ÿ’œ

๐Ÿ‘‰Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/kK_je
๐Ÿ‘‰Paper arxiv.org/pdf/2601.16079
๐Ÿ‘‰Project mikeqzy.github.io/MoRo/
๐Ÿ‘‰Repo github.com/mikeqzy/MoRo
โค6๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ BBoxMaskPose v2 is fire ๐Ÿ”ฅ

๐Ÿ‘‰BBoxMaskPose v2 by ฤŒVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/GkkDl
๐Ÿ‘‰Paper arxiv.org/pdf/2601.15200
๐Ÿ‘‰Project https://lnkd.in/dQ_3hxjC
๐Ÿ‘‰Repo https://lnkd.in/dVqwD3jN
โค5๐Ÿ‘2๐Ÿ‘1