This media is not supported in your browser
VIEW IN TELEGRAM
โญTOP 5 Papers you loved in 2025โญ
๐ In 2025 novel architectures have redefined efficiency and accuracy, and almost every day brought a new SOTA in image understanding, tracking, and GenAI. Itโs been an inspiring ride, and 2026 it will be even wilder. This community (LinkedIn + Telegram) is now around 80,000+ people.
๐๐๐ฉ๐๐ซ๐ฌ (๐๐ฒ ๐ฒ๐จ๐ฎ๐ซ ๐ฉ๐ซ๐๐๐๐ซ๐๐ง๐๐):
โญ3D LLM https://t.ly/ejr1s
โญDynOMo https://t.ly/t5pCf
โญTrack Transf. https://t.ly/NPyW4
โญYOLOv12 https://t.ly/jj1oR
โญG-Surface Tracking https://t.ly/udpMq
Thank you all๐
๐ In 2025 novel architectures have redefined efficiency and accuracy, and almost every day brought a new SOTA in image understanding, tracking, and GenAI. Itโs been an inspiring ride, and 2026 it will be even wilder. This community (LinkedIn + Telegram) is now around 80,000+ people.
๐๐๐ฉ๐๐ซ๐ฌ (๐๐ฒ ๐ฒ๐จ๐ฎ๐ซ ๐ฉ๐ซ๐๐๐๐ซ๐๐ง๐๐):
โญ3D LLM https://t.ly/ejr1s
โญDynOMo https://t.ly/t5pCf
โญTrack Transf. https://t.ly/NPyW4
โญYOLOv12 https://t.ly/jj1oR
โญG-Surface Tracking https://t.ly/udpMq
Thank you all๐
โค24๐3๐2๐ฅ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Depth as Neural Implicit ๐ฆ
๐InfiniDepth represents depth as neural implicit fields, "infinite" (i.e.16K) resolution and geometrical details. Repo under Apache 2.0๐
๐Review https://t.ly/4we5t
๐Paper https://lnkd.in/dpiHQExj
๐Project https://lnkd.in/dy3JxKye
๐Repo https://lnkd.in/dAXbnK5z
๐InfiniDepth represents depth as neural implicit fields, "infinite" (i.e.16K) resolution and geometrical details. Repo under Apache 2.0๐
๐Review https://t.ly/4we5t
๐Paper https://lnkd.in/dpiHQExj
๐Project https://lnkd.in/dy3JxKye
๐Repo https://lnkd.in/dAXbnK5z
1๐ฅ12โค2๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Label Any Object in 3D ๐
๐LabelAny3D: novel analysis-by-synthesis framework that reconstructs holistic 3D scenes from 2D to efficiently produce HQ 3D BBs annotations. Repo under CC-BY-4.0 license๐
๐Review https://t.ly/bO93j
๐Paper https://lnkd.in/dYb97zWG
๐Project https://lnkd.in/dJ9UKERb
๐Repo https://lnkd.in/d9SxtmiA
๐LabelAny3D: novel analysis-by-synthesis framework that reconstructs holistic 3D scenes from 2D to efficiently produce HQ 3D BBs annotations. Repo under CC-BY-4.0 license๐
๐Review https://t.ly/bO93j
๐Paper https://lnkd.in/dYb97zWG
๐Project https://lnkd.in/dJ9UKERb
๐Repo https://lnkd.in/d9SxtmiA
โค7๐ฅ7๐1๐1
๐ฅ New #AI Startups in 2026? ๐ฅ
In 2026, which area would you focus on?
๐คAgents โ workflows, copilots, etc.
๐ญVertical AI โ Pharma, Automotive, Energy ...
๐ง Infrastructure โ MLOps, Security, Cost Control ...
๐จAI for Creators/Media โ Video, avatars, contents ...
Please, help me understanding what's next with this poll on LinkedIn :)
https://www.linkedin.com/posts/visionarynet_ai-ai-deeplearning-activity-7415377341779996672-sQO1
LUV U \m/
In 2026, which area would you focus on?
๐คAgents โ workflows, copilots, etc.
๐ญVertical AI โ Pharma, Automotive, Energy ...
๐ง Infrastructure โ MLOps, Security, Cost Control ...
๐จAI for Creators/Media โ Video, avatars, contents ...
Please, help me understanding what's next with this poll on LinkedIn :)
https://www.linkedin.com/posts/visionarynet_ai-ai-deeplearning-activity-7415377341779996672-sQO1
LUV U \m/
Linkedin
#ai #ai #deeplearning #aiwithpapers #metaverse | Alessandro Ferrari
๐ฅ๐ฅ New #AI Startups in 2026? ๐ฅ๐ฅ
๐ Looking ahead to 2026, the question is no longer โcan we build it?โ but โwhere does it actually create durable value?โ in the AI field. So, if you were to launch an AI startup in 2026, which area would you focus on?
๐คAgentsโฆ
๐ Looking ahead to 2026, the question is no longer โcan we build it?โ but โwhere does it actually create durable value?โ in the AI field. So, if you were to launch an AI startup in 2026, which area would you focus on?
๐คAgentsโฆ
๐ฅ5โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅOrient Anything V2 is out๐ฅ
๐Orient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0๐
๐Review https://t.ly/Ht7Xd
๐Paper arxiv.org/pdf/2601.05573
๐Project orient-anythingv2.github.io/
๐Repo github.com/SpatialVision/Orient-Anything-V2
๐Orient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0๐
๐Review https://t.ly/Ht7Xd
๐Paper arxiv.org/pdf/2601.05573
๐Project orient-anythingv2.github.io/
๐Repo github.com/SpatialVision/Orient-Anything-V2
โค5๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซActive Object Reconstruction๐ซ
๐ObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announced๐
๐Review https://t.ly/au6HE
๐Paper arxiv.org/pdf/2601.06997
๐Project li-yuetao.github.io/ObjSplat-page/
๐Repo https://github.com/Li-Yuetao/ObjSplat
๐ObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announced๐
๐Review https://t.ly/au6HE
๐Paper arxiv.org/pdf/2601.06997
๐Project li-yuetao.github.io/ObjSplat-page/
๐Repo https://github.com/Li-Yuetao/ObjSplat
โค6๐1
In 2026, who should we keep an eye on?
Vote: https://www.linkedin.com/posts/visionarynet_ai-deeplearning-aiwithpapers-activity-7416886610795077632-qQeP/
Vote: https://www.linkedin.com/posts/visionarynet_ai-deeplearning-aiwithpapers-activity-7416886610795077632-qQeP/
๐ฅ2โค1๐คฏ1
๐Games Workshop (Warhammer) is banning the use of AI in creative and design processes to protect IP and human creativity. A decision that goes against the current hype of widespread AI adoption.
And what about your organization? I need your help๐
Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
And what about your organization? I need your help๐
Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
โค2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Segment Anything Geometry๐
๐3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo available๐
๐Review https://t.ly/olZwE
๐Paper https://arxiv.org/pdf/2601.08831
๐Project https://jayisaking.github.io/3AM-Page/
๐Repo https://github.com/jayisaking
๐Demo https://huggingface.co/spaces/nycu-cplab/3AM
๐3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo available๐
๐Review https://t.ly/olZwE
๐Paper https://arxiv.org/pdf/2601.08831
๐Project https://jayisaking.github.io/3AM-Page/
๐Repo https://github.com/jayisaking
๐Demo https://huggingface.co/spaces/nycu-cplab/3AM
๐ฅ10โค4๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Multi-target SAM3 ๐
๐SAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM License๐
๐Review https://t.ly/jJOAr
๐Paper https://arxiv.org/pdf/2601.09699
๐Repo https://github.com/FudanCVL/SAM3-DMS
๐SAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM License๐
๐Review https://t.ly/jJOAr
๐Paper https://arxiv.org/pdf/2601.09699
๐Repo https://github.com/FudanCVL/SAM3-DMS
๐ฅ5โค2๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฟ100M Video Action Dataset๐ฟ
๐Action100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research License๐
๐Review https://t.ly/w5KXe
๐Paper arxiv.org/pdf/2601.10592
๐Repo github.com/facebookresearch/Action100M
๐Action100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research License๐
๐Review https://t.ly/w5KXe
๐Paper arxiv.org/pdf/2601.10592
๐Repo github.com/facebookresearch/Action100M
๐ฅ10๐2โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Interactive Humanoid Generation๐
๐FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) ๐
๐Review https://t.ly/aQhol
๐Paper arxiv.org/pdf/2601.10103
๐Project grisoon.github.io/FlowAct-R1/
๐FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) ๐
๐Review https://t.ly/aQhol
๐Paper arxiv.org/pdf/2601.10103
๐Project grisoon.github.io/FlowAct-R1/
โค9๐คฏ6๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ข3D Human Gen-Seg๐ข
๐CoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing๐
๐Review https://t.ly/khSkm
๐Paper arxiv.org/pdf/2601.10632
๐Project igl-hkust.github.io/CoMoVi/
๐Repo github.com/IGL-HKUST/CoMoVi
๐Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
๐CoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing๐
๐Review https://t.ly/khSkm
๐Paper arxiv.org/pdf/2601.10632
๐Project igl-hkust.github.io/CoMoVi/
๐Repo github.com/IGL-HKUST/CoMoVi
๐Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
๐ฅ3โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐นSOTA Part-level Generator๐น
๐A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released๐
๐Review https://t.ly/leB_R
๐Paper arxiv.org/pdf/2601.10909
๐Project coral79.github.io/frankenmotion/
๐Repo github.com/Coral79/FrankenMotion-Code
๐A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released๐
๐Review https://t.ly/leB_R
๐Paper arxiv.org/pdf/2601.10909
๐Project coral79.github.io/frankenmotion/
๐Repo github.com/Coral79/FrankenMotion-Code
โค3๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ #META 3D Casual Captures ๐
๐#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0๐
๐Review https://t.ly/j08sJ
๐Paper arxiv.org/pdf/2601.11514
๐Project facebookresearch.github.io/ShapeR/
๐Repo github.com/facebookresearch/ShapeR
๐#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0๐
๐Review https://t.ly/j08sJ
๐Paper arxiv.org/pdf/2601.11514
๐Project facebookresearch.github.io/ShapeR/
๐Repo github.com/facebookresearch/ShapeR
๐ฅ7โค4๐1
๐Foundation Medical SAM3 ๐
๐Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced๐
๐Review https://t.ly/C6jcy
๐Paper https://arxiv.org/pdf/2601.10880
๐Project chongcongjiang.github.io/MedicalSAM3/#
๐Repo github.com/AIM-Research-Lab/Medical-SAM3
๐Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced๐
๐Review https://t.ly/C6jcy
๐Paper https://arxiv.org/pdf/2601.10880
๐Project chongcongjiang.github.io/MedicalSAM3/#
๐Repo github.com/AIM-Research-Lab/Medical-SAM3
โค13๐ฅ3๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆงMask-Guided Matting๐ฆง
๐VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo๐
๐Review https://t.ly/l_0f8
๐Paper arxiv.org/pdf/2601.14255
๐Project cvlab-kaist.github.io/VideoMaMa
๐Repo github.com/cvlab-kaist/VideoMaMa
๐Demo huggingface.co/spaces/SammyLim/VideoMaMa
๐VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo๐
๐Review https://t.ly/l_0f8
๐Paper arxiv.org/pdf/2601.14255
๐Project cvlab-kaist.github.io/VideoMaMa
๐Repo github.com/cvlab-kaist/VideoMaMa
๐Demo huggingface.co/spaces/SammyLim/VideoMaMa
โค5๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐MoRo: Human Motion๐
๐Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐
๐Review https://t.ly/kK_je
๐Paper arxiv.org/pdf/2601.16079
๐Project mikeqzy.github.io/MoRo/
๐Repo github.com/mikeqzy/MoRo
๐Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released๐
๐Review https://t.ly/kK_je
๐Paper arxiv.org/pdf/2601.16079
๐Project mikeqzy.github.io/MoRo/
๐Repo github.com/mikeqzy/MoRo
โค6๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ BBoxMaskPose v2 is fire ๐ฅ
๐BBoxMaskPose v2 by ฤVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐
๐Review https://t.ly/GkkDl
๐Paper arxiv.org/pdf/2601.15200
๐Project https://lnkd.in/dQ_3hxjC
๐Repo https://lnkd.in/dVqwD3jN
๐BBoxMaskPose v2 by ฤVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available๐
๐Review https://t.ly/GkkDl
๐Paper arxiv.org/pdf/2601.15200
๐Project https://lnkd.in/dQ_3hxjC
๐Repo https://lnkd.in/dVqwD3jN
โค5๐2๐1