This media is not supported in your browser
VIEW IN TELEGRAM
💷 SOTA Zero-Shot Stereo Matching💷
👉Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announced💙
👉Review https://t.ly/XD6pO
👉Paper https://lnkd.in/d9_YKW2A
👉Project https://lnkd.in/dKDxm7EX
👉Repo https://lnkd.in/dR4-PdsW
👉Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announced💙
👉Review https://t.ly/XD6pO
👉Paper https://lnkd.in/d9_YKW2A
👉Project https://lnkd.in/dKDxm7EX
👉Repo https://lnkd.in/dR4-PdsW
2🔥10❤4👍1
This media is not supported in your browser
VIEW IN TELEGRAM
👀DriverGaze360: Driver SOTA👀
👉DriverGaze360 is a large-scale 360◦ field of view driver attention dataset, containing ∼1M gaze-labeled frames. Code & Dataset announced💙
👉Review https://t.ly/ZcoUw
👉Paper arxiv.org/pdf/2512.14266
👉Project av.dfki.de/drivergaze360/
👉Repo github.com/dfki-av/drivergaze360
👉Data av.dfki.de/drivergaze360/dataset
👉DriverGaze360 is a large-scale 360◦ field of view driver attention dataset, containing ∼1M gaze-labeled frames. Code & Dataset announced💙
👉Review https://t.ly/ZcoUw
👉Paper arxiv.org/pdf/2512.14266
👉Project av.dfki.de/drivergaze360/
👉Repo github.com/dfki-av/drivergaze360
👉Data av.dfki.de/drivergaze360/dataset
🔥10❤5👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🫠FlexAvatar: 3D Heads🫠
👉TUM introduces FlexAvatar, a novel method for creating HQ and complete 3D head avatars from a single image. Code announced💙
👉Review https://t.ly/Rkdtd
👉Paper arxiv.org/pdf/2512.15599
👉Project tobias-kirschstein.github.io/flexavatar/
👉Repo TBA
👉TUM introduces FlexAvatar, a novel method for creating HQ and complete 3D head avatars from a single image. Code announced💙
👉Review https://t.ly/Rkdtd
👉Paper arxiv.org/pdf/2512.15599
👉Project tobias-kirschstein.github.io/flexavatar/
👉Repo TBA
🔥8❤5👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🏜️ Depth Any Panoramas 🏜️
👉DAP is the new SOTA foundation model for panoramic depth estimation with a large scale dataset. Data & Repo under MIT💙
👉Review https://t.ly/LaUmd
👉Paper arxiv.org/pdf/2512.16913
👉Project https://lnkd.in/dvqNV9jx
👉Repo https://lnkd.in/dmNzhb-7
👉Demo https://lnkd.in/dDwjMF3u
👉DAP is the new SOTA foundation model for panoramic depth estimation with a large scale dataset. Data & Repo under MIT💙
👉Review https://t.ly/LaUmd
👉Paper arxiv.org/pdf/2512.16913
👉Project https://lnkd.in/dvqNV9jx
👉Repo https://lnkd.in/dmNzhb-7
👉Demo https://lnkd.in/dDwjMF3u
🔥9❤6👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎯Generative Refocusing is out🎯
👉Generative Refocusing is a two-step process that uses DeblurNet to recover all-in-focus images from various inputs and BokehNet for creating controllable bokeh (in semi-supervised mode). Repo under Apache2.0💙
👉Review https://t.ly/8t7PA
👉Paper arxiv.org/pdf/2512.16923
👉Project generative-refocusing.github.io/
👉Repo github.com/rayray9999/Genfocus
👉Demo huggingface.co/spaces/nycu-cplab/Genfocus-Demo
👉Generative Refocusing is a two-step process that uses DeblurNet to recover all-in-focus images from various inputs and BokehNet for creating controllable bokeh (in semi-supervised mode). Repo under Apache2.0💙
👉Review https://t.ly/8t7PA
👉Paper arxiv.org/pdf/2512.16923
👉Project generative-refocusing.github.io/
👉Repo github.com/rayray9999/Genfocus
👉Demo huggingface.co/spaces/nycu-cplab/Genfocus-Demo
🔥7❤3
This media is not supported in your browser
VIEW IN TELEGRAM
⭐TOP 5 Papers you loved in 2025⭐
👉 In 2025 novel architectures have redefined efficiency and accuracy, and almost every day brought a new SOTA in image understanding, tracking, and GenAI. It’s been an inspiring ride, and 2026 it will be even wilder. This community (LinkedIn + Telegram) is now around 80,000+ people.
𝐏𝐚𝐩𝐞𝐫𝐬 (𝐛𝐲 𝐲𝐨𝐮𝐫 𝐩𝐫𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞):
⭐3D LLM https://t.ly/ejr1s
⭐DynOMo https://t.ly/t5pCf
⭐Track Transf. https://t.ly/NPyW4
⭐YOLOv12 https://t.ly/jj1oR
⭐G-Surface Tracking https://t.ly/udpMq
Thank you all💙
👉 In 2025 novel architectures have redefined efficiency and accuracy, and almost every day brought a new SOTA in image understanding, tracking, and GenAI. It’s been an inspiring ride, and 2026 it will be even wilder. This community (LinkedIn + Telegram) is now around 80,000+ people.
𝐏𝐚𝐩𝐞𝐫𝐬 (𝐛𝐲 𝐲𝐨𝐮𝐫 𝐩𝐫𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞):
⭐3D LLM https://t.ly/ejr1s
⭐DynOMo https://t.ly/t5pCf
⭐Track Transf. https://t.ly/NPyW4
⭐YOLOv12 https://t.ly/jj1oR
⭐G-Surface Tracking https://t.ly/udpMq
Thank you all💙
❤24👏3👍2🔥1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦙 Depth as Neural Implicit 🦙
👉InfiniDepth represents depth as neural implicit fields, "infinite" (i.e.16K) resolution and geometrical details. Repo under Apache 2.0💙
👉Review https://t.ly/4we5t
👉Paper https://lnkd.in/dpiHQExj
👉Project https://lnkd.in/dy3JxKye
👉Repo https://lnkd.in/dAXbnK5z
👉InfiniDepth represents depth as neural implicit fields, "infinite" (i.e.16K) resolution and geometrical details. Repo under Apache 2.0💙
👉Review https://t.ly/4we5t
👉Paper https://lnkd.in/dpiHQExj
👉Project https://lnkd.in/dy3JxKye
👉Repo https://lnkd.in/dAXbnK5z
1🔥12❤2👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌍Label Any Object in 3D 🌍
👉LabelAny3D: novel analysis-by-synthesis framework that reconstructs holistic 3D scenes from 2D to efficiently produce HQ 3D BBs annotations. Repo under CC-BY-4.0 license💙
👉Review https://t.ly/bO93j
👉Paper https://lnkd.in/dYb97zWG
👉Project https://lnkd.in/dJ9UKERb
👉Repo https://lnkd.in/d9SxtmiA
👉LabelAny3D: novel analysis-by-synthesis framework that reconstructs holistic 3D scenes from 2D to efficiently produce HQ 3D BBs annotations. Repo under CC-BY-4.0 license💙
👉Review https://t.ly/bO93j
👉Paper https://lnkd.in/dYb97zWG
👉Project https://lnkd.in/dJ9UKERb
👉Repo https://lnkd.in/d9SxtmiA
❤7🔥7👍1👏1
🔥 New #AI Startups in 2026? 🔥
In 2026, which area would you focus on?
🤖Agents → workflows, copilots, etc.
🏭Vertical AI → Pharma, Automotive, Energy ...
🧠Infrastructure → MLOps, Security, Cost Control ...
🎨AI for Creators/Media → Video, avatars, contents ...
Please, help me understanding what's next with this poll on LinkedIn :)
https://www.linkedin.com/posts/visionarynet_ai-ai-deeplearning-activity-7415377341779996672-sQO1
LUV U \m/
In 2026, which area would you focus on?
🤖Agents → workflows, copilots, etc.
🏭Vertical AI → Pharma, Automotive, Energy ...
🧠Infrastructure → MLOps, Security, Cost Control ...
🎨AI for Creators/Media → Video, avatars, contents ...
Please, help me understanding what's next with this poll on LinkedIn :)
https://www.linkedin.com/posts/visionarynet_ai-ai-deeplearning-activity-7415377341779996672-sQO1
LUV U \m/
Linkedin
#ai #ai #deeplearning #aiwithpapers #metaverse | Alessandro Ferrari
🔥🔥 New #AI Startups in 2026? 🔥🔥
👉 Looking ahead to 2026, the question is no longer “can we build it?” but “where does it actually create durable value?” in the AI field. So, if you were to launch an AI startup in 2026, which area would you focus on?
🤖Agents…
👉 Looking ahead to 2026, the question is no longer “can we build it?” but “where does it actually create durable value?” in the AI field. So, if you were to launch an AI startup in 2026, which area would you focus on?
🤖Agents…
🔥5❤1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥Orient Anything V2 is out🔥
👉Orient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0💙
👉Review https://t.ly/Ht7Xd
👉Paper arxiv.org/pdf/2601.05573
👉Project orient-anythingv2.github.io/
👉Repo github.com/SpatialVision/Orient-Anything-V2
👉Orient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0💙
👉Review https://t.ly/Ht7Xd
👉Paper arxiv.org/pdf/2601.05573
👉Project orient-anythingv2.github.io/
👉Repo github.com/SpatialVision/Orient-Anything-V2
❤5🔥2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🫛Active Object Reconstruction🫛
👉ObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announced💙
👉Review https://t.ly/au6HE
👉Paper arxiv.org/pdf/2601.06997
👉Project li-yuetao.github.io/ObjSplat-page/
👉Repo https://github.com/Li-Yuetao/ObjSplat
👉ObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announced💙
👉Review https://t.ly/au6HE
👉Paper arxiv.org/pdf/2601.06997
👉Project li-yuetao.github.io/ObjSplat-page/
👉Repo https://github.com/Li-Yuetao/ObjSplat
❤6👍1
In 2026, who should we keep an eye on?
Vote: https://www.linkedin.com/posts/visionarynet_ai-deeplearning-aiwithpapers-activity-7416886610795077632-qQeP/
Vote: https://www.linkedin.com/posts/visionarynet_ai-deeplearning-aiwithpapers-activity-7416886610795077632-qQeP/
🔥2❤1🤯1
👉Games Workshop (Warhammer) is banning the use of AI in creative and design processes to protect IP and human creativity. A decision that goes against the current hype of widespread AI adoption.
And what about your organization? I need your help👇
Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
And what about your organization? I need your help👇
Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
❤2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💚Segment Anything Geometry💚
👉3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo available💙
👉Review https://t.ly/olZwE
👉Paper https://arxiv.org/pdf/2601.08831
👉Project https://jayisaking.github.io/3AM-Page/
👉Repo https://github.com/jayisaking
👉Demo https://huggingface.co/spaces/nycu-cplab/3AM
👉3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo available💙
👉Review https://t.ly/olZwE
👉Paper https://arxiv.org/pdf/2601.08831
👉Project https://jayisaking.github.io/3AM-Page/
👉Repo https://github.com/jayisaking
👉Demo https://huggingface.co/spaces/nycu-cplab/3AM
🔥10❤4👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🎇 Multi-target SAM3 🎇
👉SAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM License💙
👉Review https://t.ly/jJOAr
👉Paper https://arxiv.org/pdf/2601.09699
👉Repo https://github.com/FudanCVL/SAM3-DMS
👉SAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM License💙
👉Review https://t.ly/jJOAr
👉Paper https://arxiv.org/pdf/2601.09699
👉Repo https://github.com/FudanCVL/SAM3-DMS
🔥5❤2👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🍿100M Video Action Dataset🍿
👉Action100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research License💙
👉Review https://t.ly/w5KXe
👉Paper arxiv.org/pdf/2601.10592
👉Repo github.com/facebookresearch/Action100M
👉Action100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research License💙
👉Review https://t.ly/w5KXe
👉Paper arxiv.org/pdf/2601.10592
👉Repo github.com/facebookresearch/Action100M
🔥10👍2❤1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💜Interactive Humanoid Generation💜
👉FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) 💙
👉Review https://t.ly/aQhol
👉Paper arxiv.org/pdf/2601.10103
👉Project grisoon.github.io/FlowAct-R1/
👉FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) 💙
👉Review https://t.ly/aQhol
👉Paper arxiv.org/pdf/2601.10103
👉Project grisoon.github.io/FlowAct-R1/
❤9🤯6🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💢3D Human Gen-Seg💢
👉CoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing💙
👉Review https://t.ly/khSkm
👉Paper arxiv.org/pdf/2601.10632
👉Project igl-hkust.github.io/CoMoVi/
👉Repo github.com/IGL-HKUST/CoMoVi
👉Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
👉CoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing💙
👉Review https://t.ly/khSkm
👉Paper arxiv.org/pdf/2601.10632
👉Project igl-hkust.github.io/CoMoVi/
👉Repo github.com/IGL-HKUST/CoMoVi
👉Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
🔥3❤1
This media is not supported in your browser
VIEW IN TELEGRAM
👹SOTA Part-level Generator👹
👉A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released💙
👉Review https://t.ly/leB_R
👉Paper arxiv.org/pdf/2601.10909
👉Project coral79.github.io/frankenmotion/
👉Repo github.com/Coral79/FrankenMotion-Code
👉A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released💙
👉Review https://t.ly/leB_R
👉Paper arxiv.org/pdf/2601.10909
👉Project coral79.github.io/frankenmotion/
👉Repo github.com/Coral79/FrankenMotion-Code
❤3🔥1👏1