π₯ New #AI Startups in 2026? π₯
In 2026, which area would you focus on?
π€Agents β workflows, copilots, etc.
πVertical AI β Pharma, Automotive, Energy ...
π§ Infrastructure β MLOps, Security, Cost Control ...
π¨AI for Creators/Media β Video, avatars, contents ...
Please, help me understanding what's next with this poll on LinkedIn :)
https://www.linkedin.com/posts/visionarynet_ai-ai-deeplearning-activity-7415377341779996672-sQO1
LUV U \m/
In 2026, which area would you focus on?
π€Agents β workflows, copilots, etc.
πVertical AI β Pharma, Automotive, Energy ...
π§ Infrastructure β MLOps, Security, Cost Control ...
π¨AI for Creators/Media β Video, avatars, contents ...
Please, help me understanding what's next with this poll on LinkedIn :)
https://www.linkedin.com/posts/visionarynet_ai-ai-deeplearning-activity-7415377341779996672-sQO1
LUV U \m/
Linkedin
#ai #ai #deeplearning #aiwithpapers #metaverse | Alessandro Ferrari
π₯π₯ New #AI Startups in 2026? π₯π₯
π Looking ahead to 2026, the question is no longer βcan we build it?β but βwhere does it actually create durable value?β in the AI field. So, if you were to launch an AI startup in 2026, which area would you focus on?
π€Agentsβ¦
π Looking ahead to 2026, the question is no longer βcan we build it?β but βwhere does it actually create durable value?β in the AI field. So, if you were to launch an AI startup in 2026, which area would you focus on?
π€Agentsβ¦
π₯5β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Orient Anything V2 is outπ₯
πOrient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0π
πReview https://t.ly/Ht7Xd
πPaper arxiv.org/pdf/2601.05573
πProject orient-anythingv2.github.io/
πRepo github.com/SpatialVision/Orient-Anything-V2
πOrient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0π
πReview https://t.ly/Ht7Xd
πPaper arxiv.org/pdf/2601.05573
πProject orient-anythingv2.github.io/
πRepo github.com/SpatialVision/Orient-Anything-V2
β€5π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π«Active Object Reconstructionπ«
πObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announcedπ
πReview https://t.ly/au6HE
πPaper arxiv.org/pdf/2601.06997
πProject li-yuetao.github.io/ObjSplat-page/
πRepo https://github.com/Li-Yuetao/ObjSplat
πObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announcedπ
πReview https://t.ly/au6HE
πPaper arxiv.org/pdf/2601.06997
πProject li-yuetao.github.io/ObjSplat-page/
πRepo https://github.com/Li-Yuetao/ObjSplat
β€6π1
In 2026, who should we keep an eye on?
Vote: https://www.linkedin.com/posts/visionarynet_ai-deeplearning-aiwithpapers-activity-7416886610795077632-qQeP/
Vote: https://www.linkedin.com/posts/visionarynet_ai-deeplearning-aiwithpapers-activity-7416886610795077632-qQeP/
π₯2β€1π€―1
πGames Workshop (Warhammer) is banning the use of AI in creative and design processes to protect IP and human creativity. A decision that goes against the current hype of widespread AI adoption.
And what about your organization? I need your helpπ
Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
And what about your organization? I need your helpπ
Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
β€2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πSegment Anything Geometryπ
π3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo availableπ
πReview https://t.ly/olZwE
πPaper https://arxiv.org/pdf/2601.08831
πProject https://jayisaking.github.io/3AM-Page/
πRepo https://github.com/jayisaking
πDemo https://huggingface.co/spaces/nycu-cplab/3AM
π3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo availableπ
πReview https://t.ly/olZwE
πPaper https://arxiv.org/pdf/2601.08831
πProject https://jayisaking.github.io/3AM-Page/
πRepo https://github.com/jayisaking
πDemo https://huggingface.co/spaces/nycu-cplab/3AM
π₯10β€4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Multi-target SAM3 π
πSAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM Licenseπ
πReview https://t.ly/jJOAr
πPaper https://arxiv.org/pdf/2601.09699
πRepo https://github.com/FudanCVL/SAM3-DMS
πSAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM Licenseπ
πReview https://t.ly/jJOAr
πPaper https://arxiv.org/pdf/2601.09699
πRepo https://github.com/FudanCVL/SAM3-DMS
π₯5β€2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΏ100M Video Action DatasetπΏ
πAction100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research Licenseπ
πReview https://t.ly/w5KXe
πPaper arxiv.org/pdf/2601.10592
πRepo github.com/facebookresearch/Action100M
πAction100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research Licenseπ
πReview https://t.ly/w5KXe
πPaper arxiv.org/pdf/2601.10592
πRepo github.com/facebookresearch/Action100M
π₯10π2β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πInteractive Humanoid Generationπ
πFlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) π
πReview https://t.ly/aQhol
πPaper arxiv.org/pdf/2601.10103
πProject grisoon.github.io/FlowAct-R1/
πFlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) π
πReview https://t.ly/aQhol
πPaper arxiv.org/pdf/2601.10103
πProject grisoon.github.io/FlowAct-R1/
β€9π€―6π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π’3D Human Gen-Segπ’
πCoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasingπ
πReview https://t.ly/khSkm
πPaper arxiv.org/pdf/2601.10632
πProject igl-hkust.github.io/CoMoVi/
πRepo github.com/IGL-HKUST/CoMoVi
πData huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
πCoMoVi takes an input image with a text description and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasingπ
πReview https://t.ly/khSkm
πPaper arxiv.org/pdf/2601.10632
πProject igl-hkust.github.io/CoMoVi/
πRepo github.com/IGL-HKUST/CoMoVi
πData huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
π₯3β€1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉSOTA Part-level GeneratorπΉ
πA novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be releasedπ
πReview https://t.ly/leB_R
πPaper arxiv.org/pdf/2601.10909
πProject coral79.github.io/frankenmotion/
πRepo github.com/Coral79/FrankenMotion-Code
πA novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be releasedπ
πReview https://t.ly/leB_R
πPaper arxiv.org/pdf/2601.10909
πProject coral79.github.io/frankenmotion/
πRepo github.com/Coral79/FrankenMotion-Code
β€3π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π #META 3D Casual Captures π
π#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0π
πReview https://t.ly/j08sJ
πPaper arxiv.org/pdf/2601.11514
πProject facebookresearch.github.io/ShapeR/
πRepo github.com/facebookresearch/ShapeR
π#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0π
πReview https://t.ly/j08sJ
πPaper arxiv.org/pdf/2601.11514
πProject facebookresearch.github.io/ShapeR/
πRepo github.com/facebookresearch/ShapeR
π₯7β€4π1
πFoundation Medical SAM3 π
πMedical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announcedπ
πReview https://t.ly/C6jcy
πPaper https://arxiv.org/pdf/2601.10880
πProject chongcongjiang.github.io/MedicalSAM3/#
πRepo github.com/AIM-Research-Lab/Medical-SAM3
πMedical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announcedπ
πReview https://t.ly/C6jcy
πPaper https://arxiv.org/pdf/2601.10880
πProject chongcongjiang.github.io/MedicalSAM3/#
πRepo github.com/AIM-Research-Lab/Medical-SAM3
β€13π₯3π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦§Mask-Guided Mattingπ¦§
πVideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demoπ
πReview https://t.ly/l_0f8
πPaper arxiv.org/pdf/2601.14255
πProject cvlab-kaist.github.io/VideoMaMa
πRepo github.com/cvlab-kaist/VideoMaMa
πDemo huggingface.co/spaces/SammyLim/VideoMaMa
πVideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demoπ
πReview https://t.ly/l_0f8
πPaper arxiv.org/pdf/2601.14255
πProject cvlab-kaist.github.io/VideoMaMa
πRepo github.com/cvlab-kaist/VideoMaMa
πDemo huggingface.co/spaces/SammyLim/VideoMaMa
β€5π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πMoRo: Human Motionπ
πMasked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo releasedπ
πReview https://t.ly/kK_je
πPaper arxiv.org/pdf/2601.16079
πProject mikeqzy.github.io/MoRo/
πRepo github.com/mikeqzy/MoRo
πMasked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo releasedπ
πReview https://t.ly/kK_je
πPaper arxiv.org/pdf/2601.16079
πProject mikeqzy.github.io/MoRo/
πRepo github.com/mikeqzy/MoRo
β€6π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ BBoxMaskPose v2 is fire π₯
πBBoxMaskPose v2 by ΔVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data availableπ
πReview https://t.ly/GkkDl
πPaper arxiv.org/pdf/2601.15200
πProject https://lnkd.in/dQ_3hxjC
πRepo https://lnkd.in/dVqwD3jN
πBBoxMaskPose v2 by ΔVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data availableπ
πReview https://t.ly/GkkDl
πPaper arxiv.org/pdf/2601.15200
πProject https://lnkd.in/dQ_3hxjC
πRepo https://lnkd.in/dVqwD3jN
β€5π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Generalized-Scale Countingπ¦
πGeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo π
πReview https://t.ly/2_7I8
πPaper https://arxiv.org/pdf/2511.08048
πRepo https://github.com/jerpelhan/GECO2
πDemo huggingface.co/spaces/jerpelhan/GECO2-demo
πGeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo π
πReview https://t.ly/2_7I8
πPaper https://arxiv.org/pdf/2511.08048
πRepo https://github.com/jerpelhan/GECO2
πDemo huggingface.co/spaces/jerpelhan/GECO2-demo
π10β€1π₯1
π₯π₯Super-Hard Poll folksπ₯π₯
π This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
π This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
β€4π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π»MLLMs Fine Segmentationπ»
πSimpleSeg: MLLMs with native pixel-level perception. Repo & Model availableπ
πReview https://t.ly/eVguh
πPaper arxiv.org/pdf/2601.19228
πProject simpleseg.github.io/
πRepo github.com/songtianhui/SimpleSeg
πSimpleSeg: MLLMs with native pixel-level perception. Repo & Model availableπ
πReview https://t.ly/eVguh
πPaper arxiv.org/pdf/2601.19228
πProject simpleseg.github.io/
πRepo github.com/songtianhui/SimpleSeg
π₯4π3β€2π1
π₯ DeepSeek-OCR 2 is out π₯
πDeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weightsπ
πReview https://t.ly/gX4bX
πPaper https://arxiv.org/pdf/2601.20552
πRepo github.com/deepseek-ai/DeepSeek-OCR-2
πDeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weightsπ
πReview https://t.ly/gX4bX
πPaper https://arxiv.org/pdf/2601.20552
πRepo github.com/deepseek-ai/DeepSeek-OCR-2
π₯7β€4π1