AI with Papers - Artificial Intelligence & Deep Learning
17.5K subscribers
156 photos
274 videos
14 files
1.43K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
Could be useful for you seeing a few (verified) job posting about AI in this channel?
Anonymous Poll
63%
πŸ’šYES, why not?!
37%
❌ NO, only damn AI & Papers
❀5
This media is not supported in your browser
VIEW IN TELEGRAM
🍧Monocular 3D Clothed Human🍧

πŸ‘‰MultiGO++ is a novel framework for monocular 3D clothed human reconstruction via geometry-texture collaboration. New SOTA but no code announcedπŸ₯²

πŸ‘‰Review https://t.ly/YKY44
πŸ‘‰Paper arxiv.org/pdf/2603.04993
πŸ‘‰Project 3dagentworld.github.io/multigo++
❀4πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŽͺSOTA Arbitrary TrackingπŸŽͺ

πŸ‘‰TAPFormer is the novel SOTA transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-freq point tracking. Repo & Dataset under MITπŸ’™

πŸ‘‰Review https://t.ly/-q4wm
πŸ‘‰Paper https://arxiv.org/pdf/2603.04989
πŸ‘‰Project http://tapformer.github.io/
πŸ‘‰Repo https://github.com/ljx1002/TAPFormer
❀5πŸ‘3πŸ”₯3πŸ‘2🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“ŠReal-Time Scene GraphπŸ“Š

πŸ‘‰REACT++ by Umea University is the new state-of-the-art model for real-time SGG: 20% faster with a gain of 10% in relation prediction accuracy on average. Code under MITπŸ’™

πŸ‘‰Review https://t.ly/c12VX
πŸ‘‰Paper https://arxiv.org/pdf/2603.06386
πŸ‘‰Repo https://github.com/Maelic/SGG-Benchmark
πŸ”₯6❀3πŸ‘3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯Holistic 3D Spatial IntelligenceπŸ”₯

πŸ‘‰Holi-Spatial is the first fully automated pipeline capable of converting raw video streams into holistic 3D spatial annotations without human intervention. Code/Data announcedπŸ’™

πŸ‘‰Review https://t.ly/PDpr9
πŸ‘‰Paper https://lnkd.in/dTbMuZCm
πŸ‘‰Project https://lnkd.in/d66CYB4q
πŸ‘‰Repo https://lnkd.in/dAGzShXj
❀8πŸ”₯7πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Surface Light TokenizerπŸ“

πŸ‘‰Apple unveils LITO a novel latent flow matching model enables HQ image-to-3D. Latent representation that encodes a surface light field into a compact set of latent vectors. Impressive results but no codeπŸ₯²

πŸ‘‰Review https://t.ly/xcWNe
πŸ‘‰Paper https://lnkd.in/dYHwY4YX
πŸ‘‰Project https://lnkd.in/dtJT8bXy
❀8πŸ‘4πŸ”₯2πŸ‘2🀯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
β˜„οΈ OmniStream Backbone β˜„οΈ

πŸ‘‰Novel unified streaming visual backbone that effectively perceives, reconstructs, and acts from diverse visual inputs. Repo/Models announcedπŸ’™

πŸ‘‰Review https://t.ly/_zZMO
πŸ‘‰Paper arxiv.org/pdf/2603.12265
πŸ‘‰Project go2heart.github.io/omnistream/
πŸ‘‰Repo github.com/Go2Heart/OmniStream
❀6πŸ‘2🀯2πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 New SOTA Video Depth 🌈

πŸ‘‰DVD is the new Video Depth Estimation SOTA with full training suite available under Apache2.0πŸ’™

πŸ‘‰Review https://t.ly/gpCkG
πŸ‘‰Paper https://arxiv.org/pdf/2603.12250
πŸ‘‰Project https://dvd-project.github.io/
πŸ‘‰Repo github.com/EnVision-Research/DVD
❀7πŸ”₯3πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€–Physically-Plausible HumanπŸ€–

πŸ‘‰PhysMoDPO is a novel direct preference optimization framework for humanoid motion generation. Repo under MITπŸ’™

πŸ‘‰Review https://t.ly/clf8w
πŸ‘‰Paper https://arxiv.org/pdf/2603.13228
πŸ‘‰Project https://mael-zys.github.io/PhysMoDPO/
πŸ‘‰Repo https://github.com/Mael-zys/PhysMoDPO
1❀4πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
🍧10,000Γ— faster SAM-3D🍧

πŸ‘‰Fast SAM 3D Body achieves up to 10.9Γ— speedup, over 10,000Γ— faster MHR-to-SMPL conversion -> real-time humanoid control from RGB. Repo availableπŸ’™

πŸ‘‰Review https://t.ly/uHx84
πŸ‘‰Paper https://arxiv.org/pdf/2603.15603
πŸ‘‰Project yangtiming.github.io/Fast-SAM-3D-Body-Page/
πŸ‘‰Repo https://github.com/yangtiming/Fast-SAM-3D-Body
πŸ”₯9❀2πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Material-Aware GroupingπŸ“

πŸ‘‰Material Magic Wand (Adobe) is a tool for material-aware grouping of parts in untextured 3D meshes. Given one selected part, it automatically retrieves the other parts in the same shape by its material. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/q00SU
πŸ‘‰Paper https://arxiv.org/pdf/2603.17370
πŸ‘‰Project umangi-jain.github.io/material-magic-wand/
πŸ‘‰Repo TBA
πŸ”₯4
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦ͺOccAny: Universal 3D OccupancyπŸ¦ͺ

πŸ‘‰OccAny by Valeo is a novel unified framework for generalized unconstrained urban 3D occupancy prediction. Repo under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/FFiU0
πŸ‘‰Paper https://arxiv.org/pdf/2603.23502
πŸ‘‰Project https://valeoai.github.io/OccAny/
πŸ‘‰Repo https://github.com/valeoai/OccAny
πŸ”₯6πŸ‘2❀1
This media is not supported in your browser
VIEW IN TELEGRAM
🐍Pose-Appearance-Motion for HOI🐍

πŸ‘‰PAM is a novel Pose–Appearance–Motion Engine for controllable Hand–Object Interaction SOTA video generation. Repo/models availableπŸ’™

πŸ‘‰Review https://t.ly/JU4MD
πŸ‘‰Paper arxiv.org/pdf/2603.22193
πŸ‘‰Project gasaiyu.github.io/PAM.github.io/
πŸ‘‰Repo https://github.com/GasaiYU/PAM
❀7πŸ‘2πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯ GaussianGPT 3D GSCπŸ’₯

πŸ‘‰From TUM, GaussianGPT: transformer-based 3D Gaussians generation via next-token prediction -> full 3D complex indoor scene. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/bj-lL
πŸ‘‰Paper arxiv.org/pdf/2603.26661
πŸ‘‰Project nicolasvonluetzow.github.io/GaussianGPT/
πŸ‘‰Repo TBA
πŸ”₯8❀2πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ŒHandX: Scaling Hands MotionπŸ‘Œ

πŸ‘‰ HandX is a unified foundation spanning data, annotation, and evaluation: novel large-scale dataset of bimanual & dexterous motions with fine-grained textual. Around 6M frames. Repo availableπŸ’™

πŸ‘‰Review https://t.ly/1nGxw
πŸ‘‰Paper https://arxiv.org/pdf/2603.28766
πŸ‘‰Project https://handx-project.github.io/
πŸ‘‰Repo github.com/handx-project/HandX
πŸ”₯9❀2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌡SOTA Training-Free In-Context Segmentation🌡

πŸ‘‰INSID3 is the new SOTA, training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. Repo under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/NVWHN
πŸ‘‰Paper arxiv.org/pdf/2603.28480
πŸ‘‰Project visinf.github.io/INSID3/
πŸ‘‰Repo github.com/visinf/INSID3
❀16πŸ”₯2πŸ‘1🀩1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ¬Camera Raw Image GenerationπŸͺ¬

πŸ‘‰RawGen by #Samsung is a generative approach that learns the complex distribution of raw sensor data directly, enabling high-fidelity generation from either text descriptions or standard sRGB images across arbitrary camera sensors. Linear raw image once, then apply any ISP operation. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/_QVKP
πŸ‘‰Paper https://arxiv.org/pdf/2604.00093
πŸ‘‰Project https://dy112.github.io/rawgen-page/
πŸ‘‰Repo TBA
❀2πŸ”₯2πŸ‘1
If you have to invest TODAY 1B$ on a frontier tech for the next decade, would you invest in space, agentic, quantum or frugal GPUs? Vote here: https://t.ly/hSx6i
🀣3❀1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍎Video Object Deletion🍎

πŸ‘‰Void by Netflix is a novel video object removal framework designed to perform physically-plausible inpainting in very complex scenarios. Repo under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/cMVny
πŸ‘‰Paper https://arxiv.org/pdf/2604.02296
πŸ‘‰Project https://void-model.github.io/
πŸ‘‰Repo https://github.com/Netflix/void-model
❀3🀯2πŸ‘1πŸ‘1