AI with Papers - Artificial Intelligence & Deep Learning
17.5K subscribers
156 photos
274 videos
14 files
1.43K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🦜Geometry-Aware 4D Head🦜

πŸ‘‰ GeoDiff4D is a novel framework that reconstructs animatable 4D head avatars from a single portrait image through geometry-aware diffusion. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/J9L-t
πŸ‘‰Paper https://lnkd.in/ddpv-78g
πŸ‘‰Project https://lnkd.in/d-vhukyj
πŸ‘‰Repo https://lnkd.in/dzd6mnFv
❀5πŸ‘3πŸ‘1πŸ”₯1🀯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Fully Offline Mobile-VTONπŸ“

πŸ‘‰A novel, hq, privacy-preserving framework that enables fully offline virtual try-on on commodity mobile devices using only a single user image and a garment image. Repo announced, to be releasedπŸ’™

πŸ‘‰Review https://t.ly/dsrIn
πŸ‘‰Paper arxiv.org/pdf/2603.00947
πŸ‘‰Project zhenchenwan.github.io/Mobile-VTON/
πŸ‘‰Repo https://github.com/tmllab/2026_CVPR_Mobile-VTON
❀11🀯3πŸ‘2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺΏAll Point Clouds-One EncoderπŸͺΏ

πŸ‘‰Utonia is a step toward one-from-all and one-for-all point cloud encoder. It pretrains a single encoder on diverse point cloud data and reuses it as a reliable backbone for downstream tasks. Code under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/yqSyZ
πŸ‘‰Paper https://arxiv.org/pdf/2603.03283
πŸ‘‰Project pointcept.github.io/Utonia/
πŸ‘‰Repo https://github.com/Pointcept/Utonia
❀7πŸ”₯2πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺDuoMo: Dual Motion DiffusionπŸͺ

πŸ‘‰DuoMo by META is a novel generative method that recovers human motion in world-space coordinates from unconstrained videos with noisy or incomplete observations. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/dnA3K
πŸ‘‰Paper arxiv.org/pdf/2603.03265
πŸ‘‰Project yufu-wang.github.io/duomo/
πŸ‘‰Repo TBA
❀7πŸ‘2🀯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Any Resolution, Any GeometryπŸ™

πŸ‘‰Ultra Resolution Geometry Transformer (URGT) for arbitrary resolutions (e.g. 4K, 6K, 8K) depth–normal estimation. New SOTA. Repo under MITπŸ’™

πŸ‘‰Review https://t.ly/HXg1n
πŸ‘‰Paper arxiv.org/pdf/2603.03026
πŸ‘‰Project dreamaker-mrc.github.io/Any-Resolution-Any-Geometry/
πŸ‘‰Repo github.com/Dreamaker-MrC/Any-Resolution-Any-Geometry
πŸ”₯8❀6πŸ‘1πŸ‘1
Could be useful for you seeing a few (verified) job posting about AI in this channel?
Anonymous Poll
63%
πŸ’šYES, why not?!
37%
❌ NO, only damn AI & Papers
❀5
This media is not supported in your browser
VIEW IN TELEGRAM
🍧Monocular 3D Clothed Human🍧

πŸ‘‰MultiGO++ is a novel framework for monocular 3D clothed human reconstruction via geometry-texture collaboration. New SOTA but no code announcedπŸ₯²

πŸ‘‰Review https://t.ly/YKY44
πŸ‘‰Paper arxiv.org/pdf/2603.04993
πŸ‘‰Project 3dagentworld.github.io/multigo++
❀4πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŽͺSOTA Arbitrary TrackingπŸŽͺ

πŸ‘‰TAPFormer is the novel SOTA transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-freq point tracking. Repo & Dataset under MITπŸ’™

πŸ‘‰Review https://t.ly/-q4wm
πŸ‘‰Paper https://arxiv.org/pdf/2603.04989
πŸ‘‰Project http://tapformer.github.io/
πŸ‘‰Repo https://github.com/ljx1002/TAPFormer
❀5πŸ‘3πŸ”₯3πŸ‘2🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“ŠReal-Time Scene GraphπŸ“Š

πŸ‘‰REACT++ by Umea University is the new state-of-the-art model for real-time SGG: 20% faster with a gain of 10% in relation prediction accuracy on average. Code under MITπŸ’™

πŸ‘‰Review https://t.ly/c12VX
πŸ‘‰Paper https://arxiv.org/pdf/2603.06386
πŸ‘‰Repo https://github.com/Maelic/SGG-Benchmark
πŸ”₯6❀3πŸ‘3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯Holistic 3D Spatial IntelligenceπŸ”₯

πŸ‘‰Holi-Spatial is the first fully automated pipeline capable of converting raw video streams into holistic 3D spatial annotations without human intervention. Code/Data announcedπŸ’™

πŸ‘‰Review https://t.ly/PDpr9
πŸ‘‰Paper https://lnkd.in/dTbMuZCm
πŸ‘‰Project https://lnkd.in/d66CYB4q
πŸ‘‰Repo https://lnkd.in/dAGzShXj
❀8πŸ”₯7πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Surface Light TokenizerπŸ“

πŸ‘‰Apple unveils LITO a novel latent flow matching model enables HQ image-to-3D. Latent representation that encodes a surface light field into a compact set of latent vectors. Impressive results but no codeπŸ₯²

πŸ‘‰Review https://t.ly/xcWNe
πŸ‘‰Paper https://lnkd.in/dYHwY4YX
πŸ‘‰Project https://lnkd.in/dtJT8bXy
❀8πŸ‘4πŸ”₯2πŸ‘2🀯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
β˜„οΈ OmniStream Backbone β˜„οΈ

πŸ‘‰Novel unified streaming visual backbone that effectively perceives, reconstructs, and acts from diverse visual inputs. Repo/Models announcedπŸ’™

πŸ‘‰Review https://t.ly/_zZMO
πŸ‘‰Paper arxiv.org/pdf/2603.12265
πŸ‘‰Project go2heart.github.io/omnistream/
πŸ‘‰Repo github.com/Go2Heart/OmniStream
❀6πŸ‘2🀯2πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 New SOTA Video Depth 🌈

πŸ‘‰DVD is the new Video Depth Estimation SOTA with full training suite available under Apache2.0πŸ’™

πŸ‘‰Review https://t.ly/gpCkG
πŸ‘‰Paper https://arxiv.org/pdf/2603.12250
πŸ‘‰Project https://dvd-project.github.io/
πŸ‘‰Repo github.com/EnVision-Research/DVD
❀7πŸ”₯3πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€–Physically-Plausible HumanπŸ€–

πŸ‘‰PhysMoDPO is a novel direct preference optimization framework for humanoid motion generation. Repo under MITπŸ’™

πŸ‘‰Review https://t.ly/clf8w
πŸ‘‰Paper https://arxiv.org/pdf/2603.13228
πŸ‘‰Project https://mael-zys.github.io/PhysMoDPO/
πŸ‘‰Repo https://github.com/Mael-zys/PhysMoDPO
1❀4πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
🍧10,000Γ— faster SAM-3D🍧

πŸ‘‰Fast SAM 3D Body achieves up to 10.9Γ— speedup, over 10,000Γ— faster MHR-to-SMPL conversion -> real-time humanoid control from RGB. Repo availableπŸ’™

πŸ‘‰Review https://t.ly/uHx84
πŸ‘‰Paper https://arxiv.org/pdf/2603.15603
πŸ‘‰Project yangtiming.github.io/Fast-SAM-3D-Body-Page/
πŸ‘‰Repo https://github.com/yangtiming/Fast-SAM-3D-Body
πŸ”₯9❀2πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Material-Aware GroupingπŸ“

πŸ‘‰Material Magic Wand (Adobe) is a tool for material-aware grouping of parts in untextured 3D meshes. Given one selected part, it automatically retrieves the other parts in the same shape by its material. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/q00SU
πŸ‘‰Paper https://arxiv.org/pdf/2603.17370
πŸ‘‰Project umangi-jain.github.io/material-magic-wand/
πŸ‘‰Repo TBA
πŸ”₯4
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦ͺOccAny: Universal 3D OccupancyπŸ¦ͺ

πŸ‘‰OccAny by Valeo is a novel unified framework for generalized unconstrained urban 3D occupancy prediction. Repo under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/FFiU0
πŸ‘‰Paper https://arxiv.org/pdf/2603.23502
πŸ‘‰Project https://valeoai.github.io/OccAny/
πŸ‘‰Repo https://github.com/valeoai/OccAny
πŸ”₯6πŸ‘2❀1
This media is not supported in your browser
VIEW IN TELEGRAM
🐍Pose-Appearance-Motion for HOI🐍

πŸ‘‰PAM is a novel Pose–Appearance–Motion Engine for controllable Hand–Object Interaction SOTA video generation. Repo/models availableπŸ’™

πŸ‘‰Review https://t.ly/JU4MD
πŸ‘‰Paper arxiv.org/pdf/2603.22193
πŸ‘‰Project gasaiyu.github.io/PAM.github.io/
πŸ‘‰Repo https://github.com/GasaiYU/PAM
❀7πŸ‘2πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯ GaussianGPT 3D GSCπŸ’₯

πŸ‘‰From TUM, GaussianGPT: transformer-based 3D Gaussians generation via next-token prediction -> full 3D complex indoor scene. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/bj-lL
πŸ‘‰Paper arxiv.org/pdf/2603.26661
πŸ‘‰Project nicolasvonluetzow.github.io/GaussianGPT/
πŸ‘‰Repo TBA
πŸ”₯8❀2πŸ‘1πŸ‘1