AI with Papers - Artificial Intelligence & Deep Learning
17.5K subscribers
156 photos
274 videos
14 files
1.43K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🫧SurfPhase: 3D Interfacial Dynamics🫧

πŸ‘‰SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/g2P5F
πŸ‘‰Paper https://arxiv.org/pdf/2602.11154
πŸ‘‰Project https://yuegao.me/SurfPhase/
πŸ‘‰Repo github.com/yuegao/SurfPhase
❀6πŸ”₯2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺΏTeaching AI to illusionsπŸͺΏ

πŸ‘‰Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/98Oim
πŸ‘‰Paper https://lnkd.in/dTA7iuce
πŸ‘‰Project https://lnkd.in/dhTMGw23
πŸ‘‰Repo https://lnkd.in/deQyDGFu
❀7πŸ‘2πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯Conversational SegmentationπŸ₯

πŸ‘‰CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo releasedπŸ’™

πŸ‘‰Review https://t.ly/SsG57
πŸ‘‰Paper arxiv.org/pdf/2602.13195
πŸ‘‰Project glab-caltech.github.io/converseg/
πŸ‘‰Repo github.com/AadSah/ConverSeg
πŸ‘‰Demo glab-caltech.github.io/converseg/#interactive-demo
❀6πŸ”₯3πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“² Efficient VLMs πŸ“²

πŸ‘‰CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announcedπŸ’™

πŸ‘‰Review https://t.ly/3_GqN
πŸ‘‰Paper https://arxiv.org/pdf/2602.13191
πŸ‘‰Project https://sayands.github.io/cope/
πŸ‘‰Repo TBA
πŸ”₯11❀5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Dex4D: Task-Agnostic TrackπŸ™

πŸ‘‰Dex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/ZGx9T
πŸ‘‰Paper arxiv.org/pdf/2602.15828
πŸ‘‰Project dex4d.github.io/
πŸ‘‰Sim github.com/Dex4D/Dex4D-Simulation
πŸ‘‰Vision github.com/Dex4D/Dex4D-Vision
πŸ‘‰HW https://github.com/Dex4D/Dex4D-Hardware
❀8πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🚀Video Neural Compression🚀

πŸ‘‰TeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3Γ— faster. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/0AtCK
πŸ‘‰Paper arxiv.org/pdf/2602.16711
πŸ‘‰Project namithap10.github.io/teconerv/
πŸ‘‰Repo github.com/namithap10/TeCoNeRV/
πŸ”₯10❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯New SOTA Planar TrackingπŸ”₯

πŸ‘‰WOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0πŸ’™

πŸ‘‰Review https://t.ly/VUOe5
πŸ‘‰Paper https://lnkd.in/dZfc_DhQ
πŸ‘‰Repo https://lnkd.in/dAcneJGn
πŸ”₯8πŸ‘4❀2πŸ‘1🀯1🀣1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🫸 World-Grounded Hand-Obj🫸

πŸ‘‰WHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/c5w8h
πŸ‘‰Paper https://arxiv.org/pdf/2602.22209
πŸ‘‰Project https://judyye.github.io/whole-www/
πŸ‘‰Repo TBA
❀2πŸ‘1πŸ”₯1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧱Solaris: generative #Minecraft🧱

πŸ‘‰NYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & DatasetπŸ’™

πŸ‘‰Review https://t.ly/VrcrT
πŸ‘‰Paper https://arxiv.org/pdf/2602.22208
πŸ‘‰Project https://solaris-wm.github.io/
πŸ‘‰Repo https://github.com/solaris-wm/
πŸ”₯6❀2πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦜Geometry-Aware 4D Head🦜

πŸ‘‰ GeoDiff4D is a novel framework that reconstructs animatable 4D head avatars from a single portrait image through geometry-aware diffusion. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/J9L-t
πŸ‘‰Paper https://lnkd.in/ddpv-78g
πŸ‘‰Project https://lnkd.in/d-vhukyj
πŸ‘‰Repo https://lnkd.in/dzd6mnFv
❀5πŸ‘3πŸ‘1πŸ”₯1🀯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Fully Offline Mobile-VTONπŸ“

πŸ‘‰A novel, hq, privacy-preserving framework that enables fully offline virtual try-on on commodity mobile devices using only a single user image and a garment image. Repo announced, to be releasedπŸ’™

πŸ‘‰Review https://t.ly/dsrIn
πŸ‘‰Paper arxiv.org/pdf/2603.00947
πŸ‘‰Project zhenchenwan.github.io/Mobile-VTON/
πŸ‘‰Repo https://github.com/tmllab/2026_CVPR_Mobile-VTON
❀11🀯3πŸ‘2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺΏAll Point Clouds-One EncoderπŸͺΏ

πŸ‘‰Utonia is a step toward one-from-all and one-for-all point cloud encoder. It pretrains a single encoder on diverse point cloud data and reuses it as a reliable backbone for downstream tasks. Code under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/yqSyZ
πŸ‘‰Paper https://arxiv.org/pdf/2603.03283
πŸ‘‰Project pointcept.github.io/Utonia/
πŸ‘‰Repo https://github.com/Pointcept/Utonia
❀7πŸ”₯2πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺDuoMo: Dual Motion DiffusionπŸͺ

πŸ‘‰DuoMo by META is a novel generative method that recovers human motion in world-space coordinates from unconstrained videos with noisy or incomplete observations. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/dnA3K
πŸ‘‰Paper arxiv.org/pdf/2603.03265
πŸ‘‰Project yufu-wang.github.io/duomo/
πŸ‘‰Repo TBA
❀7πŸ‘2🀯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Any Resolution, Any GeometryπŸ™

πŸ‘‰Ultra Resolution Geometry Transformer (URGT) for arbitrary resolutions (e.g. 4K, 6K, 8K) depth–normal estimation. New SOTA. Repo under MITπŸ’™

πŸ‘‰Review https://t.ly/HXg1n
πŸ‘‰Paper arxiv.org/pdf/2603.03026
πŸ‘‰Project dreamaker-mrc.github.io/Any-Resolution-Any-Geometry/
πŸ‘‰Repo github.com/Dreamaker-MrC/Any-Resolution-Any-Geometry
πŸ”₯8❀6πŸ‘1πŸ‘1
Could be useful for you seeing a few (verified) job posting about AI in this channel?
Anonymous Poll
63%
πŸ’šYES, why not?!
37%
❌ NO, only damn AI & Papers
❀5
This media is not supported in your browser
VIEW IN TELEGRAM
🍧Monocular 3D Clothed Human🍧

πŸ‘‰MultiGO++ is a novel framework for monocular 3D clothed human reconstruction via geometry-texture collaboration. New SOTA but no code announcedπŸ₯²

πŸ‘‰Review https://t.ly/YKY44
πŸ‘‰Paper arxiv.org/pdf/2603.04993
πŸ‘‰Project 3dagentworld.github.io/multigo++
❀4πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŽͺSOTA Arbitrary TrackingπŸŽͺ

πŸ‘‰TAPFormer is the novel SOTA transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-freq point tracking. Repo & Dataset under MITπŸ’™

πŸ‘‰Review https://t.ly/-q4wm
πŸ‘‰Paper https://arxiv.org/pdf/2603.04989
πŸ‘‰Project http://tapformer.github.io/
πŸ‘‰Repo https://github.com/ljx1002/TAPFormer
❀5πŸ‘3πŸ”₯3πŸ‘2🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“ŠReal-Time Scene GraphπŸ“Š

πŸ‘‰REACT++ by Umea University is the new state-of-the-art model for real-time SGG: 20% faster with a gain of 10% in relation prediction accuracy on average. Code under MITπŸ’™

πŸ‘‰Review https://t.ly/c12VX
πŸ‘‰Paper https://arxiv.org/pdf/2603.06386
πŸ‘‰Repo https://github.com/Maelic/SGG-Benchmark
πŸ”₯6❀3πŸ‘3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯Holistic 3D Spatial IntelligenceπŸ”₯

πŸ‘‰Holi-Spatial is the first fully automated pipeline capable of converting raw video streams into holistic 3D spatial annotations without human intervention. Code/Data announcedπŸ’™

πŸ‘‰Review https://t.ly/PDpr9
πŸ‘‰Paper https://lnkd.in/dTbMuZCm
πŸ‘‰Project https://lnkd.in/d66CYB4q
πŸ‘‰Repo https://lnkd.in/dAGzShXj
❀8πŸ”₯7πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Surface Light TokenizerπŸ“

πŸ‘‰Apple unveils LITO a novel latent flow matching model enables HQ image-to-3D. Latent representation that encodes a surface light field into a compact set of latent vectors. Impressive results but no codeπŸ₯²

πŸ‘‰Review https://t.ly/xcWNe
πŸ‘‰Paper https://lnkd.in/dYHwY4YX
πŸ‘‰Project https://lnkd.in/dtJT8bXy
❀8πŸ‘4πŸ”₯2πŸ‘2🀯1🍾1