AI with Papers - Artificial Intelligence & Deep Learning

🫧SurfPhase: 3D Interfacial Dynamics🫧

👉SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announced💙

👉Review https://t.ly/g2P5F
👉Paper https://arxiv.org/pdf/2602.11154
👉Project https://yuegao.me/SurfPhase/
👉Repo github.com/yuegao/SurfPhase

❤6🔥2👍1🤯1

4.25K views09:29

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🪿Teaching AI to illusions🪿

👉Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code released💙

👉Review https://t.ly/98Oim
👉Paper https://lnkd.in/dTA7iuce
👉Project https://lnkd.in/dhTMGw23
👉Repo https://lnkd.in/deQyDGFu

❤7👍2👏2

4.34K viewsedited 09:13

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🥝Conversational Segmentation🥝

👉CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo released💙

👉Review https://t.ly/SsG57
👉Paper arxiv.org/pdf/2602.13195
👉Project glab-caltech.github.io/converseg/
👉Repo github.com/AadSah/ConverSeg
👉Demo glab-caltech.github.io/converseg/#interactive-demo

❤6🔥3👍1👏1

4.52K viewsedited 14:31

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

📲 Efficient VLMs 📲

👉CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announced💙

👉Review https://t.ly/3_GqN
👉Paper https://arxiv.org/pdf/2602.13191
👉Project https://sayands.github.io/cope/
👉Repo TBA

🔥11❤5👏1

5.18K viewsedited 07:38

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🐙Dex4D: Task-Agnostic Track🐙

👉Dex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0💙

👉Review https://t.ly/ZGx9T
👉Paper arxiv.org/pdf/2602.15828
👉Project dex4d.github.io/
👉Sim github.com/Dex4D/Dex4D-Simulation
👉Vision github.com/Dex4D/Dex4D-Vision
👉HW https://github.com/Dex4D/Dex4D-Hardware

❤8🔥1👏1

5.38K viewsedited 07:44

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🚤Video Neural Compression🚤

👉TeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3× faster. Code announced💙

👉Review https://t.ly/0AtCK
👉Paper arxiv.org/pdf/2602.16711
👉Project namithap10.github.io/teconerv/
👉Repo github.com/namithap10/TeCoNeRV/

🔥10❤4👏2👍1

5.66K viewsedited 12:44

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥New SOTA Planar Tracking🔥

👉WOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0💙

👉Review https://t.ly/VUOe5
👉Paper https://lnkd.in/dZfc_DhQ
👉Repo https://lnkd.in/dAcneJGn

🔥8👍4❤2👏1🤯1🤣1🍾1

4.68K views07:06

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🫸 World-Grounded Hand-Obj🫸

👉WHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announced💙

👉Review https://t.ly/c5w8h
👉Paper https://arxiv.org/pdf/2602.22209
👉Project https://judyye.github.io/whole-www/
👉Repo TBA

❤2👍1🔥1👏1😍1

4.71K viewsedited 07:26

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧱Solaris: generative #Minecraft🧱

👉NYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & Dataset💙

👉Review https://t.ly/VrcrT
👉Paper https://arxiv.org/pdf/2602.22208
👉Project https://solaris-wm.github.io/
👉Repo https://github.com/solaris-wm/

🔥6❤2👍2👏1

5.28K views10:21

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦜Geometry-Aware 4D Head🦜

👉 GeoDiff4D is a novel framework that reconstructs animatable 4D head avatars from a single portrait image through geometry-aware diffusion. Code announced💙

👉Review https://t.ly/J9L-t
👉Paper https://lnkd.in/ddpv-78g
👉Project https://lnkd.in/d-vhukyj
👉Repo https://lnkd.in/dzd6mnFv

❤5👏3👍1🔥1🤯1🍾1

3.91K views15:06

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍓Fully Offline Mobile-VTON🍓

👉A novel, hq, privacy-preserving framework that enables fully offline virtual try-on on commodity mobile devices using only a single user image and a garment image. Repo announced, to be released💙

👉Review https://t.ly/dsrIn
👉Paper arxiv.org/pdf/2603.00947
👉Project zhenchenwan.github.io/Mobile-VTON/
👉Repo https://github.com/tmllab/2026_CVPR_Mobile-VTON

❤11🤯3👏2🔥1

4.15K views12:57

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🪿All Point Clouds-One Encoder🪿

👉Utonia is a step toward one-from-all and one-for-all point cloud encoder. It pretrains a single encoder on diverse point cloud data and reuses it as a reliable backbone for downstream tasks. Code under Apache 2.0💙

👉Review https://t.ly/yqSyZ
👉Paper https://arxiv.org/pdf/2603.03283
👉Project pointcept.github.io/Utonia/
👉Repo https://github.com/Pointcept/Utonia

❤7🔥2👍1👏1

4K viewsedited 08:11

AI with Papers - Artificial Intelligence & Deep Learning

0:04

This media is not supported in your browser

VIEW IN TELEGRAM

🐪DuoMo: Dual Motion Diffusion🐪

👉DuoMo by META is a novel generative method that recovers human motion in world-space coordinates from unconstrained videos with noisy or incomplete observations. Code announced💙

👉Review https://t.ly/dnA3K
👉Paper arxiv.org/pdf/2603.03265
👉Project yufu-wang.github.io/duomo/
👉Repo TBA

❤7👍2🤯2👏1

4.14K viewsedited 13:11

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍙Any Resolution, Any Geometry🍙

👉Ultra Resolution Geometry Transformer (URGT) for arbitrary resolutions (e.g. 4K, 6K, 8K) depth–normal estimation. New SOTA. Repo under MIT💙

👉Review https://t.ly/HXg1n
👉Paper arxiv.org/pdf/2603.03026
👉Project dreamaker-mrc.github.io/Any-Resolution-Any-Geometry/
👉Repo github.com/Dreamaker-MrC/Any-Resolution-Any-Geometry

🔥8❤7👍1👏1

4.61K views06:55

AI with Papers - Artificial Intelligence & Deep Learning

Could be useful for you seeing a few (verified) job posting about AI in this channel?

Anonymous Poll

63%

💚YES, why not?!

37%

❌ NO, only damn AI & Papers

❤5

358 voters4.2K views14:09

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍧Monocular 3D Clothed Human🍧

👉MultiGO++ is a novel framework for monocular 3D clothed human reconstruction via geometry-texture collaboration. New SOTA but no code announced🥲

👉Review https://t.ly/YKY44
👉Paper arxiv.org/pdf/2603.04993
👉Project 3dagentworld.github.io/multigo++

❤5👍1👏1

4.52K views07:07

AI with Papers - Artificial Intelligence & Deep Learning

0:04

This media is not supported in your browser

VIEW IN TELEGRAM

🎪SOTA Arbitrary Tracking🎪

👉TAPFormer is the novel SOTA transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-freq point tracking. Repo & Dataset under MIT💙

👉Review https://t.ly/-q4wm
👉Paper https://arxiv.org/pdf/2603.04989
👉Project http://tapformer.github.io/
👉Repo https://github.com/ljx1002/TAPFormer

❤5👍3🔥3👏2🍾1

5.09K viewsedited 08:00

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

📊Real-Time Scene Graph📊

👉REACT++ by Umea University is the new state-of-the-art model for real-time SGG: 20% faster with a gain of 10% in relation prediction accuracy on average. Code under MIT💙

👉Review https://t.ly/c12VX
👉Paper https://arxiv.org/pdf/2603.06386
👉Repo https://github.com/Maelic/SGG-Benchmark

🔥6❤3👏3👍1

4.81K views07:51

AI with Papers - Artificial Intelligence & Deep Learning

0:02

This media is not supported in your browser

VIEW IN TELEGRAM

🔥Holistic 3D Spatial Intelligence🔥

👉Holi-Spatial is the first fully automated pipeline capable of converting raw video streams into holistic 3D spatial annotations without human intervention. Code/Data announced💙

👉Review https://t.ly/PDpr9
👉Paper https://lnkd.in/dTbMuZCm
👉Project https://lnkd.in/d66CYB4q
👉Repo https://lnkd.in/dAGzShXj

❤8🔥7👍2👏1

4.51K views07:57

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍓Surface Light Tokenizer🍓

👉Apple unveils LITO a novel latent flow matching model enables HQ image-to-3D. Latent representation that encodes a surface light field into a compact set of latent vectors. Impressive results but no code🥲

👉Review https://t.ly/xcWNe
👉Paper https://lnkd.in/dYHwY4YX
👉Project https://lnkd.in/dtJT8bXy

❤9👍4🔥2👏2🤯1🍾1

4.57K views07:46

About

Blog

Apps

Platform