AI with Papers - Artificial Intelligence & Deep Learning
17.5K subscribers
155 photos
266 videos
14 files
1.4K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
πŸ˜Άβ€πŸŒ«οΈ SOTA Full-Head Synthesis πŸ˜Άβ€πŸŒ«οΈ

πŸ‘‰HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/WYfP3
πŸ‘‰Paper arxiv.org/pdf/2509.16748
πŸ‘‰Project https://lhyfst.github.io/hyplanehead/
πŸ‘‰Repo github.com/lhyfst/HyPlaneHead
❀3πŸ”₯3πŸ‘2πŸ‘1😒1
This media is not supported in your browser
VIEW IN TELEGRAM
🍟 AnyTouch 2 is out 🍟

πŸ‘‰AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & DataπŸ’™

πŸ‘‰Review https://t.ly/fP4dP
πŸ‘‰Paper https://arxiv.org/pdf/2602.09617
πŸ‘‰Project gewu-lab.github.io/AnyTouch2/
πŸ‘‰Repo github.com/GeWu-Lab/AnyTouch2
❀6πŸ”₯1
🍌 AGENT BANANA (SOTA) 🍌

πŸ‘‰Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/EXaCH
πŸ‘‰Paper https://arxiv.org/pdf/2602.09084
πŸ‘‰Project https://agent-banana.github.io/
πŸ‘‰Repo https://github.com/taco-group/agent-banana
❀12πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ› οΈ IndustryShapes 6D Pose πŸ› οΈ

πŸ‘‰IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset availableπŸ’™

πŸ‘‰Review https://t.ly/KKcuH
πŸ‘‰Paper https://arxiv.org/pdf/2602.05555
πŸ‘‰Project https://pose-lab.github.io/IndustryShapes/
πŸ‘‰Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
❀8πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€–Generalized Human TrackingπŸ€–

πŸ‘‰Beijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.

πŸ‘‰Review https://t.ly/ucmuB
πŸ‘‰Paper arxiv.org/pdf/2601.23080
πŸ‘‰Project zeonsunlightyu.github.io/RGMT.github.io
πŸ”₯11❀2🀯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🫧SurfPhase: 3D Interfacial Dynamics🫧

πŸ‘‰SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/g2P5F
πŸ‘‰Paper https://arxiv.org/pdf/2602.11154
πŸ‘‰Project https://yuegao.me/SurfPhase/
πŸ‘‰Repo github.com/yuegao/SurfPhase
❀5πŸ”₯2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺΏTeaching AI to illusionsπŸͺΏ

πŸ‘‰Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/98Oim
πŸ‘‰Paper https://lnkd.in/dTA7iuce
πŸ‘‰Project https://lnkd.in/dhTMGw23
πŸ‘‰Repo https://lnkd.in/deQyDGFu
❀7πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯Conversational SegmentationπŸ₯

πŸ‘‰CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo releasedπŸ’™

πŸ‘‰Review https://t.ly/SsG57
πŸ‘‰Paper arxiv.org/pdf/2602.13195
πŸ‘‰Project glab-caltech.github.io/converseg/
πŸ‘‰Repo github.com/AadSah/ConverSeg
πŸ‘‰Demo glab-caltech.github.io/converseg/#interactive-demo
❀5πŸ”₯3πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“² Efficient VLMs πŸ“²

πŸ‘‰CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announcedπŸ’™

πŸ‘‰Review https://t.ly/3_GqN
πŸ‘‰Paper https://arxiv.org/pdf/2602.13191
πŸ‘‰Project https://sayands.github.io/cope/
πŸ‘‰Repo TBA
πŸ”₯11❀5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Dex4D: Task-Agnostic TrackπŸ™

πŸ‘‰Dex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/ZGx9T
πŸ‘‰Paper arxiv.org/pdf/2602.15828
πŸ‘‰Project dex4d.github.io/
πŸ‘‰Sim github.com/Dex4D/Dex4D-Simulation
πŸ‘‰Vision github.com/Dex4D/Dex4D-Vision
πŸ‘‰HW https://github.com/Dex4D/Dex4D-Hardware
❀8πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🚀Video Neural Compression🚀

πŸ‘‰TeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3Γ— faster. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/0AtCK
πŸ‘‰Paper arxiv.org/pdf/2602.16711
πŸ‘‰Project namithap10.github.io/teconerv/
πŸ‘‰Repo github.com/namithap10/TeCoNeRV/
πŸ”₯9❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯New SOTA Planar TrackingπŸ”₯

πŸ‘‰WOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0πŸ’™

πŸ‘‰Review https://t.ly/VUOe5
πŸ‘‰Paper https://lnkd.in/dZfc_DhQ
πŸ‘‰Repo https://lnkd.in/dAcneJGn
πŸ”₯8πŸ‘3❀2πŸ‘1🀯1🀣1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🫸 World-Grounded Hand-Obj🫸

πŸ‘‰WHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/c5w8h
πŸ‘‰Paper https://arxiv.org/pdf/2602.22209
πŸ‘‰Project https://judyye.github.io/whole-www/
πŸ‘‰Repo TBA
❀2πŸ‘2πŸ”₯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧱Solaris: generative #Minecraft🧱

πŸ‘‰NYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & DatasetπŸ’™

πŸ‘‰Review https://t.ly/VrcrT
πŸ‘‰Paper https://arxiv.org/pdf/2602.22208
πŸ‘‰Project https://solaris-wm.github.io/
πŸ‘‰Repo https://github.com/solaris-wm/
πŸ”₯6❀2πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦜Geometry-Aware 4D Head🦜

πŸ‘‰ GeoDiff4D is a novel framework that reconstructs animatable 4D head avatars from a single portrait image through geometry-aware diffusion. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/J9L-t
πŸ‘‰Paper https://lnkd.in/ddpv-78g
πŸ‘‰Project https://lnkd.in/d-vhukyj
πŸ‘‰Repo https://lnkd.in/dzd6mnFv
❀3πŸ‘3πŸ”₯2