AI with Papers - Artificial Intelligence & Deep Learning
17.3K subscribers
158 photos
275 videos
14 files
1.44K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
If you have to invest TODAY 1B$ on a frontier tech for the next decade, would you invest in space, agentic, quantum or frugal GPUs? Vote here: https://t.ly/hSx6i
🀣3❀1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍎Video Object Deletion🍎

πŸ‘‰Void by Netflix is a novel video object removal framework designed to perform physically-plausible inpainting in very complex scenarios. Repo under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/cMVny
πŸ‘‰Paper https://arxiv.org/pdf/2604.02296
πŸ‘‰Project https://void-model.github.io/
πŸ‘‰Repo https://github.com/Netflix/void-model
❀4🀯3πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯Vanast: VTON w/ Human AnimationπŸ”₯

πŸ‘‰SNU unveils a novel unified framework that generates garment-transferred human animation videos directly from a single human/garment images, and pose guidance clip. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/c0t79
πŸ‘‰Paper arxiv.org/pdf/2604.04934
πŸ‘‰Project hyunsoocha.github.io/vanast/
πŸ‘‰Repo github.com/snuvclab/vanast
❀6πŸ‘2πŸ”₯1🀯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯BoxerNet: SOTA 2D->3D BBsπŸ”₯

πŸ‘‰Boxer by META: transformer-based network to lift 2D BB proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space. Repo under A-NC 4.0 InternationalπŸ’™

πŸ‘‰Review https://t.ly/mlmV1
πŸ‘‰Paper https://arxiv.org/pdf/2604.05212
πŸ‘‰Project facebookresearch.github.io/boxer/
πŸ‘‰Repo github.com/facebookresearch/boxer
🀯9πŸ‘1πŸ”₯1
Hinton our guest in Pavia (remotely) πŸ’šπŸ˜ˆ

Would you see a clip about the interview?
πŸ‘12❀6πŸ”₯2😍1
Media is too big
VIEW IN TELEGRAM
Here the preview, tomorrow the full clip from official source :)
❀5πŸ”₯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺž1.1M Metric VTON DatasetπŸͺž

πŸ‘‰Google's Fit-Inclusive Try-on: large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. Repo & dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/cs-pt
πŸ‘‰Paper arxiv.org/pdf/2604.08526
πŸ‘‰Project johannakarras.github.io/FIT/
πŸ‘‰Repo TBA
πŸ”₯8❀2πŸ‘1
🐞6D Object Pose w/ Deformation🐞

πŸ‘‰DeSOPE by Xidian & #MagicLeap is a novel large-scale dataset for 6DoF deformed objects: 665K pose annotations produced via a semiautomatic pipeline. Repo & Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/M5VgX
πŸ‘‰Paper https://arxiv.org/pdf/2604.06720
πŸ‘‰Project https://desope-6d.github.io/
πŸ‘‰Repo TBA
πŸ”₯8❀3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯SOTA 3D Detection in the wildπŸ”₯

πŸ‘‰WildDet3D is a novel unified geometry-aware architecture for 3D detection that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. New SOTA! Repo, models and iphone πŸ’™

πŸ‘‰Review https://t.ly/8NxBN
πŸ‘‰Paper arxiv.org/pdf/2604.08626
πŸ‘‰Project allenai.github.io/WildDet3D/
πŸ‘‰Repo github.com/allenai/WildDet3D
πŸ”₯7❀4πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧴OmniShow Content Creation🧴

πŸ‘‰OmniShow is the novel SOTA in content creation with industry-grade performance. Impressive results, best with audio. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/Pm-7U
πŸ‘‰Paper arxiv.org/pdf/2604.11804
πŸ‘‰Project correr-zhou.github.io/OmniShow/
πŸ‘‰Repo github.com/Correr-Zhou/OmniShow
❀6🀯6😒1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Interactive Objects from EgoVideoπŸ“

πŸ‘‰EgoFun3D by Simon Fraser University is a coordinated task, dataset and benchmark for modeling interactive 3D objects from egocentric videos. Repo (TBA), demo & datasetπŸ’™

πŸ‘‰Review https://t.ly/YhGN7
πŸ‘‰Paper arxiv.org/pdf/2604.11038
πŸ‘‰Project 3dlg-hcvc.github.io/EgoFun3D/
πŸ‘‰Repo github.com/3dlg-hcvc/EgoFun3D
πŸ‘‰Demo bc79fea884062374b3.gradio.live/
❀2🀯2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“±3D Human-Object ContactπŸ“±

πŸ‘‰Pi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/TAgG1
πŸ‘‰Paper https://arxiv.org/pdf/2604.12923
πŸ‘‰Project https://pi-hoc.github.io/
πŸ‘‰Repo https://github.com/SravanChittupalli/Pi-HOC
πŸ”₯3❀2πŸ‘2🀩2
This media is not supported in your browser
VIEW IN TELEGRAM
🐞GCT 3D Reconstruction🐞

πŸ‘‰ANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 InternationalπŸ’™

πŸ‘‰Review https://t.ly/ExodA
πŸ‘‰Paper https://arxiv.org/pdf/2604.14141
πŸ‘‰Project https://arxiv.org/pdf/2604.14141
πŸ‘‰Repo github.com/robbyant/lingbot-map
πŸ”₯8❀4πŸ‘3
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘©β€πŸ¦°Deformable 3D HairπŸ‘©β€πŸ¦°

πŸ‘‰Xi’an Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/kWZdd
πŸ‘‰Paper https://arxiv.org/pdf/2604.14782
πŸ‘‰Project yuansun-xjtu.github.io/CompHairHead.io/
πŸ‘‰Repo yuansun-xjtu.github.io/CompHairHead.io/
❀6πŸ”₯3πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŒ—Mobile Ultra-detailed AvatarsπŸŒ—

πŸ‘‰Given skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/QPCy6
πŸ‘‰Paper https://arxiv.org/pdf/2604.18583
πŸ‘‰Project https://vcai.mpi-inf.mpg.de/projects/MUA/
πŸ‘‰Repo TBA
❀10πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎈Face Anything 4D (SOTA)🎈

πŸ‘‰A novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/zItie
πŸ‘‰Paper https://arxiv.org/pdf/2604.19702
πŸ‘‰Project kocasariumut.github.io/FaceAnything
πŸ‘‰Repo TBA
❀4πŸ”₯2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’™ PY4AI 2026: here we are! πŸ’™

πŸ‘‰The third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
❀10πŸ‘1🀯1😒1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›’ Reshoot-Anything is out πŸ›’

πŸ‘‰Reshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 πŸ’™

πŸ‘‰Review https://t.ly/MIqAc
πŸ‘‰Paper https://arxiv.org/pdf/2604.21776
πŸ‘‰Project adithyaiyer1999.github.io/reshoot-anything/
πŸ‘‰Repo github.com/morphicfilms/video-to-video
❀4πŸ”₯4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§˜β€β™€οΈHolistic Shot Boundary DetectionπŸ§˜β€β™€οΈ

πŸ‘‰OmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmarkπŸ’™

πŸ‘‰Review https://t.ly/sTi7N
πŸ‘‰Paper https://arxiv.org/pdf/2604.24762
πŸ‘‰Project uva-computer-vision-lab.github.io/OmniShotCut_website/
πŸ‘‰Repo github.com/UVA-Computer-Vision-Lab/OmniShotCut
πŸ”₯4❀2πŸ‘1πŸ‘1