AI with Papers - Artificial Intelligence & Deep Learning
17.3K subscribers
158 photos
276 videos
14 files
1.45K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
Media is too big
VIEW IN TELEGRAM
Here the preview, tomorrow the full clip from official source :)
โค5๐Ÿ”ฅ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชž1.1M Metric VTON Dataset๐Ÿชž

๐Ÿ‘‰Google's Fit-Inclusive Try-on: large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. Repo & dataset announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/cs-pt
๐Ÿ‘‰Paper arxiv.org/pdf/2604.08526
๐Ÿ‘‰Project johannakarras.github.io/FIT/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8โค2๐Ÿ‘1
๐Ÿž6D Object Pose w/ Deformation๐Ÿž

๐Ÿ‘‰DeSOPE by Xidian & #MagicLeap is a novel large-scale dataset for 6DoF deformed objects: 665K pose annotations produced via a semiautomatic pipeline. Repo & Dataset announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/M5VgX
๐Ÿ‘‰Paper https://arxiv.org/pdf/2604.06720
๐Ÿ‘‰Project https://desope-6d.github.io/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8โค3๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅSOTA 3D Detection in the wild๐Ÿ”ฅ

๐Ÿ‘‰WildDet3D is a novel unified geometry-aware architecture for 3D detection that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. New SOTA! Repo, models and iphone ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/8NxBN
๐Ÿ‘‰Paper arxiv.org/pdf/2604.08626
๐Ÿ‘‰Project allenai.github.io/WildDet3D/
๐Ÿ‘‰Repo github.com/allenai/WildDet3D
๐Ÿ”ฅ7โค4๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงดOmniShow Content Creation๐Ÿงด

๐Ÿ‘‰OmniShow is the novel SOTA in content creation with industry-grade performance. Impressive results, best with audio. Repo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Pm-7U
๐Ÿ‘‰Paper arxiv.org/pdf/2604.11804
๐Ÿ‘‰Project correr-zhou.github.io/OmniShow/
๐Ÿ‘‰Repo github.com/Correr-Zhou/OmniShow
โค6๐Ÿคฏ6๐Ÿ˜ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“Interactive Objects from EgoVideo๐Ÿ“

๐Ÿ‘‰EgoFun3D by Simon Fraser University is a coordinated task, dataset and benchmark for modeling interactive 3D objects from egocentric videos. Repo (TBA), demo & dataset๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/YhGN7
๐Ÿ‘‰Paper arxiv.org/pdf/2604.11038
๐Ÿ‘‰Project 3dlg-hcvc.github.io/EgoFun3D/
๐Ÿ‘‰Repo github.com/3dlg-hcvc/EgoFun3D
๐Ÿ‘‰Demo bc79fea884062374b3.gradio.live/
โค2๐Ÿคฏ2๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“ฑ3D Human-Object Contact๐Ÿ“ฑ

๐Ÿ‘‰Pi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/TAgG1
๐Ÿ‘‰Paper https://arxiv.org/pdf/2604.12923
๐Ÿ‘‰Project https://pi-hoc.github.io/
๐Ÿ‘‰Repo https://github.com/SravanChittupalli/Pi-HOC
๐Ÿ”ฅ3โค2๐Ÿ‘2๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸžGCT 3D Reconstruction๐Ÿž

๐Ÿ‘‰ANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 International๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ExodA
๐Ÿ‘‰Paper https://arxiv.org/pdf/2604.14141
๐Ÿ‘‰Project https://arxiv.org/pdf/2604.14141
๐Ÿ‘‰Repo github.com/robbyant/lingbot-map
๐Ÿ”ฅ9โค4๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ฉโ€๐ŸฆฐDeformable 3D Hair๐Ÿ‘ฉโ€๐Ÿฆฐ

๐Ÿ‘‰Xiโ€™an Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/kWZdd
๐Ÿ‘‰Paper https://arxiv.org/pdf/2604.14782
๐Ÿ‘‰Project yuansun-xjtu.github.io/CompHairHead.io/
๐Ÿ‘‰Repo yuansun-xjtu.github.io/CompHairHead.io/
โค6๐Ÿ”ฅ3๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ—Mobile Ultra-detailed Avatars๐ŸŒ—

๐Ÿ‘‰Given skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/QPCy6
๐Ÿ‘‰Paper https://arxiv.org/pdf/2604.18583
๐Ÿ‘‰Project https://vcai.mpi-inf.mpg.de/projects/MUA/
๐Ÿ‘‰Repo TBA
โค10๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽˆFace Anything 4D (SOTA)๐ŸŽˆ

๐Ÿ‘‰A novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/zItie
๐Ÿ‘‰Paper https://arxiv.org/pdf/2604.19702
๐Ÿ‘‰Project kocasariumut.github.io/FaceAnything
๐Ÿ‘‰Repo TBA
โค5๐Ÿ”ฅ2๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’™ PY4AI 2026: here we are! ๐Ÿ’™

๐Ÿ‘‰The third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
โค10๐Ÿ‘1๐Ÿคฏ1๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›’ Reshoot-Anything is out ๐Ÿ›’

๐Ÿ‘‰Reshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/MIqAc
๐Ÿ‘‰Paper https://arxiv.org/pdf/2604.21776
๐Ÿ‘‰Project adithyaiyer1999.github.io/reshoot-anything/
๐Ÿ‘‰Repo github.com/morphicfilms/video-to-video
โค5๐Ÿ”ฅ4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿง˜โ€โ™€๏ธHolistic Shot Boundary Detection๐Ÿง˜โ€โ™€๏ธ

๐Ÿ‘‰OmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmark๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/sTi7N
๐Ÿ‘‰Paper https://arxiv.org/pdf/2604.24762
๐Ÿ‘‰Project uva-computer-vision-lab.github.io/OmniShotCut_website/
๐Ÿ‘‰Repo github.com/UVA-Computer-Vision-Lab/OmniShotCut
๐Ÿ”ฅ6โค3๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸชSyn4D: Multiview Synthetic 4D Dataset๐Ÿช

๐Ÿ‘‰Syn4D is novel multi-view synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SL1mk
๐Ÿ‘‰Paper https://arxiv.org/pdf/2605.05207
๐Ÿ‘‰Project https://jzr99.github.io/Syn4D/
๐Ÿ‘‰Repo https://github.com/jzr99/Syn4D
๐Ÿ‘‰Data huggingface.co/datasets/Syn4D/Syn4D_RGBD/tree/main
โค5๐Ÿ”ฅ5๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ„Unified Correspondence Transformer๐Ÿฆ„

๐Ÿ‘‰UniCorrn is the first correspondence model with shared weights that unifies 2D-2D, 2D-3D, and 3D-3D geometric matching with a transformer. CC BY-NC-SA 4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/2OBdq
๐Ÿ‘‰Paper https://arxiv.org/pdf/2605.04044
๐Ÿ‘‰Project https://neu-vi.github.io/UniCorrn/
๐Ÿ‘‰Repo https://github.com/neu-vi/UniCorrn
๐Ÿ‘5๐Ÿ”ฅ5โค4๐Ÿคฏ4๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’Count Anything, Any Granularity๐Ÿ’

๐Ÿ‘‰Open-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text specifies the intended semantic granularity across five explicit levels. Repo/Data under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/nqz80
๐Ÿ‘‰Paper https://lnkd.in/dp7khTRU
๐Ÿ‘‰Project https://lnkd.in/d_jfX_Yn
๐Ÿ‘‰Repo https://lnkd.in/dkTRGZkG
๐Ÿ‘‰Data https://lnkd.in/dB83jRyT
1โค12๐Ÿ‘6๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช”Latent Decoding Pixel Diffusion๐Ÿช”

๐Ÿ‘‰PiD by Nvidia is a plug-and-play diffusion decoder that replaces VAE/RAE decoders, turning latent representations directly into super-resolved pixels in a single pass. Repo under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/y19mA
๐Ÿ‘‰Paper https://lnkd.in/duVC25C2
๐Ÿ‘‰Project https://lnkd.in/dW6TkzCB
๐Ÿ‘‰Repo https://lnkd.in/dnGdgKRr
๐Ÿ”ฅ5โค4๐Ÿ‘1๐Ÿ‘1