AI with Papers - Artificial Intelligence & Deep Learning
15.4K subscribers
140 photos
253 videos
14 files
1.33K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ #6D Foundation Pose ๐Ÿ”ฅ

๐Ÿ‘‰#Nvidia unveils FoundationPose, a novel (and unified) foundation model for 6D object pose estimation and tracking.

๐Ÿ‘‰Review https://t.ly/HGd4h
๐Ÿ‘‰Project https://lnkd.in/dPcnBKWm
๐Ÿ‘‰Paper https://lnkd.in/dixn_iHZ
๐Ÿ‘‰Code coming ๐Ÿฉท
๐Ÿ”ฅ12โค5๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆฉ WildRGB-D: Objects in the Wild ๐Ÿฆฉ

๐Ÿ‘‰#NVIDIA unveils a novel RGB-D object dataset captured in the wild: ~8500 recorded objects, ~20,000 RGBD videos, 46 categories with corresponding masks and 3D point clouds.

๐Ÿ‘‰Review https://t.ly/WCqVz
๐Ÿ‘‰Data github.com/wildrgbd/wildrgbd
๐Ÿ‘‰Paper arxiv.org/pdf/2401.12592.pdf
๐Ÿ‘‰Project wildrgbd.github.io/
๐Ÿ‘9โค3๐Ÿ”ฅ2๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ† Up to 69x Faster SAM ๐ŸŒ†

๐Ÿ‘‰EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMโ€™s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

๐Ÿ‘‰Review https://t.ly/zGiE9
๐Ÿ‘‰Paper arxiv.org/pdf/2402.05008.pdf
๐Ÿ‘‰Code github.com/mit-han-lab/efficientvit
๐Ÿ”ฅ19๐Ÿ‘7โค4๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”Œ BodyMAP: human body & pressure ๐Ÿ”Œ

๐Ÿ‘‰#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/8926S
๐Ÿ‘‰Project bodymap3d.github.io/
๐Ÿ‘‰Paper https://lnkd.in/gCxH4ev3
๐Ÿ‘‰Code https://lnkd.in/gaifdy3q
โค8๐Ÿคฏ4โšก1๐Ÿ‘1๐Ÿ”ฅ1
๐Ÿ“ˆGradient Boosting Reinforcement Learning๐Ÿ“ˆ

๐Ÿ‘‰#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/zv9pl
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.08250
๐Ÿ‘‰Code https://github.com/NVlabs/gbrl
โค7๐Ÿคฏ4๐Ÿ‘3๐Ÿ”ฅ1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›ณ๏ธ EVER Ellipsoid Rendering ๐Ÿ›ณ๏ธ

๐Ÿ‘‰UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving โˆผ30 FPS at 720p on #NVIDIA RTX4090.

๐Ÿ‘‰Review https://t.ly/zAfGU
๐Ÿ‘‰Paper arxiv.org/pdf/2410.01804
๐Ÿ‘‰Project half-potato.gitlab.io/posts/ever/
๐Ÿ”ฅ13โค2๐Ÿ‘2๐Ÿ‘1๐Ÿคฏ1๐Ÿ˜ฑ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸชžRobo-Emulation via Video Imitation๐Ÿชž

๐Ÿ‘‰OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

๐Ÿ‘‰Review https://t.ly/_N29-
๐Ÿ‘‰Paper arxiv.org/pdf/2410.11792
๐Ÿ‘‰Project https://lnkd.in/d6bHF_-s
๐Ÿ‘4๐Ÿคฏ2๐Ÿ”ฅ1
๐Ÿ”ฅ "Nuclear" AI vs. Hyper-Cheap Inference ๐Ÿ”ฅ

โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
๐ŸคฒPortabile Training Workstation
34%
โš›๏ธNuclear energy for AI training
33%
๐Ÿ–ฒ๏ธCheaper Only-inference devices
9%
๐Ÿ’ฐCloud-intensive Only-inference
๐Ÿ‘4โค1๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงžโ€โ™‚๏ธOmni-RGPT: SOTA MLLM Understanding๐Ÿงžโ€โ™‚๏ธ

๐Ÿ‘‰ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

๐Ÿ‘‰Review https://t.ly/KHnQ7
๐Ÿ‘‰Paper arxiv.org/pdf/2501.08326
๐Ÿ‘‰Project miranheo.github.io/omni-rgpt/
๐Ÿ‘‰Repo TBA soon
๐Ÿ”ฅ10โค3๐Ÿพ2โšก1๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆ #Nvidia Foundation ZS-Stereo ๐ŸŒˆ

๐Ÿ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/rfBr5
๐Ÿ‘‰Paper arxiv.org/pdf/2501.09898
๐Ÿ‘‰Project nvlabs.github.io/FoundationStereo/
๐Ÿ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
โค6๐Ÿ”ฅ6๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅ›HAMSTER: Hierarchical VLA Manipulation๐Ÿฅ›

๐Ÿ‘‰#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/2yXaY
๐Ÿ‘‰Paper https://arxiv.org/pdf/2502.05485
๐Ÿ‘‰Project https://hamster-robot.github.io/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ4โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆUnified Low-Level 4D Vision๐ŸŒˆ

๐Ÿ‘‰#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/04DGj
๐Ÿ‘‰Paper arxiv.org/pdf/2502.13078
๐Ÿ‘‰Project research.nvidia.com/labs/lpr/l4p/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ5๐Ÿ‘2๐Ÿคฏ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ฝNeural-Free Sparse Voxels Rasterization๐Ÿ‘ฝ

๐Ÿ‘‰#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Nh_ic
๐Ÿ‘‰Paper https://lnkd.in/g8k8Zs6R
๐Ÿ‘‰Project https://lnkd.in/gR-bD4Wx
๐Ÿ‘‰Repo https://lnkd.in/gNHX-w4t
๐Ÿ”ฅ15๐Ÿ‘4๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ™€3D MultiModal Memory๐Ÿ™€

๐Ÿ‘‰M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

๐Ÿ‘‰Review https://t.ly/OrXZO
๐Ÿ‘‰Paper arxiv.org/pdf/2503.16413
๐Ÿ‘‰Project https://lnkd.in/dXAZ97KH
๐Ÿ‘‰Repo https://lnkd.in/dWvunCET
๐Ÿ”ฅ10โค4๐Ÿ‘1๐Ÿ‘1
๐ŸฆŽ Scaling Vision to 4K๐ŸฆŽ

๐Ÿ‘‰PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & ๐Ÿค— announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/WN479
๐Ÿ‘‰Paper https://lnkd.in/ddWq8UpX
๐Ÿ‘‰Project https://lnkd.in/dMkTY8-k
๐Ÿ‘‰Repo https://lnkd.in/d9YSB6yv
๐Ÿ”ฅ14โค4๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸPartField #3D Part Segmentation๐Ÿ

๐Ÿ‘‰#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/fGb2O
๐Ÿ‘‰Paper https://lnkd.in/dGeyKSzG
๐Ÿ‘‰Code https://lnkd.in/dbe57XGH
๐Ÿ‘‰Project https://lnkd.in/dhEgf7X2
โค2๐Ÿ”ฅ2๐Ÿคฏ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆง #Nvidia Describe Anything ๐Ÿฆง

๐Ÿ‘‰Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on ๐Ÿค—

๐Ÿ‘‰Review https://t.ly/la4JD
๐Ÿ‘‰Paper https://lnkd.in/dZh82xtV
๐Ÿ‘‰Project https://lnkd.in/dcv9V2ZF
๐Ÿ‘‰Repo https://lnkd.in/dJB9Ehtb
๐Ÿค—Demo https://lnkd.in/dXDb2MWU
๐Ÿ”ฅ10๐Ÿ‘5โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ#Nvidia Dynamic Pose ๐Ÿ

๐Ÿ‘‰Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/wrcb0
๐Ÿ‘‰Paper https://lnkd.in/dycGjAyy
๐Ÿ‘‰Project https://lnkd.in/dDZ2Ej_Q
๐Ÿค—Data https://lnkd.in/d8yUSB7m
๐Ÿ”ฅ4๐Ÿ‘2โค1๐Ÿคฏ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงžโ€โ™€๏ธGENMO: Generalist Human Motion ๐Ÿงžโ€โ™€๏ธ

๐Ÿ‘‰#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐Ÿฅฒ

๐Ÿ‘‰Review https://t.ly/Q5T_Y
๐Ÿ‘‰Paper https://lnkd.in/ds36BY49
๐Ÿ‘‰Project https://lnkd.in/dAYHhuFU
๐Ÿ”ฅ13โค3๐Ÿ‘2๐Ÿ˜ข1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคDiffusive Hand from Signs๐Ÿงค

๐Ÿ‘‰LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/HonX_
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.15902
๐Ÿ‘‰Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Ÿ‘‰Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Ÿ‘‰Repo TBA
โค4๐Ÿ”ฅ3๐Ÿ‘2๐Ÿคฏ1