AI with Papers - Artificial Intelligence & Deep Learning

🦩 WildRGB-D: Objects in the Wild 🦩

👉#NVIDIA unveils a novel RGB-D object dataset captured in the wild: ~8500 recorded objects, ~20,000 RGBD videos, 46 categories with corresponding masks and 3D point clouds.

👉Review https://t.ly/WCqVz
👉Data github.com/wildrgbd/wildrgbd
👉Paper arxiv.org/pdf/2401.12592.pdf
👉Project wildrgbd.github.io/

👍9❤3🔥2👏1🤩1😍1

6.39K views10:19

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌆 Up to 69x Faster SAM 🌆

👉EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

👉Review https://t.ly/zGiE9
👉Paper arxiv.org/pdf/2402.05008.pdf
👉Code github.com/mit-han-lab/efficientvit

🔥19👍7❤4🥰1

8.93K viewsedited 13:04

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔌 BodyMAP: human body & pressure 🔌

👉#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming 💙

👉Review https://t.ly/8926S
👉Project bodymap3d.github.io/
👉Paper https://lnkd.in/gCxH4ev3
👉Code https://lnkd.in/gaifdy3q

❤8🤯4⚡1👍1🔥1

7.7K views15:07

AI with Papers - Artificial Intelligence & Deep Learning

📈Gradient Boosting Reinforcement Learning📈

👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙

👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl

❤7🤯4👍3🔥1🥰1

8.13K views06:14

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🛳️ EVER Ellipsoid Rendering 🛳️

👉UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∼30 FPS at 720p on #NVIDIA RTX4090.

👉Review https://t.ly/zAfGU
👉Paper arxiv.org/pdf/2410.01804
👉Project half-potato.gitlab.io/posts/ever/

🔥13❤2👍2👏1🤯1😱1🍾1

8.65K views14:34

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🪞Robo-Emulation via Video Imitation🪞

👉OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

👉Review https://t.ly/_N29-
👉Paper arxiv.org/pdf/2410.11792
👉Project https://lnkd.in/d6bHF_-s

👍4🤯2🔥1

6.86K views07:26

AI with Papers - Artificial Intelligence & Deep Learning

🔥 "Nuclear" AI vs. Hyper-Cheap Inference 🔥

⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)

Anonymous Poll

24%

🤲Portabile Training Workstation

34%

⚛️Nuclear energy for AI training

33%

🖲️Cheaper Only-inference devices

💰Cloud-intensive Only-inference

👍4❤1🔥1🤯1🤩1

245 voters7.31K views13:19

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🧞‍♂️Omni-RGPT: SOTA MLLM Understanding🧞‍♂️

👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon

🔥10❤3🍾2⚡1👍1👏1

7.71K viewsedited 07:55

AI with Papers - Artificial Intelligence & Deep Learning

0:05

This media is not supported in your browser

VIEW IN TELEGRAM

🌈 #Nvidia Foundation ZS-Stereo 🌈

👉Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released💙

👉Review https://t.ly/rfBr5
👉Paper arxiv.org/pdf/2501.09898
👉Project nvlabs.github.io/FoundationStereo/
👉Repo github.com/NVlabs/FoundationStereo/tree/master

❤6🔥6🤩1

7.2K views07:01

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🥛HAMSTER: Hierarchical VLA Manipulation🥛

👉#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced💙

👉Review https://t.ly/2yXaY
👉Paper https://arxiv.org/pdf/2502.05485
👉Project https://hamster-robot.github.io/
👉Repo TBA

🔥4❤1

8.55K views08:06

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌈Unified Low-Level 4D Vision🌈

👉#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced 💙

👉Review https://t.ly/04DGj
👉Paper arxiv.org/pdf/2502.13078
👉Project research.nvidia.com/labs/lpr/l4p/
👉Repo TBA

🔥5👍2❤1🤯1🤩1

10.3K viewsedited 08:27

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

👽Neural-Free Sparse Voxels Rasterization👽

👉#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)💙

👉Review https://t.ly/Nh_ic
👉Paper https://lnkd.in/g8k8Zs6R
👉Project https://lnkd.in/gR-bD4Wx
👉Repo https://lnkd.in/gNHX-w4t

🔥15👍4🤩1

10.1K viewsedited 08:12

AI with Papers - Artificial Intelligence & Deep Learning

0:04

This media is not supported in your browser

VIEW IN TELEGRAM

🙀3D MultiModal Memory🙀

👉M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

👉Review https://t.ly/OrXZO
👉Paper arxiv.org/pdf/2503.16413
👉Project https://lnkd.in/dXAZ97KH
👉Repo https://lnkd.in/dWvunCET

🔥10❤4👍1👏1

8.14K viewsedited 14:52

AI with Papers - Artificial Intelligence & Deep Learning

🦎 Scaling Vision to 4K🦎

👉PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & 🤗 announced💙

👉Review https://t.ly/WN479
👉Paper https://lnkd.in/ddWq8UpX
👉Project https://lnkd.in/dMkTY8-k
👉Repo https://lnkd.in/d9YSB6yv

🔥14❤4👍2👏1

8.15K viewsedited 07:37

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🍏PartField #3D Part Segmentation🍏

👉#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License💙

👉Review https://t.ly/fGb2O
👉Paper https://lnkd.in/dGeyKSzG
👉Code https://lnkd.in/dbe57XGH
👉Project https://lnkd.in/dhEgf7X2

❤2🔥2🤯2

7.42K viewsedited 06:50

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦧 #Nvidia Describe Anything 🦧

👉Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on 🤗

👉Review https://t.ly/la4JD
👉Paper https://lnkd.in/dZh82xtV
👉Project https://lnkd.in/dcv9V2ZF
👉Repo https://lnkd.in/dJB9Ehtb
🤗Demo https://lnkd.in/dXDb2MWU

🔥10👍5❤1

8.24K viewsedited 09:56

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍏#Nvidia Dynamic Pose 🍏

👉Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license💙

👉Review https://t.ly/wrcb0
👉Paper https://lnkd.in/dycGjAyy
👉Project https://lnkd.in/dDZ2Ej_Q
🤗Data https://lnkd.in/d8yUSB7m

🔥4👍2❤1🤯1😍1

9.13K viewsedited 07:56

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧞‍♀️GENMO: Generalist Human Motion 🧞‍♀️

👉#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment🥲

👉Review https://t.ly/Q5T_Y
👉Paper https://lnkd.in/ds36BY49
👉Project https://lnkd.in/dAYHhuFU

🔥13❤3👍2😢1😍1

14.9K views07:41

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧤Diffusive Hand from Signs🧤

👉LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released 💙

👉Review https://t.ly/HonX_
👉Paper https://arxiv.org/pdf/2508.15902
👉Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
👉Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
👉Repo TBA

❤4🔥3👍2🤯1

4.54K views14:52

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🛡️3D Prompted Vision-LLM🛡️

👉#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announced💙

👉Review https://t.ly/5Y2c5
👉Paper https://arxiv.org/pdf/2509.13317
👉Project https://www.anjiecheng.me/sr3d
👉Repo TBA

❤6🔥5👍1👏1

4.25K viewsedited 08:41

AI with Papers - Artificial Intelligence & Deep Learning

A few “leaks” for you from the #Nvidia presentation I’m right now in Milan. Impressive stuff.

Ps: sorry for the shitty quality of the pics ♥️

❤19🔥4👍2👏1🤯1🤩1

2.98K viewsedited 08:28

About

Blog

Apps

Platform