AI with Papers - Artificial Intelligence & Deep Learning
15.5K subscribers
145 photos
256 videos
14 files
1.34K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🦩 WildRGB-D: Objects in the Wild 🦩

πŸ‘‰#NVIDIA unveils a novel RGB-D object dataset captured in the wild: ~8500 recorded objects, ~20,000 RGBD videos, 46 categories with corresponding masks and 3D point clouds.

πŸ‘‰Review https://t.ly/WCqVz
πŸ‘‰Data github.com/wildrgbd/wildrgbd
πŸ‘‰Paper arxiv.org/pdf/2401.12592.pdf
πŸ‘‰Project wildrgbd.github.io/
πŸ‘9❀3πŸ”₯2πŸ‘1🀩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŒ† Up to 69x Faster SAM πŸŒ†

πŸ‘‰EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

πŸ‘‰Review https://t.ly/zGiE9
πŸ‘‰Paper arxiv.org/pdf/2402.05008.pdf
πŸ‘‰Code github.com/mit-han-lab/efficientvit
πŸ”₯19πŸ‘7❀4πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”Œ BodyMAP: human body & pressure πŸ”Œ

πŸ‘‰#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming πŸ’™

πŸ‘‰Review https://t.ly/8926S
πŸ‘‰Project bodymap3d.github.io/
πŸ‘‰Paper https://lnkd.in/gCxH4ev3
πŸ‘‰Code https://lnkd.in/gaifdy3q
❀8🀯4⚑1πŸ‘1πŸ”₯1
πŸ“ˆGradient Boosting Reinforcement LearningπŸ“ˆ

πŸ‘‰#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/zv9pl
πŸ‘‰Paper https://arxiv.org/pdf/2407.08250
πŸ‘‰Code https://github.com/NVlabs/gbrl
❀7🀯4πŸ‘3πŸ”₯1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›³οΈ EVER Ellipsoid Rendering πŸ›³οΈ

πŸ‘‰UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∼30 FPS at 720p on #NVIDIA RTX4090.

πŸ‘‰Review https://t.ly/zAfGU
πŸ‘‰Paper arxiv.org/pdf/2410.01804
πŸ‘‰Project half-potato.gitlab.io/posts/ever/
πŸ”₯13❀2πŸ‘2πŸ‘1🀯1😱1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺžRobo-Emulation via Video ImitationπŸͺž

πŸ‘‰OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

πŸ‘‰Review https://t.ly/_N29-
πŸ‘‰Paper arxiv.org/pdf/2410.11792
πŸ‘‰Project https://lnkd.in/d6bHF_-s
πŸ‘4🀯2πŸ”₯1
πŸ”₯ "Nuclear" AI vs. Hyper-Cheap Inference πŸ”₯

⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
🀲Portabile Training Workstation
34%
βš›οΈNuclear energy for AI training
33%
πŸ–²οΈCheaper Only-inference devices
9%
πŸ’°Cloud-intensive Only-inference
πŸ‘4❀1πŸ”₯1🀯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§žβ€β™‚οΈOmni-RGPT: SOTA MLLM UnderstandingπŸ§žβ€β™‚οΈ

πŸ‘‰ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

πŸ‘‰Review https://t.ly/KHnQ7
πŸ‘‰Paper arxiv.org/pdf/2501.08326
πŸ‘‰Project miranheo.github.io/omni-rgpt/
πŸ‘‰Repo TBA soon
πŸ”₯10❀3🍾2⚑1πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 #Nvidia Foundation ZS-Stereo 🌈

πŸ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be releasedπŸ’™

πŸ‘‰Review https://t.ly/rfBr5
πŸ‘‰Paper arxiv.org/pdf/2501.09898
πŸ‘‰Project nvlabs.github.io/FoundationStereo/
πŸ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
❀6πŸ”₯6🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯›HAMSTER: Hierarchical VLA ManipulationπŸ₯›

πŸ‘‰#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announcedπŸ’™

πŸ‘‰Review https://t.ly/2yXaY
πŸ‘‰Paper https://arxiv.org/pdf/2502.05485
πŸ‘‰Project https://hamster-robot.github.io/
πŸ‘‰Repo TBA
πŸ”₯4❀1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Unified Low-Level 4D Vision🌈

πŸ‘‰#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced πŸ’™

πŸ‘‰Review https://t.ly/04DGj
πŸ‘‰Paper arxiv.org/pdf/2502.13078
πŸ‘‰Project research.nvidia.com/labs/lpr/l4p/
πŸ‘‰Repo TBA
πŸ”₯5πŸ‘2❀1🀯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘½Neural-Free Sparse Voxels RasterizationπŸ‘½

πŸ‘‰#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)πŸ’™

πŸ‘‰Review https://t.ly/Nh_ic
πŸ‘‰Paper https://lnkd.in/g8k8Zs6R
πŸ‘‰Project https://lnkd.in/gR-bD4Wx
πŸ‘‰Repo https://lnkd.in/gNHX-w4t
πŸ”₯15πŸ‘4🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™€3D MultiModal MemoryπŸ™€

πŸ‘‰M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

πŸ‘‰Review https://t.ly/OrXZO
πŸ‘‰Paper arxiv.org/pdf/2503.16413
πŸ‘‰Project https://lnkd.in/dXAZ97KH
πŸ‘‰Repo https://lnkd.in/dWvunCET
πŸ”₯10❀4πŸ‘1πŸ‘1
🦎 Scaling Vision to 4K🦎

πŸ‘‰PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & πŸ€— announcedπŸ’™

πŸ‘‰Review https://t.ly/WN479
πŸ‘‰Paper https://lnkd.in/ddWq8UpX
πŸ‘‰Project https://lnkd.in/dMkTY8-k
πŸ‘‰Repo https://lnkd.in/d9YSB6yv
πŸ”₯14❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏PartField #3D Part Segmentation🍏

πŸ‘‰#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia LicenseπŸ’™

πŸ‘‰Review https://t.ly/fGb2O
πŸ‘‰Paper https://lnkd.in/dGeyKSzG
πŸ‘‰Code https://lnkd.in/dbe57XGH
πŸ‘‰Project https://lnkd.in/dhEgf7X2
❀2πŸ”₯2🀯2
This media is not supported in your browser
VIEW IN TELEGRAM
🦧 #Nvidia Describe Anything 🦧

πŸ‘‰Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on πŸ€—

πŸ‘‰Review https://t.ly/la4JD
πŸ‘‰Paper https://lnkd.in/dZh82xtV
πŸ‘‰Project https://lnkd.in/dcv9V2ZF
πŸ‘‰Repo https://lnkd.in/dJB9Ehtb
πŸ€—Demo https://lnkd.in/dXDb2MWU
πŸ”₯10πŸ‘5❀1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏#Nvidia Dynamic Pose 🍏

πŸ‘‰Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπŸ’™

πŸ‘‰Review https://t.ly/wrcb0
πŸ‘‰Paper https://lnkd.in/dycGjAyy
πŸ‘‰Project https://lnkd.in/dDZ2Ej_Q
πŸ€—Data https://lnkd.in/d8yUSB7m
πŸ”₯4πŸ‘2❀1🀯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§žβ€β™€οΈGENMO: Generalist Human Motion πŸ§žβ€β™€οΈ

πŸ‘‰#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπŸ₯²

πŸ‘‰Review https://t.ly/Q5T_Y
πŸ‘‰Paper https://lnkd.in/ds36BY49
πŸ‘‰Project https://lnkd.in/dAYHhuFU
πŸ”₯13❀3πŸ‘2😒1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧀Diffusive Hand from Signs🧀

πŸ‘‰LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released πŸ’™

πŸ‘‰Review https://t.ly/HonX_
πŸ‘‰Paper https://arxiv.org/pdf/2508.15902
πŸ‘‰Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πŸ‘‰Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πŸ‘‰Repo TBA
❀4πŸ”₯3πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›‘οΈ3D Prompted Vision-LLMπŸ›‘οΈ

πŸ‘‰#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/5Y2c5
πŸ‘‰Paper https://arxiv.org/pdf/2509.13317
πŸ‘‰Project https://www.anjiecheng.me/sr3d
πŸ‘‰Repo TBA
❀6πŸ”₯5πŸ‘1πŸ‘1
A few β€œleaks” for you from the #Nvidia presentation I’m right now in Milan. Impressive stuff.

Ps: sorry for the shitty quality of the pics β™₯️
❀19πŸ”₯4πŸ‘2πŸ‘1🀯1🀩1