This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ #6D Foundation Pose ๐ฅ
๐#Nvidia unveils FoundationPose, a novel (and unified) foundation model for 6D object pose estimation and tracking.
๐Review https://t.ly/HGd4h
๐Project https://lnkd.in/dPcnBKWm
๐Paper https://lnkd.in/dixn_iHZ
๐Code coming ๐ฉท
๐#Nvidia unveils FoundationPose, a novel (and unified) foundation model for 6D object pose estimation and tracking.
๐Review https://t.ly/HGd4h
๐Project https://lnkd.in/dPcnBKWm
๐Paper https://lnkd.in/dixn_iHZ
๐Code coming ๐ฉท
๐ฅ12โค5๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆฉ WildRGB-D: Objects in the Wild ๐ฆฉ
๐#NVIDIA unveils a novel RGB-D object dataset captured in the wild: ~8500 recorded objects, ~20,000 RGBD videos, 46 categories with corresponding masks and 3D point clouds.
๐Review https://t.ly/WCqVz
๐Data github.com/wildrgbd/wildrgbd
๐Paper arxiv.org/pdf/2401.12592.pdf
๐Project wildrgbd.github.io/
๐#NVIDIA unveils a novel RGB-D object dataset captured in the wild: ~8500 recorded objects, ~20,000 RGBD videos, 46 categories with corresponding masks and 3D point clouds.
๐Review https://t.ly/WCqVz
๐Data github.com/wildrgbd/wildrgbd
๐Paper arxiv.org/pdf/2401.12592.pdf
๐Project wildrgbd.github.io/
๐9โค3๐ฅ2๐1๐คฉ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Up to 69x Faster SAM ๐
๐EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMโs lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia
๐Review https://t.ly/zGiE9
๐Paper arxiv.org/pdf/2402.05008.pdf
๐Code github.com/mit-han-lab/efficientvit
๐EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMโs lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia
๐Review https://t.ly/zGiE9
๐Paper arxiv.org/pdf/2402.05008.pdf
๐Code github.com/mit-han-lab/efficientvit
๐ฅ19๐7โค4๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ BodyMAP: human body & pressure ๐
๐#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming ๐
๐Review https://t.ly/8926S
๐Project bodymap3d.github.io/
๐Paper https://lnkd.in/gCxH4ev3
๐Code https://lnkd.in/gaifdy3q
๐#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming ๐
๐Review https://t.ly/8926S
๐Project bodymap3d.github.io/
๐Paper https://lnkd.in/gCxH4ev3
๐Code https://lnkd.in/gaifdy3q
โค8๐คฏ4โก1๐1๐ฅ1
๐Gradient Boosting Reinforcement Learning๐
๐#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐
๐Review https://t.ly/zv9pl
๐Paper https://arxiv.org/pdf/2407.08250
๐Code https://github.com/NVlabs/gbrl
๐#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐
๐Review https://t.ly/zv9pl
๐Paper https://arxiv.org/pdf/2407.08250
๐Code https://github.com/NVlabs/gbrl
โค7๐คฏ4๐3๐ฅ1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ณ๏ธ EVER Ellipsoid Rendering ๐ณ๏ธ
๐UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving โผ30 FPS at 720p on #NVIDIA RTX4090.
๐Review https://t.ly/zAfGU
๐Paper arxiv.org/pdf/2410.01804
๐Project half-potato.gitlab.io/posts/ever/
๐UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving โผ30 FPS at 720p on #NVIDIA RTX4090.
๐Review https://t.ly/zAfGU
๐Paper arxiv.org/pdf/2410.01804
๐Project half-potato.gitlab.io/posts/ever/
๐ฅ13โค2๐2๐1๐คฏ1๐ฑ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชRobo-Emulation via Video Imitation๐ช
๐OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
๐Review https://t.ly/_N29-
๐Paper arxiv.org/pdf/2410.11792
๐Project https://lnkd.in/d6bHF_-s
๐OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
๐Review https://t.ly/_N29-
๐Paper arxiv.org/pdf/2410.11792
๐Project https://lnkd.in/d6bHF_-s
๐4๐คฏ2๐ฅ1
๐ฅ "Nuclear" AI vs. Hyper-Cheap Inference ๐ฅ
โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
๐คฒPortabile Training Workstation
34%
โ๏ธNuclear energy for AI training
33%
๐ฒ๏ธCheaper Only-inference devices
9%
๐ฐCloud-intensive Only-inference
๐4โค1๐ฅ1๐คฏ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งโโ๏ธOmni-RGPT: SOTA MLLM Understanding๐งโโ๏ธ
๐ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
๐Review https://t.ly/KHnQ7
๐Paper arxiv.org/pdf/2501.08326
๐Project miranheo.github.io/omni-rgpt/
๐Repo TBA soon
๐ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
๐Review https://t.ly/KHnQ7
๐Paper arxiv.org/pdf/2501.08326
๐Project miranheo.github.io/omni-rgpt/
๐Repo TBA soon
๐ฅ10โค3๐พ2โก1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ #Nvidia Foundation ZS-Stereo ๐
๐Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐
๐Review https://t.ly/rfBr5
๐Paper arxiv.org/pdf/2501.09898
๐Project nvlabs.github.io/FoundationStereo/
๐Repo github.com/NVlabs/FoundationStereo/tree/master
๐Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐
๐Review https://t.ly/rfBr5
๐Paper arxiv.org/pdf/2501.09898
๐Project nvlabs.github.io/FoundationStereo/
๐Repo github.com/NVlabs/FoundationStereo/tree/master
โค6๐ฅ6๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅHAMSTER: Hierarchical VLA Manipulation๐ฅ
๐#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced๐
๐Review https://t.ly/2yXaY
๐Paper https://arxiv.org/pdf/2502.05485
๐Project https://hamster-robot.github.io/
๐Repo TBA
๐#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced๐
๐Review https://t.ly/2yXaY
๐Paper https://arxiv.org/pdf/2502.05485
๐Project https://hamster-robot.github.io/
๐Repo TBA
๐ฅ4โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Unified Low-Level 4D Vision๐
๐#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced ๐
๐Review https://t.ly/04DGj
๐Paper arxiv.org/pdf/2502.13078
๐Project research.nvidia.com/labs/lpr/l4p/
๐Repo TBA
๐#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced ๐
๐Review https://t.ly/04DGj
๐Paper arxiv.org/pdf/2502.13078
๐Project research.nvidia.com/labs/lpr/l4p/
๐Repo TBA
๐ฅ5๐2๐คฏ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฝNeural-Free Sparse Voxels Rasterization๐ฝ
๐#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)๐
๐Review https://t.ly/Nh_ic
๐Paper https://lnkd.in/g8k8Zs6R
๐Project https://lnkd.in/gR-bD4Wx
๐Repo https://lnkd.in/gNHX-w4t
๐#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)๐
๐Review https://t.ly/Nh_ic
๐Paper https://lnkd.in/g8k8Zs6R
๐Project https://lnkd.in/gR-bD4Wx
๐Repo https://lnkd.in/gNHX-w4t
๐ฅ15๐4๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐3D MultiModal Memory๐
๐M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos
๐Review https://t.ly/OrXZO
๐Paper arxiv.org/pdf/2503.16413
๐Project https://lnkd.in/dXAZ97KH
๐Repo https://lnkd.in/dWvunCET
๐M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos
๐Review https://t.ly/OrXZO
๐Paper arxiv.org/pdf/2503.16413
๐Project https://lnkd.in/dXAZ97KH
๐Repo https://lnkd.in/dWvunCET
๐ฅ10โค4๐1๐1
๐ฆ Scaling Vision to 4K๐ฆ
๐PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & ๐ค announced๐
๐Review https://t.ly/WN479
๐Paper https://lnkd.in/ddWq8UpX
๐Project https://lnkd.in/dMkTY8-k
๐Repo https://lnkd.in/d9YSB6yv
๐PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & ๐ค announced๐
๐Review https://t.ly/WN479
๐Paper https://lnkd.in/ddWq8UpX
๐Project https://lnkd.in/dMkTY8-k
๐Repo https://lnkd.in/d9YSB6yv
๐ฅ14โค4๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐PartField #3D Part Segmentation๐
๐#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License๐
๐Review https://t.ly/fGb2O
๐Paper https://lnkd.in/dGeyKSzG
๐Code https://lnkd.in/dbe57XGH
๐Project https://lnkd.in/dhEgf7X2
๐#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License๐
๐Review https://t.ly/fGb2O
๐Paper https://lnkd.in/dGeyKSzG
๐Code https://lnkd.in/dbe57XGH
๐Project https://lnkd.in/dhEgf7X2
โค2๐ฅ2๐คฏ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆง #Nvidia Describe Anything ๐ฆง
๐Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on ๐ค
๐Review https://t.ly/la4JD
๐Paper https://lnkd.in/dZh82xtV
๐Project https://lnkd.in/dcv9V2ZF
๐Repo https://lnkd.in/dJB9Ehtb
๐คDemo https://lnkd.in/dXDb2MWU
๐Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on ๐ค
๐Review https://t.ly/la4JD
๐Paper https://lnkd.in/dZh82xtV
๐Project https://lnkd.in/dcv9V2ZF
๐Repo https://lnkd.in/dJB9Ehtb
๐คDemo https://lnkd.in/dXDb2MWU
๐ฅ10๐5โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐#Nvidia Dynamic Pose ๐
๐Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐
๐Review https://t.ly/wrcb0
๐Paper https://lnkd.in/dycGjAyy
๐Project https://lnkd.in/dDZ2Ej_Q
๐คData https://lnkd.in/d8yUSB7m
๐Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐
๐Review https://t.ly/wrcb0
๐Paper https://lnkd.in/dycGjAyy
๐Project https://lnkd.in/dDZ2Ej_Q
๐คData https://lnkd.in/d8yUSB7m
๐ฅ4๐2โค1๐คฏ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งโโ๏ธGENMO: Generalist Human Motion ๐งโโ๏ธ
๐#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐ฅฒ
๐Review https://t.ly/Q5T_Y
๐Paper https://lnkd.in/ds36BY49
๐Project https://lnkd.in/dAYHhuFU
๐#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐ฅฒ
๐Review https://t.ly/Q5T_Y
๐Paper https://lnkd.in/ds36BY49
๐Project https://lnkd.in/dAYHhuFU
๐ฅ13โค3๐2๐ข1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคDiffusive Hand from Signs๐งค
๐LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐
๐Review https://t.ly/HonX_
๐Paper https://arxiv.org/pdf/2508.15902
๐Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Repo TBA
๐LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐
๐Review https://t.ly/HonX_
๐Paper https://arxiv.org/pdf/2508.15902
๐Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Repo TBA
โค4๐ฅ3๐2๐คฏ1