AI with Papers - Artificial Intelligence & Deep Learning – Telegram

AI with Papers - Artificial Intelligence & Deep Learning

@AI_DeepLearning

15.4K subscribers

140 photos

253 videos

14 files

1.33K links

All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI

Download Telegram

About

Blog

Apps

Platform

AI with Papers - Artificial Intelligence & Deep Learning

15.4K subscribers

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 #6D Foundation Pose 🔥

👉#Nvidia unveils FoundationPose, a novel (and unified) foundation model for 6D object pose estimation and tracking.

👉Review https://t.ly/HGd4h
👉Project https://lnkd.in/dPcnBKWm
👉Paper https://lnkd.in/dixn_iHZ
👉Code coming 🩷

🔥12❤5👏1🤯1

6.46K views12:46

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦩 WildRGB-D: Objects in the Wild 🦩

👉#NVIDIA unveils a novel RGB-D object dataset captured in the wild: ~8500 recorded objects, ~20,000 RGBD videos, 46 categories with corresponding masks and 3D point clouds.

👉Review https://t.ly/WCqVz
👉Data github.com/wildrgbd/wildrgbd
👉Paper arxiv.org/pdf/2401.12592.pdf
👉Project wildrgbd.github.io/

👍9❤3🔥2👏1🤩1😍1

6.38K views10:19

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌆 Up to 69x Faster SAM 🌆

👉EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

👉Review https://t.ly/zGiE9
👉Paper arxiv.org/pdf/2402.05008.pdf
👉Code github.com/mit-han-lab/efficientvit

🔥19👍7❤4🥰1

8.92K viewsedited 13:04

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔌 BodyMAP: human body & pressure 🔌

👉#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming 💙

👉Review https://t.ly/8926S
👉Project bodymap3d.github.io/
👉Paper https://lnkd.in/gCxH4ev3
👉Code https://lnkd.in/gaifdy3q

❤8🤯4⚡1👍1🔥1

7.69K views15:07

AI with Papers - Artificial Intelligence & Deep Learning

📈Gradient Boosting Reinforcement Learning📈

👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙

👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl

❤7🤯4👍3🔥1🥰1

8.12K views06:14

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🛳️ EVER Ellipsoid Rendering 🛳️

👉UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∼30 FPS at 720p on #NVIDIA RTX4090.

👉Review https://t.ly/zAfGU
👉Paper arxiv.org/pdf/2410.01804
👉Project half-potato.gitlab.io/posts/ever/

🔥13❤2👍2👏1🤯1😱1🍾1

8.64K views14:34

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🪞Robo-Emulation via Video Imitation🪞

👉OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

👉Review https://t.ly/_N29-
👉Paper arxiv.org/pdf/2410.11792
👉Project https://lnkd.in/d6bHF_-s

👍4🤯2🔥1

6.84K views07:26

AI with Papers - Artificial Intelligence & Deep Learning

🔥 "Nuclear" AI vs. Hyper-Cheap Inference 🔥

⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)

Anonymous Poll

🤲Portabile Training Workstation

⚛️Nuclear energy for AI training

🖲️Cheaper Only-inference devices

💰Cloud-intensive Only-inference

👍4❤1🔥1🤯1🤩1

245 voters7.29K views13:19

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧞‍♂️Omni-RGPT: SOTA MLLM Understanding🧞‍♂️

👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon

🔥10❤3🍾2⚡1👍1👏1

7.69K viewsedited 07:55

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌈 #Nvidia Foundation ZS-Stereo 🌈

👉Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released💙

👉Review https://t.ly/rfBr5
👉Paper arxiv.org/pdf/2501.09898
👉Project nvlabs.github.io/FoundationStereo/
👉Repo github.com/NVlabs/FoundationStereo/tree/master

❤6🔥6🤩1

7.18K views07:01

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🥛HAMSTER: Hierarchical VLA Manipulation🥛

👉#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced💙

👉Review https://t.ly/2yXaY
👉Paper https://arxiv.org/pdf/2502.05485
👉Project https://hamster-robot.github.io/
👉Repo TBA

🔥4❤1

8.52K views08:06

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌈Unified Low-Level 4D Vision🌈

👉#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced 💙

👉Review https://t.ly/04DGj
👉Paper arxiv.org/pdf/2502.13078
👉Project research.nvidia.com/labs/lpr/l4p/
👉Repo TBA

🔥5👍2🤯1🤩1

10.3K viewsedited 08:27

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

👽Neural-Free Sparse Voxels Rasterization👽

👉#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)💙

👉Review https://t.ly/Nh_ic
👉Paper https://lnkd.in/g8k8Zs6R
👉Project https://lnkd.in/gR-bD4Wx
👉Repo https://lnkd.in/gNHX-w4t

🔥15👍4🤩1

10K viewsedited 08:12

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🙀3D MultiModal Memory🙀

👉M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

👉Review https://t.ly/OrXZO
👉Paper arxiv.org/pdf/2503.16413
👉Project https://lnkd.in/dXAZ97KH
👉Repo https://lnkd.in/dWvunCET

🔥10❤4👍1👏1

8.1K viewsedited 14:52

AI with Papers - Artificial Intelligence & Deep Learning

🦎 Scaling Vision to 4K🦎

👉PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & 🤗 announced💙

👉Review https://t.ly/WN479
👉Paper https://lnkd.in/ddWq8UpX
👉Project https://lnkd.in/dMkTY8-k
👉Repo https://lnkd.in/d9YSB6yv

🔥14❤4👍2👏1

8.1K viewsedited 07:37

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍏PartField #3D Part Segmentation🍏

👉#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License💙

👉Review https://t.ly/fGb2O
👉Paper https://lnkd.in/dGeyKSzG
👉Code https://lnkd.in/dbe57XGH
👉Project https://lnkd.in/dhEgf7X2

❤2🔥2🤯2

7.37K viewsedited 06:50

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦧 #Nvidia Describe Anything 🦧

👉Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on 🤗

👉Review https://t.ly/la4JD
👉Paper https://lnkd.in/dZh82xtV
👉Project https://lnkd.in/dcv9V2ZF
👉Repo https://lnkd.in/dJB9Ehtb
🤗Demo https://lnkd.in/dXDb2MWU

🔥10👍5❤1

8.18K viewsedited 09:56

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍏#Nvidia Dynamic Pose 🍏

👉Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license💙

👉Review https://t.ly/wrcb0
👉Paper https://lnkd.in/dycGjAyy
👉Project https://lnkd.in/dDZ2Ej_Q
🤗Data https://lnkd.in/d8yUSB7m

🔥4👍2❤1🤯1😍1

9.07K viewsedited 07:56

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧞‍♀️GENMO: Generalist Human Motion 🧞‍♀️

👉#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment🥲

👉Review https://t.ly/Q5T_Y
👉Paper https://lnkd.in/ds36BY49
👉Project https://lnkd.in/dAYHhuFU

🔥13❤3👍2😢1😍1

14.8K views07:41

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧤Diffusive Hand from Signs🧤

👉LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released 💙

👉Review https://t.ly/HonX_
👉Paper https://arxiv.org/pdf/2508.15902
👉Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
👉Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
👉Repo TBA

❤4🔥3👍2🤯1

4.43K views14:52