This media is not supported in your browser
VIEW IN TELEGRAM
π‘οΈ3D Prompted Vision-LLMπ‘οΈ
π#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπ
πReview https://t.ly/5Y2c5
πPaper https://arxiv.org/pdf/2509.13317
πProject https://www.anjiecheng.me/sr3d
πRepo TBA
π#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπ
πReview https://t.ly/5Y2c5
πPaper https://arxiv.org/pdf/2509.13317
πProject https://www.anjiecheng.me/sr3d
πRepo TBA
β€6π₯5π1π1
A few βleaksβ for you from the #Nvidia presentation Iβm right now in Milan. Impressive stuff.
Ps: sorry for the shitty quality of the pics β₯οΈ
Ps: sorry for the shitty quality of the pics β₯οΈ
β€19π₯4π1π1π€―1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π€Real-time Interactive Videoπ€
πLONGLIVE by #Nvidia is a frame-level autoregressive framework for real-time & interactive long video generation. LONGLIVE accepts sequential user prompts and generates corresponding videos in real time. Repo under non-commercial licenseπ
πReview https://t.ly/jJkdY
πPaper arxiv.org/pdf/2509.22622
πProject nvlabs.github.io/LongLive/
πRepo github.com/NVlabs/LongLive
π€huggingface.co/Efficient-Large-Model/LongLive-1.3B
πLONGLIVE by #Nvidia is a frame-level autoregressive framework for real-time & interactive long video generation. LONGLIVE accepts sequential user prompts and generates corresponding videos in real time. Repo under non-commercial licenseπ
πReview https://t.ly/jJkdY
πPaper arxiv.org/pdf/2509.22622
πProject nvlabs.github.io/LongLive/
πRepo github.com/NVlabs/LongLive
π€huggingface.co/Efficient-Large-Model/LongLive-1.3B
π₯8β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π© Foundational Humanoid π©
π#NVIDIA unveils SONIC a novel foundational model for high-precision teleoperation & interactive control capabilities (running, jumping, crawling) with natural human-like movements. Code announcedπ
πReview https://t.ly/_3wnt
πPaper https://lnkd.in/dctfShu8
πProject https://lnkd.in/d_inmA2p
π#NVIDIA unveils SONIC a novel foundational model for high-precision teleoperation & interactive control capabilities (running, jumping, crawling) with natural human-like movements. Code announcedπ
πReview https://t.ly/_3wnt
πPaper https://lnkd.in/dctfShu8
πProject https://lnkd.in/d_inmA2p
π€―9β€4π₯2