This media is not supported in your browser
VIEW IN TELEGRAM
๐ชSOTA Points Segmentation๐ช
๐VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!
๐Review https://t.ly/8Bsbt
๐Paper https://arxiv.org/pdf/2501.12392
๐Code https://github.com/karazijal/lrtl
๐Project www.robots.ox.ac.uk/~vgg/research/lrtl/
๐VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!
๐Review https://t.ly/8Bsbt
๐Paper https://arxiv.org/pdf/2501.12392
๐Code https://github.com/karazijal/lrtl
๐Project www.robots.ox.ac.uk/~vgg/research/lrtl/
๐ฅ3โค2๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐จMatAnyone: Human Matting๐จ
๐MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & ๐ค-Demo announced๐
๐Review https://t.ly/NVXsT
๐Paper arxiv.org/pdf/2501.14677
๐Project pq-yang.github.io/projects/MatAnyone
๐Repo TBA
๐MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & ๐ค-Demo announced๐
๐Review https://t.ly/NVXsT
๐Paper arxiv.org/pdf/2501.14677
๐Project pq-yang.github.io/projects/MatAnyone
๐Repo TBA
โค15๐2๐คฉ2๐1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ[SOTA] Visual Grounding VOS๐ฆ
๐ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon๐
๐Review https://t.ly/SDFy9
๐Paper arxiv.org/pdf/2501.14607
๐Project isee-laboratory.github.io/ReferDINO/
๐Repo github.com/iSEE-Laboratory/ReferDINO
๐ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon๐
๐Review https://t.ly/SDFy9
๐Paper arxiv.org/pdf/2501.14607
๐Project isee-laboratory.github.io/ReferDINO/
๐Repo github.com/iSEE-Laboratory/ReferDINO
๐คฏ4โค1๐ฅ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธ Relightable Full-Body Avatars โ๏ธ
๐#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.
๐Review https://t.ly/kx9gf
๐Paper arxiv.org/pdf/2501.14726
๐Project neuralbodies.github.io/RFGCA
๐#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.
๐Review https://t.ly/kx9gf
๐Paper arxiv.org/pdf/2501.14726
๐Project neuralbodies.github.io/RFGCA
โค3๐3๐ฅ3โก1๐คฏ1๐ข1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐
Generative Human Mesh Recovery ๐
๐GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced ๐ฅบ
๐Review https://t.ly/Rrzpj
๐Paper https://arxiv.org/pdf/2412.14444
๐Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
๐GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced ๐ฅบ
๐Review https://t.ly/Rrzpj
๐Paper https://arxiv.org/pdf/2412.14444
๐Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
๐ฅ6๐2โค1๐คฏ1๐พ1
Social feed of everyone is broken because of unnecessary/not required opinions about DeepSeek. Your wish:
Anonymous Poll
37%
๐ STOP posting about!
63%
๐ฉ Keep posting. we want more!
๐1
๐AI-driven Docs Conversion๐
๐Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT๐
๐Review https://t.ly/nSCfT
๐Paper https://lnkd.in/dc5Kpc2F
๐Repo https://lnkd.in/d9gvw9bt
๐Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT๐
๐Review https://t.ly/nSCfT
๐Paper https://lnkd.in/dc5Kpc2F
๐Repo https://lnkd.in/d9gvw9bt
โค18๐8๐ฅ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฏ SOTA 0-Shot Multi-View ๐ฏ
๐MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced ๐
๐Review https://t.ly/_ecKl
๐Paper arxiv.org/pdf/2501.18804
๐Project mvgd.github.io/
๐Repo TBA
๐MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced ๐
๐Review https://t.ly/_ecKl
๐Paper arxiv.org/pdf/2501.18804
๐Project mvgd.github.io/
๐Repo TBA
๐ฅ8โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐MambaGlue: SOTA feats. matching๐
๐MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released๐
๐Review https://shorturl.at/LxDG1
๐Paper arxiv.org/pdf/2502.00462
๐Repo https://lnkd.in/dAujfGZQ
๐MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released๐
๐Review https://shorturl.at/LxDG1
๐Paper arxiv.org/pdf/2502.00462
๐Repo https://lnkd.in/dAujfGZQ
๐คฉ9โค3๐ฅ2๐2๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ธReal-Time Differentiable Tracing๐ธ
๐ Radiant Foam is a novel scene representation by leveraging the decades-old efficient volumetric mesh ray tracing algorithm (largely overlooked in recent research). Performing like Gaussian Splatting, without the constraints of rasterization. Code announced๐
๐Review https://shorturl.at/26U06
๐Paper https://arxiv.org/pdf/2502.01157
๐Project https://radfoam.github.io/
๐Repo https://github.com/theialab/radfoam
๐ Radiant Foam is a novel scene representation by leveraging the decades-old efficient volumetric mesh ray tracing algorithm (largely overlooked in recent research). Performing like Gaussian Splatting, without the constraints of rasterization. Code announced๐
๐Review https://shorturl.at/26U06
๐Paper https://arxiv.org/pdf/2502.01157
๐Project https://radfoam.github.io/
๐Repo https://github.com/theialab/radfoam
๐ฅ7โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ VideoJAM: #META's Video-Model (SOTA) ๐ฅ
๐#META's VideoJAM: the new SOTA (by large margin) in motion coherence for video generation, much better than SORA! A strong motion prior into any video-gen model. Impressive results, no code announced๐ฅฒ
๐Review https://shorturl.at/id7Bt
๐Paper https://arxiv.org/pdf/2502.02492
๐Project https://hila-chefer.github.io/videojam-paper.github.io/
๐#META's VideoJAM: the new SOTA (by large margin) in motion coherence for video generation, much better than SORA! A strong motion prior into any video-gen model. Impressive results, no code announced๐ฅฒ
๐Review https://shorturl.at/id7Bt
๐Paper https://arxiv.org/pdf/2502.02492
๐Project https://hila-chefer.github.io/videojam-paper.github.io/
๐ฅ9โค4๐1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐3D Dynamic Garments๐
๐UCLA introduces Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image.
๐Review https://t.ly/qciHV
๐Paper arxiv.org/pdf/2502.03449
๐Project dress-1-to-3.github.io
๐UCLA introduces Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image.
๐Review https://t.ly/qciHV
๐Paper arxiv.org/pdf/2502.03449
๐Project dress-1-to-3.github.io
๐ฅ8โค3๐3๐2๐คฉ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ค META Human-Robot ๐ค
๐#META PARTNR: novel benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. The largest benchmark of its kind: 100,000+ natural language tasks, spanning 60 houses and 5,819 unique objects. Code & Data (๐ค) under MIT๐
๐Review https://t.ly/zcN0K
๐Paper arxiv.org/pdf/2411.00081
๐Repo github.com/facebookresearch/partnr-planner
๐คData huggingface.co/datasets/ai-habitat/partnr_episodes
๐#META PARTNR: novel benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. The largest benchmark of its kind: 100,000+ natural language tasks, spanning 60 houses and 5,819 unique objects. Code & Data (๐ค) under MIT๐
๐Review https://t.ly/zcN0K
๐Paper arxiv.org/pdf/2411.00081
๐Repo github.com/facebookresearch/partnr-planner
๐คData huggingface.co/datasets/ai-habitat/partnr_episodes
๐ฅ9๐คฉ2โค1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐HumanDiT Long-form Human๐
๐HumanDiT is a novel pose-guided Diffusion trained on a large and wild dataset w/ 14,000 hours of HQ video to produce HD videos with fine-grained bodies. Stunning results but no code announced๐ฅฒ
๐Review https://t.ly/7rTRr
๐Paper https://arxiv.org/pdf/2502.04847
๐Project https://agnjason.github.io/HumanDiT-page/
๐HumanDiT is a novel pose-guided Diffusion trained on a large and wild dataset w/ 14,000 hours of HQ video to produce HD videos with fine-grained bodies. Stunning results but no code announced๐ฅฒ
๐Review https://t.ly/7rTRr
๐Paper https://arxiv.org/pdf/2502.04847
๐Project https://agnjason.github.io/HumanDiT-page/
โค6๐ฅ3๐2๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎFlow-Based Foundation GenAI๐ฎ
๐Goku is the novel SOTA family of joint image-and-video generation models leveraging rectified flow Transformers to achieve industry-leading performance. Amazing results! Repo released (now, empty)๐
๐Review https://t.ly/dzi0O
๐Paper http://arxiv.org/pdf/2502.04896
๐Project saiyan-world.github.io/goku/
๐Repo github.com/Saiyan-World/goku
๐Goku is the novel SOTA family of joint image-and-video generation models leveraging rectified flow Transformers to achieve industry-leading performance. Amazing results! Repo released (now, empty)๐
๐Review https://t.ly/dzi0O
๐Paper http://arxiv.org/pdf/2502.04896
๐Project saiyan-world.github.io/goku/
๐Repo github.com/Saiyan-World/goku
๐ฅ7โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅHAMSTER: Hierarchical VLA Manipulation๐ฅ
๐#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced๐
๐Review https://t.ly/2yXaY
๐Paper https://arxiv.org/pdf/2502.05485
๐Project https://hamster-robot.github.io/
๐Repo TBA
๐#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced๐
๐Review https://t.ly/2yXaY
๐Paper https://arxiv.org/pdf/2502.05485
๐Project https://hamster-robot.github.io/
๐Repo TBA
๐ฅ4โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆถ It's all About Foot ๐ฆถ
๐ A collection of three works all about human foot: synthetic foot renders, reconstruction and surface normals. Repos & Datasets available๐
๐Review https://t.ly/GY8mL
๐Paper (last) arxiv.org/pdf/2502.06367
๐Projects www.ollieboyne.com/
๐Repo github.com/OllieBoyne/FOUND
๐Repo github.com/OllieBoyne/SynFoot
๐Repo github.com/OllieBoyne/FOCUS (coming)
๐ A collection of three works all about human foot: synthetic foot renders, reconstruction and surface normals. Repos & Datasets available๐
๐Review https://t.ly/GY8mL
๐Paper (last) arxiv.org/pdf/2502.06367
๐Projects www.ollieboyne.com/
๐Repo github.com/OllieBoyne/FOUND
๐Repo github.com/OllieBoyne/SynFoot
๐Repo github.com/OllieBoyne/FOCUS (coming)
๐คฉ4โค2๐2๐คฃ2โก1๐ข1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช Make anything "Rig-Ready" ๐ช
๐RigAnything is a novel autoregressive transformer-based model, which makes 3D assets rig-ready by probabilistically generating joints, skeleton topologies, and assigning skinning weights in a template-free manner. Online demo announced๐
๐Review https://t.ly/bNwxq
๐Paper arxiv.org/pdf/2502.09615
๐Project www.liuisabella.com/RigAnything
๐RigAnything is a novel autoregressive transformer-based model, which makes 3D assets rig-ready by probabilistically generating joints, skeleton topologies, and assigning skinning weights in a template-free manner. Online demo announced๐
๐Review https://t.ly/bNwxq
๐Paper arxiv.org/pdf/2502.09615
๐Project www.liuisabella.com/RigAnything
๐ฅ14โค8๐4๐คฉ1
Hi friends, what other kind of content would you like to *OCCASIONALLY* see in this group?
Anonymous Poll
44%
๐ Job/Research offers
65%
๐ฆ AI tools/news (with NO papers)
32%
๐ฅ Events & Hackathon
3%
๐ Other (comment please)
๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Animate Anyone 2 ๐ฅ
๐ The evolution of the first version that enables character animation w/ environment affordance. Amazing results but no code announced ๐ฅฒ
๐Review https://t.ly/iNNLB
๐Paper https://arxiv.org/pdf/2502.06145
๐Project https://humanaigc.github.io/animate-anyone-2
๐ The evolution of the first version that enables character animation w/ environment affordance. Amazing results but no code announced ๐ฅฒ
๐Review https://t.ly/iNNLB
๐Paper https://arxiv.org/pdf/2502.06145
๐Project https://humanaigc.github.io/animate-anyone-2
โค18๐คฏ8๐1๐1