This media is not supported in your browser
VIEW IN TELEGRAM
π§€ Two-Hand tracking via GCN π§€
πThe first-ever GCN for two interacting hands in single RGB image
ππ’π π‘π₯π’π π‘ππ¬:
β Reconstruction by GCN mesh regression
β PIFA: pyramid attention for local occlusion
β CHA: cross hand attention for interaction
β SOTA + generalization in-the-wild scenario
β Source code available under GNU π€―
More: https://bit.ly/3KH5FWO
πThe first-ever GCN for two interacting hands in single RGB image
ππ’π π‘π₯π’π π‘ππ¬:
β Reconstruction by GCN mesh regression
β PIFA: pyramid attention for local occlusion
β CHA: cross hand attention for interaction
β SOTA + generalization in-the-wild scenario
β Source code available under GNU π€―
More: https://bit.ly/3KH5FWO
π10π4π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉοΈVideo K-Net, SOTA in SegmentationπΉοΈ
πSimple, strong, and unified framework for fully end-to-end video panoptic segmentation
ππ’π π‘π₯π’π π‘ππ¬:
β Learnable kernels from K-Net
β K-Net learns to segment & track
β Appearance / cross-T kernel interaction
β New SOTA without bells and whistles π€·ββοΈ
More: https://bit.ly/3uEEZQR
πSimple, strong, and unified framework for fully end-to-end video panoptic segmentation
ππ’π π‘π₯π’π π‘ππ¬:
β Learnable kernels from K-Net
β K-Net learns to segment & track
β Appearance / cross-T kernel interaction
β New SOTA without bells and whistles π€·ββοΈ
More: https://bit.ly/3uEEZQR
π6π₯1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πDeepLabCut: tracking animals in the wildπ
πA toolbox for markerless pose estimation of animals performing various tasks
ππ’π π‘π₯π’π π‘ππ¬:
β Multi-animal pose estimation
β Datasets for multi-animal pose
β Key-points, limbs, animal identity
β Optimal key-points without input
More: https://bit.ly/37L1mLE
πA toolbox for markerless pose estimation of animals performing various tasks
ππ’π π‘π₯π’π π‘ππ¬:
β Multi-animal pose estimation
β Datasets for multi-animal pose
β Key-points, limbs, animal identity
β Optimal key-points without input
More: https://bit.ly/37L1mLE
π₯6π€4π2π€―2β€1π1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π‘Neural Articulated Human Bodyπ‘
πNovel neural implicit representation for articulated body
ππ’π π‘π₯π’π π‘ππ¬:
β COmpositional Articulated People
β Large variety of shapes & poses
β Novel encoder-decoder architecture
More: https://bit.ly/3xvn7dl
πNovel neural implicit representation for articulated body
ππ’π π‘π₯π’π π‘ππ¬:
β COmpositional Articulated People
β Large variety of shapes & poses
β Novel encoder-decoder architecture
More: https://bit.ly/3xvn7dl
π4π₯°2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ 2K Resolution Generative #AI π¦
πNovel continuous-scale training with variable output resolutions
ππ’π π‘π₯π’π π‘ππ¬:
β Mixed-resolution data
β Arbitrary scales during training
β Generations beyond 1024Γ1024
β Variant of FID metric for scales
β Source code under MIT license
More: https://bit.ly/3uNfVY6
πNovel continuous-scale training with variable output resolutions
ππ’π π‘π₯π’π π‘ππ¬:
β Mixed-resolution data
β Arbitrary scales during training
β Generations beyond 1024Γ1024
β Variant of FID metric for scales
β Source code under MIT license
More: https://bit.ly/3uNfVY6
π€―11π2π₯2π±1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πDS Unsupervised Video Decompositionπ
πNovel method to extract persistent elements of a scene
ππ’π π‘π₯π’π π‘ππ¬:
β Scene element as Deformable Sprite (DS)
β Deformable Sprites by video auto-encoder
β Canonical texture image for appearance
β Non-rigid geom. transformation
More: https://bit.ly/37WV9w1
πNovel method to extract persistent elements of a scene
ππ’π π‘π₯π’π π‘ππ¬:
β Scene element as Deformable Sprite (DS)
β Deformable Sprites by video auto-encoder
β Canonical texture image for appearance
β Non-rigid geom. transformation
More: https://bit.ly/37WV9w1
π4π€―3π₯1π₯°1π1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ L-SVPE for Deep Deblurring π₯
πL-SVPE to deblur scenes while recovering high-freq details
ππ’π π‘π₯π’π π‘ππ¬:
β Learned Spatially Varying Pixel Exposures
β Next-gen focal-plane sensor + DL
β Deep conv decoder for motion deblurring
β Superior results over non-optimized exp.
More: https://bit.ly/3uRYQMT
πL-SVPE to deblur scenes while recovering high-freq details
ππ’π π‘π₯π’π π‘ππ¬:
β Learned Spatially Varying Pixel Exposures
β Next-gen focal-plane sensor + DL
β Deep conv decoder for motion deblurring
β Superior results over non-optimized exp.
More: https://bit.ly/3uRYQMT
π€©7π2π€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§§Hyper-Fast Instance Segmentationπ§§
πNovel Temporally Efficient Vision Transformer (TeViT) for VIS
ππ’π π‘π₯π’π π‘ππ¬:
β Video instance segmentation transformer
β Contextual-info at frame/instance level
β Nearly convolution-free framework π€·ββοΈ
β The new SOTA for VIS, ~70 FPS!
β Code & models under MIT license
More: https://bit.ly/3rCMXIn
πNovel Temporally Efficient Vision Transformer (TeViT) for VIS
ππ’π π‘π₯π’π π‘ππ¬:
β Video instance segmentation transformer
β Contextual-info at frame/instance level
β Nearly convolution-free framework π€·ββοΈ
β The new SOTA for VIS, ~70 FPS!
β Code & models under MIT license
More: https://bit.ly/3rCMXIn
π₯10π3π1π€―1
πUnified Scene Text/Layout Detectionπ
πWorld's first hierarchical scene text dataset + novel detection method
ππ’π π‘π₯π’π π‘ππ¬:
β Unified detection & geometric layout
β Hierarchical annotations in natural scenes
β Word, line, & paragraph level annotations
β Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
πWorld's first hierarchical scene text dataset + novel detection method
ππ’π π‘π₯π’π π‘ππ¬:
β Unified detection & geometric layout
β Hierarchical annotations in natural scenes
β Word, line, & paragraph level annotations
β Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
π₯3π€―2β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π #Oculus' new Hand Tracking π
πHands are able to move as naturally and intuitively in the #metaverse as do in real life
ππ’π π‘π₯π’π π‘ππ¬:
β Hands2.0 powered by CV & ML
β Tracking hand-over-hand interactions
β Crossing hands, clapping, high-fives
β Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
πHands are able to move as naturally and intuitively in the #metaverse as do in real life
ππ’π π‘π₯π’π π‘ππ¬:
β Hands2.0 powered by CV & ML
β Tracking hand-over-hand interactions
β Crossing hands, clapping, high-fives
β Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
π€―6β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈNew SOTA in #3D human avatarποΈ
πPHORHUM: photorealistic 3D human from mono-RGB
ππ’π π‘π₯π’π π‘ππ¬:
β Pixel-aligned method for 3D geometry
β Unshaded surface color + illumination
β Patch-based rendering losses for visible
β Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
πPHORHUM: photorealistic 3D human from mono-RGB
ππ’π π‘π₯π’π π‘ππ¬:
β Pixel-aligned method for 3D geometry
β Unshaded surface color + illumination
β Patch-based rendering losses for visible
β Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
π€―4π2π₯°2β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π What's in your hands (#3D) ? π
πReconstructing hand-held objects (from single RGB) without knowing their 3D templatesπ€·ββοΈ
ππ’π π‘π₯π’π π‘ππ¬:
β Hand is highly predictive of object shape
β Conditional-based on the articulation
β Visual feats. / articulation-aware coords.
β Code and models available!
More: https://bit.ly/3vuYn2a
πReconstructing hand-held objects (from single RGB) without knowing their 3D templatesπ€·ββοΈ
ππ’π π‘π₯π’π π‘ππ¬:
β Hand is highly predictive of object shape
β Conditional-based on the articulation
β Visual feats. / articulation-aware coords.
β Code and models available!
More: https://bit.ly/3vuYn2a
π9π€―2π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πYODO: You Only Demonstrate Onceπ
πA novel category-level manipulation learned in sim from single demonstration videoπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β One-shot IL, model-free 6D pose tracking
β Demonstration BY single 3rd-person-view
β manipulation including hi-precision tasks
β Category-level Behavior Cloning
β Attention for dynamic coords selection
β Generalizability to novel unseen obj/env
More: https://bit.ly/3v0V4R4
πA novel category-level manipulation learned in sim from single demonstration videoπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β One-shot IL, model-free 6D pose tracking
β Demonstration BY single 3rd-person-view
β manipulation including hi-precision tasks
β Category-level Behavior Cloning
β Attention for dynamic coords selection
β Generalizability to novel unseen obj/env
More: https://bit.ly/3v0V4R4
π€―8β€3π2π±2π€©2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Dress Code for Virtual Try-On π
πUniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
ππ’π π‘π₯π’π π‘ππ¬:
β Hi-Res paired front-view / full-body
β Pixel-level Semantic-Aware Discriminator
β 9 SOTA VTON approaches / 3 baselines
β New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
πUniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
ππ’π π‘π₯π’π π‘ππ¬:
β Hi-Res paired front-view / full-body
β Pixel-level Semantic-Aware Discriminator
β 9 SOTA VTON approaches / 3 baselines
β New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
β€3π3π₯1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πDeep Equilibrium for Optical Flowπ
πDEQ: converge faster, less memory, often more accurate
ππ’π π‘π₯π’π π‘ππ¬:
β Novel formulation of optical flow method
β Compatible with prior modeling/data-related
β Sparse fixed-point correction for stability
β Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
πDEQ: converge faster, less memory, often more accurate
ππ’π π‘π₯π’π π‘ππ¬:
β Novel formulation of optical flow method
β Compatible with prior modeling/data-related
β Sparse fixed-point correction for stability
β Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
π3π₯°2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π³Ultra High-Resolution Neural Saliencyπ³
πA novel ultra high-resolution saliency detector with dataset!
ππ’π π‘π₯π’π π‘ππ¬:
β Ultra Hi-Res Saliency Detection
β 5,920 pics at 4K-8K resolution
β Pyramid Grafting Network
β Cross-Model Grafting Module
β AGL: Attention Guided Loss
β Code/models under MIT
More: https://bit.ly/3MnU1Rf
πA novel ultra high-resolution saliency detector with dataset!
ππ’π π‘π₯π’π π‘ππ¬:
β Ultra Hi-Res Saliency Detection
β 5,920 pics at 4K-8K resolution
β Pyramid Grafting Network
β Cross-Model Grafting Module
β AGL: Attention Guided Loss
β Code/models under MIT
More: https://bit.ly/3MnU1Rf
β€6π3π€―3π₯2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺStyleGAN-Human for fashion πͺ
πA novel unconditional human generation based on StyleGAN is out!
ππ’π π‘π₯π’π π‘ππ¬:
β 200,000+ labeled sample (pose/texture)
β 1024x512 StyleGAN-Human StyleGAN3
β 512x256 StyleGAN-Human StyleGAN1
β Face model for downstream: InsetGAN
β Source code and model available!
More: https://bit.ly/3xMg5B2
πA novel unconditional human generation based on StyleGAN is out!
ππ’π π‘π₯π’π π‘ππ¬:
β 200,000+ labeled sample (pose/texture)
β 1024x512 StyleGAN-Human StyleGAN3
β 512x256 StyleGAN-Human StyleGAN1
β Face model for downstream: InsetGAN
β Source code and model available!
More: https://bit.ly/3xMg5B2
β€5π4π₯3π€―1π©1
This media is not supported in your browser
VIEW IN TELEGRAM
π OSSO: Skeletal Shape from Outside π
πAnatomic skeleton of a person from 3D surface of body π¦΄
ππ’π π‘π₯π’π π‘ππ¬:
β Max Planck + IMATI-CNR + INRIA
β DXA images to obtain #3D shape
β External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
πAnatomic skeleton of a person from 3D surface of body π¦΄
ππ’π π‘π₯π’π π‘ππ¬:
β Max Planck + IMATI-CNR + INRIA
β DXA images to obtain #3D shape
β External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
π4π€―2π₯1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π· Pix2Seq: object detection by #Google π·
πA novel framework to perform object detection as a language modeling task
ππ’π π‘π₯π’π π‘ππ¬:
β Obj. detection as a lang-modeling task
β BBs/labels -> seq. of discrete token
β Encoder-decoder (one token at a time)
β Code under Apache License 2.0
More: https://bit.ly/3F49PX3
πA novel framework to perform object detection as a language modeling task
ππ’π π‘π₯π’π π‘ππ¬:
β Obj. detection as a lang-modeling task
β BBs/labels -> seq. of discrete token
β Encoder-decoder (one token at a time)
β Code under Apache License 2.0
More: https://bit.ly/3F49PX3
π8π€―3π₯1π±1π1π€©1