AI with Papers - Artificial Intelligence & Deep Learning
15.5K subscribers
145 photos
256 videos
14 files
1.34K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🧀 Two-Hand tracking via GCN 🧀

πŸ‘‰The first-ever GCN for two interacting hands in single RGB image

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Reconstruction by GCN mesh regression
βœ…PIFA: pyramid attention for local occlusion
βœ…CHA: cross hand attention for interaction
βœ…SOTA + generalization in-the-wild scenario
βœ…Source code available under GNU 🀯

More: https://bit.ly/3KH5FWO
πŸ‘10πŸ‘4🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ•ΉοΈVideo K-Net, SOTA in SegmentationπŸ•ΉοΈ

πŸ‘‰Simple, strong, and unified framework for fully end-to-end video panoptic segmentation

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Learnable kernels from K-Net
βœ…K-Net learns to segment & track
βœ…Appearance / cross-T kernel interaction
βœ…New SOTA without bells and whistles πŸ€·β€β™‚οΈ

More: https://bit.ly/3uEEZQR
πŸ‘6πŸ”₯1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🐭DeepLabCut: tracking animals in the wild🐭

πŸ‘‰A toolbox for markerless pose estimation of animals performing various tasks

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Multi-animal pose estimation
βœ…Datasets for multi-animal pose
βœ…Key-points, limbs, animal identity
βœ…Optimal key-points without input

More: https://bit.ly/37L1mLE
πŸ”₯6πŸ€”4πŸ‘2🀯2❀1πŸ‘1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🍑Neural Articulated Human Body🍑

πŸ‘‰Novel neural implicit representation for articulated body

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…COmpositional Articulated People
βœ…Large variety of shapes & poses
βœ…Novel encoder-decoder architecture

More: https://bit.ly/3xvn7dl
πŸ‘4πŸ₯°2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦚 2K Resolution Generative #AI 🦚

πŸ‘‰Novel continuous-scale training with variable output resolutions

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Mixed-resolution data
βœ…Arbitrary scales during training
βœ…Generations beyond 1024Γ—1024
βœ…Variant of FID metric for scales
βœ…Source code under MIT license

More: https://bit.ly/3uNfVY6
🀯11πŸ‘2πŸ”₯2😱1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🐍DS Unsupervised Video Decomposition🐍

πŸ‘‰Novel method to extract persistent elements of a scene

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Scene element as Deformable Sprite (DS)
βœ…Deformable Sprites by video auto-encoder
βœ…Canonical texture image for appearance
βœ…Non-rigid geom. transformation

More: https://bit.ly/37WV9w1
πŸ‘4🀯3πŸ”₯1πŸ₯°1πŸ‘1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯“ L-SVPE for Deep Deblurring πŸ₯“

πŸ‘‰L-SVPE to deblur scenes while recovering high-freq details

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Learned Spatially Varying Pixel Exposures
βœ…Next-gen focal-plane sensor + DL
βœ…Deep conv decoder for motion deblurring
βœ…Superior results over non-optimized exp.

More: https://bit.ly/3uRYQMT
🀩7πŸ‘2πŸ€”2πŸŽ‰1
This media is not supported in your browser
VIEW IN TELEGRAM
🧧Hyper-Fast Instance Segmentation🧧

πŸ‘‰Novel Temporally Efficient Vision Transformer (TeViT) for VIS

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Video instance segmentation transformer
βœ…Contextual-info at frame/instance level
βœ…Nearly convolution-free framework πŸ€·β€β™‚οΈ
βœ…The new SOTA for VIS, ~70 FPS!
βœ…Code & models under MIT license

More: https://bit.ly/3rCMXIn
πŸ”₯10πŸ‘3πŸ‘1🀯1
πŸ“—Unified Scene Text/Layout DetectionπŸ“—

πŸ‘‰World's first hierarchical scene text dataset + novel detection method

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Unified detection & geometric layout
βœ…Hierarchical annotations in natural scenes
βœ…Word, line, & paragraph level annotations
βœ…Source under CC Attribution Share Alike 4.0

More: https://bit.ly/3jRpezV
πŸ”₯3🀯2❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Œ #Oculus' new Hand Tracking πŸ™Œ

πŸ‘‰Hands are able to move as naturally and intuitively in the #metaverse as do in real life

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hands2.0 powered by CV & ML
βœ…Tracking hand-over-hand interactions
βœ…Crossing hands, clapping, high-fives
βœ…Accurate thumbs-up gesture

More: https://bit.ly/3JXPvY2
🀯6❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŽ—οΈNew SOTA in #3D human avatarπŸŽ—οΈ

πŸ‘‰PHORHUM: photorealistic 3D human from mono-RGB

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Pixel-aligned method for 3D geometry
βœ…Unshaded surface color + illumination
βœ…Patch-based rendering losses for visible
βœ…Plausible color estimation for non-visible

More: https://bit.ly/3MkvBrA
🀯4πŸ‘2πŸ₯°2❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Ÿ What's in your hands (#3D) ? πŸ“Ÿ

πŸ‘‰Reconstructing hand-held objects (from single RGB) without knowing their 3D templatesπŸ€·β€β™‚οΈ

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hand is highly predictive of object shape
βœ…Conditional-based on the articulation
βœ…Visual feats. / articulation-aware coords.
βœ…Code and models available!

More: https://bit.ly/3vuYn2a
πŸ‘9🀯2πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”‹YODO: You Only Demonstrate OnceπŸ”‹

πŸ‘‰A novel category-level manipulation learned in sim from single demonstration video🀯

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…One-shot IL, model-free 6D pose tracking
βœ…Demonstration BY single 3rd-person-view
βœ…manipulation including hi-precision tasks
βœ…Category-level Behavior Cloning
βœ…Attention for dynamic coords selection
βœ…Generalizability to novel unseen obj/env

More: https://bit.ly/3v0V4R4
🀯8❀3πŸ‘2😱2🀩2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘— Dress Code for Virtual Try-On πŸ‘—

πŸ‘‰UniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hi-Res paired front-view / full-body
βœ…Pixel-level Semantic-Aware Discriminator
βœ…9 SOTA VTON approaches / 3 baselines
βœ…New SOTA considering res. & garments

More: https://bit.ly/3xKXSUw
❀3πŸ‘3πŸ”₯1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸƒDeep Equilibrium for Optical FlowπŸƒ

πŸ‘‰DEQ: converge faster, less memory, often more accurate

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Novel formulation of optical flow method
βœ…Compatible with prior modeling/data-related
βœ…Sparse fixed-point correction for stability
βœ…Code/models under GNU Affero GPL v3.0

More: https://bit.ly/3v4fZmi
πŸ‘3πŸ₯°2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳Ultra High-Resolution Neural Saliency🌳

πŸ‘‰A novel ultra high-resolution saliency detector with dataset!

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Ultra Hi-Res Saliency Detection
βœ…5,920 pics at 4K-8K resolution
βœ…Pyramid Grafting Network
βœ…Cross-Model Grafting Module
βœ…AGL: Attention Guided Loss
βœ…Code/models under MIT

More: https://bit.ly/3MnU1Rf
❀6πŸ‘3🀯3πŸ”₯2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ†StyleGAN-Human for fashion πŸͺ†

πŸ‘‰A novel unconditional human generation based on StyleGAN is out!

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…200,000+ labeled sample (pose/texture)
βœ…1024x512 StyleGAN-Human StyleGAN3
βœ…512x256 StyleGAN-Human StyleGAN1
βœ…Face model for downstream: InsetGAN
βœ…Source code and model available!

More: https://bit.ly/3xMg5B2
❀5πŸ‘4πŸ”₯3🀯1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’€ OSSO: Skeletal Shape from Outside πŸ’€

πŸ‘‰Anatomic skeleton of a person from 3D surface of body 🦴

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Max Planck + IMATI-CNR + INRIA
βœ…DXA images to obtain #3D shape
βœ…External body to internal skeleton

More: https://bit.ly/3v7Z5TQ
πŸ‘4🀯2πŸ”₯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷 Pix2Seq: object detection by #Google 🎷

πŸ‘‰A novel framework to perform object detection as a language modeling task

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Obj. detection as a lang-modeling task
βœ…BBs/labels -> seq. of discrete token
βœ…Encoder-decoder (one token at a time)
βœ…Code under Apache License 2.0

More: https://bit.ly/3F49PX3
πŸ‘8🀯3πŸ”₯1😱1πŸŽ‰1🀩1