AI with Papers - Artificial Intelligence & Deep Learning
15.6K subscribers
145 photos
260 videos
14 files
1.36K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🥭3D Point Motion Editing🥭

👉Edit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevant💙

👉Review https://t.ly/GJHJ5
👉Paper arxiv.org/pdf/2512.02015
👉Project edit-by-track.github.io/
🔥43🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🦄 Native Unified Multimodal 🦄

👉META unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal review💙

👉Review https://t.ly/7wmKP
👉Paper https://lnkd.in/djT4WGEU
👉Project https://tuna-ai.org/
👉Repo github.com/wren93/tuna
6🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
✌️SOTA Generative SLP✌️

👉Stable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data 💙

👉Review https://t.ly/yKZhn
👉Paper arxiv.org/pdf/2512.04048
👉Project stablesigner.github.io/
👉Data github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
5🔥1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐘TTSC for 3D Generative🐘

👉SpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announced💙

👉Review https://t.ly/1zrah
👉Paper https://lnkd.in/dEWh3vep
👉Project https://lnkd.in/dScftUmm
👉Repo TBA
8🔥2👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷Layered PSD Diffusion🎷

👉OmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online Demo💙

👉Review https://t.ly/YNRAC
👉Paper arxiv.org/pdf/2512.09247
👉Project showlab.github.io/OmniPSD/
👉Demo https://www.lovart.ai/it
🔥98👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🧱Pixel Art Volumetric Rendering🧱

👉Voxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announced💙

👉Review https://t.ly/qPyNl
👉Paper https://lnkd.in/du5ikJGN
👉Project https://lnkd.in/dpiAjj5m
👉Repo TBA
7🔥4
This media is not supported in your browser
VIEW IN TELEGRAM
🫎 MoCapAnything is out 🫎

👉MoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced 🥲

👉Review https://t.ly/_Tw6t
👉Paper arxiv.org/pdf/2512.10881
👉Project animotionlab.github.io/MoCapAnything
11👍4🔥4👏1🤯1😢1
This media is not supported in your browser
VIEW IN TELEGRAM
💚 MatAnyone 2 is out! 💚

👉MatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announced💙

👉Review https://t.ly/vxOBO
👉Paper arxiv.org/pdf/2512.11782
👉Project pq-yang.github.io/projects/MatAnyone2
👉Repo github.com/pq-yang/MatAnyone2
🔥54👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💷 SOTA Zero-Shot Stereo Matching💷

👉Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announced💙

👉Review https://t.ly/XD6pO
👉Paper https://lnkd.in/d9_YKW2A
👉Project https://lnkd.in/dKDxm7EX
👉Repo https://lnkd.in/dR4-PdsW
2🔥114
This media is not supported in your browser
VIEW IN TELEGRAM
👀DriverGaze360: Driver SOTA👀

👉DriverGaze360 is a large-scale 360◦ field of view driver attention dataset, containing ∼1M gaze-labeled frames. Code & Dataset announced💙

👉Review https://t.ly/ZcoUw
👉Paper arxiv.org/pdf/2512.14266
👉Project av.dfki.de/drivergaze360/
👉Repo github.com/dfki-av/drivergaze360
👉Data av.dfki.de/drivergaze360/dataset
🔥114
This media is not supported in your browser
VIEW IN TELEGRAM
🫠FlexAvatar: 3D Heads🫠

👉TUM introduces FlexAvatar, a novel method for creating HQ and complete 3D head avatars from a single image. Code announced💙

👉Review https://t.ly/Rkdtd
👉Paper arxiv.org/pdf/2512.15599
👉Project tobias-kirschstein.github.io/flexavatar/
👉Repo TBA
🔥73👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🏜️ Depth Any Panoramas 🏜️

👉DAP is the new SOTA foundation model for panoramic depth estimation with a large scale dataset. Data & Repo under MIT💙

👉Review https://t.ly/LaUmd
👉Paper arxiv.org/pdf/2512.16913
👉Project https://lnkd.in/dvqNV9jx
👉Repo https://lnkd.in/dmNzhb-7
👉Demo https://lnkd.in/dDwjMF3u
🔥94👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎯Generative Refocusing is out🎯

👉Generative Refocusing is a two-step process that uses DeblurNet to recover all-in-focus images from various inputs and BokehNet for creating controllable bokeh (in semi-supervised mode). Repo under Apache2.0💙

👉Review https://t.ly/8t7PA
👉Paper arxiv.org/pdf/2512.16923
👉Project generative-refocusing.github.io/
👉Repo github.com/rayray9999/Genfocus
👉Demo huggingface.co/spaces/nycu-cplab/Genfocus-Demo
🔥72