AI with Papers - Artificial Intelligence & Deep Learning
17.3K subscribers
158 photos
275 videos
14 files
1.44K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“Š SOTA Style Transfer ๐Ÿ“Š

๐Ÿ‘‰TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโ€™s robust capabilities in content preservation & style customization. Code & Model released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/viVR0
๐Ÿ‘‰Paper arxiv.org/pdf/2601.20175
๐Ÿ‘‰Project tele-ai.github.io/TeleStyle/
๐Ÿ‘‰Repo github.com/Tele-AI/TeleStyle
โค12๐Ÿ‘2๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ Metric Anything is out ๐Ÿ‘

๐Ÿ‘‰Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/54Ccr
๐Ÿ‘‰Paper arxiv.org/pdf/2601.22054
๐Ÿ‘‰Project metric-anything.github.io/metric-anything-io/
๐Ÿ‘‰Repo github.com/metric-anything/metric-anything
๐Ÿ”ฅ11โค6๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆSegment Any Events by Language๐ŸŒˆ

๐Ÿ‘‰SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/1ZMF0
๐Ÿ‘‰Paper https://arxiv.org/pdf/2601.23159
๐Ÿ‘‰Project https://0nandon.github.io/SEAL/
๐Ÿ‘‰Repo https://github.com/0nandon/SEAL
๐Ÿ”ฅ7โค4๐Ÿ‘1๐Ÿคฏ1
๐Ÿ‘‰RAM prices skyrocketing

๐Ÿ‘‰Me acting like a rich kid.

Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐Ÿคฃ25โค4๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฎCoWTracker: Track-Warping๐Ÿฎ

๐Ÿ‘‰CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/6bAn9
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.04877
๐Ÿ‘‰Project https://cowtracker.github.io/
๐Ÿ‘‰Repo https://github.com/facebookresearch/cowtracker
๐Ÿ”ฅ4โค2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆTrajVG Trajectory-Geometry๐ŸŒˆ

๐Ÿ‘‰TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/yVi01
๐Ÿ‘‰Paper arxiv.org/pdf/2602.04439
๐Ÿ‘‰Project xingy038.github.io/TrajVG/
๐Ÿ‘‰Repo github.com/xingy038/TrajVG
โค7๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช™MOMENTUM #NeurIPS 2025 ๐Ÿช™

๐Ÿ‘‰MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/06h7Q
๐Ÿ‘‰Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Ÿ‘‰Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Ÿ‘‰Repo TBA
๐Ÿ‘3๐Ÿ”ฅ2โค1
๐Ÿ˜ถโ€๐ŸŒซ๏ธ SOTA Full-Head Synthesis ๐Ÿ˜ถโ€๐ŸŒซ๏ธ

๐Ÿ‘‰HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/WYfP3
๐Ÿ‘‰Paper arxiv.org/pdf/2509.16748
๐Ÿ‘‰Project https://lhyfst.github.io/hyplanehead/
๐Ÿ‘‰Repo github.com/lhyfst/HyPlaneHead
โค3๐Ÿ”ฅ3๐Ÿ‘2๐Ÿ‘1๐Ÿ˜ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŸ AnyTouch 2 is out ๐ŸŸ

๐Ÿ‘‰AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/fP4dP
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.09617
๐Ÿ‘‰Project gewu-lab.github.io/AnyTouch2/
๐Ÿ‘‰Repo github.com/GeWu-Lab/AnyTouch2
โค6๐Ÿ”ฅ1
๐ŸŒ AGENT BANANA (SOTA) ๐ŸŒ

๐Ÿ‘‰Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/EXaCH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.09084
๐Ÿ‘‰Project https://agent-banana.github.io/
๐Ÿ‘‰Repo https://github.com/taco-group/agent-banana
โค12๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ› ๏ธ IndustryShapes 6D Pose ๐Ÿ› ๏ธ

๐Ÿ‘‰IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/KKcuH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.05555
๐Ÿ‘‰Project https://pose-lab.github.io/IndustryShapes/
๐Ÿ‘‰Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
โค8๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿค–Generalized Human Tracking๐Ÿค–

๐Ÿ‘‰Beijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.

๐Ÿ‘‰Review https://t.ly/ucmuB
๐Ÿ‘‰Paper arxiv.org/pdf/2601.23080
๐Ÿ‘‰Project zeonsunlightyu.github.io/RGMT.github.io
๐Ÿ”ฅ11โค2๐Ÿคฏ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸซงSurfPhase: 3D Interfacial Dynamics๐Ÿซง

๐Ÿ‘‰SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/g2P5F
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.11154
๐Ÿ‘‰Project https://yuegao.me/SurfPhase/
๐Ÿ‘‰Repo github.com/yuegao/SurfPhase
โค6๐Ÿ”ฅ2๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸชฟTeaching AI to illusions๐Ÿชฟ

๐Ÿ‘‰Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/98Oim
๐Ÿ‘‰Paper https://lnkd.in/dTA7iuce
๐Ÿ‘‰Project https://lnkd.in/dhTMGw23
๐Ÿ‘‰Repo https://lnkd.in/deQyDGFu
โค7๐Ÿ‘2๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฅConversational Segmentation๐Ÿฅ

๐Ÿ‘‰CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SsG57
๐Ÿ‘‰Paper arxiv.org/pdf/2602.13195
๐Ÿ‘‰Project glab-caltech.github.io/converseg/
๐Ÿ‘‰Repo github.com/AadSah/ConverSeg
๐Ÿ‘‰Demo glab-caltech.github.io/converseg/#interactive-demo
โค6๐Ÿ”ฅ3๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“ฒ Efficient VLMs ๐Ÿ“ฒ

๐Ÿ‘‰CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/3_GqN
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.13191
๐Ÿ‘‰Project https://sayands.github.io/cope/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ11โค5๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ™Dex4D: Task-Agnostic Track๐Ÿ™

๐Ÿ‘‰Dex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ZGx9T
๐Ÿ‘‰Paper arxiv.org/pdf/2602.15828
๐Ÿ‘‰Project dex4d.github.io/
๐Ÿ‘‰Sim github.com/Dex4D/Dex4D-Simulation
๐Ÿ‘‰Vision github.com/Dex4D/Dex4D-Vision
๐Ÿ‘‰HW https://github.com/Dex4D/Dex4D-Hardware
โค8๐Ÿ”ฅ1๐Ÿ‘1