AI with Papers - Artificial Intelligence & Deep Learning
17.5K subscribers
155 photos
264 videos
14 files
1.39K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒปMLLMs Fine Segmentation๐ŸŒป

๐Ÿ‘‰SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/eVguh
๐Ÿ‘‰Paper arxiv.org/pdf/2601.19228
๐Ÿ‘‰Project simpleseg.github.io/
๐Ÿ‘‰Repo github.com/songtianhui/SimpleSeg
๐Ÿ”ฅ4๐Ÿ‘3โค2๐Ÿ‘1
๐Ÿ”ฅ DeepSeek-OCR 2 is out ๐Ÿ”ฅ

๐Ÿ‘‰DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/gX4bX
๐Ÿ‘‰Paper https://arxiv.org/pdf/2601.20552
๐Ÿ‘‰Repo github.com/deepseek-ai/DeepSeek-OCR-2
๐Ÿ”ฅ7โค4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“Š SOTA Style Transfer ๐Ÿ“Š

๐Ÿ‘‰TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base modelโ€™s robust capabilities in content preservation & style customization. Code & Model released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/viVR0
๐Ÿ‘‰Paper arxiv.org/pdf/2601.20175
๐Ÿ‘‰Project tele-ai.github.io/TeleStyle/
๐Ÿ‘‰Repo github.com/Tele-AI/TeleStyle
โค10๐Ÿ‘2๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ Metric Anything is out ๐Ÿ‘

๐Ÿ‘‰Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/54Ccr
๐Ÿ‘‰Paper arxiv.org/pdf/2601.22054
๐Ÿ‘‰Project metric-anything.github.io/metric-anything-io/
๐Ÿ‘‰Repo github.com/metric-anything/metric-anything
๐Ÿ”ฅ10โค5๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆSegment Any Events by Language๐ŸŒˆ

๐Ÿ‘‰SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/1ZMF0
๐Ÿ‘‰Paper https://arxiv.org/pdf/2601.23159
๐Ÿ‘‰Project https://0nandon.github.io/SEAL/
๐Ÿ‘‰Repo https://github.com/0nandon/SEAL
๐Ÿ”ฅ7โค4๐Ÿ‘1๐Ÿคฏ1
๐Ÿ‘‰RAM prices skyrocketing

๐Ÿ‘‰Me acting like a rich kid.

Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
๐Ÿคฃ21โค4๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฎCoWTracker: Track-Warping๐Ÿฎ


๐Ÿ‘‰CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/6bAn9
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.04877
๐Ÿ‘‰Project https://cowtracker.github.io/
๐Ÿ‘‰Repo https://github.com/facebookresearch/cowtracker
๐Ÿ”ฅ4โค2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆTrajVG Trajectory-Geometry๐ŸŒˆ

๐Ÿ‘‰TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/yVi01
๐Ÿ‘‰Paper arxiv.org/pdf/2602.04439
๐Ÿ‘‰Project xingy038.github.io/TrajVG/
๐Ÿ‘‰Repo github.com/xingy038/TrajVG
โค7๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช™MOMENTUM #NeurIPS 2025 ๐Ÿช™

๐Ÿ‘‰MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/06h7Q
๐Ÿ‘‰Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
๐Ÿ‘‰Project https://momentum-project-page-232993426383.us-central1.run.app/
๐Ÿ‘‰Repo TBA
๐Ÿ‘3โค1
๐Ÿ˜ถโ€๐ŸŒซ๏ธ SOTA Full-Head Synthesis ๐Ÿ˜ถโ€๐ŸŒซ๏ธ

๐Ÿ‘‰HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/WYfP3
๐Ÿ‘‰Paper arxiv.org/pdf/2509.16748
๐Ÿ‘‰Project https://lhyfst.github.io/hyplanehead/
๐Ÿ‘‰Repo github.com/lhyfst/HyPlaneHead
โค4๐Ÿ”ฅ3๐Ÿ‘1๐Ÿ˜ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŸ AnyTouch 2 is out ๐ŸŸ

๐Ÿ‘‰AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/fP4dP
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.09617
๐Ÿ‘‰Project gewu-lab.github.io/AnyTouch2/
๐Ÿ‘‰Repo github.com/GeWu-Lab/AnyTouch2
โค6
๐ŸŒ AGENT BANANA (SOTA) ๐ŸŒ

๐Ÿ‘‰Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/EXaCH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.09084
๐Ÿ‘‰Project https://agent-banana.github.io/
๐Ÿ‘‰Repo https://github.com/taco-group/agent-banana
โค11๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ› ๏ธ IndustryShapes 6D Pose ๐Ÿ› ๏ธ

๐Ÿ‘‰IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/KKcuH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2602.05555
๐Ÿ‘‰Project https://pose-lab.github.io/IndustryShapes/
๐Ÿ‘‰Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
โค3๐Ÿ”ฅ1