AI & ML Papers
32.8K subscribers
7.05K photos
519 videos
24 files
7.7K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
πŸ€–πŸ§  Thinking with Camera 2.0: A Powerful Multimodal Model for Camera-Centric Understanding and Generation

πŸ—“οΈ 14 Oct 2025
πŸ“š AI News & Trends

In the rapidly evolving field of multimodal AI, bridging gaps between vision, language and geometry is one of the frontier challenges. Traditional vision-language models excel at describing what is in an image β€œa cat on a sofa” β€œa red car on the road” but struggle to reason about how the image was captured: the camera’s ...

#MultimodalAI #CameraCentricUnderstanding #VisionLanguageModels #AIResearch #ComputerVision #GenerativeModels
✨Diversity Has Always Been There in Your Visual Autoregressive Models

πŸ“ Summary:
To combat diversity collapse in Visual Autoregressive models, DiverseVAR modifies feature maps without retraining. This restores generative diversity while maintaining high synthesis quality.

πŸ”Ή Publication Date: Published on Nov 21

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.17074
β€’ PDF: https://arxiv.org/pdf/2511.17074

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#VisualAI #GenerativeModels #ModelDiversity #MachineLearning #ComputerVision
✨Riemannian Motion Generation: A Unified Framework for Human Motion Representation and Generation via Riemannian Flow Matching

πŸ“ Summary:
RMG is a new framework representing human motion on a product manifold and learning dynamics via Riemannian flow matching. This geometry-aware approach achieves state-of-the-art results on HumanML3D and MotionMillion, showing that modeling non-Euclidean motion geometry leads to more stable and ef...

πŸ”Ή Publication Date: Published on Mar 16

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2603.15016
β€’ PDF: https://arxiv.org/pdf/2603.15016
β€’ Project Page: https://frank-miao.github.io/RMG-Project-Page

✨ Spaces citing this paper:
β€’ https://huggingface.co/spaces/Frank-miao/RMG

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#HumanMotionGeneration #RiemannianGeometry #MachineLearning #AIResearch #GenerativeModels
AI & ML Papers
Photo
πŸ”₯ Semantic Generative Tuning for Unified Multimodal Models

πŸ’‘ The paper addresses the issue of unified multimodal models where visual understanding and generation are not well aligned due to separate training objectives. The prevailing approach of optimizing understanding through text signals and generation through pixel objectives leads to isolated representation spaces. To bridge this gap, the authors propose a novel approach called Semantic Generative Tuning, which uses semantic segmentation as a generative proxy to align and synergize multimodal capabilities.

The method involves formulating hierarchical visual tasks as generative proxies, with a focus on high-level semantic tasks like image segmentation. The authors find that segmentation provides structural semantics that enhance both vision-centric perception and generative layout fidelity. Unlike low-level tasks, segmentation does not distract models with texture details, making it an optimal proxy.

The results show that Semantic Generative Tuning fundamentally improves feature linear separability and optimizes visual-textual attention allocation patterns. Extensive evaluations demonstrate that this approach consistently improves both multimodal comprehension and generative fidelity across mainstream benchmarks. The authors provide a systematic investigation into generative post-training and introduce a new paradigm that leverages segmentation to align multimodal capabilities. The code for the proposed method is made available for further research and development. Overall, the paper presents a significant contribution to the field of unified multimodal models by introducing a novel approach that enhances multimodal alignment and performance.


πŸ“… Published on May 18

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2605.18714
β€’ PDF: https://arxiv.org/pdf/2605.18714
β€’ Project Page: https://song2yu.github.io/SGT/

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#MultimodalLearning #SemanticSegmentation #GenerativeModels #UnifiedMultimodalModels #MultimodalRepresentationLearning
AI & ML Papers
Photo
πŸ”₯ GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

πŸ’‘ The paper proposes a self-evolving image generation framework called GenEvolve that improves generative capabilities through iterative learning and reference-based prompting. The problem addressed is that high-quality image generation often requires combining a model's internal generative ability with external resources, and existing methods have limitations in handling diverse and demanding requests.

The GenEvolve framework models each generation attempt as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing methods that rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience.

This visual experience is provided to a privileged teacher branch, which uses visual experience distillation to provide dense token-level supervision to a student branch. This helps the student internalize better search, knowledge activation, reference selection, and prompt construction. The authors also construct GenEvolve-Data and GenEvolve-Bench to evaluate the framework.

The results show that GenEvolve achieves substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. The experiments on public benchmarks and GenEvolve-Bench demonstrate the effectiveness of the proposed framework. Overall, the paper contributes a novel self-evolving image generation framework that can effectively handle diverse and demanding generation challenges.


πŸ“… Published on May 20

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2605.21605
β€’ PDF: https://arxiv.org/pdf/2605.21605
β€’ Project Page: https://ephemeral182.github.io/GenEvolve/

πŸ€– Models citing this paper:
β€’ https://huggingface.co/MeiGen-AI/GenEvolve

πŸ“Š Datasets citing this paper:
β€’ https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data-Bench

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#ComputerVision #ImageGeneration #GenerativeModels #SelfEvolvingSystems #DeepLearning
AI & ML Papers
Photo
πŸ”₯ Foundations of Large Language Models

πŸ’‘ The book Foundations of Large Language Models provides a comprehensive overview of the fundamental concepts underlying large language models. The book is structured into four main chapters, each focusing on a key area: pre-training, generative models, prompting techniques, and alignment methods. The authors aim to provide a foundational understanding of large language models, rather than a comprehensive coverage of all cutting-edge technologies. The book is intended for college students, professionals, and practitioners in natural language processing and related fields, serving as a reference for anyone interested in large language models.

The problem addressed by the book is the need for a clear understanding of the foundational concepts of large language models, which are becoming increasingly important in natural language processing. The method used to address this problem is a structured approach, dividing the topic into four key areas and exploring each in depth. The results of this approach are a book that provides a solid foundation for understanding large language models, which can be used as a reference by students, professionals, and practitioners in the field.

Overall, the book provides a foundational understanding of large language models, covering key areas such as pre-training, generative models, prompting techniques, and alignment methods, and is intended to serve as a reference for those interested in this topic. The book does not aim to cover all cutting-edge technologies, but rather provides a solid foundation for understanding the underlying concepts of large language models.


πŸ“… Published on Jan 16, 2025

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2501.09223
β€’ PDF: https://arxiv.org/pdf/2501.09223

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #NaturalLanguageProcessing #PreTrainingMethods #GenerativeModels #LanguageModelAlignment
❀1