AI & ML Papers
33K subscribers
7.11K photos
532 videos
24 files
7.78K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
AI & ML Papers
Photo
πŸ”₯ BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

πŸ’‘ The paper introduces BlockPilot, a method for improving the efficiency of speculative decoding in natural language processing tasks. Speculative decoding is a technique that uses a lightweight model to generate candidate tokens in parallel, which are then verified by a target model. Existing methods use a fixed block size for decoding, which can be suboptimal as the optimal block size varies across different input samples. The authors show that the optimal block size is critical to speculative decoding performance and that it exhibits a local structure, meaning that it tends to concentrate around the training block size.

To address this issue, the authors propose a sample-adaptive policy that predicts the optimal block size from the prefilling representation. This is done by formulating block size selection as a lightweight policy learning problem, where the optimal block size is predicted based on the representation of the prefilling stage. The prediction is performed only once after prefilling, allowing for seamless integration with existing models.

The authors evaluate their method on several benchmarks and demonstrate that it is plug-and-play, introduces minimal overhead, and consistently improves efficiency. The results show that BlockPilot achieves an acceptance length of 5.92 and a 4.20 times speedup on a specific model, indicating that it can significantly accelerate inference while maintaining accuracy. Overall, the paper contributes to the development of more efficient and adaptive speculative decoding methods, which can be useful for a wide range of natural language processing applications.


πŸ“… Published on Jun 30

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2606.31315
β€’ PDF: https://arxiv.org/pdf/2606.31315

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#InstanceAdaptivePolicyLearning #DiffusionBasedSpeculativeDecoding #NaturalLanguageProcessing #SpeculativeDecodingTechniques #BlockPilotMethod
AI & ML Papers
Photo
πŸ”₯ GEAR: Guided End-to-End AutoRegression for Image Synthesis

πŸ’‘ The paper introduces GEAR, a method for training a vector-quantized tokenizer and an autoregressive generator jointly and end-to-end for image synthesis. Typically, these models are trained in two stages, where the tokenizer is first trained and then frozen, and then the generator is trained on its output. However, this approach has a limitation, as the tokenizer is not aware of what the generator finds easy to model.

GEAR overcomes this limitation by training the tokenizer and generator jointly, guided by representation alignment. The key challenge is that the output of the tokenizer is non-differentiable, making it difficult to train the tokenizer and generator jointly. To address this, GEAR uses a dual read-out approach, where the tokenizer output is used in two different ways. A hard, one-hot branch is used to train the autoregressive generator, while a differentiable soft branch is used to carry a representation-alignment loss that guides the tokenizer.

This approach allows the autoregressive generator to steer the tokenizer towards an index distribution that it can predict more easily. As a result, the tokenizer's features become less complex, while the autoregressive generator's features become more complex and semantic. The paper demonstrates that GEAR speeds up convergence by up to 10 times relative to a strong baseline, and learns better patch-level and spatially-coherent features. Additionally, GEAR generalizes across different quantizers and can be applied to text-to-image generation. Overall, GEAR provides a new approach for training visual generative models, and achieves state-of-the-art results in image synthesis.


πŸ“… Published on Jun 30

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2606.32039
β€’ PDF: https://arxiv.org/pdf/2606.32039
β€’ Project Page: https://linb203.github.io/gear

πŸ€– Models citing this paper:
β€’ https://huggingface.co/BinLin203/Warmup-LFQ
β€’ https://huggingface.co/BinLin203/Warmup-IBQ
β€’ https://huggingface.co/BinLin203/GEAR-VQ

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#ImageSynthesis #AutoRegression #VectorQuantization #EndToEndLearning #AutoregressiveGenerators
AI & ML Papers
Photo
πŸ”₯ Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems

πŸ’‘ The paper presents a real-time verification system for retrieval-augmented generation that can process long documents and balance latency constraints with comprehensive answer validation. The problem addressed is that verifying generated answers in retrieval-augmented generation systems is difficult due to the large size of the source materials and the need for interactive services to respond quickly. Large language models can check long contexts but are too slow and costly, while lightweight classifiers operate within strict context limits and frequently miss evidence outside truncated passages.

The method proposed is a real-time verification component integrated into a production retrieval-augmented generation pipeline that enables full-document grounding under latency constraints. The system can process documents up to 32K tokens and employs adaptive inference strategies to balance response time and verification coverage across workloads.

The results show that full-context verification substantially improves detection of unsupported responses compared with truncated validation. The evaluation methodology used to deploy the verifier highlights the importance of long-context verification, the limitations of chunk-based checking in real documents, and the impact of latency budgets on model design. The findings provide practical guidance for practitioners building reliable large-scale retrieval-augmented applications, demonstrating that the proposed system can effectively verify generated answers in real-time while maintaining comprehensive coverage of the source materials.


πŸ“… Published on Mar 4

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2603.23508
β€’ PDF: https://arxiv.org/pdf/2603.23508

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#RealTimeVerification #RetrievalAugmentedGeneration #LongDocumentProcessing #AnswerValidationSystems #LatencyConstrainedVerification
AI & ML Papers
Photo
πŸ”₯ AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

πŸ’‘ The paper presents AReaL, a large-scale asynchronous reinforcement learning system designed for training large language models on reasoning tasks. The problem with existing synchronous reinforcement learning systems is that they alternate between generation and training in a batch setting, which leads to severe system-level inefficiency and underutilization of GPUs. This is because generation must wait until the longest output in the batch is completed before the model can be updated.

To address this issue, AReaL decouples generation from training, allowing rollout workers to continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. This asynchronous approach leads to substantially higher GPU utilization. To stabilize reinforcement learning training, AReaL balances the workload of rollout and training workers to control data staleness and adopts a staleness-enhanced PPO variant to better handle outdated training samples.

The results show that AReaL achieves up to 2.57 times training speedup compared to the best synchronous systems with the same number of GPUs, while matching or even improving final performance. The system was tested on math and code reasoning benchmarks, demonstrating the effectiveness of the asynchronous approach. The code for AReaL is made available, allowing others to build upon and utilize the system. Overall, AReaL provides a more efficient and scalable solution for training large language models on reasoning tasks using reinforcement learning.


πŸ“… Published on May 30, 2025

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2505.24298
β€’ PDF: https://arxiv.org/pdf/2505.24298

πŸ€– Models citing this paper:
β€’ https://huggingface.co/inclusionAI/AReaL-boba-2-8B
β€’ https://huggingface.co/inclusionAI/AReaL-boba-2-14B
β€’ https://huggingface.co/inclusionAI/AReaL-boba-2-8B-Open

πŸ“Š Datasets citing this paper:
β€’ https://huggingface.co/datasets/inclusionAI/AReaL-tau2-data

πŸš€ Spaces citing this paper:
β€’ https://huggingface.co/spaces/rzvn/Medieval-Village-AI

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#AsynchronousReinforcementLearning #LanguageReasoningTasks #LargeScaleLanguageModels #ReinforcementLearningSystems #DeepLearningForNaturalLanguageProcessing
AI & ML Papers
Photo
πŸ”₯ Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

πŸ’‘ The paper introduces a unified framework called Perceive-to-Reason that improves fine-grained visual reasoning performance on high-resolution images. Fine-grained visual reasoning is a challenging task for vision-language models, especially when small but critical visual cues are buried in high-resolution images. Existing approaches typically do not explicitly distinguish between perception and reasoning, instead relying on repeated cropping or test-time visual search to introduce local evidence.

The Perceive-to-Reason framework addresses this limitation by formulating fine-grained visual reasoning as a two-stage process. In the first stage, the model localizes question-relevant evidence as a Perceiver, and in the second stage, it answers the question as a Reasoner based on the annotated image and cropped regions. To train the model, the authors introduce a role-aware reinforcement learning strategy called Perception-Reasoning Alternating GRPO, which alternates between perception-focused and reasoning-focused updates using only final-answer supervision.

The Perceive-to-Reason framework is built on top of existing vision-language models, and it consistently improves performance across model scales. The results show that the Perceive-to-Reason framework achieves state-of-the-art performance on several benchmarks, including V-Star, HR-Bench-4K, and HR-Bench-8K. Specifically, the P2R-4B model achieves 93.2 percent on V-Star, 81.9 percent on HR-Bench-4K, and 80.5 percent on HR-Bench-8K, substantially outperforming its corresponding backbone.

The benefits of the Perceive-to-Reason framework extend beyond high-resolution benchmarks to broader multimodal reasoning tasks. The results suggest that explicitly decoupling perception from reasoning provides an effective framework for fine-grained visual reasoning. Overall, the paper contributes a novel framework for fine-grained visual reasoning that improves performance on high-resolution images and has broader implications for multimodal reasoning tasks.


πŸ“… Published on Jul 1

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2607.01191
β€’ PDF: https://arxiv.org/pdf/2607.01191

πŸ€– Models citing this paper:
β€’ https://huggingface.co/hongxingli/P2R-4B
β€’ https://huggingface.co/hongxingli/P2R-2B
β€’ https://huggingface.co/hongxingli/P2R-8B

πŸ“Š Datasets citing this paper:
β€’ https://huggingface.co/datasets/hongxingli/P2R-10k

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#FineGrainedVisualReasoning #VisualReasoningModels #PerceptionAndReasoning #HighResolutionImageAnalysis #VisionLanguageModels
πŸ”₯ Free IT Cert Resources – Grab Them While They're Hot!

🌈SPOTO just dropped a bunch of 100% free study kits for 2026 – covering #Cisco, #AWS, #PMP, #AI, #Python, #Excel, and #Cybersecurity

πŸ’₯No signup traps, no hidden fees – just click and download.

πŸ“˜ FREE Cert E‑Book β†’ https://bit.ly/4wkiLAT
πŸͺœ Online FREE Course β†’
https://bit.ly/4vHFJSz
☁️ FREE AI Materials β†’
https://bit.ly/4wdu7X6
πŸ“Š Cloud Study Guide β†’
https://bit.ly/4y0HyeW
🧠 Free Mock Exam β†’
https://bit.ly/4ff8jos

Tag a friend who's also on this journey – Get certified together! πŸ’ͺ

🌐 Join the community: https://chat.whatsapp.com/FmbIbbqm2QhKglVpVTSH4d/
πŸ“² Need personalized help? β†’ https://wa.link/6k7042
❀1
AI & ML Papers
πŸ› οΈ Build Faster, Spend Less. Your All-in-One API Proxy Endpoint. www.afford-ai.cn is designed for developers who need scale without the crazy costs. πŸ”Ή 1:2 Value Ratio: Stretch your budget further. For every $1 you fund, we credit your account with $2 in…
Code smarter, not costlier. πŸš€
Get powerful AI coding agents, seamless OpenAI-compatible APIs, and more value for every dollar. Build faster, automate more, and let AI work directly with your code. Join now and start creating without limits.
AI & ML Papers
Photo
πŸ”₯ Representation Distribution Matching for One-Step Visual Generation

πŸ’‘ The paper introduces Representation Distribution Matching, a method for one-step visual generation that matches feature distributions under pretrained encoders. The goal is to generate high-quality images by comparing the distributions of generated and reference features. The authors identify two key design axes: how the distributions are compared and the representations they are compared in. They conduct controlled studies and find three main results.

First, they show that the Maximum Mean Discrepancy, a classical method that was previously ineffective, becomes a strong and scalable objective when estimated correctly. Second, they find that the batch size of the generated images has a significant impact on performance, with an optimum batch size above 2048, which is much larger than typical batch sizes. Third, they demonstrate that using a single representation can be gamed, resulting in low scores despite visibly fake images, and instead propose using a balanced set of encoders and evaluating with a Sliced-Wasserstein distance over 14 encoders.

The authors combine these findings to develop an improved Representation Distribution Matching method, which they call iRDM. They evaluate iRDM on the ImageNet dataset and achieve state-of-the-art results, with a Sliced-Wasserstein distance of 1.30. Additionally, they use a human-preference proxy, called PickScore, which shows that iRDM is preferred over the previous best one-step generator on 71.2% of matched samples. They also apply the same method to post-train a four-step generator, called FLUX.2, and achieve better results than the original four-step version, with improved performance on GenEval and PickScore, and requiring only 90 GPU-hours. Overall, the paper presents a new method for one-step visual generation that achieves state-of-the-art results and can be used to improve existing generators.


πŸ“… Published on Jul 2

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2607.02375
β€’ PDF: https://arxiv.org/pdf/2607.02375
β€’ Project Page: https://alan-lanfeng.github.io/rdm/

πŸ€– Models citing this paper:
β€’ https://huggingface.co/epfl-vita/flux2-klein-1step-rdm

πŸš€ Spaces citing this paper:
β€’ https://huggingface.co/spaces/epfl-vita/flux2-klein-1step-demo

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#VisualGeneration #RepresentationLearning #DistributionMatching #ImageSynthesis #DeepLearning
πŸ”₯ AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

πŸ’‘ The paper introduces a new approach to studying long-horizon large language model agents, called AgenticSTS. The problem addressed is that current methods for analyzing memory components in these agents are limited, as they append past observations and reflections to every prompt, making it hard to isolate the effect of a single memory component. To solve this, the authors propose a bounded contract approach, where every decision is made from a fresh user message assembled by typed retrieval, with no raw cross-decision transcript appended. This allows for isolated analysis of memory components and demonstrates improved performance in complex decision-making tasks.

The method involves instantiating this contract in a closed-rule stochastic deck-building game, where runs require hundreds of tactical and strategic decisions. The authors create a testbed, called AgenticSTS, which includes a reproducible environment, frozen memory and skill snapshots, prompt records, and analysis scripts. This testbed allows for the study of how explicit memory layers shape long-horizon LLM-agent decisions.

The results show that the proposed approach leads to improved performance in the game, with a fixed-A0 ablation showing the largest observed difference when triggered strategic skills are enabled. The no-store baseline wins 3 out of 10 games, while adding the skill layer wins 6 out of 10 games. Although the comparison is directional rather than statistically decisive, the results demonstrate the effectiveness of the proposed approach. The authors also release a public online benchmark of frontier LLMs on the same game, which reports zero wins at the lowest difficulty across five configurations, highlighting the challenge of the task. Overall, the paper contributes a new methodology for studying long-horizon LLM agents and demonstrates its effectiveness in a complex decision-making task.


πŸ“… Published on Jul 2

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2607.02255
β€’ PDF: https://arxiv.org/pdf/2607.02255
β€’ Project Page: https://alayalab.github.io/AgenticSTS/

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#AgenticSTS #LongHorizonLLMAgents #BoundedMemoryTestbed #LargeLanguageModelAgents #LLMMemoryComponents
AI & ML Papers
Photo
πŸ”₯ Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

πŸ’‘ The paper proposes a training-free acceleration strategy for text-to-image diffusion models called MrFlow. The problem with existing multi-resolution generation strategies is that they can produce noticeable blurring or artifacts due to upsampling in the latent space and selective modification of partial regions. MrFlow addresses this issue by using a staged low-to-high-resolution pipeline. It first generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained model, injects low-strength noise to enable high-frequency resampling, and finally refines the details at high resolution. The results show that MrFlow achieves a 10x end-to-end acceleration while maintaining a high level of image quality, with only a 1 percent gap in performance compared to the original model. Additionally, MrFlow can be combined with other acceleration strategies, such as timestep distillation, to achieve even higher acceleration of up to 25x. The key advantage of MrFlow is that it does not require any training or runtime modifications, making it a hardware-agnostic and efficient solution for accelerating text-to-image diffusion models.


πŸ“… Published on Jul 2

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2607.01642
β€’ PDF: https://arxiv.org/pdf/2607.01642

πŸ€– Models citing this paper:
β€’ https://huggingface.co/Xingyu-Zheng/MrFlow

πŸš€ Spaces citing this paper:
β€’ https://huggingface.co/spaces/Xingyu-Zheng/mrflow-fast-diffusion

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#DiffusionModels #TextToImageSynthesis #MultiResolutionGeneration #StagedSampling #SuperResolutionTechniques
❀2
Did you know that many young adults miss out on exciting opportunities because they don’t take the leap into online betting? 🌿✨ Join the Winning Wave with Win Pesa and embrace the thrill of chasing big wins!

- Transform your free time into cash with enticing promotions.

- Claim FREE BETS using our innovative AI bot at winpesa.ke and watch your odds soar!

- Refer friends and unlock massive rewards! Remember, every moment is an opportunity. πŸŒŸπŸ’™

GAMBLING IS ADDICTIVE | PLAY RESPONSIBLY | 18+ ONLY | REGULATED BY BCLB

πŸ‘‰ Claim your bonus now!

#ad πŸ“’ InsideAd
Please open Telegram to view this post
VIEW IN TELEGRAM
AI & ML Papers
Photo
πŸ”₯ MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse

πŸ’‘ MetaSpatial is a framework that uses reinforcement learning to improve 3D spatial reasoning in vision-language models, which are used to generate 3D scenes. The problem with current models is that they lack internalized 3D spatial reasoning, which limits their ability to generate realistic layouts. Additionally, traditional supervised fine-tuning methods are not effective for layout generation tasks because perfect ground truth annotations are not available.

To address these challenges, MetaSpatial introduces a multi-turn reinforcement learning-based optimization mechanism that integrates physics-aware constraints and rendered image evaluations. This mechanism allows the model to refine spatial arrangements over multiple turns by analyzing rendered outputs, improving scene coherence progressively.

The method works by having the model analyze rendered outputs and refine the spatial arrangements in an iterative process. This process ensures that the generated 3D layouts are coherent, physically plausible, and aesthetically consistent.

The results of the empirical evaluations demonstrate that MetaSpatial significantly enhances the spatial consistency and formatting stability of various scale models. After training, object placements are more realistic, aligned, and functionally coherent, which validates the effectiveness of reinforcement learning for 3D spatial reasoning in applications such as metaverse, AR/VR, digital twins, and game development.

Overall, the contributions of MetaSpatial are the introduction of a reinforcement learning-based framework that enhances 3D spatial reasoning in vision-language models, and the demonstration of its effectiveness in generating realistic and coherent 3D scenes. The code, data, and training pipeline are publicly available, which can facilitate further research and development in this area.


πŸ“… Published on Mar 24, 2025

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2503.18470
β€’ PDF: https://arxiv.org/pdf/2503.18470
β€’ Project Page: https://github.com/PzySeere/MetaSpatial

πŸ“Š Datasets citing this paper:
β€’ https://huggingface.co/datasets/johnschaefer/EasyR1-qwen3vl-rl
β€’ https://huggingface.co/datasets/Yuting6/ttrl
β€’ https://huggingface.co/datasets/zhenyupan/3d_layout_reasoning

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#VisionLanguageModels #ReinforcementLearningFor3D #MetaverseArchitecture #3DSpatialReasoning #PhysicsAwareAI