AI & ML Papers – Telegram

AI & ML Papers

33.4K subscribers

7.17K photos

556 videos

24 files

7.87K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

33.4K subscribers

✨DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

📝 Summary:
DocETL is an agent-based system that optimizes complex document processing pipelines to significantly improve LLM accuracy. It uses logical rewriting and agent-guided evaluation to achieve 1.34 to 4.6 times higher quality outputs than current baselines.

🔹 Publication Date: Published on Oct 16, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.12189
• PDF: https://arxiv.org/pdf/2410.12189
• Github: https://github.com/ucbepic/docetl

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLM #AI #DocumentProcessing #AgentSystems #NaturalLanguageProcessing

❤2

912 views17:03

✨ Explore Data Science 📝 Write your paper

✨The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models

📝 Summary:
LLMs can encode high-level relational concepts for analogies but struggle with missing relational information and transfer to new entities. Success depends on strong structural alignment. Their analogical reasoning is emerging but limited compared to humans.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20344
• PDF: https://arxiv.org/pdf/2511.20344

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMs #AnalogicalReasoning #AIResearch #NaturalLanguageProcessing #CognitiveAI

285 views07:07

✨ Explore Data Science 📝 Write your paper

✨T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

📝 Summary:
T-pro 2.0 is an open-weight Russian LLM for hybrid reasoning and efficient inference. It uses a Cyrillic-dense tokenizer and EAGLE speculative decoding for low latency. The project releases model weights and benchmarks to foster reproducible research.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10430
• PDF: https://arxiv.org/pdf/2512.10430

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLM #AI #NaturalLanguageProcessing #HybridReasoning #EfficientInference

463 views11:04

✨ Explore Data Science 📝 Write your paper

✨Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives

📝 Summary:
This study explores syllogistic reasoning in LLMs, examining both symbolic inference and natural language understanding. Some models achieve perfect symbolic performance, leading to questions about whether LLMs are becoming more formal reasoning mechanisms.

🔹 Publication Date: Published on Dec 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12620
• PDF: https://arxiv.org/pdf/2512.12620
• Github: https://github.com/XAheli/Logic-in-LLMs

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMs #SyllogisticReasoning #NaturalLanguageProcessing #AIResearch #FormalLogic

471 views08:03

✨ Explore Data Science 📝 Write your paper

✨Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis

📝 Summary:
TAPPA unifies LLM attention patterns by temporal analysis, classifying them as predictable or unpredictable based on query self-similarity. This framework deepens understanding and guides acceleration, improving KV cache and LLM pruning.

🔹 Publication Date: Published on Jan 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.21709
• PDF: https://arxiv.org/pdf/2601.21709

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLM #AttentionMechanism #AIResearch #NaturalLanguageProcessing #MachineLearning

365 views10:07

✨ Explore Data Science 📝 Write your paper

✨The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus

📝 Summary:
λ-RLM replaces open-ended recursive code generation in LLMs with a typed functional runtime based on λ-calculus. This provides formal guarantees and improves long-context reasoning by outperforming standard RLMs in accuracy and latency.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20105
• PDF: https://arxiv.org/pdf/2603.20105
• Github: https://github.com/lambda-calculus-LLM/lambda-RLM

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMs #LambdaCalculus #AI #NaturalLanguageProcessing #DeepLearning

❤1

225 views09:40

✨ Explore Data Science 📝 Write your paper

✨Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

📝 Summary:
XKD-Dial is a progressive training pipeline for explainable, bilingual English-Hindi knowledge-grounded dialogue. It achieves zero hallucination rates by using citation grounding and improves explainability through post-hoc analyses.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18911
• PDF: https://arxiv.org/pdf/2603.18911

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#LLMs #ExplainableAI #NaturalLanguageProcessing #AIResearch #HallucinationReduction

299 views16:12

✨ Explore Data Science 📝 Write your paper

✨Natural-Language Agent Harnesses

📝 Summary:
Natural-Language Agent Harnesses NLAHs and Intelligent Harness Runtime IHR enable portable, executable agent harness design through natural language. This externalizes control logic from code, making harnesses easier to transfer, compare, and study.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25723
• PDF: https://arxiv.org/pdf/2603.25723

==================================

For more data science resources:
✓ https://xn--r1a.website/DataScienceT

#NaturalLanguageProcessing #AI #AIAgents #SoftwareEngineering #CodePortability

330 views10:03

✨ Explore Data Science 📝 Write your paper

🔥 OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

💡 The paper introduces OmniFlatten, a novel end-to-end GPT model that enables real-time natural full-duplex spoken dialogue. The goal is to achieve low latency and natural interactions in full-duplex dialogue systems, which is a significant challenge due to human conversation dynamics such as interruptions, backchannels, and overlapping speech. To address this, the authors propose a multi-stage post-training technique that integrates speech and text without altering the original model's architecture. The training process consists of three stages: modality alignment, half-duplex dialogue learning, and full-duplex dialogue learning. A flattening operation is used to standardize the data, allowing for unified training methods and model architecture across different modalities and tasks. The OmniFlatten model can generate text and speech in real-time, effectively modeling complex behaviors inherent to natural conversations. The approach offers a straightforward modeling technique and a promising research direction for developing efficient and natural end-to-end full-duplex spoken dialogue systems. The results are demonstrated through audio samples of dialogues generated by OmniFlatten, which can be found online. Overall, the paper contributes to the development of full-duplex spoken dialogue systems that can mimic human-human interactions, with potential applications in various areas such as virtual assistants, customer service, and more.

📅 Published on Oct 23, 2024

🔗 Links:
• arXiv: https://arxiv.org/abs/2410.17799
• PDF: https://arxiv.org/pdf/2410.17799
• GitHub: https://github.com/karpathy/nanogpt ⭐ 57.6k

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#GPTModelArchitecture #FullDuplexDialogueSystems #NaturalLanguageProcessing #SpeechRecognitionTechniques #EndToEndConversationalAI

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Full-duplex spoken dialogue systems significantly surpass traditional turn-based dialogue systems, as they allow simultaneous bidirectional communication, closely mirroring human-human...

497 views14:56

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Self-Supervised Prompt Optimization

💡 The paper proposes a self supervised framework called Self Supervised Prompt Optimization that optimizes prompts for large language models without requiring external references. The problem addressed is that manually designed prompts require expertise and iterative experimentation, while existing prompt optimization methods rely heavily on external references such as ground truth or human evaluation, which can be costly to obtain. The proposed method derives evaluation and optimization signals purely from output comparisons, where a large language model evaluator selects superior prompts through pairwise output comparisons, and a large language model optimizer aligns outputs with task requirements. The results show that the proposed method outperforms state of the art prompt optimization methods, achieving comparable or superior results with significantly lower costs and fewer samples, demonstrating its effectiveness and efficiency. The method can optimize prompts for both closed and open ended tasks, and can be applied in real world scenarios where external references are unavailable or costly to obtain.

📅 Published on Feb 7, 2025

🔗 Links:
• arXiv: https://arxiv.org/abs/2502.06855
• PDF: https://arxiv.org/pdf/2502.06855
• GitHub: https://github.com/geekan/metagpt ⭐ 67.7k

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/XiangJinYu/SPO
• https://huggingface.co/spaces/tang-x/SPO
• https://huggingface.co/spaces/ositamiles/SPO

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#SelfSupervisedLearning #PromptOptimization #LargeLanguageModels #NaturalLanguageProcessing #LanguageModelEvaluation

Self-Supervised Prompt Optimization

Well-designed prompts are crucial for enhancing Large language models' (LLMs) reasoning capabilities while aligning their outputs with task requirements across diverse domains. However, manually...

412 views22:57

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Recursive Language Models

💡 The paper introduces Recursive Language Models, a novel approach to enable large language models to process arbitrarily long prompts. The problem addressed is that current language models have limited context windows, which restricts their ability to handle long inputs. The proposed method treats long prompts as part of an external environment and allows the language model to programmatically examine, decompose, and recursively call itself over snippets of the prompt. This approach enables the model to handle inputs that are up to two orders of magnitude beyond the model context window. The results show that Recursive Language Models successfully handle long inputs and outperform base language models and common long-context scaffolds across four diverse long-context tasks, while having comparable or cheaper cost per query. Overall, the paper contributes a general inference strategy that improves the ability of large language models to process long prompts, making them more effective and efficient.

📅 Published on Dec 31, 2025

🔗 Links:
• arXiv: https://arxiv.org/abs/2512.24601
• PDF: https://arxiv.org/pdf/2512.24601
• Project Page: https://alexzhang13.github.io/blog/2025/rlm/
• GitHub: https://github.com/alexzhang13/rlm ⭐ 4.2k

🤖 Models citing this paper:
• https://huggingface.co/mit-oasys/rlm-qwen3-8b-v0.1
• https://huggingface.co/nightmedia/Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking-qx86-hi-mlx

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/sergiopaniego/repl
• https://huggingface.co/spaces/openenv/repl
• https://huggingface.co/spaces/sergiopaniego/repl-env

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#RecursiveLanguageModels #LargeLanguageModels #LongContextProcessing #LanguageModelArchitectures #NaturalLanguageProcessing

Recursive Language Models

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference...

❤3

444 views10:57

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Adam's Law: Textual Frequency Law on Large Language Models

💡 The paper proposes a novel framework to improve large language model performance through textual frequency analysis. The authors argue that textual frequency, which is the frequency of certain words or phrases in a language, is relevant to human cognition and can also be applied to large language models. However, this topic has been understudied in the context of large language models.

The proposed framework consists of three main components. First, the authors introduce the Textual Frequency Law, which states that frequent textual data should be preferred for large language models, both for prompting and fine-tuning. To estimate the sentence-level frequency, the authors use online resources, as many large language models are closed-source in their training data. They also utilize an input paraphraser to paraphrase the input into a more frequent textual expression.

The second component is Textual Frequency Distillation, which involves querying large language models to conduct story completion by extending sentences in the datasets. The resulting corpora are used to adjust the initial estimation of textual frequency.

The third component is Curriculum Textual Frequency Training, which fine-tunes large language models in an increasing order of sentence-level frequency. This means that the models are first trained on the most frequent sentences and then gradually moved to less frequent ones.

The authors conducted experiments on a curated dataset called Textual Frequency Paired Dataset, which covers tasks such as math reasoning, machine translation, commonsense reasoning, and agentic tool calling. The results show that the proposed framework is effective in improving large language model performance.

Overall, the paper contributes to the understanding of textual frequency in large language models and provides a novel framework for improving their performance. The proposed framework has the potential to be applied to various natural language processing tasks and can lead to more efficient and effective large language models.

📅 Published on Apr 2

🔗 Links:
• arXiv: https://arxiv.org/abs/2604.02176
• PDF: https://arxiv.org/pdf/2604.02176
• GitHub: https://github.com/HongyuanLuke/frequencylaw ⭐ 658

📊 Datasets citing this paper:
• https://huggingface.co/datasets/Akaashiiii/TFPD

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#AdamSLaw #TextualFrequencyAnalysis #LargeLanguageModels #NaturalLanguageProcessing #LanguageModelOptimization

Adam's Law: Textual Frequency Law on Large Language Models

While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom studied. We propose a novel research direction...

❤2

543 views05:00

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Fish Audio S2 Technical Report

💡 The paper introduces Fish Audio S2, an open source text to speech system that features multi speaker capabilities, multi turn generation, and instruction following control through natural language descriptions. The system utilizes a multi stage training approach, which includes a staged data pipeline covering video captioning, speech captioning, voice quality assessment, and reward modeling. This approach allows for scalable training and improves the overall performance of the system. The authors also release their model weights, fine tuning code, and an inference engine, making it production ready for streaming. The inference engine achieves a real time factor of 0.195 and a time to first audio of below 100 milliseconds, indicating its efficiency and speed. The code and weights are made available on GitHub and Hugging Face, and users are encouraged to try custom voices on the website. Overall, the paper contributes to the advancement of open source text to speech systems, providing a robust and efficient solution for generating high quality speech.

📅 Published on Mar 9

🔗 Links:
• arXiv: https://arxiv.org/abs/2603.08823
• PDF: https://arxiv.org/pdf/2603.08823
• Project Page: https://fish.audio/
• GitHub: https://github.com/fishaudio/fish-speech ⭐ 30.2k

🤖 Models citing this paper:
• https://huggingface.co/fishaudio/s2-pro
• https://huggingface.co/drbaph/s2-pro-fp8
• https://huggingface.co/mlx-community/fish-audio-s2-pro-bf16

📊 Datasets citing this paper:
• https://huggingface.co/datasets/Izzyzlin/CFSDD

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/artificialguybr/fish-s2-pro-zero
• https://huggingface.co/spaces/fguilleme/fish-s2-pro-zero
• https://huggingface.co/spaces/MAYA-AI/fish-s2-pro-zero

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#TextToSpeechSystems #MultispeakerSynthesis #NaturalLanguageProcessing #SpeechGenerationModels #RealTimeAudioProcessing

Fish Audio S2 Technical Report

We introduce Fish Audio S2, an open-sourced text-to-speech system featuring multi-speaker, multi-turn generation, and, most importantly, instruction-following control via natural-language...

❤4👍2

870 views01:37

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Transformer Explainer: Interactive Learning of Text-Generative Models

💡 The paper introduces Transformer Explainer, an interactive visualization tool that helps non-experts understand the inner workings of the GPT-2 model. The problem addressed is that Transformers, despite being a revolutionary machine learning technology, are often opaque to those without extensive expertise. To tackle this issue, the authors developed a tool that provides a model overview and allows users to smoothly transition across different abstraction levels of mathematical operations and model structures.

The method used to create the tool involves integrating a live GPT-2 instance that runs locally in the user's browser, enabling users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. This approach allows users to gain hands-on experience and intuition about complex Transformer concepts without requiring installation or special hardware.

The results of this work are a publicly available, open-sourced tool that broadens access to education on modern generative AI techniques. The tool is accessible at a provided website and a video demo is also available, showcasing the tool's capabilities. Overall, the paper contributes to making Transformers more accessible and understandable to a wider audience, including non-experts, by providing an interactive and intuitive learning experience.

📅 Published on Aug 8, 2024

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2408.04619
• PDF: https://arxiv.org/pdf/2408.04619
• Project Page: https://poloclub.github.io/transformer-explainer/

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#TransformerModels #GPT2Explained #NaturalLanguageProcessing #TextGenerationModels #ExplainableAI

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

679 views01:53

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Foundations of Large Language Models

💡 The book Foundations of Large Language Models provides a comprehensive overview of the fundamental concepts underlying large language models. The book is structured into four main chapters, each focusing on a key area: pre-training, generative models, prompting techniques, and alignment methods. The authors aim to provide a foundational understanding of large language models, rather than a comprehensive coverage of all cutting-edge technologies. The book is intended for college students, professionals, and practitioners in natural language processing and related fields, serving as a reference for anyone interested in large language models.

The problem addressed by the book is the need for a clear understanding of the foundational concepts of large language models, which are becoming increasingly important in natural language processing. The method used to address this problem is a structured approach, dividing the topic into four key areas and exploring each in depth. The results of this approach are a book that provides a solid foundation for understanding large language models, which can be used as a reference by students, professionals, and practitioners in the field.

Overall, the book provides a foundational understanding of large language models, covering key areas such as pre-training, generative models, prompting techniques, and alignment methods, and is intended to serve as a reference for those interested in this topic. The book does not aim to cover all cutting-edge technologies, but rather provides a solid foundation for understanding the underlying concepts of large language models.

📅 Published on Jan 16, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2501.09223
• PDF: https://arxiv.org/pdf/2501.09223

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #NaturalLanguageProcessing #PreTrainingMethods #GenerativeModels #LanguageModelAlignment

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

❤1

1.08K views20:19

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models

💡 The paper addresses the challenge of financial sentiment analysis, which is crucial for investment decision-making. Traditional natural language processing models are limited by their size and training data, resulting in poor generalization and effectiveness. Large Language Models, despite their superior performance in various NLP tasks, also face challenges in financial sentiment analysis due to the discrepancy between their pre-training objective and the task of predicting sentiment labels. Additionally, the concise nature of financial news often lacks sufficient context, which can compromise the reliability of Large Language Models' sentiment analysis.

To overcome these challenges, the authors propose a retrieval-augmented Large Language Model framework. This framework consists of two modules: an instruction-tuned Large Language Model module that ensures the model behaves as a predictor of sentiment labels, and a retrieval-augmentation module that retrieves additional context from reliable external sources. This approach enables the model to leverage external context to improve its sentiment analysis capabilities.

The authors evaluate their framework against traditional models and other Large Language Models, such as ChatGPT and LLaMA. The results show that their approach achieves a significant performance gain, with improvements in accuracy and F1 score ranging from 15% to 48%. This demonstrates the effectiveness of the proposed retrieval-augmented Large Language Model framework in enhancing financial sentiment analysis. Overall, the paper contributes to the development of more accurate and reliable financial sentiment analysis models, which can inform better investment decisions.

📅 Published on Oct 6, 2023

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2310.04027
• PDF: https://arxiv.org/pdf/2310.04027

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#FinancialSentimentAnalysis #RetrievalAugmentedModels #LargeLanguageModels #NaturalLanguageProcessing #FinancialTextAnalysis

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

613 views15:51

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

💡 The paper presents olmOCR, an open source toolkit that uses a fine tuned vision language model to extract clean text from PDF documents while preserving their structure. The problem addressed is that PDFs come in diverse formats and visual layouts, making it challenging to extract and represent their content for language model use. The method involves using a 7 billion parameter vision language model trained on a sample of 260,000 pages from over 100,000 crawled PDFs with diverse properties. The model is fine tuned to process PDFs into clean linearized plain text in natural reading order, preserving structured content such as sections, tables, lists, and equations. The results show that olmOCR is optimized for large scale batch processing, able to scale flexibly to different hardware setups, and can convert a million PDF pages for a relatively low cost of 190 USD. The toolkit is released as open source, including the vision language model weights, data, training code, and inference code, making it accessible for use in training language models with the trillions of tokens available in PDF documents. Overall, the paper contributes a scalable and cost effective solution for unlocking the content of PDF documents, which can be used to train high quality language models.

📅 Published on Feb 25, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2502.18443
• PDF: https://arxiv.org/pdf/2502.18443
• Project Page: https://olmocr.allenai.org/

📊 Datasets citing this paper:
• https://huggingface.co/datasets/allenai/olmOCR-bench
• https://huggingface.co/datasets/shhdwi/olmocr-pre-rendered
• https://huggingface.co/datasets/Voxel51/olmOCR_bench

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/davanstrien/benchmark-race
• https://huggingface.co/spaces/OpenEvals/every-leaderboards
• https://huggingface.co/spaces/OpenEvals/leaderboard-watcher

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#VisionLanguageModels #PDFTextExtraction #DocumentLayoutAnalysis #OCRTechniques #NaturalLanguageProcessing

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

❤3

1.05K views11:51

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

💡 The paper introduces BlockPilot, a method for improving the efficiency of speculative decoding in natural language processing tasks. Speculative decoding is a technique that uses a lightweight model to generate candidate tokens in parallel, which are then verified by a target model. Existing methods use a fixed block size for decoding, which can be suboptimal as the optimal block size varies across different input samples. The authors show that the optimal block size is critical to speculative decoding performance and that it exhibits a local structure, meaning that it tends to concentrate around the training block size.

To address this issue, the authors propose a sample-adaptive policy that predicts the optimal block size from the prefilling representation. This is done by formulating block size selection as a lightweight policy learning problem, where the optimal block size is predicted based on the representation of the prefilling stage. The prediction is performed only once after prefilling, allowing for seamless integration with existing models.

The authors evaluate their method on several benchmarks and demonstrate that it is plug-and-play, introduces minimal overhead, and consistently improves efficiency. The results show that BlockPilot achieves an acceptance length of 5.92 and a 4.20 times speedup on a specific model, indicating that it can significantly accelerate inference while maintaining accuracy. Overall, the paper contributes to the development of more efficient and adaptive speculative decoding methods, which can be useful for a wide range of natural language processing applications.

📅 Published on Jun 30

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2606.31315
• PDF: https://arxiv.org/pdf/2606.31315

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#InstanceAdaptivePolicyLearning #DiffusionBasedSpeculativeDecoding #NaturalLanguageProcessing #SpeculativeDecodingTechniques #BlockPilotMethod

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

573 views19:54

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 Vision as Unified Multimodal Generation

💡 The paper introduces a unified multimodal model that formulates computer vision tasks as generation problems using natural language and visual prompts. This approach allows for a single model to perform a wide range of vision tasks without requiring task-specific architectures. The model, called SenseNova-Vision, uses natural-language instructions and optional visual prompts to specify tasks and generates responses as text, images, or mixed text-and-image outputs. To support large-scale training, the authors created the SenseNova-Vision Corpus, a computer-vision instruction-response corpus that spans text, image, and mixed targets. The model is trained on this corpus, along with auxiliary multimodal data, and achieves performance comparable to specialized systems across diverse vision tasks, including detection, OCR, keypoint estimation, segmentation, and camera pose estimation. The results demonstrate that a single unified model can match leading task-specialized systems, suggesting that unified multimodal generation is a scalable route for integrating computer vision capabilities into general-purpose foundation models. The model and corpus are publicly available, providing a valuable resource for the research community. Overall, the paper presents a significant contribution to the field of computer vision, offering a unified and flexible approach to tackling a wide range of vision tasks.

📅 Published on Jul 7

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.06560
• PDF: https://arxiv.org/pdf/2607.06560

🤖 Models citing this paper:
• https://huggingface.co/sensenova/SenseNova-Vision-7B-MoT

📊 Datasets citing this paper:
• https://huggingface.co/datasets/sensenova/SenseNova-Vision-Corpus-50M
• https://huggingface.co/datasets/sensenova/SenseNova-Vision-Benchmark

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MultimodalGeneration #VisionTasks #NaturalLanguageProcessing #ComputerVision #MultimodalLearning

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

484 views21:55

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

🔥 HuggingFace's Transformers: State-of-the-art Natural Language Processing

💡 The paper discusses the Transformers library, an open source collection of state of the art Transformer architectures and pretrained models for natural language processing tasks. The library aims to make recent advances in natural language processing accessible to the wider machine learning community. The problem addressed is the difficulty in utilizing recent advances in model architecture and pretraining for natural language processing tasks. The method used is the creation of a unified API that provides access to a range of carefully engineered state of the art Transformer architectures, along with a curated collection of pretrained models. The library is designed to be extensible for researchers, simple for practitioners, and fast and robust for industrial deployments. The results are a library that provides a simple and unified way to access and utilize state of the art natural language processing models, making it easier for researchers and practitioners to build and deploy effective natural language processing systems. The library is available for use and contribution by the community, with the goal of driving further advances in natural language processing.

📅 Published on Oct 9, 2019

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/1910.03771
• PDF: https://arxiv.org/pdf/1910.03771
• Project Page: https://huggingface.co

🤖 Models citing this paper:
• https://huggingface.co/PJMixers-Images/Florence-2-base-Castollux-v0.5
• https://huggingface.co/Ian332/Helper_Bob
• https://huggingface.co/PJMixers-Images/Florence-2-base-Castollux-v0.2

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/dpratapa/bio-seq-lm-explorer
• https://huggingface.co/spaces/itchybeetle3/img_caption_generation
• https://huggingface.co/spaces/PJMixers-Images/Florence-2-base-Castollux-v0.5

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#NaturalLanguageProcessing #TransformerArchitectures #PretrainedModels #StateOfTheArtAI #MachineLearningLibrary

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

❤2

737 views23:56

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate