AI & ML Papers
32.8K subscribers
7.05K photos
519 videos
24 files
7.7K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
πŸ€–πŸ§  The Transformer Architecture: How Attention Revolutionized Deep Learning

πŸ—“οΈ 11 Nov 2025
πŸ“š AI News & Trends

The field of artificial intelligence has witnessed a remarkable evolution and at the heart of this transformation lies the Transformer architecture. Introduced by Vaswani et al. in 2017, the paper β€œAttention Is All You Need” redefined the foundations of natural language processing (NLP) and sequence modeling. Unlike its predecessors – recurrent and convolutional neural networks, ...

#TransformerArchitecture #AttentionMechanism #DeepLearning #NaturalLanguageProcessing #NLP #AIResearch
❀1
πŸ€–πŸ§  BERT: Revolutionizing Natural Language Processing with Bidirectional Transformers

πŸ—“οΈ 11 Nov 2025
πŸ“š AI News & Trends

In the ever-evolving landscape of artificial intelligence and natural language processing (NLP), BERT (Bidirectional Encoder Representations from Transformers) stands as a monumental breakthrough. Developed by researchers at Google AI in 2018, BERT introduced a new way of understanding the context of language by using deep bidirectional training of the Transformer architecture. Unlike previous models that ...

#BERT #NaturalLanguageProcessing #TransformerArchitecture #BidirectionalLearning #DeepLearning #AIStrategy
πŸ€–πŸ§  Context Engineering 2.0: Redefining Human–Machine Understanding

πŸ—“οΈ 16 Nov 2025
πŸ“š AI News & Trends

As artificial intelligence advances, machines are becoming increasingly capable of understanding and responding to human language. Yet, one crucial challenge remains how can machines truly understand the context behind human intentions? This question forms the foundation of context engineering, a discipline that focuses on designing, organizing and managing contextual information so that AI systems can ...

#ContextEngineering #AIEducation #HumanMachineUnderstanding #AIContext #NaturalLanguageProcessing #AIModels
This media is not supported in your browser
VIEW IN TELEGRAM
✨Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

πŸ“ Summary:
Think-at-Hard TaH improves LLM reasoning by dynamically refining only hard tokens. It uses a neural decider to identify them and LoRA for focused refinement, boosting performance with minimal overhead.

πŸ”Ή Publication Date: Published on Nov 11

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.08577
β€’ PDF: https://arxiv.org/pdf/2511.08577
β€’ Github: https://github.com/thu-nics/TaH

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#LLM #AI #MachineLearning #NaturalLanguageProcessing #Reasoning
✨DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

πŸ“ Summary:
DocETL is an agent-based system that optimizes complex document processing pipelines to significantly improve LLM accuracy. It uses logical rewriting and agent-guided evaluation to achieve 1.34 to 4.6 times higher quality outputs than current baselines.

πŸ”Ή Publication Date: Published on Oct 16, 2024

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2410.12189
β€’ PDF: https://arxiv.org/pdf/2410.12189
β€’ Github: https://github.com/ucbepic/docetl

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#LLM #AI #DocumentProcessing #AgentSystems #NaturalLanguageProcessing
❀2
✨The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models

πŸ“ Summary:
LLMs can encode high-level relational concepts for analogies but struggle with missing relational information and transfer to new entities. Success depends on strong structural alignment. Their analogical reasoning is emerging but limited compared to humans.

πŸ”Ή Publication Date: Published on Nov 25

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.20344
β€’ PDF: https://arxiv.org/pdf/2511.20344

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#LLMs #AnalogicalReasoning #AIResearch #NaturalLanguageProcessing #CognitiveAI
✨T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

πŸ“ Summary:
T-pro 2.0 is an open-weight Russian LLM for hybrid reasoning and efficient inference. It uses a Cyrillic-dense tokenizer and EAGLE speculative decoding for low latency. The project releases model weights and benchmarks to foster reproducible research.

πŸ”Ή Publication Date: Published on Dec 11

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2512.10430
β€’ PDF: https://arxiv.org/pdf/2512.10430

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#LLM #AI #NaturalLanguageProcessing #HybridReasoning #EfficientInference
✨Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives

πŸ“ Summary:
This study explores syllogistic reasoning in LLMs, examining both symbolic inference and natural language understanding. Some models achieve perfect symbolic performance, leading to questions about whether LLMs are becoming more formal reasoning mechanisms.

πŸ”Ή Publication Date: Published on Dec 14

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2512.12620
β€’ PDF: https://arxiv.org/pdf/2512.12620
β€’ Github: https://github.com/XAheli/Logic-in-LLMs

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#LLMs #SyllogisticReasoning #NaturalLanguageProcessing #AIResearch #FormalLogic
✨Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis

πŸ“ Summary:
TAPPA unifies LLM attention patterns by temporal analysis, classifying them as predictable or unpredictable based on query self-similarity. This framework deepens understanding and guides acceleration, improving KV cache and LLM pruning.

πŸ”Ή Publication Date: Published on Jan 29

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2601.21709
β€’ PDF: https://arxiv.org/pdf/2601.21709

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#LLM #AttentionMechanism #AIResearch #NaturalLanguageProcessing #MachineLearning
✨The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus

πŸ“ Summary:
Ξ»-RLM replaces open-ended recursive code generation in LLMs with a typed functional runtime based on Ξ»-calculus. This provides formal guarantees and improves long-context reasoning by outperforming standard RLMs in accuracy and latency.

πŸ”Ή Publication Date: Published on Mar 20

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2603.20105
β€’ PDF: https://arxiv.org/pdf/2603.20105
β€’ Github: https://github.com/lambda-calculus-LLM/lambda-RLM

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#LLMs #LambdaCalculus #AI #NaturalLanguageProcessing #DeepLearning
❀1
✨Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

πŸ“ Summary:
XKD-Dial is a progressive training pipeline for explainable, bilingual English-Hindi knowledge-grounded dialogue. It achieves zero hallucination rates by using citation grounding and improves explainability through post-hoc analyses.

πŸ”Ή Publication Date: Published on Mar 19

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2603.18911
β€’ PDF: https://arxiv.org/pdf/2603.18911

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#LLMs #ExplainableAI #NaturalLanguageProcessing #AIResearch #HallucinationReduction
✨Natural-Language Agent Harnesses

πŸ“ Summary:
Natural-Language Agent Harnesses NLAHs and Intelligent Harness Runtime IHR enable portable, executable agent harness design through natural language. This externalizes control logic from code, making harnesses easier to transfer, compare, and study.

πŸ”Ή Publication Date: Published on Mar 26

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2603.25723
β€’ PDF: https://arxiv.org/pdf/2603.25723

==================================

For more data science resources:
βœ“ https://xn--r1a.website/DataScienceT

#NaturalLanguageProcessing #AI #AIAgents #SoftwareEngineering #CodePortability
AI & ML Papers
Photo
πŸ”₯ OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

πŸ’‘ The paper introduces OmniFlatten, a novel end-to-end GPT model that enables real-time natural full-duplex spoken dialogue. The goal is to achieve low latency and natural interactions in full-duplex dialogue systems, which is a significant challenge due to human conversation dynamics such as interruptions, backchannels, and overlapping speech. To address this, the authors propose a multi-stage post-training technique that integrates speech and text without altering the original model's architecture. The training process consists of three stages: modality alignment, half-duplex dialogue learning, and full-duplex dialogue learning. A flattening operation is used to standardize the data, allowing for unified training methods and model architecture across different modalities and tasks. The OmniFlatten model can generate text and speech in real-time, effectively modeling complex behaviors inherent to natural conversations. The approach offers a straightforward modeling technique and a promising research direction for developing efficient and natural end-to-end full-duplex spoken dialogue systems. The results are demonstrated through audio samples of dialogues generated by OmniFlatten, which can be found online. Overall, the paper contributes to the development of full-duplex spoken dialogue systems that can mimic human-human interactions, with potential applications in various areas such as virtual assistants, customer service, and more.


πŸ“… Published on Oct 23, 2024

πŸ”— Links:
β€’ arXiv: https://arxiv.org/abs/2410.17799
β€’ PDF: https://arxiv.org/pdf/2410.17799
β€’ GitHub: https://github.com/karpathy/nanogpt ⭐ 57.6k

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#GPTModelArchitecture #FullDuplexDialogueSystems #NaturalLanguageProcessing #SpeechRecognitionTechniques #EndToEndConversationalAI
AI & ML Papers
Photo
πŸ”₯ Self-Supervised Prompt Optimization

πŸ’‘ The paper proposes a self supervised framework called Self Supervised Prompt Optimization that optimizes prompts for large language models without requiring external references. The problem addressed is that manually designed prompts require expertise and iterative experimentation, while existing prompt optimization methods rely heavily on external references such as ground truth or human evaluation, which can be costly to obtain. The proposed method derives evaluation and optimization signals purely from output comparisons, where a large language model evaluator selects superior prompts through pairwise output comparisons, and a large language model optimizer aligns outputs with task requirements. The results show that the proposed method outperforms state of the art prompt optimization methods, achieving comparable or superior results with significantly lower costs and fewer samples, demonstrating its effectiveness and efficiency. The method can optimize prompts for both closed and open ended tasks, and can be applied in real world scenarios where external references are unavailable or costly to obtain.


πŸ“… Published on Feb 7, 2025

πŸ”— Links:
β€’ arXiv: https://arxiv.org/abs/2502.06855
β€’ PDF: https://arxiv.org/pdf/2502.06855
β€’ GitHub: https://github.com/geekan/metagpt ⭐ 67.7k

πŸš€ Spaces citing this paper:
β€’ https://huggingface.co/spaces/XiangJinYu/SPO
β€’ https://huggingface.co/spaces/tang-x/SPO
β€’ https://huggingface.co/spaces/ositamiles/SPO

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#SelfSupervisedLearning #PromptOptimization #LargeLanguageModels #NaturalLanguageProcessing #LanguageModelEvaluation
AI & ML Papers
Photo
πŸ”₯ Recursive Language Models

πŸ’‘ The paper introduces Recursive Language Models, a novel approach to enable large language models to process arbitrarily long prompts. The problem addressed is that current language models have limited context windows, which restricts their ability to handle long inputs. The proposed method treats long prompts as part of an external environment and allows the language model to programmatically examine, decompose, and recursively call itself over snippets of the prompt. This approach enables the model to handle inputs that are up to two orders of magnitude beyond the model context window. The results show that Recursive Language Models successfully handle long inputs and outperform base language models and common long-context scaffolds across four diverse long-context tasks, while having comparable or cheaper cost per query. Overall, the paper contributes a general inference strategy that improves the ability of large language models to process long prompts, making them more effective and efficient.


πŸ“… Published on Dec 31, 2025

πŸ”— Links:
β€’ arXiv: https://arxiv.org/abs/2512.24601
β€’ PDF: https://arxiv.org/pdf/2512.24601
β€’ Project Page: https://alexzhang13.github.io/blog/2025/rlm/
β€’ GitHub: https://github.com/alexzhang13/rlm ⭐ 4.2k

πŸ€– Models citing this paper:
β€’ https://huggingface.co/mit-oasys/rlm-qwen3-8b-v0.1
β€’ https://huggingface.co/nightmedia/Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking-qx86-hi-mlx

πŸš€ Spaces citing this paper:
β€’ https://huggingface.co/spaces/sergiopaniego/repl
β€’ https://huggingface.co/spaces/openenv/repl
β€’ https://huggingface.co/spaces/sergiopaniego/repl-env

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#RecursiveLanguageModels #LargeLanguageModels #LongContextProcessing #LanguageModelArchitectures #NaturalLanguageProcessing
❀3
AI & ML Papers
Photo
πŸ”₯ Adam's Law: Textual Frequency Law on Large Language Models

πŸ’‘ The paper proposes a novel framework to improve large language model performance through textual frequency analysis. The authors argue that textual frequency, which is the frequency of certain words or phrases in a language, is relevant to human cognition and can also be applied to large language models. However, this topic has been understudied in the context of large language models.

The proposed framework consists of three main components. First, the authors introduce the Textual Frequency Law, which states that frequent textual data should be preferred for large language models, both for prompting and fine-tuning. To estimate the sentence-level frequency, the authors use online resources, as many large language models are closed-source in their training data. They also utilize an input paraphraser to paraphrase the input into a more frequent textual expression.

The second component is Textual Frequency Distillation, which involves querying large language models to conduct story completion by extending sentences in the datasets. The resulting corpora are used to adjust the initial estimation of textual frequency.

The third component is Curriculum Textual Frequency Training, which fine-tunes large language models in an increasing order of sentence-level frequency. This means that the models are first trained on the most frequent sentences and then gradually moved to less frequent ones.

The authors conducted experiments on a curated dataset called Textual Frequency Paired Dataset, which covers tasks such as math reasoning, machine translation, commonsense reasoning, and agentic tool calling. The results show that the proposed framework is effective in improving large language model performance.

Overall, the paper contributes to the understanding of textual frequency in large language models and provides a novel framework for improving their performance. The proposed framework has the potential to be applied to various natural language processing tasks and can lead to more efficient and effective large language models.


πŸ“… Published on Apr 2

πŸ”— Links:
β€’ arXiv: https://arxiv.org/abs/2604.02176
β€’ PDF: https://arxiv.org/pdf/2604.02176
β€’ GitHub: https://github.com/HongyuanLuke/frequencylaw ⭐ 658

πŸ“Š Datasets citing this paper:
β€’ https://huggingface.co/datasets/Akaashiiii/TFPD

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#AdamSLaw #TextualFrequencyAnalysis #LargeLanguageModels #NaturalLanguageProcessing #LanguageModelOptimization
❀2
AI & ML Papers
Photo
πŸ”₯ Fish Audio S2 Technical Report

πŸ’‘ The paper introduces Fish Audio S2, an open source text to speech system that features multi speaker capabilities, multi turn generation, and instruction following control through natural language descriptions. The system utilizes a multi stage training approach, which includes a staged data pipeline covering video captioning, speech captioning, voice quality assessment, and reward modeling. This approach allows for scalable training and improves the overall performance of the system. The authors also release their model weights, fine tuning code, and an inference engine, making it production ready for streaming. The inference engine achieves a real time factor of 0.195 and a time to first audio of below 100 milliseconds, indicating its efficiency and speed. The code and weights are made available on GitHub and Hugging Face, and users are encouraged to try custom voices on the website. Overall, the paper contributes to the advancement of open source text to speech systems, providing a robust and efficient solution for generating high quality speech.


πŸ“… Published on Mar 9

πŸ”— Links:
β€’ arXiv: https://arxiv.org/abs/2603.08823
β€’ PDF: https://arxiv.org/pdf/2603.08823
β€’ Project Page: https://fish.audio/
β€’ GitHub: https://github.com/fishaudio/fish-speech ⭐ 30.2k

πŸ€– Models citing this paper:
β€’ https://huggingface.co/fishaudio/s2-pro
β€’ https://huggingface.co/drbaph/s2-pro-fp8
β€’ https://huggingface.co/mlx-community/fish-audio-s2-pro-bf16

πŸ“Š Datasets citing this paper:
β€’ https://huggingface.co/datasets/Izzyzlin/CFSDD

πŸš€ Spaces citing this paper:
β€’ https://huggingface.co/spaces/artificialguybr/fish-s2-pro-zero
β€’ https://huggingface.co/spaces/fguilleme/fish-s2-pro-zero
β€’ https://huggingface.co/spaces/MAYA-AI/fish-s2-pro-zero

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#TextToSpeechSystems #MultispeakerSynthesis #NaturalLanguageProcessing #SpeechGenerationModels #RealTimeAudioProcessing
❀4πŸ‘2
AI & ML Papers
Photo
πŸ”₯ Transformer Explainer: Interactive Learning of Text-Generative Models

πŸ’‘ The paper introduces Transformer Explainer, an interactive visualization tool that helps non-experts understand the inner workings of the GPT-2 model. The problem addressed is that Transformers, despite being a revolutionary machine learning technology, are often opaque to those without extensive expertise. To tackle this issue, the authors developed a tool that provides a model overview and allows users to smoothly transition across different abstraction levels of mathematical operations and model structures.

The method used to create the tool involves integrating a live GPT-2 instance that runs locally in the user's browser, enabling users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. This approach allows users to gain hands-on experience and intuition about complex Transformer concepts without requiring installation or special hardware.

The results of this work are a publicly available, open-sourced tool that broadens access to education on modern generative AI techniques. The tool is accessible at a provided website and a video demo is also available, showcasing the tool's capabilities. Overall, the paper contributes to making Transformers more accessible and understandable to a wider audience, including non-experts, by providing an interactive and intuitive learning experience.


πŸ“… Published on Aug 8, 2024

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2408.04619
β€’ PDF: https://arxiv.org/pdf/2408.04619
β€’ Project Page: https://poloclub.github.io/transformer-explainer/

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#TransformerModels #GPT2Explained #NaturalLanguageProcessing #TextGenerationModels #ExplainableAI
AI & ML Papers
Photo
πŸ”₯ Foundations of Large Language Models

πŸ’‘ The book Foundations of Large Language Models provides a comprehensive overview of the fundamental concepts underlying large language models. The book is structured into four main chapters, each focusing on a key area: pre-training, generative models, prompting techniques, and alignment methods. The authors aim to provide a foundational understanding of large language models, rather than a comprehensive coverage of all cutting-edge technologies. The book is intended for college students, professionals, and practitioners in natural language processing and related fields, serving as a reference for anyone interested in large language models.

The problem addressed by the book is the need for a clear understanding of the foundational concepts of large language models, which are becoming increasingly important in natural language processing. The method used to address this problem is a structured approach, dividing the topic into four key areas and exploring each in depth. The results of this approach are a book that provides a solid foundation for understanding large language models, which can be used as a reference by students, professionals, and practitioners in the field.

Overall, the book provides a foundational understanding of large language models, covering key areas such as pre-training, generative models, prompting techniques, and alignment methods, and is intended to serve as a reference for those interested in this topic. The book does not aim to cover all cutting-edge technologies, but rather provides a solid foundation for understanding the underlying concepts of large language models.


πŸ“… Published on Jan 16, 2025

πŸ”— Links:
β€’ GitHub: https://github.com/huggingface
β€’ arXiv: https://arxiv.org/abs/2501.09223
β€’ PDF: https://arxiv.org/pdf/2501.09223

━━━━━━━━━━━━━━━━━━━━━━━━
πŸ“’ By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #NaturalLanguageProcessing #PreTrainingMethods #GenerativeModels #LanguageModelAlignment
❀1