AI & ML Papers

🔥 LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

💡 The paper introduces LlamaFactory, a unified framework that enables efficient fine-tuning of large language models across various tasks. The problem addressed is that fine-tuning these models requires significant effort and coding expertise, which can be a barrier for many users. To solve this, LlamaFactory integrates a suite of cutting-edge efficient training methods, allowing users to customize the fine-tuning of over 100 language models without needing to write code. This is made possible through a web-based user interface called LlamaBoard, which provides a flexible and user-friendly way to fine-tune language models. The authors validate the efficiency and effectiveness of LlamaFactory on language modeling and text generation tasks, demonstrating its potential. The framework has been released publicly and has already gained significant attention, with over 13,000 stars and 1,600 forks on GitHub. Overall, LlamaFactory contributes to the field by providing a unified and accessible way to fine-tune large language models, making it easier for researchers and practitioners to adapt these models to specific tasks and applications.

📅 Published on Mar 20, 2024

🔗 Links:
• arXiv: https://arxiv.org/abs/2403.13372
• PDF: https://arxiv.org/pdf/2403.13372
• Project Page: https://huggingface.co/spaces/hiyouga/LLaMA-Board
• GitHub: https://github.com/hiyouga/LLaMA-Factory ⭐ 70.9k

🤖 Models citing this paper:
• https://huggingface.co/AELLM/Llama-3.2-Chibi-3B
• https://huggingface.co/GXMZU/Qwen3-14B-ai-expert
• https://huggingface.co/Xin-Rui/LLAMA-Fac-NEW-A800

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/hiyouga/LLaMA-Board
• https://huggingface.co/spaces/Justinrune/LLaMA-Factory
• https://huggingface.co/spaces/Darok/Featherless-Feud

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#EfficientFineTuning #LanguageModelOptimization #UnifiedTrainingFrameworks #LargeLanguageModelDevelopment #AutomatedModelCustomization

arXiv.org

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks. However, it requires non-trivial efforts to implement these methods on different models. We present...

419 views18:56

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

AI & ML Papers

Photo

🔥 Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

💡 The paper introduces mmGRPO, a multi-module extension of Group Relative Policy Optimization, to improve the accuracy of modular AI systems that combine multiple language model calls and prompts. The problem addressed is that existing methods, such as GRPO, are not effective for optimizing language models in modular systems where multiple tasks are performed. The authors propose mmGRPO, which groups language model calls by module and handles variable-length and interrupted trajectories. The method is composed with automatic prompt optimization to further improve accuracy. The results show that mmGRPO improves accuracy by 11% on average across various tasks, including classification, many-hop search, and privacy-preserving delegation, compared to post-trained language models. Additionally, mmGRPO outperforms prompt optimization alone by 5%. The authors have open-sourced mmGRPO as the dspyGRPO optimizer, making it available for use in modular AI systems. Overall, the paper contributes a new method for optimizing language models in modular systems, which can lead to improved performance in a range of tasks.

📅 Published on Aug 6, 2025

🔗 Links:
• arXiv: https://arxiv.org/abs/2508.04660
• PDF: https://arxiv.org/pdf/2508.04660
• Project Page: https://dspy.ai
• GitHub: https://github.com/stanfordnlp/dspy ⭐ 34.2k

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MultiModuleLearning #LanguageModelOptimization #PolicyGradientMethods #ModularAISystems #PromptOptimizationTechniques

arXiv.org

Composing Policy Gradients and Prompt Optimization for Language...

Group Relative Policy Optimization (GRPO) has proven to be an effective tool for post-training language models (LMs). However, AI systems are increasingly expressed as modular programs that mix...

410 views05:00

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

AI & ML Papers

Photo

🔥 Adam's Law: Textual Frequency Law on Large Language Models

💡 The paper proposes a novel framework to improve large language model performance through textual frequency analysis. The authors argue that textual frequency, which is the frequency of certain words or phrases in a language, is relevant to human cognition and can also be applied to large language models. However, this topic has been understudied in the context of large language models.

The proposed framework consists of three main components. First, the authors introduce the Textual Frequency Law, which states that frequent textual data should be preferred for large language models, both for prompting and fine-tuning. To estimate the sentence-level frequency, the authors use online resources, as many large language models are closed-source in their training data. They also utilize an input paraphraser to paraphrase the input into a more frequent textual expression.

The second component is Textual Frequency Distillation, which involves querying large language models to conduct story completion by extending sentences in the datasets. The resulting corpora are used to adjust the initial estimation of textual frequency.

The third component is Curriculum Textual Frequency Training, which fine-tunes large language models in an increasing order of sentence-level frequency. This means that the models are first trained on the most frequent sentences and then gradually moved to less frequent ones.

The authors conducted experiments on a curated dataset called Textual Frequency Paired Dataset, which covers tasks such as math reasoning, machine translation, commonsense reasoning, and agentic tool calling. The results show that the proposed framework is effective in improving large language model performance.

Overall, the paper contributes to the understanding of textual frequency in large language models and provides a novel framework for improving their performance. The proposed framework has the potential to be applied to various natural language processing tasks and can lead to more efficient and effective large language models.

📅 Published on Apr 2

🔗 Links:
• arXiv: https://arxiv.org/abs/2604.02176
• PDF: https://arxiv.org/pdf/2604.02176
• GitHub: https://github.com/HongyuanLuke/frequencylaw ⭐ 658

📊 Datasets citing this paper:
• https://huggingface.co/datasets/Akaashiiii/TFPD

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#AdamSLaw #TextualFrequencyAnalysis #LargeLanguageModels #NaturalLanguageProcessing #LanguageModelOptimization

arXiv.org

Adam's Law: Textual Frequency Law on Large Language Models

While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom studied. We propose a novel research direction...

❤2

543 views05:00

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

AI & ML Papers

Photo

🔥 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

💡 The paper proposes a novel approach to improve the performance of large language models through test-time scaling, which involves allocating additional computation during inference. Existing test-time scaling strategies are typically hand-crafted, relying on manual design and tuning of reasoning patterns and heuristics. This approach leaves much of the computation-allocation space unexplored, resulting in potential inefficiencies.

To address this limitation, the authors introduce AutoTTS, an environment-driven framework that automates the discovery of test-time scaling strategies. Instead of designing individual strategies, researchers can create environments where optimal strategies can be discovered automatically. The key to AutoTTS lies in constructing a discovery environment that provides a tractable control space and frequent, low-cost feedback for strategy search.

The authors formulate test-time scaling as a controller synthesis problem over pre-collected reasoning trajectories and probe signals. In this framework, controllers decide when to branch, continue, probe, prune, or stop, and can be evaluated cheaply without requiring repeated calls to the language model. To make the search tractable, the authors introduce beta parameterization, which enables fine-grained execution trace feedback to improve discovery efficiency.

The proposed approach is evaluated on mathematical reasoning benchmarks, where the discovered strategies demonstrate improved accuracy-cost tradeoffs over strong manually designed baselines. The discovered strategies also generalize to held-out benchmarks and model scales, indicating their robustness and flexibility. Notably, the entire discovery process incurs a relatively low cost of 39.9 dollars and 160 minutes, making it a practical and efficient solution.

Overall, the paper contributes a novel framework for automating test-time scaling strategy discovery, which has the potential to improve the performance of large language models while reducing the need for manual design and tuning. The authors also make their data and code available, facilitating further research and development in this area.

📅 Published on May 8

🔗 Links:
• arXiv: https://arxiv.org/abs/2605.08083
• PDF: https://arxiv.org/pdf/2605.08083
• Project Page: https://zhengkid.github.io/AutoTTS-web/
• GitHub: https://github.com/zhengkid/AutoTTS ⭐ 43

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #TestTimeScaling #AgenticDiscovery #AutomatedReasoning #LanguageModelOptimization

arXiv.org

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are...

❤3

545 views21:49

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

AI & ML Papers

Photo

🔥 Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

💡 This paper introduces a new approach called rStar that improves the reasoning capabilities of small language models without requiring fine-tuning or larger models. The problem addressed is that small language models often struggle with complex reasoning tasks, which can limit their ability to solve problems. The rStar method involves a self-play mutual generation-discrimination process, where one small language model generates reasoning trajectories using a Monte Carlo Tree Search with human-like reasoning actions, and another similar model acts as a discriminator to verify these trajectories. The trajectories that are mutually agreed upon are considered more likely to be correct. The results show that rStar can effectively solve diverse reasoning problems, including math and strategy-based tasks, and significantly improves the accuracy of small language models. For example, rStar boosts the accuracy of one model from 12.51 percent to 63.91 percent on a specific task, and from 36.46 percent to 81.88 percent on another model. Overall, the rStar approach makes smaller language models stronger problem-solvers without requiring additional training or larger models.

📅 Published on Aug 12, 2024

🔗 Links:
• arXiv: https://arxiv.org/abs/2408.06195
• PDF: https://arxiv.org/pdf/2408.06195
• GitHub: https://github.com/codelion/optillm ⭐ 3.7k

🚀 Spaces citing this paper:
• https://huggingface.co/spaces/algorithmicsuperintelligence/OptiLLM
• https://huggingface.co/spaces/fabiodr/optillm
• https://huggingface.co/spaces/EduuGomes/CachoeiraBot

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#MutualReasoning #LLMProblemSolving #MonteCarloTreeSearch #SelfPlayLearning #LanguageModelOptimization

arXiv.org

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar...

425 views03:50

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

AI & ML Papers

Photo

🔥 DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

💡 The paper introduces DataFlex, a unified framework for dynamic data-centric training of large language models. The problem addressed is that existing approaches to data selection, data mixture optimization, and data reweighting are often developed in isolated codebases, making it difficult to reproduce, compare, and integrate them. DataFlex solves this problem by providing a unified framework that supports three major paradigms of dynamic data optimization: sample selection, domain mixture adjustment, and sample reweighting.

The method involves building DataFlex upon the LLaMA-Factory framework, which allows for extensible trainer abstractions and modular components. This enables a drop-in replacement for standard large language model training and unifies key model-dependent operations such as embedding extraction, inference, and gradient computation. DataFlex is also compatible with large-scale settings, including DeepSpeed ZeRO-3.

The results show that DataFlex provides an effective, efficient, and reproducible infrastructure for data-centric dynamic training of large language models. Comprehensive experiments demonstrate that dynamic data selection consistently outperforms static full-data training, and data mixture methods improve both accuracy and perplexity over default proportions. Additionally, DataFlex achieves consistent runtime improvements over original implementations. Overall, the paper contributes a unified framework that enables efficient large-scale deployment of data-centric dynamic training methods for large language models.

📅 Published on Mar 27

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2603.26164
• PDF: https://arxiv.org/pdf/2603.26164
• Project Page: https://opendcai.github.io/DataFlex-Doc/en/

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://xn--r1a.website/PaperNexus

#LargeLanguageModels #DataCentricTraining #DynamicTrainingMethods #LanguageModelOptimization #DataDrivenAI

GitHub

Hugging Face

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

533 views03:54

✨ Join Best TG Channels

👋 Join Our WhatsApp Channel

📝 Contact / Collaborate

About

Blog

Apps

Platform