AI & ML Papers
32.8K subscribers
7.07K photos
523 videos
24 files
7.72K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🤖🧠 vLLM Semantic Router: The Next Frontier in Intelligent Model Routing for LLMs

🗓️ 11 Nov 2025
📚 AI News & Trends

As large language models (LLMs) continue to evolve, organizations face new challenges in optimizing performance, accuracy and cost across various AI workloads. Running multiple models efficiently – each specialized for specific tasks has become essential for scalable AI deployment. Enter vLLM Semantic Router, an open-source innovation that introduces a new layer of intelligence to the ...

#vLLMSemanticRouter #LargeLanguageModels #AIScaling #ModelRouting #OpenSourceAI #LLMOptimization
🤖🧠 Plandex AI: The Future of Autonomous Coding Agents for Large-Scale Development

🗓️ 11 Nov 2025
📚 AI News & Trends

As software development becomes increasingly complex, developers are turning to AI tools that can manage, understand and automate large portions of the coding workflow. Among the most promising innovations in this space is Plandex AI, an open-source terminal-based coding agent designed for real-world, large-scale projects. Unlike simple AI coding assistants that handle small snippets, Plandex ...

#PlandexAI #AutonomousCoding #LargeScaleDevelopment #AICoding #OpenSourceAI #CodeAutomation
🤖🧠 Bytebot: The Future of AI Desktop Automation

🗓️ 12 Nov 2025
📚 AI News & Trends

In the era of rapid digital transformation, automation is the driving force behind business efficiency and innovation. While most AI agents are limited to browsers or APIs, a groundbreaking open-source project called Bytebot has redefined what AI can achieve. Bytebot introduces a self-hosted AI desktop agent — a virtual computer that performs complex, multi-step tasks ...

#Bytebot #AIDesktopAutomation #SelfHostedAI #OpenSourceAI #AIAgents #TaskAutomation
1
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

📝 Summary:
MiroThinker v1.0 is an open-source research agent introducing 'interactive scaling.' It trains models with reinforcement learning for deeper agent-environment interactions, performing up to 600 tool calls per task. This achieves state-of-the-art performance and establishes interaction depth as a ...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11793
• PDF: https://arxiv.org/pdf/2511.11793
• Project Page: https://dr.miromind.ai/
• Github: https://github.com/MiroMindAI/MiroThinker

🔹 Models citing this paper:
https://huggingface.co/miromind-ai/MiroThinker-v1.0-72B
https://huggingface.co/miromind-ai/MiroThinker-v1.0-8B
https://huggingface.co/miromind-ai/MiroThinker-v1.0-30B

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#MiroThinker #ResearchAgents #ReinforcementLearning #OpenSourceAI #LLM
1
Mobile-Agent-v3: Foundamental Agents for GUI Automation

📝 Summary:
GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models achieving state-of-the-art performance on GUI benchmarks. GUI-Owl introduces large-scale environment infrastructure, diverse agent capabilities, and scalable reinforcement learning, with Mobile-Agent-v3 further improving these results.

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent

🔹 Models citing this paper:
https://huggingface.co/mPLUG/GUI-Owl-7B
https://huggingface.co/mPLUG/GUI-Owl-32B
https://huggingface.co/mPLUG/GUI-Owl-7B-Desktop-RL

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#GUIAgent #Automation #ReinforcementLearning #AIResearch #OpenSourceAI
Scaling Open-Ended Reasoning to Predict the Future

📝 Summary:
This work trains language models for open-ended future prediction using a new dataset synthesized from news. Their OpenForecaster 8B model matches larger proprietary models in accuracy, calibration, and consistency. All resources are open-sourced.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25070
• PDF: https://arxiv.org/pdf/2512.25070
• Project Page: https://www.openforecaster.github.io
• Github: https://github.com/OpenForecaster/scaling-forecasting-training

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLMs #FuturePrediction #AI #OpenSourceAI #MachineLearning
BitNet b1.58 2B4T Technical Report

📝 Summary:
BitNet b1.58 2B4T is the first open-source 1-bit Large Language Model with 2 billion parameters. It matches full-precision LLM performance while offering significant improvements in computational efficiency like reduced memory and energy. The model weights are openly released for research.

🔹 Publication Date: Published on Apr 16, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.12285
• PDF: https://arxiv.org/pdf/2504.12285
• Github: https://github.com/microsoft/bitnet

🔹 Models citing this paper:
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16

Spaces citing this paper:
https://huggingface.co/spaces/suayptalha/Chat-with-Bitnet-b1.58-2B-4T
https://huggingface.co/spaces/aizip-dev/SLM-RAG-Arena
https://huggingface.co/spaces/Tonic/Native_1-bit_LLM

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#LLM #AI #Quantization #OpenSourceAI #DeepLearning
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning

📝 Summary:
Chroma 1.0 is the first open-source real-time end-to-end spoken dialogue model with personalized voice cloning. It achieves low-latency interaction and high-fidelity voice synthesis, improving speaker similarity by 10.96% over a human baseline.

🔹 Publication Date: Published on Jan 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11141
• PDF: https://arxiv.org/pdf/2601.11141
• Project Page: https://www.flashlabs.ai/flashai-voice-agents
• Github: https://github.com/FlashLabs-AI-Corp/FlashLabs-Chroma

🔹 Models citing this paper:
https://huggingface.co/FlashLabs/Chroma-4B

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#ConversationalAI #VoiceCloning #RealTimeAI #OpenSourceAI #TTS
dLLM: Simple Diffusion Language Modeling

📝 Summary:
dLLM is an open-source framework standardizing core components of diffusion language modeling. It addresses the issue of scattered, hard-to-reproduce DLM implementations, enabling easy reproduction, customization, and development of both small and large diffusion language models.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.22661
• PDF: https://arxiv.org/pdf/2602.22661
• Project Page: https://github.com/ZHZisZZ/dllm
• Github: https://github.com/ZHZisZZ/dllm

🔹 Models citing this paper:
https://huggingface.co/dllm-hub/ModernBERT-large-chat-v0.1
https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-mdlm-v0.1
https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#DiffusionModels #LanguageModeling #LLMs #OpenSourceAI #AIResearch
EXAONE 4.5 Technical Report

📝 Summary:
EXAONE 4.5 is LG AI Research's first open-weight vision language model, integrating a visual encoder into EXAONE 4.0. It enhances document understanding and general language capabilities through targeted data and extended context, outperforming similar models in document tasks.

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08644
• PDF: https://arxiv.org/pdf/2604.08644
• Github: https://github.com/LG-AI-EXAONE/EXAONE-4.5

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#VisionLanguageModel #AI #DocumentUnderstanding #MultimodalAI #OpenSourceAI
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

📝 Summary:
Nemotron 3 Nano Omni is a new efficient, open multimodal AI model. It natively supports audio, text, images, and video inputs, improving accuracy and efficiency over previous versions. It excels in document understanding and long audio-video comprehension.

🔹 Publication Date: Published on Apr 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.24954
• PDF: https://arxiv.org/pdf/2604.24954

🔹 Models citing this paper:
https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8

Spaces citing this paper:
https://huggingface.co/spaces/akhaliq/Nemotron-3-Nano-Omni
https://huggingface.co/spaces/developerjeremylive/Nemotron-3-Nano-Omni-etheroi

==================================

For more data science resources:
https://xn--r1a.website/DataScienceT

#AI #MultimodalAI #DeepLearning #OpenSourceAI #AIResearch
2