This new research introduces Agyn, an open-source multi-agent platform that models software engineering as a team-based organizational process rather than a monolithic task.
The system configures a team of four specialized agents: a manager, researcher, engineer, and reviewer. Each operates within its own isolated sandbox with role-specific tools, prompts, and language model configurations. The manager agent coordinates dynamically based on intermediate outcomes rather than following a fixed pipeline.
What makes the design interesting?
Different agents use different models depending on their role. The manager and researcher run on GPT-5 for stronger reasoning and broader context. The engineer and reviewer use GPT-5-Codex, a smaller code-specialized model optimized for iterative implementation and debugging. This mirrors how real teams allocate resources based on task requirements.
The workflow follows a GitHub-native process. Agents analyze issues, create pull requests, conduct inline code reviews, and iterate through revision cycles until the reviewer explicitly approves. No human intervention at any point. The number of steps isn't predetermined. It emerges from task complexity.
The system configures a team of four specialized agents: a manager, researcher, engineer, and reviewer. Each operates within its own isolated sandbox with role-specific tools, prompts, and language model configurations. The manager agent coordinates dynamically based on intermediate outcomes rather than following a fixed pipeline.
What makes the design interesting?
Different agents use different models depending on their role. The manager and researcher run on GPT-5 for stronger reasoning and broader context. The engineer and reviewer use GPT-5-Codex, a smaller code-specialized model optimized for iterative implementation and debugging. This mirrors how real teams allocate resources based on task requirements.
The workflow follows a GitHub-native process. Agents analyze issues, create pull requests, conduct inline code reviews, and iterate through revision cycles until the reviewer explicitly approves. No human intervention at any point. The number of steps isn't predetermined. It emerges from task complexity.
arXiv.org
Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering
Large language models have demonstrated strong capabilities in individual software engineering tasks, yet most autonomous systems still treat issue resolution as a monolithic or pipeline-based...
🔥3❤2👏2
Stripe launched (a preview) of machine payments a way for developers to directly charge agents, with a few lines of code.
Stripe launched with support for x402 using USDC stablecoins on base, with more protocols, payment methods, currencies, and chains to come.
And sales tax, refunds, and reporting just work. (You only need to think about crypto if you want to!)
Also released an open source cli called `purl` for you (and your bots) to test machine payments in the terminal, along with Node and Python samples. Yes, payments + curl creatively smushed together.
Stripe launched with support for x402 using USDC stablecoins on base, with more protocols, payment methods, currencies, and chains to come.
And sales tax, refunds, and reporting just work. (You only need to think about crypto if you want to!)
Also released an open source cli called `purl` for you (and your bots) to test machine payments in the terminal, along with Node and Python samples. Yes, payments + curl creatively smushed together.
Stripe
Machine payments
Machine payments allows automated systems and AI agents to make payments on behalf of users.
❤3🔥3👏2
Google is adding a way for consumers to buy things while seeking AI powered answers on search and in its Gemini chatbot — part of a plan to make money more directly from consumers’ AI use.
Bloomberg.com
Google Pushes AI Shopping Features in Search and Gemini Chatbot
Google is adding a way for consumers to buy things while seeking artificial intelligence-powered answers on search and in its Gemini chatbot — part of a plan to make money more directly from consumers’ AI use.
❤2👍2🔥2
Zhipu released GLM-5
The model is open source. It matches Claude Opus 4.5 on coding benchmarks. Beats Gemini 3 Pro on some tests. But the interesting part isn't the benchmarks.
GLM-5 is built for agents. The company designed it for long-running tasks and tool invocation. In the τ²-Bench interactive tool evaluation, it scored 84.7, beating Claude Sonnet 4.5.
Think about what that means. A model designed to work inside coding environments like Claude Code, Kilo Code, and Cline. "Think before you act" mechanisms baked into the architecture. Better planning for complex multi-step tasks.
Zhipu's traffic has jumped five-fold recently. The company had to implement subscription limits to handle demand. Most of that demand is coming from the US and China, followed by India, Japan, and Brazil.
The release pace is accelerating. GLM-4.6 came out in September. GLM-4.7 in January. GLM-5 in February. That's three major versions in six months.
DeepSeek proved that open models can spread fast when they're genuinely good. Zhipu is following the same playbook. Open weights, strong coding performance, agent optimization.
7 of the top 10 AI models on current leaderboards are now Chinese. The competition isn't just about who has the smartest model anymore. It's about who builds the best tools for developers.
The model is open source. It matches Claude Opus 4.5 on coding benchmarks. Beats Gemini 3 Pro on some tests. But the interesting part isn't the benchmarks.
GLM-5 is built for agents. The company designed it for long-running tasks and tool invocation. In the τ²-Bench interactive tool evaluation, it scored 84.7, beating Claude Sonnet 4.5.
Think about what that means. A model designed to work inside coding environments like Claude Code, Kilo Code, and Cline. "Think before you act" mechanisms baked into the architecture. Better planning for complex multi-step tasks.
Zhipu's traffic has jumped five-fold recently. The company had to implement subscription limits to handle demand. Most of that demand is coming from the US and China, followed by India, Japan, and Brazil.
The release pace is accelerating. GLM-4.6 came out in September. GLM-4.7 in January. GLM-5 in February. That's three major versions in six months.
DeepSeek proved that open models can spread fast when they're genuinely good. Zhipu is following the same playbook. Open weights, strong coding performance, agent optimization.
7 of the top 10 AI models on current leaderboards are now Chinese. The competition isn't just about who has the smartest model anymore. It's about who builds the best tools for developers.
👍3🔥2👏2🆒2
The agent economy just got a real marketplace
Moltlaunch is live on Base. Browse specialized AI agents, hire them for real work, and back the ones you believe in.
Every completed job burns tokens and leaves a review onchain through ERC-8004.
Moltlaunch is live on Base. Browse specialized AI agents, hire them for real work, and back the ones you believe in.
Every completed job burns tokens and leaves a review onchain through ERC-8004.
Moltlaunch
moltlaunch — hire AI agents, pay with ETH
The agent marketplace and open protocol for agent work. Trustless escrow, permanent reputation, tradeable tokens on Base.
🔥5❤2👏2
Does being a math genius make an AI better at understanding human intentions?
Researchers from Arizona State University and Microsoft Research Asia investigated whether the step-by-step logic used for coding helps AI master Theory of Mind—the ability to sense what others are thinking and feeling.
The results show that more thinking time can actually cause social reasoning to collapse, with advanced reasoning models often being outperformed by simpler ones. Unlike in math or code, these models frequently rely on answer-matching shortcuts rather than true deduction, proving that social intelligence requires a unique approach beyond existing reasoning methods.
Researchers from Arizona State University and Microsoft Research Asia investigated whether the step-by-step logic used for coding helps AI master Theory of Mind—the ability to sense what others are thinking and feeling.
The results show that more thinking time can actually cause social reasoning to collapse, with advanced reasoning models often being outperformed by simpler ones. Unlike in math or code, these models frequently rely on answer-matching shortcuts rather than true deduction, proving that social intelligence requires a unique approach beyond existing reasoning methods.
arXiv.org
To Think or Not To Think, That is The Question for Large Reasoning...
Theory of Mind (ToM) assesses whether models can infer hidden mental states such as beliefs, desires, and intentions, which is essential for natural social interaction. Although recent progress in...
🔥4🥰3👏2
OpenClaw is cool, but too large?
Hong Kong released nanobot to solve this exact problem.
Researchers transformed the massive OpenClaw system into a clean 4,000-line Python framework that focuses on a simple loop: receive input, let the AI think, and execute tools like file management or web searches.
It strips away complex abstractions to focus on clear, modular function calls that any developer can understand.
By slashing code complexity by 99 percent, they achieved full functional parity with a 2-minute deployment time, making it significantly easier to customize and learn than traditional bloated agent architectures.
Hong Kong released nanobot to solve this exact problem.
Researchers transformed the massive OpenClaw system into a clean 4,000-line Python framework that focuses on a simple loop: receive input, let the AI think, and execute tools like file management or web searches.
It strips away complex abstractions to focus on clear, modular function calls that any developer can understand.
By slashing code complexity by 99 percent, they achieved full functional parity with a 2-minute deployment time, making it significantly easier to customize and learn than traditional bloated agent architectures.
GitHub
GitHub - HKUDS/nanobot: Lightweight, open-source AI agent for your tools, chats, and workflows.
Lightweight, open-source AI agent for your tools, chats, and workflows. - HKUDS/nanobot
🆒5👍3🔥3❤2
Researchers from Huazhong University of Science and Technology and ByteDance Seed just introduced Stable-DiffCoder.
Instead of writing code one token at a time like standard models, this method uses a block diffusion approach to generate and refine code chunks simultaneously, resulting in more stable and structured programming.
The results show it outperforms its autoregressive counterparts and various 8B-parameter models on major benchmarks, specifically excelling in code editing, logical reasoning, and low-resource programming languages.
Code
Models.
Instead of writing code one token at a time like standard models, this method uses a block diffusion approach to generate and refine code chunks simultaneously, resulting in more stable and structured programming.
The results show it outperforms its autoregressive counterparts and various 8B-parameter models on major benchmarks, specifically excelling in code editing, logical reasoning, and low-resource programming languages.
Code
Models.
🆒3❤2🔥2🥰2
Google shared new work on envisioning Intelligent AI Delegation
As they've discussed previously, the expansion of the agentic web opens up new opportunities for establishing virtual agentic economies and steerable markets.
Collective intelligence is likely to play an increasingly important role in the coming period, as complex tasks may get distributed across nodes, where each agent may be able to leverage their unique skills and differential access to tools, libraries, and data, to more efficiently and effectively handle sub-tasks that are distributed across the network.
Yet, delegation is more than just task decomposition into manageable sub-units of action. Beyond the creation of sub-tasks, delegation necessitates the assignment of responsibility and authority and thus implicates accountability for outcomes. Delegation thus involves risk assessment, which can be moderated by trust. Delegation further involves capability matching and continuous performance monitoring, incorporating dynamic adjustments based on feedback, and ensuring completion of the distributed task under the specified constraints.
There is a pressing need for Intelligent Delegation - a robust framework centered around clear roles, boundaries, reputation, trust, transparency, certifiable agentic capabilities, verifiable task execution, and scalable task distribution.
Google’s framework thus proposed intelligent AI delegation that incorporates components for dynamic assessment, adaptive execution, structural transparency, scalable market coordination, and systemic resilience. Google proposed a framework that adapts the approach based on the criticality of the task at hand, its reversibility, resource requirements, complexity, projected duration, and other important properties.
Google introduced a notion of contract-first decomposition as a binding constraint, rendering task delegation is contingent upon the outcome having precise verification.
As they've discussed previously, the expansion of the agentic web opens up new opportunities for establishing virtual agentic economies and steerable markets.
Collective intelligence is likely to play an increasingly important role in the coming period, as complex tasks may get distributed across nodes, where each agent may be able to leverage their unique skills and differential access to tools, libraries, and data, to more efficiently and effectively handle sub-tasks that are distributed across the network.
Yet, delegation is more than just task decomposition into manageable sub-units of action. Beyond the creation of sub-tasks, delegation necessitates the assignment of responsibility and authority and thus implicates accountability for outcomes. Delegation thus involves risk assessment, which can be moderated by trust. Delegation further involves capability matching and continuous performance monitoring, incorporating dynamic adjustments based on feedback, and ensuring completion of the distributed task under the specified constraints.
There is a pressing need for Intelligent Delegation - a robust framework centered around clear roles, boundaries, reputation, trust, transparency, certifiable agentic capabilities, verifiable task execution, and scalable task distribution.
Google’s framework thus proposed intelligent AI delegation that incorporates components for dynamic assessment, adaptive execution, structural transparency, scalable market coordination, and systemic resilience. Google proposed a framework that adapts the approach based on the criticality of the task at hand, its reversibility, resource requirements, complexity, projected duration, and other important properties.
Google introduced a notion of contract-first decomposition as a binding constraint, rendering task delegation is contingent upon the outcome having precise verification.
arXiv.org
Intelligent AI Delegation
AI agents are able to tackle increasingly complex tasks. To achieve more ambitious goals, AI agents need to be able to meaningfully decompose problems into manageable sub-components, and safely...
🔥4❤2👏2
MiniMax Introduced M2.5
Trained with Rl across hundreds of thousands of complex real-world environments, it delivers SOTA performance in coding, agentic tool use, search, and office workflows.
At $1 per hour with 100 tps, infinite scaling of long-horizon agents now economically possible.
GitHub.
Trained with Rl across hundreds of thousands of complex real-world environments, it delivers SOTA performance in coding, agentic tool use, search, and office workflows.
At $1 per hour with 100 tps, infinite scaling of long-horizon agents now economically possible.
GitHub.
GitHub
GitHub - MiniMax-AI/MiniMax-M2.5
Contribute to MiniMax-AI/MiniMax-M2.5 development by creating an account on GitHub.
❤1🔥1👏1
Moonshot AI Introduced Kimi Claw
OpenClaw, now native to kimi.com.
1. ClawHub Access: 5,000+ community skills in the ClawHub library.
2. 40GB Cloud Storage: Massive space for all your files
3. Pro-Grade Search: Fetch live, high-quality data directly from Yahoo Finance and more.
4. Bring Your Own Claw: Connect your third-party OpenClaw to kimi.com, chat with your setup, or bridge it to apps like Telegram groups.
OpenClaw, now native to kimi.com.
1. ClawHub Access: 5,000+ community skills in the ClawHub library.
2. 40GB Cloud Storage: Massive space for all your files
3. Pro-Grade Search: Fetch live, high-quality data directly from Yahoo Finance and more.
4. Bring Your Own Claw: Connect your third-party OpenClaw to kimi.com, chat with your setup, or bridge it to apps like Telegram groups.
Kimi
Kimi Claw | 24/7 AI Agent, Now with Claw Groups (Preview)
Deploy OpenClaw in minutes to build a 24/7 AI agent with memory and scheduled tasks. Experience Claw Groups (Preview) for multi-agent and human collaboration in shared groups.
👍6🔥2🥰2
Meet Qwen3.5-397B-A17B an open-weight vision-language model.
Built for the future of coding, reasoning, and seamless multimodal interaction.
Key Highlights:
Inference Efficiency: A massive 397B total parameters, but only 17B active—delivering flagship power at a fraction of the cost.
Hybrid Architecture: Innovative Gated Delta Networks (Linear Attention) + Sparse MoE for extreme speed.
True Multimodality: Exceptional performance across GUI interaction, video comprehension, and agentic workflows.
Global Scale: Qwen3.5 now supports over 200 languages.
Empowering developers and enterprises to build smarter, faster, and more versatile AI agents
Built for the future of coding, reasoning, and seamless multimodal interaction.
Key Highlights:
Inference Efficiency: A massive 397B total parameters, but only 17B active—delivering flagship power at a fraction of the cost.
Hybrid Architecture: Innovative Gated Delta Networks (Linear Attention) + Sparse MoE for extreme speed.
True Multimodality: Exceptional performance across GUI interaction, video comprehension, and agentic workflows.
Global Scale: Qwen3.5 now supports over 200 languages.
Empowering developers and enterprises to build smarter, faster, and more versatile AI agents
🍌2🔥1🥰1👏1
A Chinese hardware team introduced PicoClaw
They took a 430,000-line AI assistant that needs a $599 Mac Mini and 1GB of RAM — and rewrote it in Go so it runs on a $9.9 dev board with less than 10MB of memory.
Boot time: from 500 seconds to 1 second.
Cost: from $599 to $9.9.
Memory: from 1GB to 10MB.
Same features: code generation, web search, Discord/Telegram chat, memory system, scheduled tasks, security sandbox.
The wildest part? They claim 95% of the new codebase was written by AI agents themselves. The humans just guided the architecture. It's an AI assistant that literally rebuilt itself to be smaller.
Launched February 9th. Four days later: 7,400+ GitHub stars.
This is the pattern no one's talking about enough.
Every AI capability that starts expensive gets commoditized within months. GPT-4 level models went open source in 6 months. Now the hardware floor for running a personal AI agent just dropped 60x in weeks.
The infrastructure moat in AI isn't sustainable. The only defensible advantage is what you do with these tools — not access to them.
They took a 430,000-line AI assistant that needs a $599 Mac Mini and 1GB of RAM — and rewrote it in Go so it runs on a $9.9 dev board with less than 10MB of memory.
Boot time: from 500 seconds to 1 second.
Cost: from $599 to $9.9.
Memory: from 1GB to 10MB.
Same features: code generation, web search, Discord/Telegram chat, memory system, scheduled tasks, security sandbox.
The wildest part? They claim 95% of the new codebase was written by AI agents themselves. The humans just guided the architecture. It's an AI assistant that literally rebuilt itself to be smaller.
Launched February 9th. Four days later: 7,400+ GitHub stars.
This is the pattern no one's talking about enough.
Every AI capability that starts expensive gets commoditized within months. GPT-4 level models went open source in 6 months. Now the hardware floor for running a personal AI agent just dropped 60x in weeks.
The infrastructure moat in AI isn't sustainable. The only defensible advantage is what you do with these tools — not access to them.
❤12😁3🔥2🥰2
NVIDIA dropped PersonaPlex-7B
A full-duplex voice model that listens and talks at the same time.
No pauses. No turn-taking. Real conversation.
100% open source. Free.
Voice AI just leveled up.
A full-duplex voice model that listens and talks at the same time.
No pauses. No turn-taking. Real conversation.
100% open source. Free.
Voice AI just leveled up.
huggingface.co
nvidia/personaplex-7b-v1 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥7🥰2👏2
Can we train LLMs from scratch using only low-rank factorized weights and still match dense performance?
Short answer: yes (with care).
New work “Stabilizing Native Low-Rank LLM Pretraining”.
Short answer: yes (with care).
New work “Stabilizing Native Low-Rank LLM Pretraining”.
🔥3❤2👏2
Google: a critical gap in modern AI isn't language or vision. It's spatial grammar. And it reveals a fundamental data bottleneck.
They built MapTrace, a fully automated, generative AI pipeline (models act as creators/critics) to generate 2M high-quality map-path pairs.
The result: fine-tuning on this synthetic data lowered path-tracing errors and boosted the success rate by +6.4 points for Gemini 2.5 Flash on real-world maps.
Open-sourced the 2M question/answer pairs dataset for the research community to build the next generation of intuitive navigation and robotics.
They built MapTrace, a fully automated, generative AI pipeline (models act as creators/critics) to generate 2M high-quality map-path pairs.
The result: fine-tuning on this synthetic data lowered path-tracing errors and boosted the success rate by +6.4 points for Gemini 2.5 Flash on real-world maps.
Open-sourced the 2M question/answer pairs dataset for the research community to build the next generation of intuitive navigation and robotics.
Google Research
Teaching AI to read a map
We propose a system for synthetic data generation to train AI systems to visually follow any route on any map, finally teaching language models to navigate our world.
🔥3🥰2👏2
Together AI published a paper that made open-source models outperform GPT-4o on a major benchmark
The method is called Mixture-of-Agents. it doesn't fine-tune anything. it doesn't train anything. it just asks multiple llms the same question, then feeds their answers to another llm that synthesizes the best response
65.1% vs GPT-4o's 57.5% on AlpacaEval 2.0, all open-source models.
GitHub.
The method is called Mixture-of-Agents. it doesn't fine-tune anything. it doesn't train anything. it just asks multiple llms the same question, then feeds their answers to another llm that synthesizes the best response
65.1% vs GPT-4o's 57.5% on AlpacaEval 2.0, all open-source models.
GitHub.
arXiv.org
Mixture-of-Agents Enhances Large Language Model Capabilities
Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the...
🔥3❤2🥰2
Fastest Frontend Tooling for Humans & AI
cpojer.net
Fastest Frontend Tooling for Humans & AI
Frontend tooling in 2026+, with and without AI.
🔥3❤2👏2
Shanghai AI Laboratory presents AgentDoG
It is a new diagnostic guardrail framework that monitors AI agents in real-time. Instead of just blocking risky moves with a simple yes or no, it uses a specialized three-part system to explain the root cause of a danger and catch subtle, "hidden" errors that other models miss.
AgentDoG achieves SOTA performance in safety moderation across complex, interactive scenarios, outperforming current guardrail models in both transparency and accuracy.
GitHub
HF.
It is a new diagnostic guardrail framework that monitors AI agents in real-time. Instead of just blocking risky moves with a simple yes or no, it uses a specialized three-part system to explain the root cause of a danger and catch subtle, "hidden" errors that other models miss.
AgentDoG achieves SOTA performance in safety moderation across complex, interactive scenarios, outperforming current guardrail models in both transparency and accuracy.
GitHub
HF.
arXiv.org
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
The rise of AI agents introduces complex safety and security challenges arising from autonomous tool use and environmental interactions. Current guardrail models lack agentic risk awareness and...
🆒3❤2🔥2👏2
Zyphra introduced ZUNA is a BCI foundation model advancing towards thought-to-text
ZUNA a 380M-parameter BCI foundation model for EEG data, a significant milestone in the development of noninvasive thought-to-text.
Fully open source, Apache 2.0. HF. GitHub.
Noninvasive EEG data is easily accessible and information-dense making it a practical foundation for thought-to-text BCI applications.
EEG records brain electrical activity through scalp electrodes to diagnose various neurological conditions and monitor brain states.
While information rich, EEG data is often messy, plagued by channel dropouts, motion artifacts, and sparse electrode coverage.
ZUNA reconstructs high-fidelity brain signals from EEG data, enabling better diagnostics, research, and BCI applications without additional hardware.
Devices with fewer EEG sensors trade signal coverage for accessibility.
ZUNA predicts missing channels from sparse data and electrode coordinates, delivering clinical-grade signals that scale from consumer headsets to 256-electrode research systems, with no retraining.
ZUNA dramatically outperforms conventional methods like MNE’s spherical spline interpolation across masked and unseen EEG datasets.
Its advantage grows with higher upsampling, especially at 4x, where classical methods break down and ZUNA excels.
Trained on 2M channel-hours across 208 EEG datasets, ZUNA uses masked diffusion training and 4D spatial embeddings to generalize across data sets and arbitrary electrode layouts.
ZUNA a 380M-parameter BCI foundation model for EEG data, a significant milestone in the development of noninvasive thought-to-text.
Fully open source, Apache 2.0. HF. GitHub.
Noninvasive EEG data is easily accessible and information-dense making it a practical foundation for thought-to-text BCI applications.
EEG records brain electrical activity through scalp electrodes to diagnose various neurological conditions and monitor brain states.
While information rich, EEG data is often messy, plagued by channel dropouts, motion artifacts, and sparse electrode coverage.
ZUNA reconstructs high-fidelity brain signals from EEG data, enabling better diagnostics, research, and BCI applications without additional hardware.
Devices with fewer EEG sensors trade signal coverage for accessibility.
ZUNA predicts missing channels from sparse data and electrode coordinates, delivering clinical-grade signals that scale from consumer headsets to 256-electrode research systems, with no retraining.
ZUNA dramatically outperforms conventional methods like MNE’s spherical spline interpolation across masked and unseen EEG datasets.
Its advantage grows with higher upsampling, especially at 4x, where classical methods break down and ZUNA excels.
Trained on 2M channel-hours across 208 EEG datasets, ZUNA uses masked diffusion training and 4D spatial embeddings to generalize across data sets and arbitrary electrode layouts.
huggingface.co
Zyphra/ZUNA · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥3👏3❤2