Meet RoboMonkey is a framework for synthetic data generation + scaling test time compute for VLAs
Turns out generation (via repeated sampling) and verification (via training a verifier on synthetic data) works well for robotics too.
GitHub.
Datasets and models.
Serving engine.
Turns out generation (via repeated sampling) and verification (via training a verifier on synthetic data) works well for robotics too.
GitHub.
Datasets and models.
Serving engine.
π₯6
Salesforce introduced CoAct-1 β a hybrid agent that elevates coding to a first-class action alongside GUI manipulation.
On OSWorld, CoAct-1 achieves a new SOTA score of 60.76%, becoming the first CUA agent to cross the 60-point mark.
Takeaways:
- Treat code as an action, not just a tool call.
- Hybrid action space (code + GUI) reduces error accumulation and boosts reliability.
- New SOTA on OSWorld with better efficiency and broader applicability.
Paper.
On OSWorld, CoAct-1 achieves a new SOTA score of 60.76%, becoming the first CUA agent to cross the 60-point mark.
Takeaways:
- Treat code as an action, not just a tool call.
- Hybrid action space (code + GUI) reduces error accumulation and boosts reliability.
- New SOTA on OSWorld with better efficiency and broader applicability.
Paper.
linxins.net
CoAct-1
CoAct-1: Computer-using Agents with Coding as Actions
π₯6
Google Introduced DeepPolisher is a new open-source method to improve genome assembly accuracy. It reduces indel errors by 70% and total assembly errors by 50%.
research.google
Highly accurate genome polishing with DeepPolisher: Enhancing the foundation of genomic research
Microsoft presented Agent Lightning
Enables seamless agent optimization for any existing agent framework (e.g. LangChain) with any optim framework (e.g. DSPy) without any modifications to the agent code.
Paper.
Repo.
Enables seamless agent optimization for any existing agent framework (e.g. LangChain) with any optim framework (e.g. DSPy) without any modifications to the agent code.
Paper.
Repo.
Microsoft Research
Agent Lightning - Microsoft Research
Optimize ANY agent with ANY framework We present Agent Lightning (opens in new tab), a flexible and extensible framework that enables seamless agent optimization for any existing agent framework. Here agent optimization includes various data-driven techniquesβ¦
π₯3
β€5π2
Databricks released Agent Bricks, a new product that helps enterprises develop SOTA domain-specific agents.
Agent Learning from Human Feedback (ALHF) is a new paradigm where agents learn directly from minimal natural language feedback, not just labels or numeric rewards.
Agent Learning from Human Feedback (ALHF) is a new paradigm where agents learn directly from minimal natural language feedback, not just labels or numeric rewards.
π₯4π1π₯°1
Google released a brilliant research, a new active learning method for curating high-quality data that reduces training data requirements for fine-tuning LLMs by orders of magnitude.
You get the same or better model quality with 250 to 450 expert labels instead of 100K, thanks to focusing expert effort only on confusing boundary cases.
The team uses an LLM as a scout to sweep a huge pool of ads, then asks experts to judge only the handful of examples that truly confuse the model.
Those expert decisions train the next model iteration, and the loop repeats until modelβexpert agreement stops improving. It is classic active learning, but adapted to large, noisy, imbalanced traffic like ads, where only about 1% of items are actually problematic.
You get the same or better model quality with 250 to 450 expert labels instead of 100K, thanks to focusing expert effort only on confusing boundary cases.
The team uses an LLM as a scout to sweep a huge pool of ads, then asks experts to judge only the handful of examples that truly confuse the model.
Those expert decisions train the next model iteration, and the loop repeats until modelβexpert agreement stops improving. It is classic active learning, but adapted to large, noisy, imbalanced traffic like ads, where only about 1% of items are actually problematic.
research.google
Achieving 10,000x training data reduction with high-fidelity labels
π₯6β€3π₯°1π€‘1
Big AI security issue: ChatGPT vulnerability in Connectors can leak your API keys + "memory" with 0 clicks
AgentFlayer, demoed at BlackHat, shows an injected prompt in a doc can force an image to render that exfiltrates data through a malicious URL.
AgentFlayer, demoed at BlackHat, shows an injected prompt in a doc can force an image to render that exfiltrates data through a malicious URL.
Zenity Labs
AgentFlayer: ChatGPT Connectors 0click Attack
π3π3π₯2π―2
Tencent AI Lab introduced R-Zero. This framework enabling LLMs to self-evolve their reasoning capabilities
From zero human-curated data, through an autonomous Challenger-Solver loop.
R-Zero learns from scratch, with the Challenger proposing tasks at the edge of the Solver's ability.
This co-evolution boosts Qwen3-4B-Base by +6.49 on math and +7.54 on general reasoning.
GitHub.
From zero human-curated data, through an autonomous Challenger-Solver loop.
R-Zero learns from scratch, with the Challenger proposing tasks at the edge of the Solver's ability.
This co-evolution boosts Qwen3-4B-Base by +6.49 on math and +7.54 on general reasoning.
GitHub.
huggingface.co
Paper page - R-Zero: Self-Evolving Reasoning LLM from Zero Data
Join the discussion on this paper page
Renmin University of China and Huawei presented comprehensive survey of memory mechanisms in LLM-based agents:
β’ What memory is & why it matters
β’ How to design & evaluate it
β’ Key applications & use cases
β’ Limitations & future directions
A roadmap for building smarter, longer-lived AI agents.
GitHub.
β’ What memory is & why it matters
β’ How to design & evaluate it
β’ Key applications & use cases
β’ Limitations & future directions
A roadmap for building smarter, longer-lived AI agents.
GitHub.
arXiv.org
A Survey on the Memory Mechanism of Large Language Model based Agents
Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their...
π€1
The top AI agents by revenue @alwebbci
The AI agent market expected to 2x+ this year ($5B to $13B). 50% of top 20 were founded in the last 3 years.
Customer service AI agents command 127x revenue multiples vs. 52x average.
The AI agent market expected to 2x+ this year ($5B to $13B). 50% of top 20 were founded in the last 3 years.
Customer service AI agents command 127x revenue multiples vs. 52x average.
π₯6π2
Alibaba introduced Memp, a new framework that gives LLM agents learnable, updatable procedural memory.
This leads to steadily higher success rates and greater efficiency on complex tasks.
Memp distills past agent trajectories into both fine-grained instructions and high-level abstractions, continuously improving with new experience. It's even transferable to weaker models.
This leads to steadily higher success rates and greater efficiency on complex tasks.
Memp distills past agent trajectories into both fine-grained instructions and high-level abstractions, continuously improving with new experience. It's even transferable to weaker models.
huggingface.co
Paper page - Memp: Exploring Agent Procedural Memory
Join the discussion on this paper page
Circle has announced the launch of Arc, an open Layer-1 blockchain designed to provide enterprise-grade infrastructure for stablecoin payments, foreign exchange, and capital markets applications.
The network is EVM-compatible and uses USDC as its native gas token. Arc is expected to launch its public testnet later this fall.
The network is EVM-compatible and uses USDC as its native gas token. Arc is expected to launch its public testnet later this fall.
The Block
Circle to launch Layer 1 blockchain Arc using USDC stablecoin as native gas token
Circle unveiled plans for its own stablecoin-focused Layer 1 blockchain, Arc, on Tuesday, expected to launch on public testnet this fall.
π₯6β€3π₯°2
Microsoft_Administering_and_Governing_Agents__1755003891.pdf
569.1 KB
Microsoft has released a 30-page guide on #AIAgent governance to help secure and manage agents in #Microsoft365 environments
π5
Anthropic: Claude Sonnet 4 now supports 1 million tokens of context on the Anthropic APIβa 5x increase.
Process over 75,000 lines of code or hundreds of documents in a single request.
Long context support is in public beta for API users with Tier 4 and custom rate limits.
Broader availability will be rolling out over the coming weeks. Available in Amazon Bedrock, and coming soon to Google Cloud's Vertex AI.
Process over 75,000 lines of code or hundreds of documents in a single request.
Long context support is in public beta for API users with Tier 4 and custom rate limits.
Broader availability will be rolling out over the coming weeks. Available in Amazon Bedrock, and coming soon to Google Cloud's Vertex AI.
Claude
Claude Sonnet 4 now supports 1M tokens of context | Claude
Claude Sonnet 4 now supports up to 1 million tokens of contextβa 5x increase that lets you process entire codebases, synthesize extensive document sets, and build agents that maintain coherence across hundreds of tool calls.
π₯6π₯°3π2
Microsoft introduced Dion is a new AI model optimization method that boosts scalability and performance over existing leading methods by orthonormalizing only a top rank subset of singular vectors, enabling more efficient training of large models such as LLaMA-3 with reduced overhead.
Orthonormal updates appear to roughly double transformer training convergence with Dion paving tractability at the largest scale.
Code.
Paper.
Orthonormal updates appear to roughly double transformer training convergence with Dion paving tractability at the largest scale.
Code.
Paper.
Microsoft Research
Dion: Distributed orthonormal update revolution
Dion is a new AI model optimization method that boosts scalability and performance over existing leading methods by orthonormalizing only a top rank subset of singular vectors, enabling more efficient training of large models such as LLaMA-3 with reducedβ¦
π₯3π3β€2π2
Matrix-Game 2.0 β the first open-source, real-time, long-sequence interactive world model
Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models.
But... it wasn't open-sourced.
Matrix-Game 2.0 is Skywork's next-gen interactive world model:
- Real-time: 25FPS generation
- Long-sequence: Minutes of continuous video
- Interactive: Move, rotate, explore
- Multi-scene: City, wild, TempleRun, GTA.
It's the foundation for:
- Game engines
- Embodied AI
- Virtual humans
- Spatial intelligence.
The Tech Stack:
- Data: 1,350 hrs of interactive videos from Unreal Engine + GTA5
- Control: Frame-level keyboard & mouse input
- Model: 1.3B autoregressive diffusion with action control
- Speed: Single GPU β 25FPS
- 3D Causal VAE for space-time compression
- Diffusion Transformer with action conditioning
- KV-Cache for infinite video generation
- DMD training to avoid error accumulation
Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models.
But... it wasn't open-sourced.
Matrix-Game 2.0 is Skywork's next-gen interactive world model:
- Real-time: 25FPS generation
- Long-sequence: Minutes of continuous video
- Interactive: Move, rotate, explore
- Multi-scene: City, wild, TempleRun, GTA.
It's the foundation for:
- Game engines
- Embodied AI
- Virtual humans
- Spatial intelligence.
The Tech Stack:
- Data: 1,350 hrs of interactive videos from Unreal Engine + GTA5
- Control: Frame-level keyboard & mouse input
- Model: 1.3B autoregressive diffusion with action control
- Speed: Single GPU β 25FPS
- 3D Causal VAE for space-time compression
- Diffusion Transformer with action conditioning
- KV-Cache for infinite video generation
- DMD training to avoid error accumulation
huggingface.co
Skywork/Matrix-Game-2.0 Β· Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π₯4π₯°3π2
Now you can run and benchmark evolutionary coding agents on 100+ algorithm optimization tasks from algotune.io
π4π₯2π₯°2
Google is rolling out their version of memory for Gemini today. It is called 'personal context.'
If you want to disable this, toggle off Personal Context in settings.
This works for 2.5 Pro only, not Flash.
It will be interesting to see what the effect of Gemini's monster context window will have on implementation.
If you want to disable this, toggle off Personal Context in settings.
This works for 2.5 Pro only, not Flash.
It will be interesting to see what the effect of Gemini's monster context window will have on implementation.
Google
Gemini adds Temporary Chats and new personalization features
Today, we are updating the Gemini app so that it learns about your preferences the more you use it.
π₯3β€2π₯°2π1