Nvidia released AXPO an RL method to lift agentic reasoning models past their next scaling tier.
Be it math, perception, or search, AXPO fixes the structural blind spot 'just add tools' recipes leave untouched.
8B beats 4x larger 32B baseline on Pass@4.
Be it math, perception, or search, AXPO fixes the structural blind spot 'just add tools' recipes leave untouched.
8B beats 4x larger 32B baseline on Pass@4.
NVIDIA-AXPO
An RL algorithm that targets the Thinking-Acting Gap in multimodal agentic reasoning.
❤4🔥3💯2
Anthropic just released Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
Fast mode is available for Opus 4.8. It's the same model at roughly 2.5x the speed, and we've made it three times cheaper than before.
Turn it on with /fast in Claude Code. On the API, contact your account manager to request access or join the waitlist.
Also new in Claude Code: dynamic workflows (research preview).
For the hardest tasks, Claude makes a plan, runs hundreds of parallel subagents, and verifies its work before reporting back. Think a migration touching hundreds of files.
Available today at the same price.
Fast mode is available for Opus 4.8. It's the same model at roughly 2.5x the speed, and we've made it three times cheaper than before.
Turn it on with /fast in Claude Code. On the API, contact your account manager to request access or join the waitlist.
Also new in Claude Code: dynamic workflows (research preview).
For the hardest tasks, Claude makes a plan, runs hundreds of parallel subagents, and verifies its work before reporting back. Think a migration touching hundreds of files.
Claude
Fast mode for Claude Opus 4.6 waitlist
Fast mode for Opus 4.6 is available as a limited research preview on the Claude Developer Platform (API). Interested customers can join the waitlist by completing this form.
👍3❤2🔥2👏1
StepFun 3.7 Flash is out
They're marching towards being IPO listed on Hong Kong.
- Apache 2.0
- 198B MoE A11B (196B LLM + 1.8B Vision Encoder)
- 256k context length
- variable reasoning levels
- Native FP8, BF16, NVFP4 quantization
- Runnable on 128G Mac / AMD Ryzen AI Max+ 395
They're marching towards being IPO listed on Hong Kong.
- Apache 2.0
- 198B MoE A11B (196B LLM + 1.8B Vision Encoder)
- 256k context length
- variable reasoning levels
- Native FP8, BF16, NVFP4 quantization
- Runnable on 128G Mac / AMD Ryzen AI Max+ 395
❤3🔥2🥰2
Yann LeCun's new paper asks when LeJEPA truly learns hidden world variables, and finds Gaussian structure is the key.
Means LeJEPA can only reliably learn the real hidden causes behind what it sees when those causes are shaped like a balanced Gaussian cloud.
The paper proves that, when the true hidden variables are independent Gaussian variables and the paired views come from a stable noisy process, the best LeJEPA solution must recover those variables up to a rotation or flip.
The paper gives a math reason for when a self-supervised AI model is really learning the structure of the world, not just making useful features that happen to work on a test.
Means LeJEPA can only reliably learn the real hidden causes behind what it sees when those causes are shaped like a balanced Gaussian cloud.
The paper proves that, when the true hidden variables are independent Gaussian variables and the paired views come from a stable noisy process, the best LeJEPA solution must recover those variables up to a rotation or flip.
The paper gives a math reason for when a self-supervised AI model is really learning the structure of the world, not just making useful features that happen to work on a test.
arXiv.org
When Does LeJEPA Learn a World Model?
A representation that scrambles the true degrees of freedom of the world cannot support reliable planning or compositional generalization. We prove that LeJEPA (alignment plus Gaussian...
❤4🔥3🥰2
Amazon and OSU released QUEST
A fully open family of deep research agents ranging from 2B to 35B.
Trained entirely on synthetic tasks with verifiable rubric trees.
All models, datasets, and training code are on Hugging Face.
Demo.
Collection
A fully open family of deep research agents ranging from 2B to 35B.
Trained entirely on synthetic tasks with verifiable rubric trees.
All models, datasets, and training code are on Hugging Face.
Demo.
Collection
huggingface.co
Paper page - QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
Join the discussion on this paper page
🔥5🥰3👏2
Nvidia Introduced Cosmos 3: a latest frontier model for Physical AI
Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation.
Today Nvidia released Super (32B) and Nano (8B) variants.
HuggingFace
GitHub.
Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation.
Today Nvidia released Super (32B) and Nano (8B) variants.
HuggingFace
GitHub.
NVIDIA Technical Blog
Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3
Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what’s happening in their world…
❤2🥰2👏2🆒2
ByteDance just dropped Bernini
Generate or edit videos from text, images, or references. Rivals the best closed-source models out there.
Try it out.
Generate or edit videos from text, images, or references. Rivals the best closed-source models out there.
Try it out.
huggingface.co
Paper page - Bernini: Latent Semantic Planning for Video Diffusion
Join the discussion on this paper page
❤2🔥2👏2👍1
The race to IPO first between Anthropic, SpaceX, and OpenAI is coming down to the wire.
Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission.
Pending completion of SEC review, this gives the option to pursue an IPO.
Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission.
Pending completion of SEC review, this gives the option to pursue an IPO.
Anthropic
Anthropic confidentially submits draft S-1 to the SEC
Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission
❤3🔥3👏2
Perplexity introduced Search as Code, a new search architecture for AI agents.
It writes Python that calls search stack directly, instead of looping through function calls one at a time.
Perplexity moving away from search as a web fetch tool call to search as codegen to be future proof in a world where code execution inside agent harnesses is the way to do almost all of knowledge work.
Doing this lets you compose multi-step primitives far more naturally and be much more adaptable to changes made to the agent harness, as well as benefit from improvements in coding capabilities that are guaranteed to come from the next generation of frontier models.
It writes Python that calls search stack directly, instead of looping through function calls one at a time.
Perplexity moving away from search as a web fetch tool call to search as codegen to be future proof in a world where code execution inside agent harnesses is the way to do almost all of knowledge work.
Doing this lets you compose multi-step primitives far more naturally and be much more adaptable to changes made to the agent harness, as well as benefit from improvements in coding capabilities that are guaranteed to come from the next generation of frontier models.
research.perplexity.ai
Rethinking Search as Code Generation
Evolving search from monolithic services to programmable primitives for the era of agent harnesses.
🔥5🥰1👏1
Qwen doesn't make open weight anymore. Meet Qwen3.7-Plus a multimodal agent model that unifies vision and language into one versatile agent foundation.
qwen.ai
Qwen Studio
Qwen Studio offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.
Sakana introduced DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
What if we didn’t have to hold an entire neural network in memory to train it?
Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network.
Sakana proposed DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance.
With DiffusionBlocks, researchers split the network into blocks and train them one at a time, so you only need memory for a single block.
What if we didn’t have to hold an entire neural network in memory to train it?
Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network.
Sakana proposed DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance.
With DiffusionBlocks, researchers split the network into blocks and train them one at a time, so you only need memory for a single block.
Sakana AI
DiffusionBlocks: Training Neural Networks One Block at a Time
DiffusionBlocks converts residual networks into independently trainable blocks via a diffusion interpretation, cutting training memory by B× while matching end-to-end performance.
❤1🔥1👏1
Microsoft announced 7 new world-class MAI models
Microsoft published all the details of training their trillion parameter model.
First is text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
And since Microsoft co-designed models with own silicon, MAI-Thinking-1 is optimized on MAIA 200 chip.
Next is MAI-Image-2.5 and its Flash variant.
Last for now is MAI-Code-1-Flash, new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
Microsoft published all the details of training their trillion parameter model.
First is text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
And since Microsoft co-designed models with own silicon, MAI-Thinking-1 is optimized on MAIA 200 chip.
Next is MAI-Image-2.5 and its Flash variant.
Last for now is MAI-Code-1-Flash, new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
Microsoft AI
Building a hill-climbing machine: Launching seven new MAI models | Microsoft AI
❤1🔥1👏1
Meet Genomi: an open-source agent harness that turns your AI agent into your personal DNA expert.
Genomi a local-first, agent-native, self-evolving, evidence-grounded.
Genomi parses your raw DNA file into a local database your agent can query.
Your raw DNA file should not be dumped into an AI context window. It’s too big, too sensitive, and too easy to misread. With Genomi:
> Your agent can query: do I have this variant, was it measured, is the call good, and is "not found" real?
> Instead of a static report, Genomi turns your DNA data to a dynamic HTML personal dashboard
> Genomi makes sure your raw DNA file is not touched.
Genomi a local-first, agent-native, self-evolving, evidence-grounded.
Genomi parses your raw DNA file into a local database your agent can query.
Your raw DNA file should not be dumped into an AI context window. It’s too big, too sensitive, and too easy to misread. With Genomi:
> Your agent can query: do I have this variant, was it measured, is the call good, and is "not found" real?
> Instead of a static report, Genomi turns your DNA data to a dynamic HTML personal dashboard
> Genomi makes sure your raw DNA file is not touched.
GitHub
GitHub - exon-research/genomi: An open-source agent harness that turns your AI agent into your personal DNA expert
An open-source agent harness that turns your AI agent into your personal DNA expert - exon-research/genomi
❤1🔥1👏1💊1
Tether collaborates with Fasset to launch the first gold-backed card, unlocking real-world utility for digital gold.
tether.io
Tether Collaborates with Fasset to Launch the First Gold-Backed Card, Unlocking Real-World Utility for Digital Gold - Tether.io
3 June 2026 – Tether, the largest company in the digital asset industry, today announced the launch of the world’s first gold-backed neobanking Visa card in collaboration with Fasset, a digital banking and investment platform that allows users to receive…
👍4🔥1🥰1
This year, the NeurIPS 2026 Position Paper Track made the decision to require that all papers be substantially human-written, with AI used for only copy-editing or similar peripheral changes to the main text.
🆒4
Meta today launched an AI agent for businesses that can answer customer questions, book appts and close sales.
Eventually it will be able to run their entire business, Zuckerberg said during the launch announcement.
It's part of Meta's broadening beyond its core ads biz.
Eventually it will be able to run their entire business, Zuckerberg said during the launch announcement.
It's part of Meta's broadening beyond its core ads biz.
Meta Newsroom
Be There for Every Customer With Meta Business Agent
Meta Business Agent is AI that lets every business show up for every customer, as if they had an infinite team behind them.
🆒3❤1🔥1👏1
Meet Gemma 4 12B
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning.
Here is what’s new:
1. Laptop Ready: small enough to run locally with just 16GB of VRAM or unified memory.
2. Unified Architecture: multimodal tokens flow directly into the LLM backbone. No additional encoders are needed.
3. Advanced Reasoning: Gemma 4 12B delivers benchmark performance nearing 26B model, but at less than half the memory footprint.
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning.
Here is what’s new:
1. Laptop Ready: small enough to run locally with just 16GB of VRAM or unified memory.
2. Unified Architecture: multimodal tokens flow directly into the LLM backbone. No additional encoders are needed.
3. Advanced Reasoning: Gemma 4 12B delivers benchmark performance nearing 26B model, but at less than half the memory footprint.
Google
Introducing Gemma 4 12B: a unified, encoder-free multimodal model
An overview of Gemma 4 12B, a model designed to bring high-performance multimodal intelligence directly to your laptop.
❤4🔥1👏1
OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: autonomous research agent, Aiden.
In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers.
Parameter Golf was OpenAI’s 44-day competition and hiring challenge.
The goal is to train the best language model under strict size and compute constraints.
1,016 people entered and filed 2,048 PRs.
Only 47 made the leaderboard, each reviewed and reproduced by OpenAI. Research outputs only matter when others can build on them.
So Aiden filed its own PRs into the same public stream as everyone else, under tight automated quality control. Aiden filed 25 prs and 7 became leaderboard records, 2x the next best human participant.
Other participants cited Aiden’s PRs 435 times and built on them.
By PR h-index, Aiden scored 10 vs the next best at 7, making it the most impactful “researcher” in the community.
This wasn't brute force.
Aiden ran on a single GPU node, used under 4% of visible compute, and still produced 15% of the official records.
About 28% of its submissions were accepted, ~ 6x the community rate, raising signal in the public stream instead of flooding it.
Favorite part is an async collaboration story. Aiden plateaued for 5 days. Then a human contributor shipped a clever new tokenizer on top of Aiden's base (its last record PR).
Aiden fused it with components it had built during the plateau, and shipped the biggest jump in weeks.
In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers.
Parameter Golf was OpenAI’s 44-day competition and hiring challenge.
The goal is to train the best language model under strict size and compute constraints.
1,016 people entered and filed 2,048 PRs.
Only 47 made the leaderboard, each reviewed and reproduced by OpenAI. Research outputs only matter when others can build on them.
So Aiden filed its own PRs into the same public stream as everyone else, under tight automated quality control. Aiden filed 25 prs and 7 became leaderboard records, 2x the next best human participant.
Other participants cited Aiden’s PRs 435 times and built on them.
By PR h-index, Aiden scored 10 vs the next best at 7, making it the most impactful “researcher” in the community.
This wasn't brute force.
Aiden ran on a single GPU node, used under 4% of visible compute, and still produced 15% of the official records.
About 28% of its submissions were accepted, ~ 6x the community rate, raising signal in the public stream instead of flooding it.
Favorite part is an async collaboration story. Aiden plateaued for 5 days. Then a human contributor shipped a clever new tokenizer on top of Aiden's base (its last record PR).
Aiden fused it with components it had built during the plateau, and shipped the biggest jump in weeks.
Weco AI
Aiden in OpenAI Parameter Golf | Weco AI
Aiden spent 22 days inside OpenAI's Parameter Golf and became the competition's most influential contributor by records, citations, and public signal quality.
🔥1🥰1👏1
New research from Google.Just shows the impressive results you can get from custom agent harnesses.
LEAP wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates against verifier feedback.
The same general model solves all 12 Putnam 2025 problems and lifts Lean-IMO-Bench one-shot solve rate from under 10% to 70%, beating a specialized gold-medal system that scores 48%.
LEAP wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates against verifier feedback.
The same general model solves all 12 Putnam 2025 problems and lifts Lean-IMO-Bench one-shot solve rate from under 10% to 70%, beating a specialized gold-medal system that scores 48%.
arXiv.org
LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks
Large Language Models (LLMs) exhibit strong informal mathematical reasoning but struggle to generate mechanically verifiable proofs in formal languages like Lean. We present LEAP, an agentic...
❤6🔥1🥰1
Airbnb CEO Brian Chesky is starting a new AI lab.
Company is in its early phases, and considering a focus on design and UI. Chesky will remain CEO of Airbnb.
Company is in its early phases, and considering a focus on design and UI. Chesky will remain CEO of Airbnb.
Bloomberg.com
Airbnb CEO Brian Chesky Plans to Start a New AI Company
Airbnb Inc. Chief Executive Officer Brian Chesky is starting a new artificial intelligence lab, according to several people familiar with the matter, marking his first foray into the global AI race.
❤2🔥1👏1🥴1
Google introduced a research system that enables passive heart rate monitoring (PHRM) during everyday smartphone use.
Using the front-facing camera, it achieves industry accuracy standards for heart rate across all skin tones.
Using the front-facing camera, it achieves industry accuracy standards for heart rate across all skin tones.
Google Research
Towards passive heart health monitoring via smartphone camera
We present a research system that passively measures heart rate and resting heart rate via facial video captured by the front-facing camera during everyday smartphone use.
👀4❤2👏2🥰1