All about AI, Web 3.0, BCI
Another top notch open source model at OpenAI/Meta/Google levels from &MiniMax AI (Chinese lab, ex Sensetime, $850m raised). Massive MoE similar to Deep-seek. Excels on long context (4m tokens!) which is really interesting, need to dig into their lighting…
Chinese Lab MiniMax introduced in this week:
1. open-sourcing LLM MiniMax-M1 — setting new standards in long-context reasoning.
- World’s longest context window: 1M-token input, 80k-token output
- State-of-the-art agentic use among open-source models
- RL at unmatched efficiency: trained with just $534,700.
HF.
GitHub.
Tech Report.
2. Hailuo 02, World-Class Quality, Record-Breaking Cost Efficiency
- Best-in-class instruction following
- Handles extreme physics
- Native 1080p
3. MiniMax Audio:
- Any prompt, any voice, any emotion
- Fully customizable and multilingual.
4. Hailuo Video Agent in Beta, Vibe Videoing with Zero-touch.
MiniMax plan to achieve end-to-end Hailuo Video Agent via 3 stages:
Stage 1: Prebuilt video Agent templates for high-quality creative videos. Users simply follow instructions and input text or images — with one click, a polished video is generated.
Stage 2: Semi-customizable video Agent. Users gain the flexibility to edit any part of the video creation process, from script to visuals to voiceover.
Stage 3: Fully autonomous, end-to-end video Agent. A complete, intelligent workflow that turns creative input into final-cut video with minimal manual effort.
This summer, team plan to gradually roll out Stage Two of Agent creation tools.
5. MiniMax Agent, a general intelligent agent designed to tackle long-horizon, complex tasks.
From expert-level multi-step planning to flexible task breakdown and end-to-end execution — it’s designed to act like a reliable teammate, with strengths in:
-Programming & tool use
-Multimodal understanding & generation
-Seamless MCP integration
1. open-sourcing LLM MiniMax-M1 — setting new standards in long-context reasoning.
- World’s longest context window: 1M-token input, 80k-token output
- State-of-the-art agentic use among open-source models
- RL at unmatched efficiency: trained with just $534,700.
HF.
GitHub.
Tech Report.
2. Hailuo 02, World-Class Quality, Record-Breaking Cost Efficiency
- Best-in-class instruction following
- Handles extreme physics
- Native 1080p
3. MiniMax Audio:
- Any prompt, any voice, any emotion
- Fully customizable and multilingual.
4. Hailuo Video Agent in Beta, Vibe Videoing with Zero-touch.
MiniMax plan to achieve end-to-end Hailuo Video Agent via 3 stages:
Stage 1: Prebuilt video Agent templates for high-quality creative videos. Users simply follow instructions and input text or images — with one click, a polished video is generated.
Stage 2: Semi-customizable video Agent. Users gain the flexibility to edit any part of the video creation process, from script to visuals to voiceover.
Stage 3: Fully autonomous, end-to-end video Agent. A complete, intelligent workflow that turns creative input into final-cut video with minimal manual effort.
This summer, team plan to gradually roll out Stage Two of Agent creation tools.
5. MiniMax Agent, a general intelligent agent designed to tackle long-horizon, complex tasks.
From expert-level multi-step planning to flexible task breakdown and end-to-end execution — it’s designed to act like a reliable teammate, with strengths in:
-Programming & tool use
-Multimodal understanding & generation
-Seamless MCP integration
www.minimax.io
Building AGI with our mission Intelligence with Everyone. Global leader in multi-modal models and AI-native products with over 200 million users.
🔥8
New AI for rare disease diagnosis: SHEPHERD shows how simulation + knowledge-grounded AI = deep learning for ultra‑low label domains
SHEPHERD is a few‑shot learning model powered by a phenotypic knowledge graph to tackle over 7,000 rare diseases with just a handful (or zero) diagnosed cases.
SHEPHERD is a few‑shot learning model powered by a phenotypic knowledge graph to tackle over 7,000 rare diseases with just a handful (or zero) diagnosed cases.
🔥4
Sakana AI introduced Reinforcement-Learned Teachers (RLTs): Transforming how teach LLMs to reason with reinforcement learning (RL).
Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting.
RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students.
Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs.
RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL.
Paper.
Code.
Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting.
RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students.
Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs.
RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL.
Paper.
Code.
sakana.ai
Sakana AI
Reinforcement Learning Teachers of Test Time Scaling
🔥6
Future of Work with AI Agents. Stanford's new report analyzes what 1500 workers think about working with AI Agents.
The audit proposes a large-scale framework for understanding where AI agents should automate or augment human labor.
The authors build the WORKBank, a database combining worker desires and expert assessments across 844 tasks and 104 occupations, and introduce the Human Agency Scale to quantify desired human involvement in AI-agent-supported work.
A substantial portion of current AI investment, such as YC-funded companies, targets tasks in the “Red Light” Zone (high technical feasibility but low worker desire).
This raises concerns about pushing automation where it's socially or ethically unwelcome.
Interpersonal skills are becoming more valuable
Tasks rated as needing HAS 5 (essential human involvement) were strongly associated with interpersonal communication and domain expertise.
These include editing, education, and some engineering tasks, where AI lacks the nuance or trustworthiness to operate alone.
Some High-Wage Skills May Decline in Value
The results above reveal that skills like analyzing data or updating knowledge, which currently command high wages, are less associated with high HAS tasks, implying future declines in their labor market value as AI spreads.
Role-based AI Support
From transcript analysis, the most common vision for human–AI collaboration was role-based support, where workers imagine AI tools acting as analysts, assistants, or specialists with clearly bounded responsibilities, not general-purpose agents.
Lots of other findings in this one.
The audit proposes a large-scale framework for understanding where AI agents should automate or augment human labor.
The authors build the WORKBank, a database combining worker desires and expert assessments across 844 tasks and 104 occupations, and introduce the Human Agency Scale to quantify desired human involvement in AI-agent-supported work.
A substantial portion of current AI investment, such as YC-funded companies, targets tasks in the “Red Light” Zone (high technical feasibility but low worker desire).
This raises concerns about pushing automation where it's socially or ethically unwelcome.
Interpersonal skills are becoming more valuable
Tasks rated as needing HAS 5 (essential human involvement) were strongly associated with interpersonal communication and domain expertise.
These include editing, education, and some engineering tasks, where AI lacks the nuance or trustworthiness to operate alone.
Some High-Wage Skills May Decline in Value
The results above reveal that skills like analyzing data or updating knowledge, which currently command high wages, are less associated with high HAS tasks, implying future declines in their labor market value as AI spreads.
Role-based AI Support
From transcript analysis, the most common vision for human–AI collaboration was role-based support, where workers imagine AI tools acting as analysts, assistants, or specialists with clearly bounded responsibilities, not general-purpose agents.
Lots of other findings in this one.
🔥5
BountyBench evaluates AI agents on 25 real-world, complex systems and 40 bug bounties (worth up to $30,000+), covering 9 OWASP Top 10 categories.
Key insights:
– AI agents solved bug bounty tasks worth tens of thousands of dollars
– Codex CLI & Claude Code excelled in patching (90% / 87.5%), vs in exploitation (32.5% / 57.5%)
– Custom agents performed more evenly across both: Exploit (40–67.5%), Patch (45–60%)
Key insights:
– AI agents solved bug bounty tasks worth tens of thousands of dollars
– Codex CLI & Claude Code excelled in patching (90% / 87.5%), vs in exploitation (32.5% / 57.5%)
– Custom agents performed more evenly across both: Exploit (40–67.5%), Patch (45–60%)
🔥5
All about AI, Web 3.0, BCI
⚡️❗️ Breaking Ground in BCI: Science (Neuralink's Competitor) Unveils Revolutionary Biohybrid Neural Technology Science, a neurotechnology company founded by former Neuralink President Max Hodak, has revealed a revolutionary approach to brain-computer interfaces…
⚡️ today Science submitted a full CE mark application for marketing approval in Europe for PRIMA retinal prosthesis.
With this key step, Science are moving closer to bringing to market the first brain-computer interface technology to restore functional form vision to patients blinded with late-stage age-related macular degeneration (AMD).
With this key step, Science are moving closer to bringing to market the first brain-computer interface technology to restore functional form vision to patients blinded with late-stage age-related macular degeneration (AMD).
Science Corporation
Science Submits CE Mark Application for PRIMA Retinal Implant – A Critical Step Towards Making It Available To Patients | Science…
Science Corporation is a clinical-stage medical technology company.
🔥4
Salesforce introduced Agentforce 3.0 + MCP
Connect Agents to any system, tool, or data source — securely, reliably, and at scale.
Connect Agents to any system, tool, or data source — securely, reliably, and at scale.
Salesforce
Agentforce: The AI Agent Platform
Build and customize autonomous AI agents to support your employees and customers 24/7, including full integration with the Salesforce ecosystem.
🔥4
Microsoft dropped a micro-sized, task-specific, on-device language model called Mu
It is offloaded fully onto NPUs on Copilot+ devices and powers real-time interactions, like the new Settings AI agent inside Windows 11.
It is offloaded fully onto NPUs on Copilot+ devices and powers real-time interactions, like the new Settings AI agent inside Windows 11.
Windows Experience Blog
Introducing Mu language model and how it enabled the agent in Windows Settings
We are excited to introduce our newest on-device small language model, Mu. This model addresses scenarios that require inferring complex input-output relationships and has been designed to operate efficiently, delivering high performance while runnin
🔥4
Last month Cursor overtook GitHub Copilot in business spend, Ramp’s data shows.
Both continue adding users + spend, more than enough to go around in this market.
But goes to show that first movers != market dominance. Not included: Claude Code, small but growing
Both continue adding users + spend, more than enough to go around in this market.
But goes to show that first movers != market dominance. Not included: Claude Code, small but growing
🔥4
Unreal Labs is Hiring - Member of Technical Staff. Just raised from Sequoia & First Round Capital 🔥
Building AI to replace performance marketing teams. Looking for Python engineers to work on:
• Creative AI: Turn briefs into finished images/videos using latest models (Runway, Sora, etc.)
• Data Pipeline: Crawl & process social media ad data at scale
What you need?
1. Strong Python skills
2. Interest in generative AI
3. Builder mindset
What you get?
- London-based (help with relocation)
- Good salary + equity
- Unlimited GPU/API budget
- Small team with big-tech experience.
When a Sequoia-backed startup offers unlimited GPU budget, you listen 👀
Building AI to replace performance marketing teams. Looking for Python engineers to work on:
• Creative AI: Turn briefs into finished images/videos using latest models (Runway, Sora, etc.)
• Data Pipeline: Crawl & process social media ad data at scale
What you need?
1. Strong Python skills
2. Interest in generative AI
3. Builder mindset
What you get?
- London-based (help with relocation)
- Good salary + equity
- Unlimited GPU/API budget
- Small team with big-tech experience.
When a Sequoia-backed startup offers unlimited GPU budget, you listen 👀
Unreal Labs on Notion
Senior AI-native Creative Producer, Unreal Labs | Notion
The role
🔥6😱1
Google DeepMind introduced Gemini Robotics On-Device a VLA model to help make robots faster, highly efficient, and adaptable to new tasks and environments - without needing a constant internet connection
Key takeaways:
1. It has the generality and dexterity of Gemini Robotics - but it can run locally on the device
2. It can handle a wide variety of complex, two-handed tasks out of the box
3. It can learn new skills with as few as 50-100 demonstrations.
From humanoids to industrial bi-arm robots, the model supports multiple embodiments, even though it was pre-trained on ALOHA - while following instructions from humans.
Also launched the Gemini Robotics software development kit (SDK) to help developers fine-tune the model for their own applications, including by testing it in the MuJoCo physics simulator.
Key takeaways:
1. It has the generality and dexterity of Gemini Robotics - but it can run locally on the device
2. It can handle a wide variety of complex, two-handed tasks out of the box
3. It can learn new skills with as few as 50-100 demonstrations.
From humanoids to industrial bi-arm robots, the model supports multiple embodiments, even though it was pre-trained on ALOHA - while following instructions from humans.
Also launched the Gemini Robotics software development kit (SDK) to help developers fine-tune the model for their own applications, including by testing it in the MuJoCo physics simulator.
Google DeepMind
Gemini Robotics On-Device brings AI to local robotic devices
We’re introducing an efficient, on-device robotics model with general-purpose dexterity and fast task adaptation.
❤4
Google launched Gemini CLI, a powerful open-source AI agent built for the terminal.
- Built on Gemini 2.5 Pro
- 1 million token context window
- Free tier with 60 requests per minute and 1,000 per day
- Google Search grounding for real-time context
- Script and plugin support
- Non-interactive mode for automation
- Support for Model Context Protocol (MCP)
- Integrated with Gemini Code Assist in VS Code
- Fully open-source under Apache 2.0
- Built on Gemini 2.5 Pro
- 1 million token context window
- Free tier with 60 requests per minute and 1,000 per day
- Google Search grounding for real-time context
- Script and plugin support
- Non-interactive mode for automation
- Support for Model Context Protocol (MCP)
- Integrated with Gemini Code Assist in VS Code
- Fully open-source under Apache 2.0
Google
Gemini CLI: your open-source AI agent
Free and open source, Gemini CLI brings Gemini directly into developers’ terminals — with unmatched access for individuals.
🔥8
Ai2 introduced OMEGA is a new math benchmark that pushes LLMs beyond pattern-matching to test true mathematical reasoning.
Paper
Code.
Paper
Code.
allenai.org
OMEGA: Can LLMs reason outside the box in math? | Ai2
Discover how OMEGA is being used to evaluate large language models' ability to generalize in math through exploratory, compositional, and transformative reasoning
🥰3
Anthropic launches AI-powered apps through Claude artifacts
Over 500M artifacts created since launch, and now they're letting you embed Claude's intelligence directly into those creations. Pretty wild move.
The mechanic is clean: instead of building static outputs, you can now create apps that think and adapt in real-time.
Your flashcard generator becomes a flashcard app that responds to any topic users throw at it.
Cost structure makes sense too - when someone uses your AI-powered artifact, it burns their Claude credits, not yours.
No API keys, no billing complexity. Just build and share.
This isn't really "no-code" anymore.
It's more like "no-barriers-to-building-intelligent-things." The gap between having an idea and having a working AI app just collapsed.
Early to tell how this plays out, but the direction feels significant. They're not trying to compete with app stores - they're making it trivial to turn conversations into tools that other people can actually use.
The internal name "Claude in Claude" captures it perfectly. It's recursion all the way down.
Worth watching what people build with this.
Over 500M artifacts created since launch, and now they're letting you embed Claude's intelligence directly into those creations. Pretty wild move.
The mechanic is clean: instead of building static outputs, you can now create apps that think and adapt in real-time.
Your flashcard generator becomes a flashcard app that responds to any topic users throw at it.
Cost structure makes sense too - when someone uses your AI-powered artifact, it burns their Claude credits, not yours.
No API keys, no billing complexity. Just build and share.
This isn't really "no-code" anymore.
It's more like "no-barriers-to-building-intelligent-things." The gap between having an idea and having a working AI app just collapsed.
Early to tell how this plays out, but the direction feels significant. They're not trying to compete with app stores - they're making it trivial to turn conversations into tools that other people can actually use.
The internal name "Claude in Claude" captures it perfectly. It's recursion all the way down.
Worth watching what people build with this.
claude.ai
Talk with Claude, an AI assistant from Anthropic
🥰3
ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million tokens on 4 H100 nodes, all using Hugging Face Transformers and DeepSpeed, with no custom modeling code required.
ALST makes long-sequence training fast, efficient, and accessible on GPU nodes or even single GPUs.
ALST makes long-sequence training fast, efficient, and accessible on GPU nodes or even single GPUs.
🔥3🆒3
Meet ROSETTA a framework that's revolutionizing how AI agents learn from human feedback
Unlike traditional AI systems that need extensive offline training or task-specific fine-tuning, ROSETTA lets AI agents adapt to your preferences in real-time using natural language.
Three-Step Pipeline:
1. Grounding - Transforms vague, colloquial language into concrete understanding
2. Planning - Creates multi-stage reward strategies
3. Code Generation - Converts plans into executable reward functions
Mind-Blowing Results:
- 87% average success rate - 86% human satisfaction - Tested across 116 diverse preferences - Handles contradictory and evolving preferences seamlessly
Unlike traditional AI systems that need extensive offline training or task-specific fine-tuning, ROSETTA lets AI agents adapt to your preferences in real-time using natural language.
Three-Step Pipeline:
1. Grounding - Transforms vague, colloquial language into concrete understanding
2. Planning - Creates multi-stage reward strategies
3. Code Generation - Converts plans into executable reward functions
Mind-Blowing Results:
- 87% average success rate - 86% human satisfaction - Tested across 116 diverse preferences - Handles contradictory and evolving preferences seamlessly
sanjanasrivastava.github.io
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
ROSETTA extracts signal from complex, unconstrained human language preference and generates dense code-based reward for RL agents in a single shot.
🔥4
Sora Neuroscience received FDA clearance for its Cirrus Resting State fMRI Brain Mapping Software.
This AI based software uses resting-state functional magnetic resonance imaging (rs-fMRI) to map critical brain networks, aiding neurosurgeons in planning surgeries for conditions like brain tumors and epilepsy.
Key points about Cirrus:
- Functionality: It generates maps of eloquent cortex areas (e.g., those controlling speech, vision, and movement) in as little as 12 minutes by analyzing patterns of brain activity during rest, using blood-oxygen-level-dependent (BOLD) signals. This is faster and more reliable than traditional task-based fMRI, which requires patient compliance and can take up to an hour.
- Accessibility: Unlike task-based fMRI, which is challenging for patients like children, those with cognitive impairments, or non-English speakers, Cirrus works with patients who are sedated or unable to follow instructions, expanding its applicability.
- Integration: The software integrates with existing surgical navigation platforms and has a non-exclusive distribution agreement with Prism Clinical Imaging, Inc., enhancing its compatibility with tools like fMRI and diffusion tensor imaging (DTI) for comprehensive brain mapping.
- Technology: Built on decades of WashU research, Cirrus uses a multi-layer perceptron (MLP) algorithm for supervised classification of rs-fMRI data, offering high sensitivity compared to intraoperative cortical stimulation, the gold standard for mapping.
This AI based software uses resting-state functional magnetic resonance imaging (rs-fMRI) to map critical brain networks, aiding neurosurgeons in planning surgeries for conditions like brain tumors and epilepsy.
Key points about Cirrus:
- Functionality: It generates maps of eloquent cortex areas (e.g., those controlling speech, vision, and movement) in as little as 12 minutes by analyzing patterns of brain activity during rest, using blood-oxygen-level-dependent (BOLD) signals. This is faster and more reliable than traditional task-based fMRI, which requires patient compliance and can take up to an hour.
- Accessibility: Unlike task-based fMRI, which is challenging for patients like children, those with cognitive impairments, or non-English speakers, Cirrus works with patients who are sedated or unable to follow instructions, expanding its applicability.
- Integration: The software integrates with existing surgical navigation platforms and has a non-exclusive distribution agreement with Prism Clinical Imaging, Inc., enhancing its compatibility with tools like fMRI and diffusion tensor imaging (DTI) for comprehensive brain mapping.
- Technology: Built on decades of WashU research, Cirrus uses a multi-layer perceptron (MLP) algorithm for supervised classification of rs-fMRI data, offering high sensitivity compared to intraoperative cortical stimulation, the gold standard for mapping.
EIN Presswire
Sora Neuroscience Announces FDA Clearance of Cirrus Resting State fMRI Brain Mapping Software
Sora's FDA clearance of Cirrus simplifies fMRI generation of eloquent cortex maps for surgical planning.
🔥4💯2
Big open AI day!
1. Black Forest labs just released the best open weights image editing model, comparable to GPT4-o.
HF.
Code.
2. Google released Gemma 3n, the first model under 10B. The E4B version is the first model under 10B parameters to break a 1300 lmarena’s score.
1. Black Forest labs just released the best open weights image editing model, comparable to GPT4-o.
HF.
Code.
2. Google released Gemma 3n, the first model under 10B. The E4B version is the first model under 10B parameters to break a 1300 lmarena’s score.
fal.ai Blog | Generative AI Model Releases & Tutorials
Announcing FLUX.1 Kontext [dev] Inference & Training
Open-weights, Fast Inference and LoRA Support
Following last month's successful launch of FLUX.1 Kontext [pro] and [max] models, we're excited to announce the release of BFL's FLUX.1 Kontext [dev] with open weights. This new version delivers exceptional…
Following last month's successful launch of FLUX.1 Kontext [pro] and [max] models, we're excited to announce the release of BFL's FLUX.1 Kontext [dev] with open weights. This new version delivers exceptional…
🔥6🦄3