BountyBench evaluates AI agents on 25 real-world, complex systems and 40 bug bounties (worth up to $30,000+), covering 9 OWASP Top 10 categories.
Key insights:
– AI agents solved bug bounty tasks worth tens of thousands of dollars
– Codex CLI & Claude Code excelled in patching (90% / 87.5%), vs in exploitation (32.5% / 57.5%)
– Custom agents performed more evenly across both: Exploit (40–67.5%), Patch (45–60%)
Key insights:
– AI agents solved bug bounty tasks worth tens of thousands of dollars
– Codex CLI & Claude Code excelled in patching (90% / 87.5%), vs in exploitation (32.5% / 57.5%)
– Custom agents performed more evenly across both: Exploit (40–67.5%), Patch (45–60%)
🔥5
All about AI, Web 3.0, BCI
⚡️❗️ Breaking Ground in BCI: Science (Neuralink's Competitor) Unveils Revolutionary Biohybrid Neural Technology Science, a neurotechnology company founded by former Neuralink President Max Hodak, has revealed a revolutionary approach to brain-computer interfaces…
⚡️ today Science submitted a full CE mark application for marketing approval in Europe for PRIMA retinal prosthesis.
With this key step, Science are moving closer to bringing to market the first brain-computer interface technology to restore functional form vision to patients blinded with late-stage age-related macular degeneration (AMD).
With this key step, Science are moving closer to bringing to market the first brain-computer interface technology to restore functional form vision to patients blinded with late-stage age-related macular degeneration (AMD).
Science Corporation
Science Submits CE Mark Application for PRIMA Retinal Implant – A Critical Step Towards Making It Available To Patients | Science…
Science Corporation is a clinical-stage medical technology company.
🔥4
Salesforce introduced Agentforce 3.0 + MCP
Connect Agents to any system, tool, or data source — securely, reliably, and at scale.
Connect Agents to any system, tool, or data source — securely, reliably, and at scale.
Salesforce
Agentforce: The AI Agent Platform
Build and customize autonomous AI agents to support your employees and customers 24/7, including full integration with the Salesforce ecosystem.
🔥4
Microsoft dropped a micro-sized, task-specific, on-device language model called Mu
It is offloaded fully onto NPUs on Copilot+ devices and powers real-time interactions, like the new Settings AI agent inside Windows 11.
It is offloaded fully onto NPUs on Copilot+ devices and powers real-time interactions, like the new Settings AI agent inside Windows 11.
Windows Experience Blog
Introducing Mu language model and how it enabled the agent in Windows Settings
We are excited to introduce our newest on-device small language model, Mu. This model addresses scenarios that require inferring complex input-output relationships and has been designed to operate efficiently, delivering high performance while runnin
🔥4
Last month Cursor overtook GitHub Copilot in business spend, Ramp’s data shows.
Both continue adding users + spend, more than enough to go around in this market.
But goes to show that first movers != market dominance. Not included: Claude Code, small but growing
Both continue adding users + spend, more than enough to go around in this market.
But goes to show that first movers != market dominance. Not included: Claude Code, small but growing
🔥4
Unreal Labs is Hiring - Member of Technical Staff. Just raised from Sequoia & First Round Capital 🔥
Building AI to replace performance marketing teams. Looking for Python engineers to work on:
• Creative AI: Turn briefs into finished images/videos using latest models (Runway, Sora, etc.)
• Data Pipeline: Crawl & process social media ad data at scale
What you need?
1. Strong Python skills
2. Interest in generative AI
3. Builder mindset
What you get?
- London-based (help with relocation)
- Good salary + equity
- Unlimited GPU/API budget
- Small team with big-tech experience.
When a Sequoia-backed startup offers unlimited GPU budget, you listen 👀
Building AI to replace performance marketing teams. Looking for Python engineers to work on:
• Creative AI: Turn briefs into finished images/videos using latest models (Runway, Sora, etc.)
• Data Pipeline: Crawl & process social media ad data at scale
What you need?
1. Strong Python skills
2. Interest in generative AI
3. Builder mindset
What you get?
- London-based (help with relocation)
- Good salary + equity
- Unlimited GPU/API budget
- Small team with big-tech experience.
When a Sequoia-backed startup offers unlimited GPU budget, you listen 👀
Unreal Labs on Notion
Senior AI-native Creative Producer, Unreal Labs | Notion
The role
🔥6😱1
Google DeepMind introduced Gemini Robotics On-Device a VLA model to help make robots faster, highly efficient, and adaptable to new tasks and environments - without needing a constant internet connection
Key takeaways:
1. It has the generality and dexterity of Gemini Robotics - but it can run locally on the device
2. It can handle a wide variety of complex, two-handed tasks out of the box
3. It can learn new skills with as few as 50-100 demonstrations.
From humanoids to industrial bi-arm robots, the model supports multiple embodiments, even though it was pre-trained on ALOHA - while following instructions from humans.
Also launched the Gemini Robotics software development kit (SDK) to help developers fine-tune the model for their own applications, including by testing it in the MuJoCo physics simulator.
Key takeaways:
1. It has the generality and dexterity of Gemini Robotics - but it can run locally on the device
2. It can handle a wide variety of complex, two-handed tasks out of the box
3. It can learn new skills with as few as 50-100 demonstrations.
From humanoids to industrial bi-arm robots, the model supports multiple embodiments, even though it was pre-trained on ALOHA - while following instructions from humans.
Also launched the Gemini Robotics software development kit (SDK) to help developers fine-tune the model for their own applications, including by testing it in the MuJoCo physics simulator.
Google DeepMind
Gemini Robotics On-Device brings AI to local robotic devices
We’re introducing an efficient, on-device robotics model with general-purpose dexterity and fast task adaptation.
❤4
Google launched Gemini CLI, a powerful open-source AI agent built for the terminal.
- Built on Gemini 2.5 Pro
- 1 million token context window
- Free tier with 60 requests per minute and 1,000 per day
- Google Search grounding for real-time context
- Script and plugin support
- Non-interactive mode for automation
- Support for Model Context Protocol (MCP)
- Integrated with Gemini Code Assist in VS Code
- Fully open-source under Apache 2.0
- Built on Gemini 2.5 Pro
- 1 million token context window
- Free tier with 60 requests per minute and 1,000 per day
- Google Search grounding for real-time context
- Script and plugin support
- Non-interactive mode for automation
- Support for Model Context Protocol (MCP)
- Integrated with Gemini Code Assist in VS Code
- Fully open-source under Apache 2.0
Google
Gemini CLI: your open-source AI agent
Free and open source, Gemini CLI brings Gemini directly into developers’ terminals — with unmatched access for individuals.
🔥8
Ai2 introduced OMEGA is a new math benchmark that pushes LLMs beyond pattern-matching to test true mathematical reasoning.
Paper
Code.
Paper
Code.
allenai.org
OMEGA: Can LLMs reason outside the box in math? | Ai2
Discover how OMEGA is being used to evaluate large language models' ability to generalize in math through exploratory, compositional, and transformative reasoning
🥰3
Anthropic launches AI-powered apps through Claude artifacts
Over 500M artifacts created since launch, and now they're letting you embed Claude's intelligence directly into those creations. Pretty wild move.
The mechanic is clean: instead of building static outputs, you can now create apps that think and adapt in real-time.
Your flashcard generator becomes a flashcard app that responds to any topic users throw at it.
Cost structure makes sense too - when someone uses your AI-powered artifact, it burns their Claude credits, not yours.
No API keys, no billing complexity. Just build and share.
This isn't really "no-code" anymore.
It's more like "no-barriers-to-building-intelligent-things." The gap between having an idea and having a working AI app just collapsed.
Early to tell how this plays out, but the direction feels significant. They're not trying to compete with app stores - they're making it trivial to turn conversations into tools that other people can actually use.
The internal name "Claude in Claude" captures it perfectly. It's recursion all the way down.
Worth watching what people build with this.
Over 500M artifacts created since launch, and now they're letting you embed Claude's intelligence directly into those creations. Pretty wild move.
The mechanic is clean: instead of building static outputs, you can now create apps that think and adapt in real-time.
Your flashcard generator becomes a flashcard app that responds to any topic users throw at it.
Cost structure makes sense too - when someone uses your AI-powered artifact, it burns their Claude credits, not yours.
No API keys, no billing complexity. Just build and share.
This isn't really "no-code" anymore.
It's more like "no-barriers-to-building-intelligent-things." The gap between having an idea and having a working AI app just collapsed.
Early to tell how this plays out, but the direction feels significant. They're not trying to compete with app stores - they're making it trivial to turn conversations into tools that other people can actually use.
The internal name "Claude in Claude" captures it perfectly. It's recursion all the way down.
Worth watching what people build with this.
claude.ai
Talk with Claude, an AI assistant from Anthropic
🥰3
ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million tokens on 4 H100 nodes, all using Hugging Face Transformers and DeepSpeed, with no custom modeling code required.
ALST makes long-sequence training fast, efficient, and accessible on GPU nodes or even single GPUs.
ALST makes long-sequence training fast, efficient, and accessible on GPU nodes or even single GPUs.
🔥3🆒3
Meet ROSETTA a framework that's revolutionizing how AI agents learn from human feedback
Unlike traditional AI systems that need extensive offline training or task-specific fine-tuning, ROSETTA lets AI agents adapt to your preferences in real-time using natural language.
Three-Step Pipeline:
1. Grounding - Transforms vague, colloquial language into concrete understanding
2. Planning - Creates multi-stage reward strategies
3. Code Generation - Converts plans into executable reward functions
Mind-Blowing Results:
- 87% average success rate - 86% human satisfaction - Tested across 116 diverse preferences - Handles contradictory and evolving preferences seamlessly
Unlike traditional AI systems that need extensive offline training or task-specific fine-tuning, ROSETTA lets AI agents adapt to your preferences in real-time using natural language.
Three-Step Pipeline:
1. Grounding - Transforms vague, colloquial language into concrete understanding
2. Planning - Creates multi-stage reward strategies
3. Code Generation - Converts plans into executable reward functions
Mind-Blowing Results:
- 87% average success rate - 86% human satisfaction - Tested across 116 diverse preferences - Handles contradictory and evolving preferences seamlessly
sanjanasrivastava.github.io
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
ROSETTA extracts signal from complex, unconstrained human language preference and generates dense code-based reward for RL agents in a single shot.
🔥4
Sora Neuroscience received FDA clearance for its Cirrus Resting State fMRI Brain Mapping Software.
This AI based software uses resting-state functional magnetic resonance imaging (rs-fMRI) to map critical brain networks, aiding neurosurgeons in planning surgeries for conditions like brain tumors and epilepsy.
Key points about Cirrus:
- Functionality: It generates maps of eloquent cortex areas (e.g., those controlling speech, vision, and movement) in as little as 12 minutes by analyzing patterns of brain activity during rest, using blood-oxygen-level-dependent (BOLD) signals. This is faster and more reliable than traditional task-based fMRI, which requires patient compliance and can take up to an hour.
- Accessibility: Unlike task-based fMRI, which is challenging for patients like children, those with cognitive impairments, or non-English speakers, Cirrus works with patients who are sedated or unable to follow instructions, expanding its applicability.
- Integration: The software integrates with existing surgical navigation platforms and has a non-exclusive distribution agreement with Prism Clinical Imaging, Inc., enhancing its compatibility with tools like fMRI and diffusion tensor imaging (DTI) for comprehensive brain mapping.
- Technology: Built on decades of WashU research, Cirrus uses a multi-layer perceptron (MLP) algorithm for supervised classification of rs-fMRI data, offering high sensitivity compared to intraoperative cortical stimulation, the gold standard for mapping.
This AI based software uses resting-state functional magnetic resonance imaging (rs-fMRI) to map critical brain networks, aiding neurosurgeons in planning surgeries for conditions like brain tumors and epilepsy.
Key points about Cirrus:
- Functionality: It generates maps of eloquent cortex areas (e.g., those controlling speech, vision, and movement) in as little as 12 minutes by analyzing patterns of brain activity during rest, using blood-oxygen-level-dependent (BOLD) signals. This is faster and more reliable than traditional task-based fMRI, which requires patient compliance and can take up to an hour.
- Accessibility: Unlike task-based fMRI, which is challenging for patients like children, those with cognitive impairments, or non-English speakers, Cirrus works with patients who are sedated or unable to follow instructions, expanding its applicability.
- Integration: The software integrates with existing surgical navigation platforms and has a non-exclusive distribution agreement with Prism Clinical Imaging, Inc., enhancing its compatibility with tools like fMRI and diffusion tensor imaging (DTI) for comprehensive brain mapping.
- Technology: Built on decades of WashU research, Cirrus uses a multi-layer perceptron (MLP) algorithm for supervised classification of rs-fMRI data, offering high sensitivity compared to intraoperative cortical stimulation, the gold standard for mapping.
EIN Presswire
Sora Neuroscience Announces FDA Clearance of Cirrus Resting State fMRI Brain Mapping Software
Sora's FDA clearance of Cirrus simplifies fMRI generation of eloquent cortex maps for surgical planning.
🔥4💯2
Big open AI day!
1. Black Forest labs just released the best open weights image editing model, comparable to GPT4-o.
HF.
Code.
2. Google released Gemma 3n, the first model under 10B. The E4B version is the first model under 10B parameters to break a 1300 lmarena’s score.
1. Black Forest labs just released the best open weights image editing model, comparable to GPT4-o.
HF.
Code.
2. Google released Gemma 3n, the first model under 10B. The E4B version is the first model under 10B parameters to break a 1300 lmarena’s score.
fal.ai Blog | Generative AI Model Releases & Tutorials
Announcing FLUX.1 Kontext [dev] Inference & Training
Open-weights, Fast Inference and LoRA Support
Following last month's successful launch of FLUX.1 Kontext [pro] and [max] models, we're excited to announce the release of BFL's FLUX.1 Kontext [dev] with open weights. This new version delivers exceptional…
Following last month's successful launch of FLUX.1 Kontext [pro] and [max] models, we're excited to announce the release of BFL's FLUX.1 Kontext [dev] with open weights. This new version delivers exceptional…
🔥6🦄3
Google introduced Doppl, a new mobile app that lets you upload a photo or screenshot of an outfit and then creates a video of you wearing the clothes to help you find your
Available on iOS and Android in the US.
Available on iOS and Android in the US.
Google
Try on looks and discover your style with Doppl
Doppl, a new Google Labs app, uses AI to create personalized outfit try-on images and videos.
🔥3
How Transformers Learn to Navigate: Episodic Memory as a Computational Workspace
A new study researchers reveals surprising insights into how transformers achieve rapid in-context learning, with implications that bridge AI and neuroscience.
Key findings:
1. Internal Map Formation
The researchers discovered that transformers don't just memorize solutions—they actually build internal representations of spatial environments. These models learn to:
- Create cognitive maps of gridworld and tree maze environments
- Align representations across different contexts with similar structures
- Use these maps for efficient navigation in novel scenarios.
2. Novel Decision-Making Strategy. Surprisingly, the models don't use traditional reinforcement learning approaches like value estimation or explicit path planning. Instead, they employ a geometric strategy:
Align representations to Euclidean space using in-context experience
Calculate angles from current state to goal in this space
Select actions based on these angular computations
This approach is both elegant and computationally efficient.
3. Memory as Computational Workspace. Perhaps most intriguingly, the study reveals that episodic memory tokens serve as more than just storage—they become an active computational workspace where intermediate calculations are cached and processed.
This work challenges our understanding of in-context learning in several ways:
Beyond simple pattern matching: Transformers are developing sophisticated algorithmic strategies
Neuroscience connections: The mechanisms mirror hippocampal-entorhinal cortex computations in biological brains
Architectural insights: Memory systems can serve dual roles as both storage and computation
A new study researchers reveals surprising insights into how transformers achieve rapid in-context learning, with implications that bridge AI and neuroscience.
Key findings:
1. Internal Map Formation
The researchers discovered that transformers don't just memorize solutions—they actually build internal representations of spatial environments. These models learn to:
- Create cognitive maps of gridworld and tree maze environments
- Align representations across different contexts with similar structures
- Use these maps for efficient navigation in novel scenarios.
2. Novel Decision-Making Strategy. Surprisingly, the models don't use traditional reinforcement learning approaches like value estimation or explicit path planning. Instead, they employ a geometric strategy:
Align representations to Euclidean space using in-context experience
Calculate angles from current state to goal in this space
Select actions based on these angular computations
This approach is both elegant and computationally efficient.
3. Memory as Computational Workspace. Perhaps most intriguingly, the study reveals that episodic memory tokens serve as more than just storage—they become an active computational workspace where intermediate calculations are cached and processed.
This work challenges our understanding of in-context learning in several ways:
Beyond simple pattern matching: Transformers are developing sophisticated algorithmic strategies
Neuroscience connections: The mechanisms mirror hippocampal-entorhinal cortex computations in biological brains
Architectural insights: Memory systems can serve dual roles as both storage and computation
The latest update from the Neuralink. Using brainwaves to:
1. Play Call of Duty
2. Control robotic hands
3. Have a neural Mario Kart party
4 Regain one’s own natural voice
1. Play Call of Duty
2. Control robotic hands
3. Have a neural Mario Kart party
4 Regain one’s own natural voice
🔥4
Huge drop from Baidu: Ernie 4.5
From 0.3B to 424B
This is a very impressive family of open models by Baidu, competitive with qwen3 and latest Deepseek V3+ they open source the training code as well.
GitHub.
Hf.
From 0.3B to 424B
This is a very impressive family of open models by Baidu, competitive with qwen3 and latest Deepseek V3+ they open source the training code as well.
GitHub.
Hf.
GitHub
GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框…
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署) - PaddlePaddle/Paddle
🔥4
OpenAI acquired the team behind Crossing Minds, a startup focused on AI recommendations for e-commerce
They will now focus on agents and information retrieval, covering how OpenAI systems learn, reason, and retrieve knowledge at scale, in real-time
They will now focus on agents and information retrieval, covering how OpenAI systems learn, reason, and retrieve knowledge at scale, in real-time
🔥4