E11Bio announced PRISM, a new, scalable technology for mapping brain circuits
PRISM uses molecular ID codes and AI to help neurons trace themselves.
Researchers discovered a new cell barcoding approach exceeding comparable methods by more than 750x.
This is the heart of PRISM. Researchers integrated this capability with microscopy and AI image analysis to automatically trace neurons at high resolution and annotate them with molecular features.
This is a key advance towards economically viable brain mapping - 95% of costs stem from neuron tracing. It is also an important step towards democratizing neuron tracing for everyday neuroscience.
Solving these problems is critical for curing brain disorders, building safer and human-like AI, and even simulating brain function.
In first pilot study, researchers acquired a unique dataset in mouse hippocampus. Barcodes improved the accuracy of tracing genetically labelled neurons by 8x β with a clear path to 100x or more.
They also permit tracing across spatial gaps β essential for mitigating tissue section loss in whole-brain scaling.
Addgene constructs.
Volara.
Open data.
PRISM uses molecular ID codes and AI to help neurons trace themselves.
Researchers discovered a new cell barcoding approach exceeding comparable methods by more than 750x.
This is the heart of PRISM. Researchers integrated this capability with microscopy and AI image analysis to automatically trace neurons at high resolution and annotate them with molecular features.
This is a key advance towards economically viable brain mapping - 95% of costs stem from neuron tracing. It is also an important step towards democratizing neuron tracing for everyday neuroscience.
Solving these problems is critical for curing brain disorders, building safer and human-like AI, and even simulating brain function.
In first pilot study, researchers acquired a unique dataset in mouse hippocampus. Barcodes improved the accuracy of tracing genetically labelled neurons by 8x β with a clear path to 100x or more.
They also permit tracing across spatial gaps β essential for mitigating tissue section loss in whole-brain scaling.
Addgene constructs.
Volara.
Open data.
E11 Bio
PRISM | E11 Bio
π₯4β€3π₯°2
Ex-OpenAI team - Thinking machines introduced Tinker: a flexible API for fine-tuning language models.
Write training loops in Python on your laptop; will run them on distributed GPUs.
Private beta starts today.
Write training loops in Python on your laptop; will run them on distributed GPUs.
Private beta starts today.
Thinking Machines Lab
Tinker
Tinker is a training API for researchers and developers.
π4π₯3π2β€1π₯°1
Microsoft introduced Agent Framework
You can build, orchestrate, and scale multi-agent systems in Azure AI Foundry using this framework.
You can build, orchestrate, and scale multi-agent systems in Azure AI Foundry using this framework.
Microsoft Azure Blog
Introducing Microsoft Agent Framework | Microsoft Azure Blog
Find out how Microsoft Agent Framework can help simplify the orchestration of multi-agent systems and keep developers in flow.
π₯3π―3π₯°2π2
Meta Superintelligence labs introduced MENLO: From Preferences to Proficiency
Team introduced a framework + dataset for evaluating and modeling native-like LLM response quality across 47 languages, inspired by audience design principles.
Data.
Team introduced a framework + dataset for evaluating and modeling native-like LLM response quality across 47 languages, inspired by audience design principles.
Data.
arXiv.org
MENLO: From Preferences to Proficiency -- Evaluating and Modeling...
Ensuring native-like quality of large language model (LLM) responses across many languages is challenging. To address this, we introduce MENLO, a framework that operationalizes the evaluation of...
π₯3π2π₯°2
Sholto Douglas, Anthropic:
"Over the last year, RL has finally allow[ed] us to take a feedback loop and turn it into a model that is at least as good as the best humans at a given thing in a narrow domain.
And you're seeing that with mathematics and competition code, which are the two domains most amendable to this - where rapidly the models are becoming incredibly competent competition mathematicians and competition coders.
There's nothing intrinsically different about competition code and math. It's just that they're really [more] amenable to RL than any other domain. But importantly, they demonstrate there's no intellectual ceiling on the models.
They're capable of doing really tough reasoning given the right feedback loop. So, we think that same approach generalizes to basically all other domains of human intellectual endeavor where given the right feedback loop, these models will [become] at least as good as the best humans at a given thing. And then once you have something that is at least as good as the best humans at a thing, you can just run 1,000 of them in parallel or 100x faster and you have something that's even just with that condition substantially smarter than any given human. And this is completely throwing aside whether or not it's possible to make something that is smarter than a human.
The implications of this are pretty staggering, right? In the next 2 or 3 years given the right feedback loops, given the right compute, etc., we think that we as the AI industry as a whole on track to create something that is at least as capable as most humans on most computer-facing tasks possibly as good as many of our best scientists at their fields. It'll be sharp and spiky, there'll be examples of things it can't [do]. But the world will change.
... I think this is worth crying from the rooftops a little bit - guys, anything that we can measure seems to be improving really rapidly. Where does that get us in 2 or 3 years? I can't say for certain. But I think it's it's worth building into worldviews that there's a pretty serious chance that we get AGI."
"Over the last year, RL has finally allow[ed] us to take a feedback loop and turn it into a model that is at least as good as the best humans at a given thing in a narrow domain.
And you're seeing that with mathematics and competition code, which are the two domains most amendable to this - where rapidly the models are becoming incredibly competent competition mathematicians and competition coders.
There's nothing intrinsically different about competition code and math. It's just that they're really [more] amenable to RL than any other domain. But importantly, they demonstrate there's no intellectual ceiling on the models.
They're capable of doing really tough reasoning given the right feedback loop. So, we think that same approach generalizes to basically all other domains of human intellectual endeavor where given the right feedback loop, these models will [become] at least as good as the best humans at a given thing. And then once you have something that is at least as good as the best humans at a thing, you can just run 1,000 of them in parallel or 100x faster and you have something that's even just with that condition substantially smarter than any given human. And this is completely throwing aside whether or not it's possible to make something that is smarter than a human.
The implications of this are pretty staggering, right? In the next 2 or 3 years given the right feedback loops, given the right compute, etc., we think that we as the AI industry as a whole on track to create something that is at least as capable as most humans on most computer-facing tasks possibly as good as many of our best scientists at their fields. It'll be sharp and spiky, there'll be examples of things it can't [do]. But the world will change.
... I think this is worth crying from the rooftops a little bit - guys, anything that we can measure seems to be improving really rapidly. Where does that get us in 2 or 3 years? I can't say for certain. But I think it's it's worth building into worldviews that there's a pretty serious chance that we get AGI."
YouTube
Sonnet 4.5 & the AI Plateau Myth β Sholto Douglas (Anthropic)
Sholto Douglas, a key researcher at Anthropic, reveals the breakthroughs behind Claude Sonnet 4.5βthe world's leading coding modelβand why we might be just 2-3 years from AI matching human-level performance on most computer-facing tasks.
You'll discoverβ¦
You'll discoverβ¦
β€4π₯4π2
IBM released Granite 4.0 in open-source with a new hybrid Mamba/transformer architecture that reduces memory requirements without reducing accuracy much.
This set of models is good for agentic workflows like tool calling, document analysis, RAG, especially in an enterprise setup.
The "Micro" (3.4B) model can even run 100% locally in your browser on WebGPU, powered by TransformersJS.
Full model collection.
This set of models is good for agentic workflows like tool calling, document analysis, RAG, especially in an enterprise setup.
The "Micro" (3.4B) model can even run 100% locally in your browser on WebGPU, powered by TransformersJS.
Full model collection.
huggingface.co
Granite-4.0 WebGPU - a Hugging Face Space by ibm-granite
Run Granite-4.0-Micro 100% locally in your browser on WebGPU
π₯°5π₯4π3
Great milestone for open-source robotics: pi0 & pi0.5 by Physical intelligence are now on HF
As described by Physical Intelligence, Οβ.β is a Vision-Language-Action model which represents a significant evolution from Οβ to address a big challenge in robotics: open-world generalization.
While robots can perform impressive tasks in controlled environments, Οβ.β is designed to generalize to entirely new environments and situations that were never seen during training.
Generalization must occur at multiple levels:
- Physical Level: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments
- Semantic Level: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills
- Environmental Level: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals
The breakthrough innovation in Οβ.β is co-training on heterogeneous data sources. The model learns from:
- Multimodal Web Data: Image captioning, visual question answering, object detection
- Verbal Instructions: Humans coaching robots through complex tasks step-by-step
- Subtask Commands: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed)
- Cross-Embodiment Robot Data: Data from various robot platforms with different capabilities
- Multi-Environment Data: Static robots deployed across many different homes
- Mobile Manipulation Data: ~400 hours of mobile robot demonstrations
As described by Physical Intelligence, Οβ.β is a Vision-Language-Action model which represents a significant evolution from Οβ to address a big challenge in robotics: open-world generalization.
While robots can perform impressive tasks in controlled environments, Οβ.β is designed to generalize to entirely new environments and situations that were never seen during training.
Generalization must occur at multiple levels:
- Physical Level: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments
- Semantic Level: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills
- Environmental Level: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals
The breakthrough innovation in Οβ.β is co-training on heterogeneous data sources. The model learns from:
- Multimodal Web Data: Image captioning, visual question answering, object detection
- Verbal Instructions: Humans coaching robots through complex tasks step-by-step
- Subtask Commands: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed)
- Cross-Embodiment Robot Data: Data from various robot platforms with different capabilities
- Multi-Environment Data: Static robots deployed across many different homes
- Mobile Manipulation Data: ~400 hours of mobile robot demonstrations
huggingface.co
lerobot/pi05_base Β· Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π₯°3π₯2π2
OpenAI is planning to announce Agent Builder on DevDay.
Agent builder will let users build their agentic workflows, connect MCPs, ChatKit widgets and other tools.
Agent builder will let users build their agentic workflows, connect MCPs, ChatKit widgets and other tools.
TestingCatalog
OpenAI prepares to release Agent Builder during DevDay on October 6
Agent builder will let users build their agentic workflows, connect MCPs, ChatKit widgets and other tools. This is one of the smoothest Agent builder canvases I've used so far.
π₯°3π₯2π2
Harmonic by the founder of Robinhood dropped how they got a gold medal at the IMO 2025, the elite math contest.
4 teams have done this.
Harmonic Aristotle, unlike OpenAI and DeepMind, uses formal Lean-based search methods and a geometry solver like Bytedance SeedProver.
4 teams have done this.
Harmonic Aristotle, unlike OpenAI and DeepMind, uses formal Lean-based search methods and a geometry solver like Bytedance SeedProver.
alphaXiv
Aristotle: IMO-level Automated Theorem Proving | alphaXiv
View recent discussion. Abstract: We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Aristotle integrates threeβ¦
β€2π₯°2π2π₯1
Google introduced CodeMender: a new AI agent that uses Gemini Deep Think to automatically patch critical software vulnerabilities.
It checks whether its patches are functionally correct, can fix the root cause and doesn't break anything else. This ensures that only high-quality solutions are sent to humans for review.
CodeMender has already created and submitted 72 high-quality fixes for serious security issues in major open-source projects.
It can instantly patch new flaws as well as rewrite old code to eliminate entire classes of vulnerabilities β saving developers significant time.
It checks whether its patches are functionally correct, can fix the root cause and doesn't break anything else. This ensures that only high-quality solutions are sent to humans for review.
CodeMender has already created and submitted 72 high-quality fixes for serious security issues in major open-source projects.
It can instantly patch new flaws as well as rewrite old code to eliminate entire classes of vulnerabilities β saving developers significant time.
Google DeepMind
Introducing CodeMender: an AI agent for code security
CodeMender is a new AI-powered agent that improves code security automatically. It instantly patches new software vulnerabilities, and rewrites and secures existing code, eliminating entire...
β€3π3
Life from OpenAIβs Dev day
YouTube
OpenAI DevDay 2025: Opening Keynote with Sam Altman
Sam Altman kicks off DevDay 2025 with a keynote to explore ideas that will challenge how you think about building. Join us for announcements, live demos, and a vision of how developers are reshaping the future with AI.
OpenAI introduced agentkit: build a high-quality agent for any vertical with visual builder, evals, guardrails, and other tools.
live demo of building a working agent in 8 minutes.
live demo of building a working agent in 8 minutes.
π₯8π₯°2π2
The gap between open and closed models are narrowing and this trend to continue.
As foundation models become commoditized on a global level, the most interesting directions from both research and commercial is not in their development but in finding new ways to use them.
On the Terminal-Bench Hard evaluation for agentic coding and terminal use, open-weights models such as DeepSeek V3.2 Exp, Kimi K2 0905, and GLM-4.6 have made large strides, with DeepSeek surpassing Gemini 2.5 Pro.
These advances reflect significantly higher capability for use in coding and other agent use cases, and developers have a wider range of model options than ever for these applications.
As foundation models become commoditized on a global level, the most interesting directions from both research and commercial is not in their development but in finding new ways to use them.
On the Terminal-Bench Hard evaluation for agentic coding and terminal use, open-weights models such as DeepSeek V3.2 Exp, Kimi K2 0905, and GLM-4.6 have made large strides, with DeepSeek surpassing Gemini 2.5 Pro.
These advances reflect significantly higher capability for use in coding and other agent use cases, and developers have a wider range of model options than ever for these applications.
Fantastic paper βEvolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learningβ
RL has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs.
Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space?
YES. Researchers propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES).
By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning.
GitHub.
RL has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs.
Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space?
YES. Researchers propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES).
By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning.
GitHub.
arXiv.org
Evolution Strategies at Scale: LLM Fine-Tuning Beyond...
Fine-tuning pre-trained large language models (LLMs) for down-stream tasks is a critical step in the AI deployment pipeline. Reinforcement learning (RL) is arguably the most prominent fine-tuning...
π₯3π₯°2π2
OpenAI DevDay 2025.
Highlights:
OpenAI grew from 2 million weekly developers and 100 million weekly ChatGPT users in 2023 to 4 million developers and 800M+ weekly ChatGPT users in 2025.
The platform now processes over 6 billion tokens per minute on the API, up from 300 million tokens per minute in 2023.
Apps inside ChatGPT
- OpenAI launched the Apps SDK in preview, built on Model Context Protocol, enabling developers to build real apps inside ChatGPT that are interactive, adaptive, and personalized. Docs.
- Launch partners include Booking, Canva, Coursera, Expedia, Figma, Spotify, and Zillow, with apps available today to all logged-in ChatGPT users outside of the EU on Free, Go, Plus and Pro plans
- OpenAI will support many ways to monetize including the new Agentic Commerce Protocol that offers instant checkout right inside ChatGPT
- Later this year, OpenAI will begin accepting app submissions for review and publication, launch a dedicated directory where users can browse and search for apps, and launch apps to ChatGPT Business, Enterprise and Edu (OpenAI expects to bring apps to EU users soon).
Building agents
- AgentKit includes Agent Builder (visual canvas for creating multi-agent workflows with drag-and-drop nodes, available in beta),
- ChatKit (toolkit for embedding customizable chat-based agent experiences, generally available starting today)
- expanded Evals capabilities (datasets, trace grading, automated prompt optimization, third-party model support)
- Connector Registry (beginning beta rollout to some API, ChatGPT Enterprise and Edu customers with a Global Admin Console) consolidates data sources into a single admin panel across ChatGPT and the API, including pre-built connectors like Dropbox, Google Drive, SharePoint, Microsoft Teams, and third-party MCP servers
- Guardrails is an open-source, modular safety layer that helps protect agents against unintended or malicious behavior, available to mask or flag PII, detect jailbreaks, and apply other safeguards
Writing code
- Codex is officially out of research preview and into general availability with new Slack integration, Codex SDK, and admin tools including environment controls, monitoring, and analytics dashboards
- Starting October 20, Codex cloud tasks will begin counting towards usage limits (Plus: 30-150 local messages or 5-40 cloud tasks every 5 hours, Pro: 300-1,500 local messages or 50-400 cloud tasks every 5 hours, with code review not counting toward limits for a limited time).
API updates
- gpt-5-pro (gpt-5-pro-2025-10-06) is now available in the API ($15 per 1M input tokens, $120 per 1M output tokens) for tasks in domains like finance, legal, and healthcare where you need high accuracy and depth of reasoning
- gpt-realtime-mini (gpt-realtime-mini-2025-10-06 - $0.60 per 1M text input tokens, $2.40 per 1M text output tokens, $10 per 1M audio input tokens, $20 per 1M audio output tokens) is 70% cheaper than the advanced voice model with the same voice quality and expressiveness
- gpt-audio-mini (gpt-audio-mini-2025-10-06 - $0.60 per 1M text input tokens, $2.40 per 1M text output tokens, $10 per 1M audio input tokens, $20 per 1M audio output tokens) provides cost-efficient audio processing
- sora-2 ($0.10 per second for 720x1280 or 1280x720) and sora-2-pro ($0.30 per second for 720x1280 or 1280x720, $0.50 per second for 1024x1792 or 1792x1024) are available in preview in the API with the ability to pair sound with visuals including rich soundscapes, ambient audio, and synchronized effects, plus control over video length, aspect ratio, resolution, and the ability to easily remix videos
- gpt-image-1-mini ($2 per 1M text input tokens, $2.50 per 1M image input tokens, $8 per 1M image output tokens, $0.005-$0.015 per image depending on quality and size) is 80% less expensive than the large model
Highlights:
OpenAI grew from 2 million weekly developers and 100 million weekly ChatGPT users in 2023 to 4 million developers and 800M+ weekly ChatGPT users in 2025.
The platform now processes over 6 billion tokens per minute on the API, up from 300 million tokens per minute in 2023.
Apps inside ChatGPT
- OpenAI launched the Apps SDK in preview, built on Model Context Protocol, enabling developers to build real apps inside ChatGPT that are interactive, adaptive, and personalized. Docs.
- Launch partners include Booking, Canva, Coursera, Expedia, Figma, Spotify, and Zillow, with apps available today to all logged-in ChatGPT users outside of the EU on Free, Go, Plus and Pro plans
- OpenAI will support many ways to monetize including the new Agentic Commerce Protocol that offers instant checkout right inside ChatGPT
- Later this year, OpenAI will begin accepting app submissions for review and publication, launch a dedicated directory where users can browse and search for apps, and launch apps to ChatGPT Business, Enterprise and Edu (OpenAI expects to bring apps to EU users soon).
Building agents
- AgentKit includes Agent Builder (visual canvas for creating multi-agent workflows with drag-and-drop nodes, available in beta),
- ChatKit (toolkit for embedding customizable chat-based agent experiences, generally available starting today)
- expanded Evals capabilities (datasets, trace grading, automated prompt optimization, third-party model support)
- Connector Registry (beginning beta rollout to some API, ChatGPT Enterprise and Edu customers with a Global Admin Console) consolidates data sources into a single admin panel across ChatGPT and the API, including pre-built connectors like Dropbox, Google Drive, SharePoint, Microsoft Teams, and third-party MCP servers
- Guardrails is an open-source, modular safety layer that helps protect agents against unintended or malicious behavior, available to mask or flag PII, detect jailbreaks, and apply other safeguards
Writing code
- Codex is officially out of research preview and into general availability with new Slack integration, Codex SDK, and admin tools including environment controls, monitoring, and analytics dashboards
- Starting October 20, Codex cloud tasks will begin counting towards usage limits (Plus: 30-150 local messages or 5-40 cloud tasks every 5 hours, Pro: 300-1,500 local messages or 50-400 cloud tasks every 5 hours, with code review not counting toward limits for a limited time).
API updates
- gpt-5-pro (gpt-5-pro-2025-10-06) is now available in the API ($15 per 1M input tokens, $120 per 1M output tokens) for tasks in domains like finance, legal, and healthcare where you need high accuracy and depth of reasoning
- gpt-realtime-mini (gpt-realtime-mini-2025-10-06 - $0.60 per 1M text input tokens, $2.40 per 1M text output tokens, $10 per 1M audio input tokens, $20 per 1M audio output tokens) is 70% cheaper than the advanced voice model with the same voice quality and expressiveness
- gpt-audio-mini (gpt-audio-mini-2025-10-06 - $0.60 per 1M text input tokens, $2.40 per 1M text output tokens, $10 per 1M audio input tokens, $20 per 1M audio output tokens) provides cost-efficient audio processing
- sora-2 ($0.10 per second for 720x1280 or 1280x720) and sora-2-pro ($0.30 per second for 720x1280 or 1280x720, $0.50 per second for 1024x1792 or 1792x1024) are available in preview in the API with the ability to pair sound with visuals including rich soundscapes, ambient audio, and synchronized effects, plus control over video length, aspect ratio, resolution, and the ability to easily remix videos
- gpt-image-1-mini ($2 per 1M text input tokens, $2.50 per 1M image input tokens, $8 per 1M image output tokens, $0.005-$0.015 per image depending on quality and size) is 80% less expensive than the large model
Openai
OpenAI DevDay 2025
Explore all the announcements from OpenAI DevDay 2025, including apps in ChatGPT, AgentKit, Sora 2, and more. Access blogs, docs, and resources to help you build with the latest tools.
π₯3β€2π₯°2π1
Excel Add-in with Claude AI integration
Take actions in Excel - Build financial models, Analyze customer behavior, Transform messy data.
Now available for max plan users.
Take actions in Excel - Build financial models, Analyze customer behavior, Transform messy data.
Now available for max plan users.
pivot.claude.ai
Claude Excel Add-in
Excel Add-in with Claude AI integration
Google expanded access to 15 new countries so more people can build AI-powered mini-apps β no code required.
Also launched new features like advanced debugging and a faster building experience.
Also launched new features like advanced debugging and a faster building experience.
Google
Expanding access to Opal, our no-code AI mini-app builder
Weβre bringing Opal to 15 new countries and making it even easier to build.
π₯4β€2π2π₯°1
Wow! Researchers introduced a new RL algo to train agents who can build other agents
Weak-for-Strong (W4S): Training a Weak Meta-Agent to Harness Strong Executors.
With this, SLMs become powerful meta-agents that manage frontier LLMs in diverse agentic tasks.
Code.
Weak-for-Strong (W4S): Training a Weak Meta-Agent to Harness Strong Executors.
With this, SLMs become powerful meta-agents that manage frontier LLMs in diverse agentic tasks.
Code.
arXiv.org
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
Efficiently leveraging of the capabilities of contemporary large language models (LLMs) is increasingly challenging, particularly when direct fine-tuning is expensive and often impractical....
β€5π4π1
Anthropic is preparing Claude Code to be released on the mobile app
It now runs on Anthropic infrastructure not just on GitHub anymore.
Users will be able to connect Claude app to GitHub and run their coding prompts on the go.
It now runs on Anthropic infrastructure not just on GitHub anymore.
Users will be able to connect Claude app to GitHub and run their coding prompts on the go.
TestingCatalog
Anthropic prepares Claude Code release for mobile apps
Anthropic prepares a Code section on web and mobile with GitHub integration, repository browsing, and Claude Code tasks tailored to developers.
π3π₯3π₯°2