#DeepSeek introduced DeepSeek-V3.2-Exp — latest experimental model
Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.
Now live on App, Web, and API.
API prices cut by 50%+
DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost.
Benchmarks show V3.2-Exp performs on par with V3.1-Terminus.
DeepSeek API prices drop 50%+, effective immediately.
For comparison testing, V3.1-Terminus remains available via a temporary API until Oct 15th, 2025, 15:59 (UTC Time).
Key GPU kernels in TileLang & CUDA (use TileLang for rapid research prototyping).
Hf.
Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.
Now live on App, Web, and API.
API prices cut by 50%+
DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost.
Benchmarks show V3.2-Exp performs on par with V3.1-Terminus.
DeepSeek API prices drop 50%+, effective immediately.
For comparison testing, V3.1-Terminus remains available via a temporary API until Oct 15th, 2025, 15:59 (UTC Time).
Key GPU kernels in TileLang & CUDA (use TileLang for rapid research prototyping).
Hf.
GitHub
DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek-V3.2-Exp
Contribute to deepseek-ai/DeepSeek-V3.2-Exp development by creating an account on GitHub.
🔥5🥰3👏3
Microsoft is launching the next progression of vibe coding this morning with what they are calling 'Vibe Working' in Copilot. It includes:
- Agent Mode in Excel
- Agent Mode in Word
- Office Agent in chat (this one is powered by Claude)
- Agent Mode in Excel
- Agent Mode in Word
- Office Agent in chat (this one is powered by Claude)
Microsoft News
Vibe working: Introducing Agent Mode and Office Agent in Microsoft 365 Copilot
Microsoft Copilot introduces Agent Mode in Office apps, enabling smarter document creation, analysis, and collaboration across Excel, Word, and PowerPoint.
❤4🔥4🦄3💯2👍1
OpenAI introduced Instant Checkout in ChatGPT with Etsy and Shopify, and open-sourcing the Agentic Commerce Protocol that powers it, built with Stripe, so more merchants and developers can integrate agentic checkout.
Instant Checkout is now rolling out to US ChatGPT Pro, Plus and Free logged-in users buying from US Etsy sellers, with over 1 million Shopify merchants coming soon.
Merchants interested in joining can learn more and apply here.
Instant Checkout is now rolling out to US ChatGPT Pro, Plus and Free logged-in users buying from US Etsy sellers, with over 1 million Shopify merchants coming soon.
Merchants interested in joining can learn more and apply here.
Chatgpt
Instant Checkout for merchants in ChatGPT
Join ChatGPT’s Instant Checkout to sell directly in conversations—reach millions of shoppers, boost sales, and keep full control of your customer relationships.
🔥5👏3🦄2👍1🥰1
Google presented ReasoningBank: memory for self-evolving LLM agents
• Distills strategies from both successes & failures
• Enables agents to learn, reuse, and improve over time
• Outperforms prior memory methods on web & SWE tasks (+34.2% eff., –16% steps)
• Distills strategies from both successes & failures
• Enables agents to learn, reuse, and improve over time
• Outperforms prior memory methods on web & SWE tasks (+34.2% eff., –16% steps)
❤4💯4🔥2👍1
Stripe released the Agentic Commerce Protocol, codeveloped by Stripe and OpenAI.
Also Stripe launched an API for agentic payments, called Shared Payment Tokens.
Also Stripe launched an API for agentic payments, called Shared Payment Tokens.
Agentic Commerce Protocol
An open standard that enables programmatic commerce flows between AI agents and businesses.
👍5🔥2🥰2❤1
OpenAI launched a new app called Sora. This is a combination of a new model called Sora 2, and a new product that makes it easy to create, share, and view videos.
Sora 2 can do things that are exceptionally difficult for prior video generation models.
It’s more physically accurate and realistic than prior systems and a big leap forward in controllability. And it also comes with synchronized audio.
There are two ways to access & use Sora 2:
1. The Sora App
The Sora iOS app is available to download now but access is invite-only.
You can sign up in-app for a push notification when access opens for your account.
2. Once you have access to the Sora app, you’ll also be able to access Sora 2 through sora.com.
Starting the initial rollout in the U.S. and Canada today with the intent to expand to additional countries.
Android users will be able to access Sora 2 via sora.com once you have an invite code from someone who already has access.
also plan to release Sora 2 in the API.
And Sora 1 Turbo will remain available, and everything you’ve created will continue to live in your sora.com library.
Sora 2 can do things that are exceptionally difficult for prior video generation models.
It’s more physically accurate and realistic than prior systems and a big leap forward in controllability. And it also comes with synchronized audio.
There are two ways to access & use Sora 2:
1. The Sora App
The Sora iOS app is available to download now but access is invite-only.
You can sign up in-app for a push notification when access opens for your account.
2. Once you have access to the Sora app, you’ll also be able to access Sora 2 through sora.com.
Starting the initial rollout in the U.S. and Canada today with the intent to expand to additional countries.
Android users will be able to access Sora 2 via sora.com once you have an invite code from someone who already has access.
also plan to release Sora 2 in the API.
And Sora 1 Turbo will remain available, and everything you’ve created will continue to live in your sora.com library.
Openai
Sora
Turn your ideas into videos with hyperreal motion and sound.
🔥4❤3👍3🥰2
New from Anthropic: context engineering for AI agents
Anthropic recently published a technical overview of context engineering - managing what information gets fed to language models during execution. This shifts focus from pure prompt design to thinking holistically about the entire information state available to an agent.
The core problem
Language models have finite attention budgets. As you add more tokens to the context window, retrieval and reasoning performance gradually degrades. This happens because transformers create n² token relationships - as context grows, the model's capacity to maintain these relationships gets stretched thin.
Context is a limited resource with diminishing returns.
Key principles
System prompts: Clear and specific, but not so prescriptive they hardcode brittle logic. Find the right level of abstraction between vague guidance and micromanagement.
Tools: Self-contained with minimal overlap. If you can't definitively say which tool applies in a situation, the agent won't do better.
Examples: Curate a small set of diverse examples rather than exhaustively listing edge cases. Most token-efficient way to communicate expected behavior.
General rule: Find the minimal set of high-signal tokens that maximize likelihood of your desired outcome.
Just-in-time retrieval
Instead of pre-loading all potentially relevant data, modern agents maintain lightweight references (file paths, queries, URLs) and dynamically load information at runtime using tools.
This mirrors human cognition - we create indexing systems and retrieve on demand rather than memorizing everything. The tradeoff is speed versus context efficiency. Many effective agents use hybrid approaches.
Long-horizon techniques
When tasks exceed the context window:
Compaction: Summarize conversation history and start fresh. The challenge is choosing what to keep versus discard.
Structured note-taking: Agent maintains persistent notes outside the context window, retrieving them as needed. Works like keeping a TODO list.
Sub-agent architectures: Specialized agents handle focused tasks in clean context windows, returning condensed summaries to a coordinating agent.
Choice depends on task characteristics. Compaction maintains conversational flow. Note-taking suits iterative development. Multi-agent works for complex research requiring parallel exploration.
Anthropic recently published a technical overview of context engineering - managing what information gets fed to language models during execution. This shifts focus from pure prompt design to thinking holistically about the entire information state available to an agent.
The core problem
Language models have finite attention budgets. As you add more tokens to the context window, retrieval and reasoning performance gradually degrades. This happens because transformers create n² token relationships - as context grows, the model's capacity to maintain these relationships gets stretched thin.
Context is a limited resource with diminishing returns.
Key principles
System prompts: Clear and specific, but not so prescriptive they hardcode brittle logic. Find the right level of abstraction between vague guidance and micromanagement.
Tools: Self-contained with minimal overlap. If you can't definitively say which tool applies in a situation, the agent won't do better.
Examples: Curate a small set of diverse examples rather than exhaustively listing edge cases. Most token-efficient way to communicate expected behavior.
General rule: Find the minimal set of high-signal tokens that maximize likelihood of your desired outcome.
Just-in-time retrieval
Instead of pre-loading all potentially relevant data, modern agents maintain lightweight references (file paths, queries, URLs) and dynamically load information at runtime using tools.
This mirrors human cognition - we create indexing systems and retrieve on demand rather than memorizing everything. The tradeoff is speed versus context efficiency. Many effective agents use hybrid approaches.
Long-horizon techniques
When tasks exceed the context window:
Compaction: Summarize conversation history and start fresh. The challenge is choosing what to keep versus discard.
Structured note-taking: Agent maintains persistent notes outside the context window, retrieving them as needed. Works like keeping a TODO list.
Sub-agent architectures: Specialized agents handle focused tasks in clean context windows, returning condensed summaries to a coordinating agent.
Choice depends on task characteristics. Compaction maintains conversational flow. Note-taking suits iterative development. Multi-agent works for complex research requiring parallel exploration.
Anthropic
Effective context engineering for AI agents
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
🔥3👏3🆒3❤2👍2
Epoch AI introduced new AI Companies Data Hub
Researchers collected key data on frontier AI companies, including revenue run rates, funding, staff, usage rates, and compute spend.
Revenue:
The combined revenue rates of OpenAI and Anthropic have grown around 10x since early 2024.
OpenAI’s annualized revenue reached $13 billion in August 2025, up from $5B at the start of the year.
Anthropic’s revenue has exploded this year, from $1B to $5B by July.
Funding:
OpenAI, Anthropic, and xAI have attracted massive investor interest.
OpenAI's last raised at a value of $300 billion, with a $500B valuation under discussion.
And collectively, the frontier AI companies in our data have raised ~$100B in equity and debt funding.
Usage:
The user bases for leading chat applications like ChatGPT and Gemini have continued to grow rapidly.
ChatGPT alone surpassed 700 million weekly active users by August 2025, processing over 3 billion daily messages.
Staff:
OpenAI and Anthropic have both expanded from small startups to thousands of full-time staff, though they are still well behind Google’s flagship AI effort, Google DeepMind.
Compute spend:
Compute for research, training, and inference is expensive: OpenAI’s cloud compute bill for 2025 will exceed $15 billion!
The extensive large-scale data center buildouts underway suggest this rapid growth could continue in the coming years.
Researchers collected key data on frontier AI companies, including revenue run rates, funding, staff, usage rates, and compute spend.
Revenue:
The combined revenue rates of OpenAI and Anthropic have grown around 10x since early 2024.
OpenAI’s annualized revenue reached $13 billion in August 2025, up from $5B at the start of the year.
Anthropic’s revenue has exploded this year, from $1B to $5B by July.
Funding:
OpenAI, Anthropic, and xAI have attracted massive investor interest.
OpenAI's last raised at a value of $300 billion, with a $500B valuation under discussion.
And collectively, the frontier AI companies in our data have raised ~$100B in equity and debt funding.
Usage:
The user bases for leading chat applications like ChatGPT and Gemini have continued to grow rapidly.
ChatGPT alone surpassed 700 million weekly active users by August 2025, processing over 3 billion daily messages.
Staff:
OpenAI and Anthropic have both expanded from small startups to thousands of full-time staff, though they are still well behind Google’s flagship AI effort, Google DeepMind.
Compute spend:
Compute for research, training, and inference is expensive: OpenAI’s cloud compute bill for 2025 will exceed $15 billion!
The extensive large-scale data center buildouts underway suggest this rapid growth could continue in the coming years.
Epoch AI
Data on AI Companies
Our database of AI company data, with data on revenue, funding, staff, and compute for many of the key players in frontier AI.
🔥3👏3🥰2
E11Bio announced PRISM, a new, scalable technology for mapping brain circuits
PRISM uses molecular ID codes and AI to help neurons trace themselves.
Researchers discovered a new cell barcoding approach exceeding comparable methods by more than 750x.
This is the heart of PRISM. Researchers integrated this capability with microscopy and AI image analysis to automatically trace neurons at high resolution and annotate them with molecular features.
This is a key advance towards economically viable brain mapping - 95% of costs stem from neuron tracing. It is also an important step towards democratizing neuron tracing for everyday neuroscience.
Solving these problems is critical for curing brain disorders, building safer and human-like AI, and even simulating brain function.
In first pilot study, researchers acquired a unique dataset in mouse hippocampus. Barcodes improved the accuracy of tracing genetically labelled neurons by 8x – with a clear path to 100x or more.
They also permit tracing across spatial gaps – essential for mitigating tissue section loss in whole-brain scaling.
Addgene constructs.
Volara.
Open data.
PRISM uses molecular ID codes and AI to help neurons trace themselves.
Researchers discovered a new cell barcoding approach exceeding comparable methods by more than 750x.
This is the heart of PRISM. Researchers integrated this capability with microscopy and AI image analysis to automatically trace neurons at high resolution and annotate them with molecular features.
This is a key advance towards economically viable brain mapping - 95% of costs stem from neuron tracing. It is also an important step towards democratizing neuron tracing for everyday neuroscience.
Solving these problems is critical for curing brain disorders, building safer and human-like AI, and even simulating brain function.
In first pilot study, researchers acquired a unique dataset in mouse hippocampus. Barcodes improved the accuracy of tracing genetically labelled neurons by 8x – with a clear path to 100x or more.
They also permit tracing across spatial gaps – essential for mitigating tissue section loss in whole-brain scaling.
Addgene constructs.
Volara.
Open data.
E11 Bio
PRISM | E11 Bio
🔥4❤3🥰2
Ex-OpenAI team - Thinking machines introduced Tinker: a flexible API for fine-tuning language models.
Write training loops in Python on your laptop; will run them on distributed GPUs.
Private beta starts today.
Write training loops in Python on your laptop; will run them on distributed GPUs.
Private beta starts today.
Thinking Machines Lab
Tinker
Tinker is a training API for researchers and developers.
🆒4🔥3👏2❤1🥰1
Microsoft introduced Agent Framework
You can build, orchestrate, and scale multi-agent systems in Azure AI Foundry using this framework.
You can build, orchestrate, and scale multi-agent systems in Azure AI Foundry using this framework.
Microsoft Azure Blog
Introducing Microsoft Agent Framework | Microsoft Azure Blog
Find out how Microsoft Agent Framework can help simplify the orchestration of multi-agent systems and keep developers in flow.
🔥3💯3🥰2🆒2
Meta Superintelligence labs introduced MENLO: From Preferences to Proficiency
Team introduced a framework + dataset for evaluating and modeling native-like LLM response quality across 47 languages, inspired by audience design principles.
Data.
Team introduced a framework + dataset for evaluating and modeling native-like LLM response quality across 47 languages, inspired by audience design principles.
Data.
arXiv.org
MENLO: From Preferences to Proficiency -- Evaluating and Modeling...
Ensuring native-like quality of large language model (LLM) responses across many languages is challenging. To address this, we introduce MENLO, a framework that operationalizes the evaluation of...
🔥3👍2🥰2
Sholto Douglas, Anthropic:
"Over the last year, RL has finally allow[ed] us to take a feedback loop and turn it into a model that is at least as good as the best humans at a given thing in a narrow domain.
And you're seeing that with mathematics and competition code, which are the two domains most amendable to this - where rapidly the models are becoming incredibly competent competition mathematicians and competition coders.
There's nothing intrinsically different about competition code and math. It's just that they're really [more] amenable to RL than any other domain. But importantly, they demonstrate there's no intellectual ceiling on the models.
They're capable of doing really tough reasoning given the right feedback loop. So, we think that same approach generalizes to basically all other domains of human intellectual endeavor where given the right feedback loop, these models will [become] at least as good as the best humans at a given thing. And then once you have something that is at least as good as the best humans at a thing, you can just run 1,000 of them in parallel or 100x faster and you have something that's even just with that condition substantially smarter than any given human. And this is completely throwing aside whether or not it's possible to make something that is smarter than a human.
The implications of this are pretty staggering, right? In the next 2 or 3 years given the right feedback loops, given the right compute, etc., we think that we as the AI industry as a whole on track to create something that is at least as capable as most humans on most computer-facing tasks possibly as good as many of our best scientists at their fields. It'll be sharp and spiky, there'll be examples of things it can't [do]. But the world will change.
... I think this is worth crying from the rooftops a little bit - guys, anything that we can measure seems to be improving really rapidly. Where does that get us in 2 or 3 years? I can't say for certain. But I think it's it's worth building into worldviews that there's a pretty serious chance that we get AGI."
"Over the last year, RL has finally allow[ed] us to take a feedback loop and turn it into a model that is at least as good as the best humans at a given thing in a narrow domain.
And you're seeing that with mathematics and competition code, which are the two domains most amendable to this - where rapidly the models are becoming incredibly competent competition mathematicians and competition coders.
There's nothing intrinsically different about competition code and math. It's just that they're really [more] amenable to RL than any other domain. But importantly, they demonstrate there's no intellectual ceiling on the models.
They're capable of doing really tough reasoning given the right feedback loop. So, we think that same approach generalizes to basically all other domains of human intellectual endeavor where given the right feedback loop, these models will [become] at least as good as the best humans at a given thing. And then once you have something that is at least as good as the best humans at a thing, you can just run 1,000 of them in parallel or 100x faster and you have something that's even just with that condition substantially smarter than any given human. And this is completely throwing aside whether or not it's possible to make something that is smarter than a human.
The implications of this are pretty staggering, right? In the next 2 or 3 years given the right feedback loops, given the right compute, etc., we think that we as the AI industry as a whole on track to create something that is at least as capable as most humans on most computer-facing tasks possibly as good as many of our best scientists at their fields. It'll be sharp and spiky, there'll be examples of things it can't [do]. But the world will change.
... I think this is worth crying from the rooftops a little bit - guys, anything that we can measure seems to be improving really rapidly. Where does that get us in 2 or 3 years? I can't say for certain. But I think it's it's worth building into worldviews that there's a pretty serious chance that we get AGI."
YouTube
Sonnet 4.5 & the AI Plateau Myth — Sholto Douglas (Anthropic)
Sholto Douglas, a key researcher at Anthropic, reveals the breakthroughs behind Claude Sonnet 4.5—the world's leading coding model—and why we might be just 2-3 years from AI matching human-level performance on most computer-facing tasks.
You'll discover…
You'll discover…
❤4🔥4👏2
IBM released Granite 4.0 in open-source with a new hybrid Mamba/transformer architecture that reduces memory requirements without reducing accuracy much.
This set of models is good for agentic workflows like tool calling, document analysis, RAG, especially in an enterprise setup.
The "Micro" (3.4B) model can even run 100% locally in your browser on WebGPU, powered by TransformersJS.
Full model collection.
This set of models is good for agentic workflows like tool calling, document analysis, RAG, especially in an enterprise setup.
The "Micro" (3.4B) model can even run 100% locally in your browser on WebGPU, powered by TransformersJS.
Full model collection.
huggingface.co
Granite-4.0 WebGPU - a Hugging Face Space by ibm-granite
Run Granite-4.0-Micro 100% locally in your browser on WebGPU
🥰5🔥4👏3
Great milestone for open-source robotics: pi0 & pi0.5 by Physical intelligence are now on HF
As described by Physical Intelligence, π₀.₅ is a Vision-Language-Action model which represents a significant evolution from π₀ to address a big challenge in robotics: open-world generalization.
While robots can perform impressive tasks in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training.
Generalization must occur at multiple levels:
- Physical Level: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments
- Semantic Level: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills
- Environmental Level: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals
The breakthrough innovation in π₀.₅ is co-training on heterogeneous data sources. The model learns from:
- Multimodal Web Data: Image captioning, visual question answering, object detection
- Verbal Instructions: Humans coaching robots through complex tasks step-by-step
- Subtask Commands: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed)
- Cross-Embodiment Robot Data: Data from various robot platforms with different capabilities
- Multi-Environment Data: Static robots deployed across many different homes
- Mobile Manipulation Data: ~400 hours of mobile robot demonstrations
As described by Physical Intelligence, π₀.₅ is a Vision-Language-Action model which represents a significant evolution from π₀ to address a big challenge in robotics: open-world generalization.
While robots can perform impressive tasks in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training.
Generalization must occur at multiple levels:
- Physical Level: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments
- Semantic Level: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills
- Environmental Level: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals
The breakthrough innovation in π₀.₅ is co-training on heterogeneous data sources. The model learns from:
- Multimodal Web Data: Image captioning, visual question answering, object detection
- Verbal Instructions: Humans coaching robots through complex tasks step-by-step
- Subtask Commands: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed)
- Cross-Embodiment Robot Data: Data from various robot platforms with different capabilities
- Multi-Environment Data: Static robots deployed across many different homes
- Mobile Manipulation Data: ~400 hours of mobile robot demonstrations
🥰3🔥2👏2
OpenAI is planning to announce Agent Builder on DevDay.
Agent builder will let users build their agentic workflows, connect MCPs, ChatKit widgets and other tools.
Agent builder will let users build their agentic workflows, connect MCPs, ChatKit widgets and other tools.
TestingCatalog
OpenAI prepares to release Agent Builder during DevDay on October 6
Agent builder will let users build their agentic workflows, connect MCPs, ChatKit widgets and other tools. This is one of the smoothest Agent builder canvases I've used so far.
🥰3🔥2👏2
Harmonic by the founder of Robinhood dropped how they got a gold medal at the IMO 2025, the elite math contest.
4 teams have done this.
Harmonic Aristotle, unlike OpenAI and DeepMind, uses formal Lean-based search methods and a geometry solver like Bytedance SeedProver.
4 teams have done this.
Harmonic Aristotle, unlike OpenAI and DeepMind, uses formal Lean-based search methods and a geometry solver like Bytedance SeedProver.
alphaXiv
Aristotle: IMO-level Automated Theorem Proving | alphaXiv
View recent discussion. Abstract: We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Aristotle integrates three…
❤2🥰2👏2🔥1
Google introduced CodeMender: a new AI agent that uses Gemini Deep Think to automatically patch critical software vulnerabilities.
It checks whether its patches are functionally correct, can fix the root cause and doesn't break anything else. This ensures that only high-quality solutions are sent to humans for review.
CodeMender has already created and submitted 72 high-quality fixes for serious security issues in major open-source projects.
It can instantly patch new flaws as well as rewrite old code to eliminate entire classes of vulnerabilities – saving developers significant time.
It checks whether its patches are functionally correct, can fix the root cause and doesn't break anything else. This ensures that only high-quality solutions are sent to humans for review.
CodeMender has already created and submitted 72 high-quality fixes for serious security issues in major open-source projects.
It can instantly patch new flaws as well as rewrite old code to eliminate entire classes of vulnerabilities – saving developers significant time.
Google DeepMind
Introducing CodeMender: an AI agent for code security
CodeMender is a new AI-powered agent that improves code security automatically. It instantly patches new software vulnerabilities, and rewrites and secures existing code, eliminating entire...
❤3👏3
Life from OpenAI’s Dev day
YouTube
OpenAI DevDay 2025: Opening Keynote with Sam Altman
Sam Altman kicks off DevDay 2025 with a keynote to explore ideas that will challenge how you think about building. Join us for announcements, live demos, and a vision of how developers are reshaping the future with AI.
OpenAI introduced agentkit: build a high-quality agent for any vertical with visual builder, evals, guardrails, and other tools.
live demo of building a working agent in 8 minutes.
live demo of building a working agent in 8 minutes.
🔥8🥰2👏2