How Transformers Learn to Navigate: Episodic Memory as a Computational Workspace
A new study researchers reveals surprising insights into how transformers achieve rapid in-context learning, with implications that bridge AI and neuroscience.
Key findings:
1. Internal Map Formation
The researchers discovered that transformers don't just memorize solutions—they actually build internal representations of spatial environments. These models learn to:
- Create cognitive maps of gridworld and tree maze environments
- Align representations across different contexts with similar structures
- Use these maps for efficient navigation in novel scenarios.
2. Novel Decision-Making Strategy. Surprisingly, the models don't use traditional reinforcement learning approaches like value estimation or explicit path planning. Instead, they employ a geometric strategy:
Align representations to Euclidean space using in-context experience
Calculate angles from current state to goal in this space
Select actions based on these angular computations
This approach is both elegant and computationally efficient.
3. Memory as Computational Workspace. Perhaps most intriguingly, the study reveals that episodic memory tokens serve as more than just storage—they become an active computational workspace where intermediate calculations are cached and processed.
This work challenges our understanding of in-context learning in several ways:
Beyond simple pattern matching: Transformers are developing sophisticated algorithmic strategies
Neuroscience connections: The mechanisms mirror hippocampal-entorhinal cortex computations in biological brains
Architectural insights: Memory systems can serve dual roles as both storage and computation
A new study researchers reveals surprising insights into how transformers achieve rapid in-context learning, with implications that bridge AI and neuroscience.
Key findings:
1. Internal Map Formation
The researchers discovered that transformers don't just memorize solutions—they actually build internal representations of spatial environments. These models learn to:
- Create cognitive maps of gridworld and tree maze environments
- Align representations across different contexts with similar structures
- Use these maps for efficient navigation in novel scenarios.
2. Novel Decision-Making Strategy. Surprisingly, the models don't use traditional reinforcement learning approaches like value estimation or explicit path planning. Instead, they employ a geometric strategy:
Align representations to Euclidean space using in-context experience
Calculate angles from current state to goal in this space
Select actions based on these angular computations
This approach is both elegant and computationally efficient.
3. Memory as Computational Workspace. Perhaps most intriguingly, the study reveals that episodic memory tokens serve as more than just storage—they become an active computational workspace where intermediate calculations are cached and processed.
This work challenges our understanding of in-context learning in several ways:
Beyond simple pattern matching: Transformers are developing sophisticated algorithmic strategies
Neuroscience connections: The mechanisms mirror hippocampal-entorhinal cortex computations in biological brains
Architectural insights: Memory systems can serve dual roles as both storage and computation
The latest update from the Neuralink. Using brainwaves to:
1. Play Call of Duty
2. Control robotic hands
3. Have a neural Mario Kart party
4 Regain one’s own natural voice
1. Play Call of Duty
2. Control robotic hands
3. Have a neural Mario Kart party
4 Regain one’s own natural voice
🔥4
Huge drop from Baidu: Ernie 4.5
From 0.3B to 424B
This is a very impressive family of open models by Baidu, competitive with qwen3 and latest Deepseek V3+ they open source the training code as well.
GitHub.
Hf.
From 0.3B to 424B
This is a very impressive family of open models by Baidu, competitive with qwen3 and latest Deepseek V3+ they open source the training code as well.
GitHub.
Hf.
GitHub
GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框…
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署) - PaddlePaddle/Paddle
🔥4
OpenAI acquired the team behind Crossing Minds, a startup focused on AI recommendations for e-commerce
They will now focus on agents and information retrieval, covering how OpenAI systems learn, reason, and retrieve knowledge at scale, in real-time
They will now focus on agents and information retrieval, covering how OpenAI systems learn, reason, and retrieve knowledge at scale, in real-time
🔥4
A new study from multiple institutions introduces CoreCognition, a benchmark that systematically evaluates whether large language models possess "core knowledge" - fundamental cognitive abilities that humans develop in early childhood.
Key findings from testing 230 models:
Reversed developmental trajectory: MLLMs excel at complex tasks (formal reasoning, math) but fail at basic ones that 2-year-olds master, such as object permanence and spatial understanding. Performance on higher-level abilities doesn't correlate with mastery of foundational ones.
Scaling doesn't help: Increasing model size improves performance on complex tasks but shows minimal or negative impact on basic cognitive abilities. Some abilities, like perspective-taking, actually decline with scale.
Reasoning models show no advantage: Models with chain-of-thought reasoning (GPT-o1, QVQ-72B) perform no better on core knowledge tasks than standard models, suggesting the deficit is architectural, not procedural.
Shortcut learning vs. genuine understanding: Through "Concept Hacking" - manipulating images to invert correct answers - researchers found models rely on learned patterns rather than genuine conceptual understanding.
The benchmark tests 12 core abilities across three developmental stages:
1. Sensorimotor (0-2 years): boundary detection, object permanence, continuity, spatiality
2. Concrete operations (7-11 years): conservation, intuitive physics, perspective-taking
3. Formal operations (11+ years): intentionality understanding, mechanical reasoning, tool use.
Key findings from testing 230 models:
Reversed developmental trajectory: MLLMs excel at complex tasks (formal reasoning, math) but fail at basic ones that 2-year-olds master, such as object permanence and spatial understanding. Performance on higher-level abilities doesn't correlate with mastery of foundational ones.
Scaling doesn't help: Increasing model size improves performance on complex tasks but shows minimal or negative impact on basic cognitive abilities. Some abilities, like perspective-taking, actually decline with scale.
Reasoning models show no advantage: Models with chain-of-thought reasoning (GPT-o1, QVQ-72B) perform no better on core knowledge tasks than standard models, suggesting the deficit is architectural, not procedural.
Shortcut learning vs. genuine understanding: Through "Concept Hacking" - manipulating images to invert correct answers - researchers found models rely on learned patterns rather than genuine conceptual understanding.
The benchmark tests 12 core abilities across three developmental stages:
1. Sensorimotor (0-2 years): boundary detection, object permanence, continuity, spatiality
2. Concrete operations (7-11 years): conservation, intuitive physics, perspective-taking
3. Formal operations (11+ years): intentionality understanding, mechanical reasoning, tool use.
williamium3000.github.io
Home - Core Cognition
Core Knowledge Deficits in Multi-Modal Language Models
🆒4🔥2
Stablecoin issuer Circle has applied to the U.S. OCC to establish “First National Digital Currency Bank, N.A.”
If approved, the charter would allow Circle to self-custody USDC reserves and offer digital asset custody services to institutions, excluding deposit-taking and lending.
If approved, the charter would allow Circle to self-custody USDC reserves and offer digital asset custody services to institutions, excluding deposit-taking and lending.
🔥4
Sakana AI introduced Inference-Time Scaling and Collective Intelligence for Frontier AI
AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin.
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution.
ARC-AGI-2 code.
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.
Implementation of AB-MCTS on GitHub.
AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin.
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution.
ARC-AGI-2 code.
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.
Implementation of AB-MCTS on GitHub.
sakana.ai
Sakana AI
Inference-Time Scaling and Collective Intelligence for Frontier AI
🔥6👍3
A noninvasive brain-computer interface that enables humans to control a robotic hand at the level of individual fingers—just by thinking
This advance moves #robotic #BCI control from the arm level to the #finger level, using only scalp #EEG.
With the help of #AI and #deeplearning, researchers were able to extract extremely weak brain signals reflecting a user’s mental intention and use them for real-time, finger-level robotic control.
In this study, 21 human participants learned to control individual fingers of a robotic hand with ~80% accuracy for two distinct fingers on the same hand.
EEG-based BCI is safe, noninvasive, and economical, offering the potential for widespread use—not just for patients, but possibly the general public as well.
Despite challenges in reading brain signals through the scalp, AI-assisted signal decoding made this breakthrough possible.
This advance moves #robotic #BCI control from the arm level to the #finger level, using only scalp #EEG.
With the help of #AI and #deeplearning, researchers were able to extract extremely weak brain signals reflecting a user’s mental intention and use them for real-time, finger-level robotic control.
In this study, 21 human participants learned to control individual fingers of a robotic hand with ~80% accuracy for two distinct fingers on the same hand.
EEG-based BCI is safe, noninvasive, and economical, offering the potential for widespread use—not just for patients, but possibly the general public as well.
Despite challenges in reading brain signals through the scalp, AI-assisted signal decoding made this breakthrough possible.
🔥4
Where should consumer AI founders build next?
From the Menlo Ventures consumer survey of 5k+ Americans, these were the activities with high participation but lowest AI penetration today
From the Menlo Ventures consumer survey of 5k+ Americans, these were the activities with high participation but lowest AI penetration today
Small Language Models are the Future of Agentic AI
This position paper argues that small language models (SLMs), defined pragmatically as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications, especially when tasks are narrow, repetitive, or tool-oriented.
The authors propose that shifting from LLM-first to SLM-first architectures will yield major gains in efficiency, modularity, and sustainability.
SLMs are already capable of commonsense reasoning, instruction following, and code/tool interaction at levels comparable to 30–70B models, with orders of magnitude better throughput.
Examples include Phi-3, Hymba-1.5B, DeepSeek-R1-Distill, and RETRO-7.5B.
This position paper argues that small language models (SLMs), defined pragmatically as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications, especially when tasks are narrow, repetitive, or tool-oriented.
The authors propose that shifting from LLM-first to SLM-first architectures will yield major gains in efficiency, modularity, and sustainability.
SLMs are already capable of commonsense reasoning, instruction following, and code/tool interaction at levels comparable to 30–70B models, with orders of magnitude better throughput.
Examples include Phi-3, Hymba-1.5B, DeepSeek-R1-Distill, and RETRO-7.5B.
arXiv.org
Small Language Models are the Future of Agentic AI
Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation. The rise of agentic AI...
🔥6
Amazon announced DeepFleet, an AI that routes warehouse bots 10% faster to trim costs and shorten delivery times
Andy Jassy likened it to "an intelligent traffic management system" that coordinates robots’ movements to find optimal paths.
Andy Jassy likened it to "an intelligent traffic management system" that coordinates robots’ movements to find optimal paths.
🔥4
HuggingFace announced a new open-source challenge in collaboration with Proxima Fusion: unlocking fusion with AI
The "Bringing Fusion Down to Earth: ML for Stellarator Optimization" project is an initiative by Hugging Face in collaboration with Proxima Fusion, a spin-out from the Max Planck Institute for Plasma Physics, aimed at accelerating fusion energy research through ML applied to stellarator design.
The initiative focuses on using ML to optimize stellarator designs, addressing the computational complexity of simulating and designing these devices. Key goals include:
- Accelerating Design Processes: Traditional stellarator design, like that of W7-X, required massive computational effort and iterative, hand-tuned processes. ML aims to streamline this by developing surrogate models that predict outcomes of complex simulations (e.g., VMEC++ simulations) and key plasma properties from input parameters. These models could replace expensive simulations, enabling faster design iterations and differentiable optimization loops.
- Open Collaboration: The project opens fusion research to the broader ML community, encouraging global participation to tackle one of the hardest scientific challenges. It includes a live leaderboard where researchers can submit optimized stellarator designs and compare performance on standard metrics.
- Advancing Fusion Energy: By optimizing stellarators, the project aims to make fusion a viable, zero-carbon, fuel-abundant, and safe energy source, capable of transforming the global energy system without the drawbacks of fossil fuels, nuclear fission, or intermittent renewables.
The "Bringing Fusion Down to Earth: ML for Stellarator Optimization" project is an initiative by Hugging Face in collaboration with Proxima Fusion, a spin-out from the Max Planck Institute for Plasma Physics, aimed at accelerating fusion energy research through ML applied to stellarator design.
The initiative focuses on using ML to optimize stellarator designs, addressing the computational complexity of simulating and designing these devices. Key goals include:
- Accelerating Design Processes: Traditional stellarator design, like that of W7-X, required massive computational effort and iterative, hand-tuned processes. ML aims to streamline this by developing surrogate models that predict outcomes of complex simulations (e.g., VMEC++ simulations) and key plasma properties from input parameters. These models could replace expensive simulations, enabling faster design iterations and differentiable optimization loops.
- Open Collaboration: The project opens fusion research to the broader ML community, encouraging global participation to tackle one of the hardest scientific challenges. It includes a live leaderboard where researchers can submit optimized stellarator designs and compare performance on standard metrics.
- Advancing Fusion Energy: By optimizing stellarators, the project aims to make fusion a viable, zero-carbon, fuel-abundant, and safe energy source, capable of transforming the global energy system without the drawbacks of fossil fuels, nuclear fission, or intermittent renewables.
huggingface.co
Bringing Fusion Down to Earth: ML for Stellarator Optimization
A Blog post by Georgia Channing on Hugging Face
🦄5🥰3🔥2🤣1
NeurIPS is seeking additional ethics reviewers this year. If you are able and willing to participate in the review process.
Together AI introduced DeepSWE is a new SOTA open-source software engineering model trained entirely using reinforcement learning, based on Qwen3-32B.
DeepSWE is trained with rLLM, Agentica’s modular RL post-training framework for agents.
rLLM makes it easy to build, train, and deploy RL-tuned agents on real-world workloads — from software engineering to web navigation and beyond.
Train DeepSWE yourself. Extend it. Build your own local agents.
DeepSWE is trained with rLLM, Agentica’s modular RL post-training framework for agents.
rLLM makes it easy to build, train, and deploy RL-tuned agents on real-world workloads — from software engineering to web navigation and beyond.
Train DeepSWE yourself. Extend it. Build your own local agents.
www.together.ai
DeepSWE: Training a Fully Open-sourced, State-of-the-Art Coding Agent by Scaling RL
🔥1
Meta introduced NaturalThoughts
Data curation for general reasoning capabilities is still relatively underexplored.
Researchers systematically compare different metrics for selecting high-quality and diverse reasoning traces in terms of data efficiency in the distillation setting.
Researchers find diversity in reasoning strategies matters more than topics diversity, and challenging questions are more sample efficient in distilling reasoning capabilities.
Researchers find that the Less-Is-More approach is not sufficient for solving general reasoning tasks, but scaling up data quantity always brings consistent gains.
Researchers find that NaturalThoughts outperforms state-of-the-art reasoning datasets such as OpenThoughts3, LIMO, S1k, etc. on general STEM domains.
Also find that distillation based on reasoning difficulty can improve the pareto frontier of the student model’s inference efficiency.
Training with a mix of full reasoning traces and the condensed answers enables efficient hybrid reasoning in the student model, by adaptively switching between long chain-of-thought thinking and directly answering.
Data curation for general reasoning capabilities is still relatively underexplored.
Researchers systematically compare different metrics for selecting high-quality and diverse reasoning traces in terms of data efficiency in the distillation setting.
Researchers find diversity in reasoning strategies matters more than topics diversity, and challenging questions are more sample efficient in distilling reasoning capabilities.
Researchers find that the Less-Is-More approach is not sufficient for solving general reasoning tasks, but scaling up data quantity always brings consistent gains.
Researchers find that NaturalThoughts outperforms state-of-the-art reasoning datasets such as OpenThoughts3, LIMO, S1k, etc. on general STEM domains.
Also find that distillation based on reasoning difficulty can improve the pareto frontier of the student model’s inference efficiency.
Training with a mix of full reasoning traces and the condensed answers enables efficient hybrid reasoning in the student model, by adaptively switching between long chain-of-thought thinking and directly answering.
arXiv.org
NaturalThoughts: Selecting and Distilling Reasoning Traces for...
Recent work has shown that distilling reasoning traces from a larger teacher model via supervised finetuning outperforms reinforcement learning with the smaller student model alone (Guo et al....
❤6
Meta introduced research on embodied AI agents that can perceive, learn, act and interact in the virtual and physical worlds.
🆒5
HeyGen launched a new Video Agent that handles content production end-to-end
Using just a doc, some footage, or even a sentence, it can find a story, write the script, select shots/generate new footage, and edit everything for final release.
Using just a doc, some footage, or even a sentence, it can find a story, write the script, select shots/generate new footage, and edit everything for final release.
HeyGen
AI Video Agent | Create and Automate Videos with AI | HeyGen
Meet HeyGen’s AI Video Agent. Instantly generate scripts, voiceovers, avatars, and translations to transform any idea into a compelling video. No credit card required.
🔥3🆒3💅2
Genspark just launched AI Docs, completing their suite with AI Slides and Sheets.
It's similar to the Gemini integration in Google Docs except with a much better UX, where the AI acts more like a creative partner than just a generative tool: you get to iterate together on the output instead of just prompting once and editing the result. And it has markdown support.
It's similar to the Gemini integration in Google Docs except with a much better UX, where the AI acts more like a creative partner than just a generative tool: you get to iterate together on the output instead of just prompting once and editing the result. And it has markdown support.
🥰3🆒3
The Hong Kong Stablecoin Ordinance will officially take effect on August 1 this year
The Hong Kong Monetary Authority will open the license application. It is expected that only a single digit number will be issued, but more than 40 companies are currently preparing to apply.
The applicants are basically the largest financial institutions and Internet companies in China.
The Hong Kong Monetary Authority will open the license application. It is expected that only a single digit number will be issued, but more than 40 companies are currently preparing to apply.
The applicants are basically the largest financial institutions and Internet companies in China.
OpenAI published "Working with 400,000 teachers to shape the future of AI in schools"
OpenAI joining the American Federation of Teachers as the founding partner to launch the National Academy for AI Instruction, a five-year initiative to equip 400,000 K-12 educators with OpenAI contributing $10 million over five years ($8 million in direct funding and $2 million in in-kind resources) alongside the United Federation of Teachers, Microsoft, and Anthropic in supporting the initiative
OpenAI joining the American Federation of Teachers as the founding partner to launch the National Academy for AI Instruction, a five-year initiative to equip 400,000 K-12 educators with OpenAI contributing $10 million over five years ($8 million in direct funding and $2 million in in-kind resources) alongside the United Federation of Teachers, Microsoft, and Anthropic in supporting the initiative
Openai
Working with 400,000 teachers to shape the future of AI in schools
OpenAI joins the American Federation of Teachers to launch the National Academy for AI Instruction.
🔥5