Propose, Solve, Verify (PSV): Self-play for code with proofs, not tests
Most AI coding systems learn from unit tests: they write code, run it on a few examples, and get rewarded if it passes. But tests are incomplete. Code can pass all tests and still be wrong on rare inputs. So the AI can learn “cheap tricks,” and those errors can spread during training.
PSV replaces this with a much stricter judge: formal verification. Instead of checking a handful of examples, a verifier tries to prove mathematically that the program meets the specification for all possible inputs.
The PSV loop
1. Propose: write the “what”
The Proposer invents a new task by writing a specification (a precise description of what the program must do).
Here is the crucial idea: the proposer does not need to be a genius that can foresee what will stump a superhuman solver (like designing a benchmark for Terence Tao). PSV relies on a simpler asymmetry:
- It's often easy to state constraints.
- It's often harder to satisfy them (and prove you did).
A proposer can cheaply say: “Sort this list,” or “Sort it and keep it stable,” or “Sort it, return an index mapping, and prove nothing was lost or duplicated.” Stacking constraints is much easier to describe than to implement and prove them correct.
2. Solve: do the “how”
The Solver tries to write the program (and the proof-style annotations the verifier needs). It samples many attempts (like trying many “mutations”).
3. Verify: harsh selection
A formal verifier checks each attempt. Only solutions that are provably correct count as wins. This is the key difference from unit tests: passing isn't “it worked on a few examples,” it’s “it’s correct for everything.”
4. Learn: keep the survivors
The solver then trains on those verified wins, becoming more likely to produce correct solutions next time.
How problems get harder without a smarter proposer: PSV makes “hard” relative, not absolute. Instead of the proposer guessing difficulty in advance, the system measures it empirically:
- If the solver verifies solutions for a spec 90% of the time, it’s easy.
- If it verifies only 10% of the time, it’s hard.
- If it never verifies, it’s too hard (for now).
The proposer is shown examples labeled EASY / MEDIUM / HARD and asked to generate new specs at a target difficulty. If the solver starts succeeding too often, the system nudges the proposer toward harder specs (more conditions, tighter guarantees). If nothing succeeds, it nudges back.
So the proposer doesn’t need to “outthink” the solver. It just needs to generate many candidate specs, while the system uses feedback (pass rates) to keep the difficulty near the frontier (like a teacher who adjusts homework based on how the student actually performs).
Where the intelligence increase comes from. PSV is basically evolution with a very strict referee:
- Variation: many proposed problems, many attempted solutions.
- Selection: only solutions that pass the verifier survive
- Inheritance: the solver trains on the survivors.
- Moving frontier: as the solver improves, yesterday’s “hard” becomes today’s “medium,” so the system keeps pushing forward.
That’s why self-play works here: the verifier prevents the loop from “learning lies,” and the proposer and feedback mechanism keep generating fresh challenges just beyond the current capability.
A sign it scales: In the paper, generating more proposed tasks per round helped. Increasing from 1,000 to 32,000 proposed questions raised MBPP pass@1 from 22.3% to 44.3%. This is consistent with the idea that more self-generated practice plus strict verification produces real capability gains.
Paper: https://arxiv.org/abs/2512.18160
Most AI coding systems learn from unit tests: they write code, run it on a few examples, and get rewarded if it passes. But tests are incomplete. Code can pass all tests and still be wrong on rare inputs. So the AI can learn “cheap tricks,” and those errors can spread during training.
PSV replaces this with a much stricter judge: formal verification. Instead of checking a handful of examples, a verifier tries to prove mathematically that the program meets the specification for all possible inputs.
The PSV loop
1. Propose: write the “what”
The Proposer invents a new task by writing a specification (a precise description of what the program must do).
Here is the crucial idea: the proposer does not need to be a genius that can foresee what will stump a superhuman solver (like designing a benchmark for Terence Tao). PSV relies on a simpler asymmetry:
- It's often easy to state constraints.
- It's often harder to satisfy them (and prove you did).
A proposer can cheaply say: “Sort this list,” or “Sort it and keep it stable,” or “Sort it, return an index mapping, and prove nothing was lost or duplicated.” Stacking constraints is much easier to describe than to implement and prove them correct.
2. Solve: do the “how”
The Solver tries to write the program (and the proof-style annotations the verifier needs). It samples many attempts (like trying many “mutations”).
3. Verify: harsh selection
A formal verifier checks each attempt. Only solutions that are provably correct count as wins. This is the key difference from unit tests: passing isn't “it worked on a few examples,” it’s “it’s correct for everything.”
4. Learn: keep the survivors
The solver then trains on those verified wins, becoming more likely to produce correct solutions next time.
How problems get harder without a smarter proposer: PSV makes “hard” relative, not absolute. Instead of the proposer guessing difficulty in advance, the system measures it empirically:
- If the solver verifies solutions for a spec 90% of the time, it’s easy.
- If it verifies only 10% of the time, it’s hard.
- If it never verifies, it’s too hard (for now).
The proposer is shown examples labeled EASY / MEDIUM / HARD and asked to generate new specs at a target difficulty. If the solver starts succeeding too often, the system nudges the proposer toward harder specs (more conditions, tighter guarantees). If nothing succeeds, it nudges back.
So the proposer doesn’t need to “outthink” the solver. It just needs to generate many candidate specs, while the system uses feedback (pass rates) to keep the difficulty near the frontier (like a teacher who adjusts homework based on how the student actually performs).
Where the intelligence increase comes from. PSV is basically evolution with a very strict referee:
- Variation: many proposed problems, many attempted solutions.
- Selection: only solutions that pass the verifier survive
- Inheritance: the solver trains on the survivors.
- Moving frontier: as the solver improves, yesterday’s “hard” becomes today’s “medium,” so the system keeps pushing forward.
That’s why self-play works here: the verifier prevents the loop from “learning lies,” and the proposer and feedback mechanism keep generating fresh challenges just beyond the current capability.
A sign it scales: In the paper, generating more proposed tasks per round helped. Increasing from 1,000 to 32,000 proposed questions raised MBPP pass@1 from 22.3% to 44.3%. This is consistent with the idea that more self-generated practice plus strict verification produces real capability gains.
Paper: https://arxiv.org/abs/2512.18160
❤1🤡1
Epiplexity
How can next-token prediction on human text lead to superhuman skills? How can synthetic data sometimes beat “real” data? And how did AlphaZero learn so much from nothing but the rules of chess? Classic information theory seems to say this shouldn’t happen. Yet it clearly does.
The problem is that traditional information theory assumes an observer with unlimited computing power. An unbounded observer can crack any code and reverse any function instantly. To them, a cryptographically encrypted message is "simple" because they can easily find the seed that generated it, distinguishing it easily from pure random noise. If you ignore time, ciphertext isn’t "random", it's the output of a short recipe plus a key. But if you can't afford the computation, it behaves like noise.
But AI systems don't have infinite compute. They’re bounded. And once time and compute matter, a new distinction appears:
- Time-Bounded Entropy (Randomness): Data that is computationally hard to predict. This includes true noise, but also things like encryption keys or complex hashes that look random to a neural network.
- Epiplexity (Structure): Patterns, abstractions, and rules that a model can actually learn and use to compress the data within a reasonable time.
They formalize it roughly like this:
1. Find the smallest model that can predict the data within a time limit.
2. The size of that model is epiplexity. Whatever remains unpredictable is time-bounded entropy.
This solves the paradox. Random noise has high entropy but low epiplexity because no amount of computing power helps you find a pattern, so the model learns nothing. Meanwhile, a strategy game or a textbook has high epiplexity. It forces the model to build complex internal circuits (shortcuts and concepts) to predict the data efficiently.
A neat example from the paper: training a model to predict chess moves is standard. But training it to predict the game in reverse (inferring moves from the final board) is computationally harder. This difficulty forces the model to learn deeper representations of the board state (higher epiplexity), which actually improves its performance on new, unseen chess puzzles. The computation "created" information by converting the implicit consequences of the rules into explicit, usable structures (epiplexity) that the model can now use to play well.
In summary:
The value of data isn’t just about how unpredictable it is. It’s about how much reusable structure it induces in a learner that has real-world limits.
Epiplexity is the amount of structure a model is worth learning because it reduces prediction error enough to justify the added complexity under a time limit.
Read the paper: https://arxiv.org/abs/2601.03220
How can next-token prediction on human text lead to superhuman skills? How can synthetic data sometimes beat “real” data? And how did AlphaZero learn so much from nothing but the rules of chess? Classic information theory seems to say this shouldn’t happen. Yet it clearly does.
The problem is that traditional information theory assumes an observer with unlimited computing power. An unbounded observer can crack any code and reverse any function instantly. To them, a cryptographically encrypted message is "simple" because they can easily find the seed that generated it, distinguishing it easily from pure random noise. If you ignore time, ciphertext isn’t "random", it's the output of a short recipe plus a key. But if you can't afford the computation, it behaves like noise.
But AI systems don't have infinite compute. They’re bounded. And once time and compute matter, a new distinction appears:
- Time-Bounded Entropy (Randomness): Data that is computationally hard to predict. This includes true noise, but also things like encryption keys or complex hashes that look random to a neural network.
- Epiplexity (Structure): Patterns, abstractions, and rules that a model can actually learn and use to compress the data within a reasonable time.
They formalize it roughly like this:
1. Find the smallest model that can predict the data within a time limit.
2. The size of that model is epiplexity. Whatever remains unpredictable is time-bounded entropy.
This solves the paradox. Random noise has high entropy but low epiplexity because no amount of computing power helps you find a pattern, so the model learns nothing. Meanwhile, a strategy game or a textbook has high epiplexity. It forces the model to build complex internal circuits (shortcuts and concepts) to predict the data efficiently.
A neat example from the paper: training a model to predict chess moves is standard. But training it to predict the game in reverse (inferring moves from the final board) is computationally harder. This difficulty forces the model to learn deeper representations of the board state (higher epiplexity), which actually improves its performance on new, unseen chess puzzles. The computation "created" information by converting the implicit consequences of the rules into explicit, usable structures (epiplexity) that the model can now use to play well.
In summary:
The value of data isn’t just about how unpredictable it is. It’s about how much reusable structure it induces in a learner that has real-world limits.
Epiplexity is the amount of structure a model is worth learning because it reduces prediction error enough to justify the added complexity under a time limit.
Read the paper: https://arxiv.org/abs/2601.03220
👍4🤡3⚡2
Spatiotemporal abstractions
Imagine trying to teach a robot to navigate a complex maze.
Traditional training uses trial and error. The robot tries random movements and gets a reward if it succeeds. The problem is that the AI model controlling the robot isn't deciding on meaningful steps like "walk to the door." Instead, it chooses tiny motor commands, similar to individual muscle twitches.
If the robot has to guess the correct sequence of millions of muscle twitches to solve a maze by random chance, it will fail every time. It flails around, never reaches the goal, and learns nothing.
To solve this, Google researchers first taught the robot simply by having it watch experts. The robot learned to predict the expert's next split-second movement.
Surprisingly, the researchers found that while the robot was learning these tiny movements, it was secretly building a map of the bigger picture. To predict the next twitch accurately, the robot internally needed to know "I am currently walking toward the red door."
The lead researcher explains this difference using coffee. Making a cup of coffee involves tiny, split-second hand movements. But it also involves massive, long-term goals (like driving to the store to buy beans). Traditional robots get stuck optimizing the hand movements. This new approach allows them to "plan the trip."
The researchers created a way to tap into these hidden internal plans. Instead of letting the AI decide every single muscle movement 100 times a second, they built a "steering wheel" that forces the AI to pick one of those high-level intentions (like "go to the red door") and stick with it for a while.
This works for the same reason humans don't plan their day by focusing on individual footsteps. Instead of searching through trillions of twitch combinations, the AI only has to choose between a few high-level plans. Because each choice lasts longer and does something useful, the robot stops flailing and actually reaches the goal, allowing it to finally learn from its success.
The researchers believe this architecture mimics human biology. The main AI model acts like the Cortex (constantly predicting what happens next based on what it sees), while the new "steering wheel" mechanism acts like the Basal Ganglia (nudging the cortex toward rewarding goals and habits).
In summary: Think of the “steering wheel” as a filter. Without it, the robot considers every possible muscle twitch at every millisecond (a search space so vast it is effectively infinite). By locking in a high-level intention, the steering wheel prunes the search space. It forces the robot to ignore the billions of random twitch combinations that don’t help reach the current sub-goal, making low-level actions goal-directed rather than random.
Paper: https://arxiv.org/abs/2512.20605
Talk: https://www.youtube.com/watch?v=cx_MIhvAOYM
Imagine trying to teach a robot to navigate a complex maze.
Traditional training uses trial and error. The robot tries random movements and gets a reward if it succeeds. The problem is that the AI model controlling the robot isn't deciding on meaningful steps like "walk to the door." Instead, it chooses tiny motor commands, similar to individual muscle twitches.
If the robot has to guess the correct sequence of millions of muscle twitches to solve a maze by random chance, it will fail every time. It flails around, never reaches the goal, and learns nothing.
To solve this, Google researchers first taught the robot simply by having it watch experts. The robot learned to predict the expert's next split-second movement.
Surprisingly, the researchers found that while the robot was learning these tiny movements, it was secretly building a map of the bigger picture. To predict the next twitch accurately, the robot internally needed to know "I am currently walking toward the red door."
The lead researcher explains this difference using coffee. Making a cup of coffee involves tiny, split-second hand movements. But it also involves massive, long-term goals (like driving to the store to buy beans). Traditional robots get stuck optimizing the hand movements. This new approach allows them to "plan the trip."
The researchers created a way to tap into these hidden internal plans. Instead of letting the AI decide every single muscle movement 100 times a second, they built a "steering wheel" that forces the AI to pick one of those high-level intentions (like "go to the red door") and stick with it for a while.
This works for the same reason humans don't plan their day by focusing on individual footsteps. Instead of searching through trillions of twitch combinations, the AI only has to choose between a few high-level plans. Because each choice lasts longer and does something useful, the robot stops flailing and actually reaches the goal, allowing it to finally learn from its success.
The researchers believe this architecture mimics human biology. The main AI model acts like the Cortex (constantly predicting what happens next based on what it sees), while the new "steering wheel" mechanism acts like the Basal Ganglia (nudging the cortex toward rewarding goals and habits).
In summary: Think of the “steering wheel” as a filter. Without it, the robot considers every possible muscle twitch at every millisecond (a search space so vast it is effectively infinite). By locking in a high-level intention, the steering wheel prunes the search space. It forces the robot to ignore the billions of random twitch combinations that don’t help reach the current sub-goal, making low-level actions goal-directed rather than random.
Paper: https://arxiv.org/abs/2512.20605
Talk: https://www.youtube.com/watch?v=cx_MIhvAOYM
❤6🤡1
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
What happens when you let an LLM write tiny computer programs that compete for control of a virtual computer's memory?
Setup:
- The environment: A chaotic “toy computer” where programs live inside shared memory. Because the program’s instructions are stored in the same place as its working data, programs can overwrite (sometimes even copy) themselves, and they can try to corrupt their opponents.
- The Algorithm: A self-play algorithm inspired by the Red Queen hypothesis in evolutionary biology, which suggests organisms must constantly adapt just to maintain their relative fitness against evolving competitors.
- The Process: Instead of training against a static objective, the algorithm evolves a lineage of warriors. In each round, an LLM generates a new warrior specifically designed to defeat all previous versions in the lineage.
Outcome: As the evolutionary arms race progresses, the warriors become increasingly robust and general-purpose, capable of defeating human-designed strategies they were never explicitly trained against.
Read the paper: https://sakana.ai/drq/
What happens when you let an LLM write tiny computer programs that compete for control of a virtual computer's memory?
Setup:
- The environment: A chaotic “toy computer” where programs live inside shared memory. Because the program’s instructions are stored in the same place as its working data, programs can overwrite (sometimes even copy) themselves, and they can try to corrupt their opponents.
- The Algorithm: A self-play algorithm inspired by the Red Queen hypothesis in evolutionary biology, which suggests organisms must constantly adapt just to maintain their relative fitness against evolving competitors.
- The Process: Instead of training against a static objective, the algorithm evolves a lineage of warriors. In each round, an LLM generates a new warrior specifically designed to defeat all previous versions in the lineage.
Outcome: As the evolutionary arms race progresses, the warriors become increasingly robust and general-purpose, capable of defeating human-designed strategies they were never explicitly trained against.
Read the paper: https://sakana.ai/drq/
🔥2🤡2👍1
Media is too big
VIEW IN TELEGRAM
Raptors gliding through a cloud of helium-filled soap bubbles reveals wingtip and tail vortices.
Paper: High aerodynamic lift from the tail reduces drag in gliding raptors https://journals.biologists.com/jeb/article/223/3/jeb214809/223686/High-aerodynamic-lift-from-the-tail-reduces-drag
Paper: High aerodynamic lift from the tail reduces drag in gliding raptors https://journals.biologists.com/jeb/article/223/3/jeb214809/223686/High-aerodynamic-lift-from-the-tail-reduces-drag
👍4🔥2🕊1
Links for 2026-01-09
AI
1. ChatGPT for Healthcare: “Over the past two years, we’ve partnered with a global network of more than 260 licensed physicians across 60 countries of practice to evaluate model performance using real clinical scenarios. To date, this group has reviewed more than 600,000 model outputs spanning 30 areas of focus. Their continuous feedback has directly informed model training, safety mitigations, and product iteration. ChatGPT for Healthcare went through multiple rounds of physician-led red teaming to tune model behavior, trustworthy information retrieval, and other evaluations.” https://openai.com/index/openai-for-healthcare/
2. AI now predicts 130 diseases from 1 night of sleep https://www.nature.com/articles/s41591-025-04133-4
3. Scaling Open-Ended Reasoning To Predict the Future https://openforecaster.github.io/
4. Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space https://arxiv.org/abs/2512.24617
5. LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings https://arxiv.org/abs/2510.08338v3
6. Why LLMs Aren’t Scientists Yet. https://www.lesswrong.com/posts/y7TpjDtKFcJSGzunm/why-llms-aren-t-scientists-yet
7. Anthropic’s new update makes coding agents self-healing https://venturebeat.com/orchestration/claude-code-2-1-0-arrives-with-smoother-workflows-and-smarter-agents
8. Claude Code and What Comes Next https://www.oneusefulthing.org/p/claude-code-and-what-comes-next
9. An AI revolution in drugmaking is under way https://www.economist.com/science-and-technology/2026/01/05/an-ai-revolution-in-drugmaking-is-under-way [no paywall: https://archive.is/Si71E]
10. JPMorgan is cutting all ties with proxy advisory firms and replacing them with AI to help cast shareholder votes https://www.wsj.com/finance/banking/jpmorgan-cuts-all-ties-with-proxy-advisers-in-industry-first-78c43d5f [no paywall: https://archive.is/Ttc8z]
11. How Judges Are Using AI to Help Decide Your Legal Dispute https://www.wsj.com/tech/ai/how-ai-could-help-decide-your-next-legal-dispute-9cb12517 [no paywall: https://archive.is/nmelA]
12. Stack Overflow’s forum is dead thanks to AI, but the company’s still kicking... thanks to AI https://sherwood.news/tech/stack-overflow-forum-dead-thanks-ai-but-companys-still-kicking-ai/
13. First theoretical physics paper to credit an AI assistant https://arxiv.org/abs/2601.02484
14. Real poetry by AI https://gwern.net/fiction/lab-animals
Neuro(tech)
1. These Hearing Aids Will Tune in to Your Brain https://spectrum.ieee.org/hearing-aids-biosignals
2. If an event is more likely to occur at a certain point in time, the brain tracks the time until it occurs more precisely https://www.mpg.de/25980090/brain-estimates-probabilities-of-events
3. A brain-inspired approach to scientific computing https://newsreleases.sandia.gov/nature-inspired-computers-are-shockingly-good-at-math/
4. The End of Privacy: Tracking Technology is Everywhere Now https://www.youtube.com/watch?v=UYWjgceclS4
Miscellaneous
1. Explaining Cloud-9: A Celestial Object Like No Other https://www.centauri-dreams.org/2026/01/07/explaining-cloud-9-a-celestial-object-like-no-other/
2. A tiny number of hyper-prolific individuals are responsible for a massive percentage of public complaints, effectively "monopolizing" government resources and taxpayer money. https://marginalrevolution.com/marginalrevolution/2026/01/the-tyranny-of-the-complainers.html
3. In “Being Nicer than Clippy,” Joe Carlsmith argues that our approach to AI alignment should be guided by “niceness” (a specific human virtue of respecting the preferences and boundaries of others) rather than just a competitive “battle of the utility functions.” https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy
AI
1. ChatGPT for Healthcare: “Over the past two years, we’ve partnered with a global network of more than 260 licensed physicians across 60 countries of practice to evaluate model performance using real clinical scenarios. To date, this group has reviewed more than 600,000 model outputs spanning 30 areas of focus. Their continuous feedback has directly informed model training, safety mitigations, and product iteration. ChatGPT for Healthcare went through multiple rounds of physician-led red teaming to tune model behavior, trustworthy information retrieval, and other evaluations.” https://openai.com/index/openai-for-healthcare/
2. AI now predicts 130 diseases from 1 night of sleep https://www.nature.com/articles/s41591-025-04133-4
3. Scaling Open-Ended Reasoning To Predict the Future https://openforecaster.github.io/
4. Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space https://arxiv.org/abs/2512.24617
5. LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings https://arxiv.org/abs/2510.08338v3
6. Why LLMs Aren’t Scientists Yet. https://www.lesswrong.com/posts/y7TpjDtKFcJSGzunm/why-llms-aren-t-scientists-yet
7. Anthropic’s new update makes coding agents self-healing https://venturebeat.com/orchestration/claude-code-2-1-0-arrives-with-smoother-workflows-and-smarter-agents
8. Claude Code and What Comes Next https://www.oneusefulthing.org/p/claude-code-and-what-comes-next
9. An AI revolution in drugmaking is under way https://www.economist.com/science-and-technology/2026/01/05/an-ai-revolution-in-drugmaking-is-under-way [no paywall: https://archive.is/Si71E]
10. JPMorgan is cutting all ties with proxy advisory firms and replacing them with AI to help cast shareholder votes https://www.wsj.com/finance/banking/jpmorgan-cuts-all-ties-with-proxy-advisers-in-industry-first-78c43d5f [no paywall: https://archive.is/Ttc8z]
11. How Judges Are Using AI to Help Decide Your Legal Dispute https://www.wsj.com/tech/ai/how-ai-could-help-decide-your-next-legal-dispute-9cb12517 [no paywall: https://archive.is/nmelA]
12. Stack Overflow’s forum is dead thanks to AI, but the company’s still kicking... thanks to AI https://sherwood.news/tech/stack-overflow-forum-dead-thanks-ai-but-companys-still-kicking-ai/
13. First theoretical physics paper to credit an AI assistant https://arxiv.org/abs/2601.02484
14. Real poetry by AI https://gwern.net/fiction/lab-animals
Neuro(tech)
1. These Hearing Aids Will Tune in to Your Brain https://spectrum.ieee.org/hearing-aids-biosignals
2. If an event is more likely to occur at a certain point in time, the brain tracks the time until it occurs more precisely https://www.mpg.de/25980090/brain-estimates-probabilities-of-events
3. A brain-inspired approach to scientific computing https://newsreleases.sandia.gov/nature-inspired-computers-are-shockingly-good-at-math/
4. The End of Privacy: Tracking Technology is Everywhere Now https://www.youtube.com/watch?v=UYWjgceclS4
Miscellaneous
1. Explaining Cloud-9: A Celestial Object Like No Other https://www.centauri-dreams.org/2026/01/07/explaining-cloud-9-a-celestial-object-like-no-other/
2. A tiny number of hyper-prolific individuals are responsible for a massive percentage of public complaints, effectively "monopolizing" government resources and taxpayer money. https://marginalrevolution.com/marginalrevolution/2026/01/the-tyranny-of-the-complainers.html
3. In “Being Nicer than Clippy,” Joe Carlsmith argues that our approach to AI alignment should be guided by “niceness” (a specific human virtue of respecting the preferences and boundaries of others) rather than just a competitive “battle of the utility functions.” https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy
🤡2❤1👍1🔥1
Lots of people are now sold on this idea, including Musk, Bezos, and Sundar Pichai.
https://x.com/paulg/status/2009686627506065779
https://x.com/paulg/status/2009686627506065779
🤡16❤2😁1
Linus Torvalds is now vibe-coding: https://github.com/torvalds/AudioNoise
For those who don't know him, he's been the creator and lead developer of the Linux kernel since 1991.
For those who don't know him, he's been the creator and lead developer of the Linux kernel since 1991.
🔥16😭9🤡4🤷♂1❤1
STACK: An open-source artificial intelligence model designed to simulate how human cells behave under different conditions.
Scientists want to know how every type of cell in the human body reacts to every possible drug or disease. However, testing every combination physically in a lab would take years and cost millions of dollars. STACK acts as a "virtual cell model" that can predict these reactions digitally, even for scenarios it has never explicitly seen before.
Unlike previous models that look at cells in isolation, STACK looks at "sets" of cells to understand their environment. Just as a word's meaning changes based on the rest of the sentence, a cell’s behavior changes based on the cells around it.
When compared to real laboratory experiments, the model's predictions were found to capture biologically meaningful and accurate effects.
Read more: https://arcinstitute.org/news/foundation-model-stack
Scientists want to know how every type of cell in the human body reacts to every possible drug or disease. However, testing every combination physically in a lab would take years and cost millions of dollars. STACK acts as a "virtual cell model" that can predict these reactions digitally, even for scenarios it has never explicitly seen before.
Unlike previous models that look at cells in isolation, STACK looks at "sets" of cells to understand their environment. Just as a word's meaning changes based on the rest of the sentence, a cell’s behavior changes based on the cells around it.
When compared to real laboratory experiments, the model's predictions were found to capture biologically meaningful and accurate effects.
Read more: https://arcinstitute.org/news/foundation-model-stack
🤡1
Media is too big
VIEW IN TELEGRAM
What if we could simulate an interactive 3D world, from a single image, in the wild, in real time?
PointWorld: A world model designed to give robots a form of "physical imagination," helping them predict how the world will change when they touch it.
Instead of processing flat images like a standard video camera, the system views the world and the robot's own body as a cloud of 3D dots (points). It predicts the "flow" or path of every single dot over time, allowing it to understand complex interactions like geometry and contact.
Read more: https://point-world.github.io/
PointWorld: A world model designed to give robots a form of "physical imagination," helping them predict how the world will change when they touch it.
Instead of processing flat images like a standard video camera, the system views the world and the robot's own body as a cloud of 3D dots (points). It predicts the "flow" or path of every single dot over time, allowing it to understand complex interactions like geometry and contact.
Read more: https://point-world.github.io/
🔥1🤡1
Anthropic built “Cowork” in the last week and a half.
Claude Code wrote 100% of it.
Read more: Cowork: Claude Code for the rest of your work https://claude.com/blog/cowork-research-preview
Claude Code wrote 100% of it.
Read more: Cowork: Claude Code for the rest of your work https://claude.com/blog/cowork-research-preview
🤡12🔥5
Nvidia researchers developed a new method called TTT-E2E that mimics how humans learn. They use the analogy of a university lecture: years later, you likely won't remember the professor's exact words (perfect recall), but you retain the skills and intuition you learned (compressed knowledge).
Instead of just temporarily holding your conversation in a "short-term memory" buffer, this new method actually trains itself on your conversation while it is happening.
Usually, an AI stops learning once it is released to the public. This new model continues to learn and update its internal "brain" (weights) with every new sentence it reads.
By treating the current context as training data, the AI compresses that information into its permanent understanding. This allows it to "understand" a massive amount of information without needing to keep a perfect log of every single word.
Read more: https://developer.nvidia.com/blog/reimagining-llm-memory-using-context-as-training-data-unlocks-models-that-learn-at-test-time/
Instead of just temporarily holding your conversation in a "short-term memory" buffer, this new method actually trains itself on your conversation while it is happening.
Usually, an AI stops learning once it is released to the public. This new model continues to learn and update its internal "brain" (weights) with every new sentence it reads.
By treating the current context as training data, the AI compresses that information into its permanent understanding. This allows it to "understand" a massive amount of information without needing to keep a perfect log of every single word.
Read more: https://developer.nvidia.com/blog/reimagining-llm-memory-using-context-as-training-data-unlocks-models-that-learn-at-test-time/
🔥3🤡2
This media is not supported in your browser
VIEW IN TELEGRAM
Driverless delivery vehicles in China are going viral for plowing through everything.
🔥12😁8🤯1🤮1🤣1
The Department of War: «Military AI is going to be a race for the foreseeable future, and therefore speed wins»
The Secretary of War directs the Department of War (DoW) to accelerate "America's Military AI Dominance" by transforming into an "AI-first" warfighting force.
Read more: https://media.defense.gov/2026/Jan/12/2003855671/-1/-1/0/ARTIFICIAL-INTELLIGENCE-STRATEGY-FOR-THE-DEPARTMENT-OF-WAR.PDF
---
I am old enough to remember when people worried about AI risks would reject any mentions of Skynet because that's Hollywood and unrealistic. And nobody would build humanoid robots because that's stupid. Also, everyone would obviously lock AI into a box without an internet connection. The risk is that it could escape by convincing humans to let it out.
Turns out Hollywood was right.
But surely, we have the common sense not to hand the AI the nuclear launch codes?
Narrator: They did, in fact, hand over the codes immediately.
The Secretary of War directs the Department of War (DoW) to accelerate "America's Military AI Dominance" by transforming into an "AI-first" warfighting force.
Read more: https://media.defense.gov/2026/Jan/12/2003855671/-1/-1/0/ARTIFICIAL-INTELLIGENCE-STRATEGY-FOR-THE-DEPARTMENT-OF-WAR.PDF
---
I am old enough to remember when people worried about AI risks would reject any mentions of Skynet because that's Hollywood and unrealistic. And nobody would build humanoid robots because that's stupid. Also, everyone would obviously lock AI into a box without an internet connection. The risk is that it could escape by convincing humans to let it out.
Turns out Hollywood was right.
But surely, we have the common sense not to hand the AI the nuclear launch codes?
Narrator: They did, in fact, hand over the codes immediately.
💩10😁1🤡1
Google announces that a novel theorem in algebraic geometry was proved with substantial help from an internal math-specialized version of Gemini.
The solution was not in the training data: "the model outputs [particularly from FullProof] do not appear to the authors to be that close to those or (to the best of our knowledge) any other pre-existing sources. So, absent some future discovery to the contrary, the model’s contribution appears to involve a genuine combination of synthesis, retrieval, generalization and innovation of these existing techniques".
Paper: https://arxiv.org/abs/2601.07222
Source of the screenshot: Adam Brown from Google DeepMind https://x.com/A_G_I_Joe/status/2011213878395617571
The solution was not in the training data: "the model outputs [particularly from FullProof] do not appear to the authors to be that close to those or (to the best of our knowledge) any other pre-existing sources. So, absent some future discovery to the contrary, the model’s contribution appears to involve a genuine combination of synthesis, retrieval, generalization and innovation of these existing techniques".
Paper: https://arxiv.org/abs/2601.07222
Source of the screenshot: Adam Brown from Google DeepMind https://x.com/A_G_I_Joe/status/2011213878395617571
🤯5👍3🔥2🤡2
Links for 2026-01-14
AI
1. The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning https://arxiv.org/abs/2601.06002
2. MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory https://arxiv.org/abs/2601.03192
3. Dr. Zero: Self-Evolving Search Agents without Training Data https://arxiv.org/abs/2601.07055
4. A method to learn latent action world models using “in-the-wild” videos (real-world footage that is diverse, messy, and uncurated). https://arxiv.org/abs/2601.05230
5. 1X World Model | From Video to Action: A New Way Robots Learn https://www.1x.tech/discover/world-model-self-learning
6. ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands https://arxiv.org/abs/2512.24965
7. Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models https://github.com/deepseek-ai/Engram/blob/main/Engram_paper.pdf
8. How does scaling up neural networks change what they learn? On neural scaling and the quanta hypothesis https://ericjmichaud.com/quanta/
9. An FAQ on Reinforcement Learning Environments https://epochai.substack.com/p/an-faq-on-reinforcement-learning
10. Cowork: Claude Code for the rest of your work. Cowork lets you complete non-technical tasks much like how developers use Claude Code. https://www.lesswrong.com/posts/fm2N4cws8nbdmfGux/claude-coworks
11. On the Origins of Algorithmic Progress in AI https://mitfuturetech.substack.com/p/on-the-origins-of-algorithmic-progress
12. The Gentle Singularity; The Fast Takeoff https://www.prinzai.com/p/the-gentle-singularity-the-fast-takeoff
13. What do experts and superforecasters think about the future of AI research and development? https://forecastingresearch.substack.com/p/what-experts-and-superforecasters
AI for math
1. Lies, Damned Lies, and Proofs: Formal Methods are not Slopless https://www.lesswrong.com/posts/rhAPh3YzhPoBNpgHg/lies-damned-lies-and-proofs-formal-methods-are-not-slopless
2. From 2.8% to 100%: Automated Proof Verification with Aristotle https://igorrivin.github.io/research/polya-szego-aristotle/ [Aristotle’s waitlist is gone, and now anyone can sign up and immediately get access. https://aristotle.harmonic.fun/]
3. AxiomProver Solves All Problems at Putnam 2025: Proof Release & Commentary https://axiommath.ai/territory/from-seeing-why-to-checking-everything
4. Terence Tao on the emergence of AI-powered write–rewrite cycles for mathematical exposition. https://mathstodon.xyz/@tao/115855852706322322
5. Terence Tao: “I can honestly say I learned something from Aristotle; a minor thing to be sure, but still useful.” https://www.erdosproblems.com/forum/thread/679#post-3050
Miscellaneous
1. This simple design change could finally fix solid-state batteries https://www.sciencedaily.com/releases/2026/01/260108231331.htm
2. Pentagon bought device through undercover operation some investigators suspect is linked to Havana Syndrome https://edition.cnn.com/2026/01/13/politics/havana-syndrome-device-pentagon-hsi
3. UK to develop new deep strike ballistic missile for Ukraine https://www.gov.uk/government/news/uk-to-develop-new-deep-strike-ballistic-missile-for-ukraine
4. A new study suggests it may be possible to regenerate cartilage lost to aging or arthritis with an oral drug or local injection, rendering knee and hip replacement unnecessary. https://news.stanford.edu/stories/2025/11/joint-cartilage-aging-osteoarthritis-therapy-research
5. Scientists detect the lowest mass dark object currently measured - an exotic concentration of dark matter? https://www.mpg.de/25518363/1007-asph-astronomers-image-a-mysterious-dark-object-in-the-distant-universe-155031-x
6. To be even minimally competent in any intellectual field requires having a bedrock of core knowledge memorized. So you *don’t* have to think about it. For math, far more than times tables. https://intellectualtakeout.org/2015/09/why-its-still-important-to-memorize/
AI
1. The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning https://arxiv.org/abs/2601.06002
2. MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory https://arxiv.org/abs/2601.03192
3. Dr. Zero: Self-Evolving Search Agents without Training Data https://arxiv.org/abs/2601.07055
4. A method to learn latent action world models using “in-the-wild” videos (real-world footage that is diverse, messy, and uncurated). https://arxiv.org/abs/2601.05230
5. 1X World Model | From Video to Action: A New Way Robots Learn https://www.1x.tech/discover/world-model-self-learning
6. ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands https://arxiv.org/abs/2512.24965
7. Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models https://github.com/deepseek-ai/Engram/blob/main/Engram_paper.pdf
8. How does scaling up neural networks change what they learn? On neural scaling and the quanta hypothesis https://ericjmichaud.com/quanta/
9. An FAQ on Reinforcement Learning Environments https://epochai.substack.com/p/an-faq-on-reinforcement-learning
10. Cowork: Claude Code for the rest of your work. Cowork lets you complete non-technical tasks much like how developers use Claude Code. https://www.lesswrong.com/posts/fm2N4cws8nbdmfGux/claude-coworks
11. On the Origins of Algorithmic Progress in AI https://mitfuturetech.substack.com/p/on-the-origins-of-algorithmic-progress
12. The Gentle Singularity; The Fast Takeoff https://www.prinzai.com/p/the-gentle-singularity-the-fast-takeoff
13. What do experts and superforecasters think about the future of AI research and development? https://forecastingresearch.substack.com/p/what-experts-and-superforecasters
AI for math
1. Lies, Damned Lies, and Proofs: Formal Methods are not Slopless https://www.lesswrong.com/posts/rhAPh3YzhPoBNpgHg/lies-damned-lies-and-proofs-formal-methods-are-not-slopless
2. From 2.8% to 100%: Automated Proof Verification with Aristotle https://igorrivin.github.io/research/polya-szego-aristotle/ [Aristotle’s waitlist is gone, and now anyone can sign up and immediately get access. https://aristotle.harmonic.fun/]
3. AxiomProver Solves All Problems at Putnam 2025: Proof Release & Commentary https://axiommath.ai/territory/from-seeing-why-to-checking-everything
4. Terence Tao on the emergence of AI-powered write–rewrite cycles for mathematical exposition. https://mathstodon.xyz/@tao/115855852706322322
5. Terence Tao: “I can honestly say I learned something from Aristotle; a minor thing to be sure, but still useful.” https://www.erdosproblems.com/forum/thread/679#post-3050
Miscellaneous
1. This simple design change could finally fix solid-state batteries https://www.sciencedaily.com/releases/2026/01/260108231331.htm
2. Pentagon bought device through undercover operation some investigators suspect is linked to Havana Syndrome https://edition.cnn.com/2026/01/13/politics/havana-syndrome-device-pentagon-hsi
3. UK to develop new deep strike ballistic missile for Ukraine https://www.gov.uk/government/news/uk-to-develop-new-deep-strike-ballistic-missile-for-ukraine
4. A new study suggests it may be possible to regenerate cartilage lost to aging or arthritis with an oral drug or local injection, rendering knee and hip replacement unnecessary. https://news.stanford.edu/stories/2025/11/joint-cartilage-aging-osteoarthritis-therapy-research
5. Scientists detect the lowest mass dark object currently measured - an exotic concentration of dark matter? https://www.mpg.de/25518363/1007-asph-astronomers-image-a-mysterious-dark-object-in-the-distant-universe-155031-x
6. To be even minimally competent in any intellectual field requires having a bedrock of core knowledge memorized. So you *don’t* have to think about it. For math, far more than times tables. https://intellectualtakeout.org/2015/09/why-its-still-important-to-memorize/
👍2🤡2❤1
Cursor autonomously coded a web browser from scratch by running hundreds of concurrent coding agents for weeks.
They achieved this by using specialized agents they call planners, workers, and judges:
Planners: Continuously explore the codebase, create tasks, and spawn sub-planners for specific areas.
Workers: Focus purely on executing assigned tasks without worrying about broader coordination.
Judges: Determine when to continue or restart cycles to prevent tunnel vision.
Model Selection: GPT-5.2 excelled at long-running tasks and planning compared to Opus 4.5 (which took shortcuts) or GPT-5.1-codex.
Read more: https://cursor.com/blog/scaling-agents#planners-and-workers
They achieved this by using specialized agents they call planners, workers, and judges:
Planners: Continuously explore the codebase, create tasks, and spawn sub-planners for specific areas.
Workers: Focus purely on executing assigned tasks without worrying about broader coordination.
Judges: Determine when to continue or restart cycles to prevent tunnel vision.
Model Selection: GPT-5.2 excelled at long-running tasks and planning compared to Opus 4.5 (which took shortcuts) or GPT-5.1-codex.
Read more: https://cursor.com/blog/scaling-agents#planners-and-workers
🤡10👍5😱3❤1
My gut feelings still can't quite grasp the possibility of transformative AI being just a few years away. But on a purely rational level, recent research has made me more convinced than ever that the world is going to change dramatically.
Over the next few years, we will see the emergence of adaptive, self-improving systems that can create their own tools. Training data will come from synthetic adversarial evolution between millions of agents, through proposer-solver-verifier pipelines, and search/optimization cycles. Furthermore, these systems will learn from their interactions with experts, such as Terence Tao, by continually compressing useful structures into their weights while using inexpensive, explicit storage for facts and logs.
Over the next few years, we will see the emergence of adaptive, self-improving systems that can create their own tools. Training data will come from synthetic adversarial evolution between millions of agents, through proposer-solver-verifier pipelines, and search/optimization cycles. Furthermore, these systems will learn from their interactions with experts, such as Terence Tao, by continually compressing useful structures into their weights while using inexpensive, explicit storage for facts and logs.
🔥4👍3🤔2🤡2💯2
Grok seems to be finally catching up when it comes to math. An internal beta version of Grok 4.20 helped Paata Ivanisvili (Professor of Mathematics) with his research: https://x.com/PI010101/status/2011560477688463573
The significance of this and similar data points does not lie in the result itself. Rather, it is the fact that models are now frequently useful for people at the far-right tail of human intellectual ability. That is the noteworthy milestone here. And given that there is absolutely no reason to expect AI progress to stop here, this implies a significant probability that things could get disruptive relatively soon.
The significance of this and similar data points does not lie in the result itself. Rather, it is the fact that models are now frequently useful for people at the far-right tail of human intellectual ability. That is the noteworthy milestone here. And given that there is absolutely no reason to expect AI progress to stop here, this implies a significant probability that things could get disruptive relatively soon.
👍7🙏3😁2🤡2🤝1