Moonshot AI released Kimi K2 Thinking. The Open-Source Thinking Agent Model is here.
- SOTA on HLE (44.9%) and BrowseComp (60.2%)
- Executes up to 200 – 300 sequential tool calls without human interference
- Excels in reasoning, agentic search, and coding
- 256K context window
Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns.
Weights and code.
- SOTA on HLE (44.9%) and BrowseComp (60.2%)
- Executes up to 200 – 300 sequential tool calls without human interference
- Excels in reasoning, agentic search, and coding
- 256K context window
Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns.
Weights and code.
moonshotai.github.io
Kimi K2 Thinking
Kimi K2 Thinking, Moonshot's best open-source thinking model.
🔥5😍4🤔2🆒2👏1
Google to roll out Polymarket and Kalshi prediction markets data in search results.
The Block
Google Finance to roll out Polymarket and Kalshi prediction markets data in search results
Google said prediction markets data from leading platforms Polymarket and Kalshi will roll out over the coming weeks.
👍7🔥3👏3
Sakana AI is building artificial life and they can evolve: Petri Dish Neural Cellular Automata (PD-NCA) let multiple NCA agents learn and adapt during simulation, not just after training.
Each cell updates its own parameters via gradient descent, turning morphogenesis into a living ecosystem of competing, cooperating, and ever-evolving entities—showing emergent cycles and persistent complexity growth.
GitHub
Each cell updates its own parameters via gradient descent, turning morphogenesis into a living ecosystem of competing, cooperating, and ever-evolving entities—showing emergent cycles and persistent complexity growth.
GitHub
Petri Dish NCA
PPetri Dish Neural Cellular Automata (PD-NCA) is a new ALife simulation substrate that replaces the fixed, non-adaptive morphogenesis of conventional NCA—where model parameters remain constant during development—with multi-agent open-ended growth, trained…
❤8🔥3🥰2
DreamGym from Meta is a new framework that lets AI agents train via synthetic reasoning-based experiences instead of costly real rollouts.
It models environment dynamics, replays and adapts tasks, and even improves sim-to-real transfer.
Results: +30% gains on WebArena and PPO-level performance—using only synthetic interactions.
It models environment dynamics, replays and adapts tasks, and even improves sim-to-real transfer.
Results: +30% gains on WebArena and PPO-level performance—using only synthetic interactions.
🔥5🥰3👏3
Google Introduced Nested Learning: a new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing.
A proof-of-concept model, Hope, shows improved performance in language modeling.
A proof-of-concept model, Hope, shows improved performance in language modeling.
research.google
Introducing Nested Learning: A new ML paradigm for continual learning
🔥6❤3👏3
Alibaba introduced ReasonMed: the largest medical reasoning dataset, advancing LLM performance in clinical QA.
Comprising 370k curated examples distilled from 1.75M reasoning paths, ReasonMed is built through a multi-agent EMD (easy–medium–difficult) pipeline with generation, verification, and an Error Refiner that corrects faulty reasoning steps.
Experiments show that combining detailed CoT reasoning with concise answer summaries yields the most robust fine-tuning outcomes.
- Models trained on ReasonMed redefine the state of the art:
- ReasonMed-7B outperforms all sub-10B models by +4.17% and even beats LLaMA3.1-70B on PubMedQA (+4.60%).
- ReasonMed-14B maintains strong scaling efficiency and competitive accuracy.
Hf.
GitHub.
Comprising 370k curated examples distilled from 1.75M reasoning paths, ReasonMed is built through a multi-agent EMD (easy–medium–difficult) pipeline with generation, verification, and an Error Refiner that corrects faulty reasoning steps.
Experiments show that combining detailed CoT reasoning with concise answer summaries yields the most robust fine-tuning outcomes.
- Models trained on ReasonMed redefine the state of the art:
- ReasonMed-7B outperforms all sub-10B models by +4.17% and even beats LLaMA3.1-70B on PubMedQA (+4.60%).
- ReasonMed-14B maintains strong scaling efficiency and competitive accuracy.
Hf.
GitHub.
🆒5❤3👏3🔥1
Moonshot AI : Quantization is not a compromise — it's the next paradigm.
After K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
Moonshot and Zhihu contributor, shared an insider's view on why this choice matters — and why quantization today isn't just about sacrificing precision for speed.
In the context of LLMs, quantization is no longer a trade-off.
With the evolution of param-scaling and test-time-scaling, native low-bit quantization will become a standard paradigm for large model training.
Why Low-bit Quantization Matters?
In modern LLM inference, there are two distinct optimization goals:
• High throughput (cost-oriented): maximize GPU utilization via large batch sizes.
• Low latency (user-oriented): minimize per-query response time.
For Kimi-K2's MoE structure (with 1/48 sparsity), decoding is memory-bound — the smaller the model weights, the faster the compute.
FP8 weights (≈1 TB) already hit the limit of what a single high-speed interconnect GPU node can handle.
By switching to W4A16, latency drops sharply while maintaining quality — a perfect fit for low-latency inference.
Why QAT over PTQ?
Post-training quantization (PTQ) worked well for shorter generations, but failed in longer reasoning chains:
• Error accumulation during long decoding degraded precision.
• Dependence on calibration data caused "expert distortion" in sparse MoE layers.
Thus, K2-Thinking adopted QAT for minimal loss and more stable long-context reasoning.
How it works?
K2-Thinking uses a weight-only QAT with fake quantization + STE (straight-through estimator).
The pipeline was fully integrated in just days — from QAT training → INT4 inference → RL rollout — enabling near lossless results without extra tokens or retraining.
INT4's hidden advantage in RL
Few people mention this: native INT4 doesn't just speed up inference — it accelerates RL training itself.
Because RL rollouts often suffer from "long-tail" inefficiency, INT4's low-latency profile makes those stages much faster.
In practice, each RL iteration runs 10-20% faster end-to-end.
Moreover, quantized RL brings stability: smaller representational space reduces accumulation error, improving learning robustness.
Why INT4, not MXFP4?
Kimi chose INT4 over "fancier" MXFP4/NVFP4 to better support non-Blackwell GPUs, with strong existing kernel support (e.g., Marlin).
At a quant scale of 1×32, INT4 matches FP4 formats in expressiveness while being more hardware-adaptable.
After K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
Moonshot and Zhihu contributor, shared an insider's view on why this choice matters — and why quantization today isn't just about sacrificing precision for speed.
In the context of LLMs, quantization is no longer a trade-off.
With the evolution of param-scaling and test-time-scaling, native low-bit quantization will become a standard paradigm for large model training.
Why Low-bit Quantization Matters?
In modern LLM inference, there are two distinct optimization goals:
• High throughput (cost-oriented): maximize GPU utilization via large batch sizes.
• Low latency (user-oriented): minimize per-query response time.
For Kimi-K2's MoE structure (with 1/48 sparsity), decoding is memory-bound — the smaller the model weights, the faster the compute.
FP8 weights (≈1 TB) already hit the limit of what a single high-speed interconnect GPU node can handle.
By switching to W4A16, latency drops sharply while maintaining quality — a perfect fit for low-latency inference.
Why QAT over PTQ?
Post-training quantization (PTQ) worked well for shorter generations, but failed in longer reasoning chains:
• Error accumulation during long decoding degraded precision.
• Dependence on calibration data caused "expert distortion" in sparse MoE layers.
Thus, K2-Thinking adopted QAT for minimal loss and more stable long-context reasoning.
How it works?
K2-Thinking uses a weight-only QAT with fake quantization + STE (straight-through estimator).
The pipeline was fully integrated in just days — from QAT training → INT4 inference → RL rollout — enabling near lossless results without extra tokens or retraining.
INT4's hidden advantage in RL
Few people mention this: native INT4 doesn't just speed up inference — it accelerates RL training itself.
Because RL rollouts often suffer from "long-tail" inefficiency, INT4's low-latency profile makes those stages much faster.
In practice, each RL iteration runs 10-20% faster end-to-end.
Moreover, quantized RL brings stability: smaller representational space reduces accumulation error, improving learning robustness.
Why INT4, not MXFP4?
Kimi chose INT4 over "fancier" MXFP4/NVFP4 to better support non-Blackwell GPUs, with strong existing kernel support (e.g., Marlin).
At a quant scale of 1×32, INT4 matches FP4 formats in expressiveness while being more hardware-adaptable.
Telegram
All about AI, Web 3.0, BCI
Moonshot AI released Kimi K2 Thinking. The Open-Source Thinking Agent Model is here.
- SOTA on HLE (44.9%) and BrowseComp (60.2%)
- Executes up to 200 – 300 sequential tool calls without human interference
- Excels in reasoning, agentic search, and coding…
- SOTA on HLE (44.9%) and BrowseComp (60.2%)
- Executes up to 200 – 300 sequential tool calls without human interference
- Excels in reasoning, agentic search, and coding…
🔥4👏4🥰3
Meta introduced Omnilingual Automatic Speech Recognition (ASR), a suite of models providing ASR capabilities for over 1,600 languages, including 500 low-coverage languages never before served by any ASR system.
While most ASR systems focus on a limited set of languages that are well-represented on the internet, this release marks a major step toward building a truly universal transcription system.
They’re released a full suite of models and a dataset:
1. Omnilingual ASR: A suite of ASR models ranging from 300M to 7B parameters, supporting 1600+ languages
2. Omnilingual w2v 2.0: a 7B-parameter multilingual speech representation model that can be leveraged for other downstream speech-related tasks
3. Omnilingual ASR Corpus: a unique dataset spanning 350 underserved languages that was curated in collaboration with our global partners
While most ASR systems focus on a limited set of languages that are well-represented on the internet, this release marks a major step toward building a truly universal transcription system.
They’re released a full suite of models and a dataset:
1. Omnilingual ASR: A suite of ASR models ranging from 300M to 7B parameters, supporting 1600+ languages
2. Omnilingual w2v 2.0: a 7B-parameter multilingual speech representation model that can be leveraged for other downstream speech-related tasks
3. Omnilingual ASR Corpus: a unique dataset spanning 350 underserved languages that was curated in collaboration with our global partners
🔥5❤4👏3
Pleias released a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it.
Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.
SYNTH is a radical departure from the classic pre-training recipe. At its core it’s an upsampling of Wikipedia 50,000 “vital” articles.
SYNTH is a collection of several synthetic playgrounds: data is not generated through simple prompts but by integrating smaller fine-tuned models into workflows with seeding, constraints, and formal verifications/checks.
Synthetic playgrounds enabled a series of controlled experiments that brought us to favor extreme depth design. Pleias selected a 80-layers architecture for Baguettotron, with improvements across the board on memorization of logical reasoning.
Along with Baguettotron Pleias released the smallest viable language model to date. Monad, a 56M transformer, trained on the English part of SYNTH with non-random performance on MMLU. Desiging Monad an engineering challenge requiring a custom tiny tokenizer.
Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.
SYNTH is a radical departure from the classic pre-training recipe. At its core it’s an upsampling of Wikipedia 50,000 “vital” articles.
SYNTH is a collection of several synthetic playgrounds: data is not generated through simple prompts but by integrating smaller fine-tuned models into workflows with seeding, constraints, and formal verifications/checks.
Synthetic playgrounds enabled a series of controlled experiments that brought us to favor extreme depth design. Pleias selected a 80-layers architecture for Baguettotron, with improvements across the board on memorization of logical reasoning.
Along with Baguettotron Pleias released the smallest viable language model to date. Monad, a 56M transformer, trained on the English part of SYNTH with non-random performance on MMLU. Desiging Monad an engineering challenge requiring a custom tiny tokenizer.
huggingface.co
PleIAs/SYNTH · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥5❤3🥰3
Google has released a new "Introduction to Agents" guide, which discusses a "self-evolving" agentic system (Level 4).
"At this level, an agentic system can identify gaps in its own capabilities and create new tools or even new agents to fill them."
"At this level, an agentic system can identify gaps in its own capabilities and create new tools or even new agents to fill them."
🔥8🥰2👏2
AELLA is an open-science initiative to make scientific research accessible via structured summaries created by LLMs
Available now:
- Dataset of 100K summaries
- 2 fine-tuned LLMs
- 3d visualizer.
This project spans many disciplines:
- bespoke model-training pipelines
- high-throughput inference systems
- protocols to ensure compute integrity and more.
Models.
Visualizer.
Available now:
- Dataset of 100K summaries
- 2 fine-tuned LLMs
- 3d visualizer.
This project spans many disciplines:
- bespoke model-training pipelines
- high-throughput inference systems
- protocols to ensure compute integrity and more.
Models.
Visualizer.
inference.net
Project OSSAS: Custom LLMs to process 100 Million Research Papers
Project OSSAS is a large-scale open-science initiative to make the world’s scientific knowledge accessible through AI-generated summaries of research papers.
❤5🔥2👏2
ByteDance launched Doubao-Seed-Code, a model specifically designed for programming tasks.
It supports native 256K long context and has claimed the top spot on the SWE-Bench Verified leaderboard.
It supports native 256K long context and has claimed the top spot on the SWE-Bench Verified leaderboard.
Volcengine
火山引擎-你的AI云
火山引擎是字节跳动旗下的云与AI服务平台。在AI时代,聚焦豆包大模型和AI云原生技术,为企业提供从 Agent 开发到部署的一站式服务,助力企业AI转型与创新发展。
A new paper from YANN LECUN. LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics. GitHub.
This could be one of LeCun's last papers at Meta (lol), but it's a really interesting one.
Quick summary:
Yann LeCun's big idea is JEPA, a self-supervised learning method. However, there are various failure modes of this approach, so training strong JEPA models is very brittle, unstable, and quite difficult. So overall JEPA has seen little adoption in practice.
This paper tries to directly address this, making specific design decisions that improve training stability.
The authors identify the isotropic Gaussian as the optimal distribution that JEPA models’ embeddings should follow and design the Sketched Isotropic Gaussian Regularization (SICReg) to constrain embeddings to reach that ideal distribution. This forms the LeJEPA framework, which can be implemented in ~50 lines of code.
On empirical tests, the authors demonstrate stability of training across hyperparameters, architectures, and datasets.
A result particularly interesting to me however is that training a LeJEPA model from scratch directly on the downstream dataset outperforms finetuning a DINOv2/v3 model on the dataset!
This could be one of LeCun's last papers at Meta (lol), but it's a really interesting one.
Quick summary:
Yann LeCun's big idea is JEPA, a self-supervised learning method. However, there are various failure modes of this approach, so training strong JEPA models is very brittle, unstable, and quite difficult. So overall JEPA has seen little adoption in practice.
This paper tries to directly address this, making specific design decisions that improve training stability.
The authors identify the isotropic Gaussian as the optimal distribution that JEPA models’ embeddings should follow and design the Sketched Isotropic Gaussian Regularization (SICReg) to constrain embeddings to reach that ideal distribution. This forms the LeJEPA framework, which can be implemented in ~50 lines of code.
On empirical tests, the authors demonstrate stability of training across hyperparameters, architectures, and datasets.
A result particularly interesting to me however is that training a LeJEPA model from scratch directly on the downstream dataset outperforms finetuning a DINOv2/v3 model on the dataset!
Japan’s first yen stablecoin issuer says stablecoin issuers could replace the Bank of Japan as major bond buyers.
Cointelegraph
Japan’s JPYC Says Stablecoins May Become Key Bond Buyers
JPYC projects yen stablecoin issuers will invest heavily in JGBs, potentially shaping liquidity and Japan’s bond-buying landscape.
🔥5❤2🏆2👍1
Anthropic is investing $50B to build data centers in TX and NY, with sites coming online throughout 2026.
Anthropic
Anthropic invests $50 billion in American AI infrastructure
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Last year Google’s AlphaProof & AlphaGeometry reached a key landmark in AI by achieving silver medal level performance at the International Math Olympiad.
Today, Nature is publishing the methodology behind agent AlphaProof.
Today, Nature is publishing the methodology behind agent AlphaProof.
Nature
Olympiad-level formal mathematical reasoning with reinforcement learning
Nature - Olympiad-level formal mathematical reasoning with reinforcement learning
🔥6👍2👏2
Anthropic’s applied AI team with a great write up on improving Claude’s frontend design via Skills.
Also with a Claude Code plugin that packages up the skill.
Also with a Claude Code plugin that packages up the skill.
Claude
Improving frontend design through Skills | Claude
Best practices for building richer, more customized frontend design with Claude and Skills.
👍4🔥2🥰2
New ByteDance + Yale + NYU + Tsinghua paper builds an LLM based agent called AlphaResearch that searches for new algorithms instead of reusing known ones.
For each problem, AlphaResearch first writes a natural language idea for an algorithm and then turns that idea into code.
The big deal is that this setup lets an LLM push actual mathematical records using a simple loop of scoring ideas and executing code, and the same loop could also search for better algorithms in many other domains.
A reward model trained on peer review data scores each idea and filters out the weakest ones before coding.
An execution engine then runs the code, checks all constraints, and reports a numeric performance score.
The agent loops over this process, sampling old attempts, tweaking ideas and programs, and keeping any version that improves the score.
To measure progress, the authors build a benchmark of 8 open ended algorithm problems with strong human baselines.
On this benchmark, AlphaResearch improves steadily and beats the best human constructions on 2 circle packing tasks, while still trailing people on the other 6.
For each problem, AlphaResearch first writes a natural language idea for an algorithm and then turns that idea into code.
The big deal is that this setup lets an LLM push actual mathematical records using a simple loop of scoring ideas and executing code, and the same loop could also search for better algorithms in many other domains.
A reward model trained on peer review data scores each idea and filters out the weakest ones before coding.
An execution engine then runs the code, checks all constraints, and reports a numeric performance score.
The agent loops over this process, sampling old attempts, tweaking ideas and programs, and keeping any version that improves the score.
To measure progress, the authors build a benchmark of 8 open ended algorithm problems with strong human baselines.
On this benchmark, AlphaResearch improves steadily and beats the best human constructions on 2 circle packing tasks, while still trailing people on the other 6.
arXiv.org
AlphaResearch: Accelerating New Algorithm Discovery with Language Models
Large language models have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present \textbf{AlphaResearch},...
🔥2🥰2👏2
Czech National Bank has announced the establishment of a pilot digital asset portfolio totaling $1 million, comprising Bitcoin, a USD stablecoin, and a tokenized deposit.
Approved on October 30, the initiative plans to share insights within the next 2–3 years.
The central bank reportedly maintains this is the first instance of a central bank including Bitcoin on its balance sheet.
Approved on October 30, the initiative plans to share insights within the next 2–3 years.
The central bank reportedly maintains this is the first instance of a central bank including Bitcoin on its balance sheet.
Coindesk
Bitcoin (BTC) Comes to Central Bank Balance Sheet as CNB Buys
The bank said it created a $1 million "test portfolio" of digital assets, mostly made up of bitcoin.
🔥2🥰2👏2
Google introduced SIMA 2: an agent that plays, reasons, and learns with u in vrtual 3D Worlds
Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments – meaning you can talk to it through text, voice, or even images.
Google trained SIMA 2 to achieve high-level goals in a wide array of games – allowing it to perform complex reasoning and independently plan how to accomplish tasks.
It acts like a collaborative partner that can explain its intentions and answer questions about its behavior.
SIMA 2 is now far better at carrying out detailed instructions, even in worlds it's never seen before.
It can transfer learned concepts like “mining” in one game and apply it to “harvesting” in another – connecting the dots between similar tasks.
It even navigated unseen environments created in real-time by Genie 3 model.
SIMA 2 can teach itself new skills, learning through trial-and-error, based on feedback from Gemini. Getting better the more it plays –without additional human input.
SIMA 2 research offers a path towards applications in robotics and another step towards AGI in the real world.
Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments – meaning you can talk to it through text, voice, or even images.
Google trained SIMA 2 to achieve high-level goals in a wide array of games – allowing it to perform complex reasoning and independently plan how to accomplish tasks.
It acts like a collaborative partner that can explain its intentions and answer questions about its behavior.
SIMA 2 is now far better at carrying out detailed instructions, even in worlds it's never seen before.
It can transfer learned concepts like “mining” in one game and apply it to “harvesting” in another – connecting the dots between similar tasks.
It even navigated unseen environments created in real-time by Genie 3 model.
SIMA 2 can teach itself new skills, learning through trial-and-error, based on feedback from Gemini. Getting better the more it plays –without additional human input.
SIMA 2 research offers a path towards applications in robotics and another step towards AGI in the real world.
Google DeepMind
SIMA 2: A Gemini-Powered AI Agent for 3D Virtual Worlds
Introducing SIMA 2, the next milestone in our research creating general and helpful AI agents. By integrating the advanced capabilities of our Gemini models, SIMA is evolving from an instruction-foll…
❤3🔥2👏2
OpenAI developed a new way to train small AI models with internal mechanisms that are easier for humans to understand.
Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work.
In a new research, team train “sparse” models—with fewer, simpler connections between neurons—to see whether their computations become easier to understand.
Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work.
In a new research, team train “sparse” models—with fewer, simpler connections between neurons—to see whether their computations become easier to understand.
Openai
Understanding neural networks through sparse circuits
We trained models to think in simpler, more traceable steps—so we can better understand how they work.
🔥3🥰3👍2