Happy new year folks! Wishing everyone a bright and inspiring new year 🎉
May 2026 be a year of bold ideas.
Let’s keep building, exploring, and pushing the boundaries together.
Happy New Year from @alwebbci 🚀
May 2026 be a year of bold ideas.
Let’s keep building, exploring, and pushing the boundaries together.
Happy New Year from @alwebbci 🚀
🆒4
So the first major paper of 2026, #DeepSeek mHC: Manifold-Constrained Hyper-Connections
This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for reading. So initial notes on this first.
DeepSeek paper starts almost in media res and first underlines a major success of HC original approach: increase in math/topological complexity did not result in computational overhead.
Overall the actual flex of the paper is not so much proving Hyper-Connections can work at scale.
It’s: we have the internal capacity to re-engineer the complete training environment at all dimensions (kernels, memory management, inter-node communication) around highly experimental research ideas.
This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for reading. So initial notes on this first.
DeepSeek paper starts almost in media res and first underlines a major success of HC original approach: increase in math/topological complexity did not result in computational overhead.
Overall the actual flex of the paper is not so much proving Hyper-Connections can work at scale.
It’s: we have the internal capacity to re-engineer the complete training environment at all dimensions (kernels, memory management, inter-node communication) around highly experimental research ideas.
arXiv.org
mHC: Manifold-Constrained Hyper-Connections
Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and...
❤7🔥2👏2
This new open-source "brain" just became the world's best robot model. Spirit AI presents Spirit v1.5.
This new vision-language-action model translates what a robot sees into precise physical actions.
It now ranks #1 on the RoboChallenge Table30 benchmark, outperforming the previous leader, Pi0.5, in robotic reasoning and control.
Code.
Model.
This new vision-language-action model translates what a robot sees into precise physical actions.
It now ranks #1 on the RoboChallenge Table30 benchmark, outperforming the previous leader, Pi0.5, in robotic reasoning and control.
Code.
Model.
GitHub
GitHub - Spirit-AI-Team/spirit-v1.5: Spirit-v1.5: A Robotic Foundation Model by Spirit AI
Spirit-v1.5: A Robotic Foundation Model by Spirit AI - Spirit-AI-Team/spirit-v1.5
⚡5❤2🔥2👏2
Apple and Google have entered into a multi-year collaboration under which the next generation of Apple Foundation Models will be based on Google's Gemini models and cloud technology.
These models will help power future Apple Intelligence features, including a more personalized Siri coming this year.
After careful evaluation, Apple determined that Google's Al technology provides the most capable foundation for Apple Foundation Models and is excited about the innovative new experiences it will unlock for Apple users. Apple Intelligence will continue to run on Apple devices and Private Cloud Compute, while maintaining Apple's industry-leading privacy standards.
These models will help power future Apple Intelligence features, including a more personalized Siri coming this year.
After careful evaluation, Apple determined that Google's Al technology provides the most capable foundation for Apple Foundation Models and is excited about the innovative new experiences it will unlock for Apple users. Apple Intelligence will continue to run on Apple devices and Private Cloud Compute, while maintaining Apple's industry-leading privacy standards.
🔥4❤3👏2😁2
Huge, new release from DeepSeek & PKU. Enter "Engram," a new conditional memory module.
It's like a super-fast, internal lookup table for knowledge, freeing up the model's compute for actual reasoning.
#Deepseek's new paper is a very nice read. The idea builds on previous work like Over-tokenized Transformer, Per-Layer Embeddings and N-grammer, they scale it and got some pretty convincing results!
The goal is quite simple: free some effective depth for complex modules like MoE and attention by creating a new layer specialized in efficient retrieval. And of course it's DeepSeek, so the system design works nicely with hardware at inference and training, especially you can scale model size with ngrams.
Results: It beats iso-parameter MoE models across the board.
Big gains in general reasoning (BBH +5.0), knowledge (MMLU +3.4), code (HumanEval +3.0), math (MATH +2.4), and massively improves long-context retrieval.
If you still aren’t bullish on SSD demand, read this and get storage-pilled.
Paper.
Code.
It's like a super-fast, internal lookup table for knowledge, freeing up the model's compute for actual reasoning.
#Deepseek's new paper is a very nice read. The idea builds on previous work like Over-tokenized Transformer, Per-Layer Embeddings and N-grammer, they scale it and got some pretty convincing results!
The goal is quite simple: free some effective depth for complex modules like MoE and attention by creating a new layer specialized in efficient retrieval. And of course it's DeepSeek, so the system design works nicely with hardware at inference and training, especially you can scale model size with ngrams.
Results: It beats iso-parameter MoE models across the board.
Big gains in general reasoning (BBH +5.0), knowledge (MMLU +3.4), code (HumanEval +3.0), math (MATH +2.4), and massively improves long-context retrieval.
If you still aren’t bullish on SSD demand, read this and get storage-pilled.
Paper.
Code.
GitHub
Engram/Engram_paper.pdf at main · deepseek-ai/Engram
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models - deepseek-ai/Engram
🔥7❤5🥰3
Anthropic introduced Cowork:
Claude Code for the rest of your work.
Cowork lets you complete non-technical tasks much like how developers use Claude Code.
In Cowork, you give Claude access to a folder on your computer. Claude can then read, edit, or create files in that folder.
Once you've set a task, Claude makes a plan and steadily completes it, looping you in along the way.
Claude will ask before taking any significant actions so you can course-correct as needed.
Claude can use your existing connectors, which link Claude to external information.
You can also pair Cowork with Claude in Chrome for tasks that need browser access.
Cowork is available as a research preview for Claude Max subscribers in the macOS app.
If you're on another plan, join the waitlist for future access here.
Claude Code for the rest of your work.
Cowork lets you complete non-technical tasks much like how developers use Claude Code.
In Cowork, you give Claude access to a folder on your computer. Claude can then read, edit, or create files in that folder.
Once you've set a task, Claude makes a plan and steadily completes it, looping you in along the way.
Claude will ask before taking any significant actions so you can course-correct as needed.
Claude can use your existing connectors, which link Claude to external information.
You can also pair Cowork with Claude in Chrome for tasks that need browser access.
Cowork is available as a research preview for Claude Max subscribers in the macOS app.
If you're on another plan, join the waitlist for future access here.
Claude
Cowork: Claude Code power for knowledge work | Claude by Anthropic
Give Claude access to your local files and let it complete tasks autonomously. Claude Cowork brings Claude Code's agentic capabilities to the desktop app for non-technical work.
❤🔥5❤5👍5
Tencent's WeChat AI presents WeDLM
It's a new diffusion decoding framework that uses standard, forward-looking attention. This lets it use the same high-speed caching systems as today's top LLMs, avoiding the slowdowns of other diffusion models.
The result? It matches the quality of top autoregressive models while delivering up to 3x faster inference on complex reasoning tasks and up to 10x faster on simpler text.
Paper
GitHub
Model.
It's a new diffusion decoding framework that uses standard, forward-looking attention. This lets it use the same high-speed caching systems as today's top LLMs, avoiding the slowdowns of other diffusion models.
The result? It matches the quality of top autoregressive models while delivering up to 3x faster inference on complex reasoning tasks and up to 10x faster on simpler text.
Paper
GitHub
Model.
wedlm.github.io
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference
Project landing page for WeDLM
🔥6👏4❤3
Great paper on Agentic Memory.
LLM agents need both long-term and short-term memory to handle complex tasks.
However, the default approach today treats these as separate components, each with its own heuristics, controllers, and optimization strategies.
But memory isn't two independent systems. It's one cognitive process that decides what to store, retrieve, summarize, and forget.
This new research introduces AgeMem, a unified framework that integrates long-term and short-term memory management directly into the agent's policy through tool-based actions.
Instead of relying on trigger-based rules or auxiliary memory managers, the agent learns when and how to invoke memory operations: ADD, UPDATE, DELETE for long-term storage, and RETRIEVE, SUMMARY, FILTER for context management.
It uses a three-stage progressive RL strategy. First, the model learns long-term memory storage. Then it masters short-term context management. Finally, it coordinates both under full task settings.
To handle the fragmented experiences from memory operations, they design a step-wise GRPO (Group Relative Policy Optimization) that transforms cross-stage dependencies into learnable signals.
The results across five long-horizon benchmarks:
1. On Qwen2.5-7B, AgeMem achieves 41.96 average score compared to 37.14 for Mem0, a 13% improvement.
2. On Qwen3-4B, the gap widens: 54.31 vs 44.70. Adding long-term memory alone provides +10-14% gains.
3. Adding RL training adds another +6%.
4. The full unified system with both memory types achieves up to +21.7% improvement over no-memory baselines.
The unified memory management through learnable tool-based actions outperforms fragmented heuristic pipelines, enabling agents to adaptively decide what to remember and forget based on task demands.
LLM agents need both long-term and short-term memory to handle complex tasks.
However, the default approach today treats these as separate components, each with its own heuristics, controllers, and optimization strategies.
But memory isn't two independent systems. It's one cognitive process that decides what to store, retrieve, summarize, and forget.
This new research introduces AgeMem, a unified framework that integrates long-term and short-term memory management directly into the agent's policy through tool-based actions.
Instead of relying on trigger-based rules or auxiliary memory managers, the agent learns when and how to invoke memory operations: ADD, UPDATE, DELETE for long-term storage, and RETRIEVE, SUMMARY, FILTER for context management.
It uses a three-stage progressive RL strategy. First, the model learns long-term memory storage. Then it masters short-term context management. Finally, it coordinates both under full task settings.
To handle the fragmented experiences from memory operations, they design a step-wise GRPO (Group Relative Policy Optimization) that transforms cross-stage dependencies into learnable signals.
The results across five long-horizon benchmarks:
1. On Qwen2.5-7B, AgeMem achieves 41.96 average score compared to 37.14 for Mem0, a 13% improvement.
2. On Qwen3-4B, the gap widens: 54.31 vs 44.70. Adding long-term memory alone provides +10-14% gains.
3. Adding RL training adds another +6%.
4. The full unified system with both memory types achieves up to +21.7% improvement over no-memory baselines.
The unified memory management through learnable tool-based actions outperforms fragmented heuristic pipelines, enabling agents to adaptively decide what to remember and forget based on task demands.
arXiv.org
Agentic Memory: Learning Unified Long-Term and Short-Term Memory...
Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows, making effective memory management critical. Existing methods typically...
❤7👍5👏2
New paper from Google, proving a novel theorem in algebraic geometry with an internal math-specialized version of Gemini.
This was a collaboration between Google DeepMind (Professor Freddie Manners and Blueshift team) and Professors Jim Bryan, Balazs Elek, and Ravi Vakil.
Coauthor Professor Ravi Vakil, president of the American Mathematical Society, said that Gemini’s “proof was rigorous, correct, and elegant... the kind of insight I would have been proud to produce myself.”
This was a collaboration between Google DeepMind (Professor Freddie Manners and Blueshift team) and Professors Jim Bryan, Balazs Elek, and Ravi Vakil.
Coauthor Professor Ravi Vakil, president of the American Mathematical Society, said that Gemini’s “proof was rigorous, correct, and elegant... the kind of insight I would have been proud to produce myself.”
arXiv.org
The motivic class of the space of genus $0$ maps to the flag variety
Let $\operatorname{Fl}_{n+1}$ be the variety of complete flags in $\mathbb{A}^{n+1}$ and let $Ω^{2}_β(\operatorname{Fl}_{n+1})$ be the space of based maps $f:\mathbb{P}^{1}\to...
👍2🔥2👏2
OMG! 1 billion cells. Illumina introduced the Billion Cell Atlas, creating the most comprehensive map of human disease biology — and unlocking unparalleled speed and scale in AI for drug discovery.
The Atlas will help researchers, including founding participants AstraZeneca, Merck, and Eli Lilly study the effect of switching on and off all 20,000 genes in cells linked to diseases that have been historically difficult to decode.
The Atlas will help researchers, including founding participants AstraZeneca, Merck, and Eli Lilly study the effect of switching on and off all 20,000 genes in cells linked to diseases that have been historically difficult to decode.
Illumina
Illumina introduces Billion Cell Atlas to accelerate AI and drug discovery
❤1🔥1
Sakana AI introduced DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.
Sakana found that if you simply delete them after pretraining and recalibrate for < 1% of the original budget, you unlock massive context windows.
Paper
Code
Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.
Sakana found that if you simply delete them after pretraining and recalibrate for < 1% of the original budget, you unlock massive context windows.
Paper
Code
arXiv.org
Extending the Context of Pretrained LLMs by Dropping Their...
So far, expensive finetuning beyond the pretraining sequence length has been a requirement for effectively extending the context of language models (LM). In this work, we break this key bottleneck...
❤2🔥2👏2
Agent Skills are now available in Google Antigravity
Skills are an open standard to extend what your agent can do. Whether it's project-specific workflows or global utilities, you can now package knowledge into reusable skills.
Skills are an open standard to extend what your agent can do. Whether it's project-specific workflows or global utilities, you can now package knowledge into reusable skills.
Google Antigravity
Google Antigravity - Build the new way
🔥5👏3👍2
How can we use new neuroscience insights to build adaptive AI agents and leverage the many foundation models (which are much like different brain areas)? Check out paper.
🔥5❤2🥰2
Anthropic rolling out MCP Tool Search for Claude Code.
As MCP has grown to become a more popular protocol and agents have become more capable, that MCP servers may have up to 50+ tools and take up a large amount of context.
Tool Search allows Claude Code to dynamically load tools into context when MCP tools would otherwise take up a lot of context.
How it works:
- Claude Code detects when your MCP tool descriptions would use more than 10% of context
- When triggered, tools are loaded via search instead of preloaded Otherwise, MCP tools work exactly as before.
This resolves one of our most-requested features on GitHub: lazy loading for MCP servers.
Users were documenting setups with 7+ servers consuming 67k+ tokens. If you're making a MCP server Things are mostly the same, but the "server instructions" field becomes more useful with tool search enabled. It helps Claude know when to search for your tools, similar to skills If you're making a MCP client highly suggest implementing the ToolSearchTool, you can find the docs here.
Anthropic implemented it with a custom search function to make it work for Claude Code.
As MCP has grown to become a more popular protocol and agents have become more capable, that MCP servers may have up to 50+ tools and take up a large amount of context.
Tool Search allows Claude Code to dynamically load tools into context when MCP tools would otherwise take up a lot of context.
How it works:
- Claude Code detects when your MCP tool descriptions would use more than 10% of context
- When triggered, tools are loaded via search instead of preloaded Otherwise, MCP tools work exactly as before.
This resolves one of our most-requested features on GitHub: lazy loading for MCP servers.
Users were documenting setups with 7+ servers consuming 67k+ tokens. If you're making a MCP server Things are mostly the same, but the "server instructions" field becomes more useful with tool search enabled. It helps Claude know when to search for your tools, similar to skills If you're making a MCP client highly suggest implementing the ToolSearchTool, you can find the docs here.
Anthropic implemented it with a custom search function to make it work for Claude Code.
GitHub
Feature Request: Lazy Loading for MCP Servers and Tools (95% context reduction possible) · Issue #7336 · anthropics/claude-code
Feature Request: Lazy Loading for MCP Servers and Tools Problem Statement Currently, Claude Code loads all configured MCP servers, tools, and agents at session startup, consuming significant contex...
🔥4🥰3❤2👍2
Anthropic published 4th Economic Index report
This version introduces "economic primitives"—simple and foundational metrics on how AI is used: task complexity, education level, purpose (work, school, personal), AI autonomy, and success rates.
API data shows Claude is 50% successful at tasks of 3.5 hours, and highly reliable on longer tasks on Claude.ai.
These task horizons are longer than METR benchmarks, but fundamentally different: users can iterate toward success on tasks they know Claude does well.
Countries at different stages of economic development use Claude quite differently.
As GDP per capita increases, people use it more for work or personal use; as it decreases, they’re more likely to use AI for coursework.
Because Claude tends to better cover higher-skill tasks, if those get automated, workers may be left with more routine work—a “deskilling” effect.
However, this assumes that automation shrinks those aspects of the job; Anthropic can't be sure how jobs might evolve.
This version introduces "economic primitives"—simple and foundational metrics on how AI is used: task complexity, education level, purpose (work, school, personal), AI autonomy, and success rates.
API data shows Claude is 50% successful at tasks of 3.5 hours, and highly reliable on longer tasks on Claude.ai.
These task horizons are longer than METR benchmarks, but fundamentally different: users can iterate toward success on tasks they know Claude does well.
Countries at different stages of economic development use Claude quite differently.
As GDP per capita increases, people use it more for work or personal use; as it decreases, they’re more likely to use AI for coursework.
Because Claude tends to better cover higher-skill tasks, if those get automated, workers may be left with more routine work—a “deskilling” effect.
However, this assumes that automation shrinks those aspects of the job; Anthropic can't be sure how jobs might evolve.
Anthropic
Anthropic Economic Index: New building blocks for understanding AI use
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
❤6🔥2👏2
Bytedance dropped a protein folding model better than Google's AlphaFold 3
SeedFold builds on top of AlphaFold3 and gets SOTA on FoldBench.
You can actually play with it and vibecode a 3D protein viewer.
The three main techniques they used are:
— 4xing the width of the Pairformer architecture of AF3
— More efficient linear triangular attention mechanism
— Distilling a 26.5M dataset from AF2 to increase training data
AlphaFold didn't directly participate in the latest CASP16 (2024), but most models that did well like Yang Lab (Shandong U), MULTICOM (UMissouri), Kiharalab (Purdue) and kozakovvajda (Stony Brook / Boston) are all based off AlphaFold3.
SeedFold has a decent chance of outperforming them and taking at least top 3 in CASP17 (2026) in various categories.
SeedFold builds on top of AlphaFold3 and gets SOTA on FoldBench.
You can actually play with it and vibecode a 3D protein viewer.
The three main techniques they used are:
— 4xing the width of the Pairformer architecture of AF3
— More efficient linear triangular attention mechanism
— Distilling a 26.5M dataset from AF2 to increase training data
AlphaFold didn't directly participate in the latest CASP16 (2024), but most models that did well like Yang Lab (Shandong U), MULTICOM (UMissouri), Kiharalab (Purdue) and kozakovvajda (Stony Brook / Boston) are all based off AlphaFold3.
SeedFold has a decent chance of outperforming them and taking at least top 3 in CASP17 (2026) in various categories.
alphaXiv
SeedFold: Scaling Biomolecular Structure Prediction
View recent discussion. Abstract: Highly accurate biomolecular structure prediction is a key component of developing biomolecular foundation models, and one of the most critical aspects of building foundation models is identifying the recipes for scaling…
❤5👍2🔥2
This new research introduces UniversalRAG, a framework that retrieves and integrates knowledge from heterogeneous sources across diverse modalities and granularities.
Real-world queries vary widely in what knowledge they need. A universal RAG framework that dynamically routes to the right modality and granularity serves diverse information needs that no single-corpus approach can address.
Instead of forcing everything into one embedding space, UniversalRAG uses modality-aware routing. A router dynamically predicts which modality-specific corpus best matches the query, then performs targeted retrieval within it. This sidesteps the modality gap entirely by avoiding cross-modal comparisons.
Beyond modality, the framework also handles granularity. Complex analytical questions may need full documents or complete videos. Simple factoid questions are better served with paragraphs or short clips. UniversalRAG organizes each modality into multiple granularity levels: paragraphs and documents for text, clips and full videos for video, plus tables and images.
The router can be trained or training-free. The trained version uses inductive biases from existing benchmarks. The training-free version prompts frontier models like Gemini to predict the best modality-granularity pairs directly.
Validation across 10 benchmarks spanning text, images, tables, and videos shows UniversalRAG outperforms both unimodal RAG baselines and unified embedding approaches by large margins on average.
Real-world queries vary widely in what knowledge they need. A universal RAG framework that dynamically routes to the right modality and granularity serves diverse information needs that no single-corpus approach can address.
Instead of forcing everything into one embedding space, UniversalRAG uses modality-aware routing. A router dynamically predicts which modality-specific corpus best matches the query, then performs targeted retrieval within it. This sidesteps the modality gap entirely by avoiding cross-modal comparisons.
Beyond modality, the framework also handles granularity. Complex analytical questions may need full documents or complete videos. Simple factoid questions are better served with paragraphs or short clips. UniversalRAG organizes each modality into multiple granularity levels: paragraphs and documents for text, clips and full videos for video, plus tables and images.
The router can be trained or training-free. The trained version uses inductive biases from existing benchmarks. The training-free version prompts frontier models like Gemini to predict the best modality-granularity pairs directly.
Validation across 10 benchmarks spanning text, images, tables, and videos shows UniversalRAG outperforms both unimodal RAG baselines and unified embedding approaches by large margins on average.
arXiv.org
UniversalRAG: Retrieval-Augmented Generation over Corpora of...
Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing...
❤4🔥4👍2
Google announced a deep learning approach that demonstrates the viability of smartwatches for estimation of walking metrics like gait speed and step length.
Wrist-worn devices can be as accurate as smartphones for continuous health tracking.
Wrist-worn devices can be as accurate as smartphones for continuous health tracking.
Google Research
Unlocking health insights: Estimating advanced walking metrics with smartwatches
We verified that smartwatches serve as a highly reliable platform for estimating spatio-temporal gait metrics through a large-scale validation study.
❤5🔥2💯2
Google's Titans architecture brings adaptive long-term memory to language models
Titans introduces a deep neural network (MLP) as a long-term memory module, separate from the main model.
This memory:
1. Updates its weights when encountering "surprising" information — tokens that deviate significantly from what the memory already encodes
2. Ignores routine, predictable tokens to maintain speed
3. Uses momentum to capture related context and adaptive forgetting to manage capacity
The "surprise metric" mirrors how human memory works: we forget the routine but retain the unexpected.
Why it matters?
Standard transformers scale quadratically with context length. Linear RNNs and state-space models (like Mamba) scale efficiently but compress context into fixed-size states, losing information.
Titans combines both approaches:
- Attention handles precise short-term context
- The neural memory module compresses and retrieves long-range information
Inference cost stays linear.
Results
On the BABILong benchmark (reasoning across extremely long documents), Titans outperforms GPT-4 despite having far fewer parameters. The architecture scales effectively beyond 2 million tokens while maintaining stable accuracy.
Google also introduced MIRAS — a theoretical framework showing that transformers, RNNs, and SSMs are all variants of associative memory systems.
This opens the door to exploring non-Euclidean optimization objectives beyond standard MSE.
Potential applications: full-document analysis, genomic sequences, long-session agents, continuous context without chunking.
Titans introduces a deep neural network (MLP) as a long-term memory module, separate from the main model.
This memory:
1. Updates its weights when encountering "surprising" information — tokens that deviate significantly from what the memory already encodes
2. Ignores routine, predictable tokens to maintain speed
3. Uses momentum to capture related context and adaptive forgetting to manage capacity
The "surprise metric" mirrors how human memory works: we forget the routine but retain the unexpected.
Why it matters?
Standard transformers scale quadratically with context length. Linear RNNs and state-space models (like Mamba) scale efficiently but compress context into fixed-size states, losing information.
Titans combines both approaches:
- Attention handles precise short-term context
- The neural memory module compresses and retrieves long-range information
Inference cost stays linear.
Results
On the BABILong benchmark (reasoning across extremely long documents), Titans outperforms GPT-4 despite having far fewer parameters. The architecture scales effectively beyond 2 million tokens while maintaining stable accuracy.
Google also introduced MIRAS — a theoretical framework showing that transformers, RNNs, and SSMs are all variants of associative memory systems.
This opens the door to exploring non-Euclidean optimization objectives beyond standard MSE.
Potential applications: full-document analysis, genomic sequences, long-session agents, continuous context without chunking.
Google Research
Titans + MIRAS: Helping AI have long-term memory
We introduce the Titans architecture and the MIRAS framework, which allow AI models to work much faster and handle massive contexts by updating their core memory while it's actively running.
🔥4❤3👍2👨💻1
MIT dropped a technique that makes ChatGPT reason like a team of experts instead of one overconfident intern.
It’s called “Recursive Meta-Cognition” and it outperforms standard prompts by 110%.
The problem with how you prompt AI:
- You ask one question. AI gives one answer. If it’s wrong, you never know.
- It’s like asking a random person on the street for medical advice and just… trusting them.
- No second opinion. No fact-checking. No confidence level.
The secret sauce is the confidence scoring.Every reasoning path gets a score from 0.0 to 1.0.Paths below 0.4? Rejected. Paths above 0.8? Trusted.
The multi-perspective check catches errors before they reach you. Most AI answers fail at least one of these.This framework catches them.
Best part: it doesn’t overthink simple questions.The system matches complexity to the problem. No wasted cycles.
It’s called “Recursive Meta-Cognition” and it outperforms standard prompts by 110%.
The problem with how you prompt AI:
- You ask one question. AI gives one answer. If it’s wrong, you never know.
- It’s like asking a random person on the street for medical advice and just… trusting them.
- No second opinion. No fact-checking. No confidence level.
The secret sauce is the confidence scoring.Every reasoning path gets a score from 0.0 to 1.0.Paths below 0.4? Rejected. Paths above 0.8? Trusted.
The multi-perspective check catches errors before they reach you. Most AI answers fail at least one of these.This framework catches them.
Best part: it doesn’t overthink simple questions.The system matches complexity to the problem. No wasted cycles.
❤4🔥4👏4
Another Chinese model fully trained on domestic chips, released by China Telecom
TeleChat3-36B-Thinking:
- Native support for the Ascend + MindSpore ecosystem
- Inspired by DeepSeek’s architecture design, bringing training stability and efficiency gains.
TeleChat3-36B-Thinking:
- Native support for the Ascend + MindSpore ecosystem
- Inspired by DeepSeek’s architecture design, bringing training stability and efficiency gains.
🔥5👏4💯4