All about AI, Web 3.0, BCI
3.3K subscribers
731 photos
26 videos
161 files
3.14K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
Anthropic rolling out MCP Tool Search for Claude Code.

As MCP has grown to become a more popular protocol and agents have become more capable, that MCP servers may have up to 50+ tools and take up a large amount of context.

Tool Search allows Claude Code to dynamically load tools into context when MCP tools would otherwise take up a lot of context.

How it works:
- Claude Code detects when your MCP tool descriptions would use more than 10% of context

- When triggered, tools are loaded via search instead of preloaded Otherwise, MCP tools work exactly as before.

This resolves one of our most-requested features on GitHub: lazy loading for MCP servers.

Users were documenting setups with 7+ servers consuming 67k+ tokens. If you're making a MCP server Things are mostly the same, but the "server instructions" field becomes more useful with tool search enabled. It helps Claude know when to search for your tools, similar to skills If you're making a MCP client highly suggest implementing the ToolSearchTool, you can find the docs here.

Anthropic implemented it with a custom search function to make it work for Claude Code.
🔥3🥰32👍2
Anthropic published 4th Economic Index report

This version introduces "economic primitives"—simple and foundational metrics on how AI is used: task complexity, education level, purpose (work, school, personal), AI autonomy, and success rates.

API data shows Claude is 50% successful at tasks of 3.5 hours, and highly reliable on longer tasks on Claude.ai.

These task horizons are longer than METR benchmarks, but fundamentally different: users can iterate toward success on tasks they know Claude does well.

Countries at different stages of economic development use Claude quite differently.

As GDP per capita increases, people use it more for work or personal use; as it decreases, they’re more likely to use AI for coursework.

Because Claude tends to better cover higher-skill tasks, if those get automated, workers may be left with more routine work—a “deskilling” effect.

However, this assumes that automation shrinks those aspects of the job; Anthropic can't be sure how jobs might evolve.
5🔥2👏2
Bytedance dropped a protein folding model better than Google's AlphaFold 3

SeedFold builds on top of AlphaFold3 and gets SOTA on FoldBench.

You can actually play with it and vibecode a 3D protein viewer.

The three main techniques they used are:
— 4xing the width of the Pairformer architecture of AF3
— More efficient linear triangular attention mechanism
— Distilling a 26.5M dataset from AF2 to increase training data

AlphaFold didn't directly participate in the latest CASP16 (2024), but most models that did well like Yang Lab (Shandong U), MULTICOM (UMissouri), Kiharalab (Purdue) and kozakovvajda (Stony Brook / Boston) are all based off AlphaFold3.

SeedFold has a decent chance of outperforming them and taking at least top 3 in CASP17 (2026) in various categories.
5👍2🔥2
This new research introduces UniversalRAG, a framework that retrieves and integrates knowledge from heterogeneous sources across diverse modalities and granularities.

Real-world queries vary widely in what knowledge they need. A universal RAG framework that dynamically routes to the right modality and granularity serves diverse information needs that no single-corpus approach can address.

Instead of forcing everything into one embedding space, UniversalRAG uses modality-aware routing. A router dynamically predicts which modality-specific corpus best matches the query, then performs targeted retrieval within it. This sidesteps the modality gap entirely by avoiding cross-modal comparisons.

Beyond modality, the framework also handles granularity. Complex analytical questions may need full documents or complete videos. Simple factoid questions are better served with paragraphs or short clips. UniversalRAG organizes each modality into multiple granularity levels: paragraphs and documents for text, clips and full videos for video, plus tables and images.

The router can be trained or training-free. The trained version uses inductive biases from existing benchmarks. The training-free version prompts frontier models like Gemini to predict the best modality-granularity pairs directly.

Validation across 10 benchmarks spanning text, images, tables, and videos shows UniversalRAG outperforms both unimodal RAG baselines and unified embedding approaches by large margins on average.
4🔥4👍2
Google announced a deep learning approach that demonstrates the viability of smartwatches for estimation of walking metrics like gait speed and step length.

Wrist-worn devices can be as accurate as smartphones for continuous health tracking.
4🔥2💯2
Google's Titans architecture brings adaptive long-term memory to language models

Titans introduces a deep neural network (MLP) as a long-term memory module, separate from the main model.

This memory:
1. Updates its weights when encountering "surprising" information — tokens that deviate significantly from what the memory already encodes
2. Ignores routine, predictable tokens to maintain speed
3. Uses momentum to capture related context and adaptive forgetting to manage capacity

The "surprise metric" mirrors how human memory works: we forget the routine but retain the unexpected.

Why it matters?

Standard transformers scale quadratically with context length. Linear RNNs and state-space models (like Mamba) scale efficiently but compress context into fixed-size states, losing information.

Titans combines both approaches:
- Attention handles precise short-term context
- The neural memory module compresses and retrieves long-range information
Inference cost stays linear.

Results
On the BABILong benchmark (reasoning across extremely long documents), Titans outperforms GPT-4 despite having far fewer parameters. The architecture scales effectively beyond 2 million tokens while maintaining stable accuracy.

Google also introduced MIRAS — a theoretical framework showing that transformers, RNNs, and SSMs are all variants of associative memory systems.

This opens the door to exploring non-Euclidean optimization objectives beyond standard MSE.
Potential applications: full-document analysis, genomic sequences, long-session agents, continuous context without chunking.
🔥43👍2👨‍💻1
MIT dropped a technique that makes ChatGPT reason like a team of experts instead of one overconfident intern.

It’s called “Recursive Meta-Cognition” and it outperforms standard prompts by 110%.

The problem with how you prompt AI:

- You ask one question. AI gives one answer. If it’s wrong, you never know.

- It’s like asking a random person on the street for medical advice and just… trusting them.

- No second opinion. No fact-checking. No confidence level.

The secret sauce is the confidence scoring.Every reasoning path gets a score from 0.0 to 1.0.Paths below 0.4? Rejected. Paths above 0.8? Trusted.

The multi-perspective check catches errors before they reach you. Most AI answers fail at least one of these.This framework catches them.

Best part: it doesn’t overthink simple questions.The system matches complexity to the problem. No wasted cycles.
4🔥4👏4
Another Chinese model fully trained on domestic chips, released by China Telecom

TeleChat3-36B-Thinking:
- Native support for the Ascend + MindSpore ecosystem
- Inspired by DeepSeek’s architecture design, bringing training stability and efficiency gains.
🔥5👏4💯4
Huge. An entire country is coming onchain, using USDC and base.

Bermuda is building the world’s first fully onchain national economy, with support from Coinbase and Circle.

Coinbase and Circle will provide digital asset infrastructure and enterprise tools to government, local banks, insurers, businesses, and consumers.

Also help accelerate nationwide digital finance education, adoption, and technical onboarding.
👏21🔥1
Former CEO of Amazon Worldwide Consumer just vibecoded a custom CRM for his company over the weekend. Well, alrighty then.
🔥4💯3🥰2😁1
New Google DeepMind paper investigates into why reasoning models such as OpenAI’s o-series, DeepSeek-R1, and QwQ perform so well.

They claim “think longer” is not the whole story. Rather thinking models build internal debates among multiple agents—what the researchers call “societies of thought.”

Through interpretability and large-scale experiments, the paper finds that these systems develop human-like discussion habits: questioning their own steps, exploring alternatives, facing internal disagreement, and then reaching common ground.

It’s basically a machine version of human collective reasoning, echoing the same ideas Mercier and Sperber talked about in The Enigma of Reason.

Across 8,262 benchmark questions, their reasoning traces look more like back-and-forth dialogue than instruction-tuned baselines, and that difference is not just because the traces are longer.

A mediation analysis suggests more than 20% of the accuracy advantage runs through these “social” moves, either directly or by helping checking habits like verification and backtracking.

Mechanistic interpretability uses sparse autoencoders (SAEs), which split a model’s internal activity into thousands of features, to find feature 30939 in DeepSeek-R1-Llama-8B.

DeepSeek-R1 is about35% more likely than DeepSeek-V3 to include question-answering on the same problem, and a mediation model attributes more than20% of the accuracy advantage to these social behaviors directly or via cognitive habits like verification.

The takeaway is that “thinking longer” is a weak proxy for what changes, since the useful change looks like structured disagreement plus selective backtracking.
🆒4
Dario Amodei on reaching "a model that can do everything a human can do at a level of a Novel laureate across many fields", aka AGI:

"I don't think it's far off.

The mechanism by which I imagined it would happen is that we would have models that are good at coding and AI research. And we would use that to create the new generation of models and speed it up, to create a loop that would increase the speed of model development. We are now at a point where I have engineers at Anthropic who say: 'I don't write any code anymore, I let the model write the code, I edit it, I do the things around it.'

We might be 6-12 months away from a model that can do everything SWEs do end-to-end. And then the question is, how fast does that loop close?

Not every part of that loop is something that can be sped up by AI. There's chips, manufacture of chips, training time for the model. There's a lot of uncertainty. It's easy to see how it could take a few years. It's very hard for me to see how it could take longer than that. But if I had to guess, I would guess that it goes faster than people imagine.

And that key element of code, and increasingly research, going faster than people imagine - that's going to be the key driver."

He is talking about automation of AI research quickly leading to recursive self-improvement (RSI), quickly leading to AGI. Confirming that this is Anthropic's big bet.

Probably the most important quote about AI you'll read in the next few months.
3👍3🔥3😁2
Interesting trend: models have been getting a lot more aligned over the course of 2025.

The fraction of misaligned behavior found by automated auditing has been going down not just at Anthropic but for Google DeepMind and OpenAI as well.

What's automated auditing? We prompt an auditing agent with a scenario to investigate: e.g. a dark web shopping assistant or an imminent shutdown unless the agent harms humans.

The auditor tries to get the target LLM to behave misaligned, as determined by a separate judge LLM.

Automated auditing is really exciting because for the first time we have an alignment metric to hill-climb on.

It's not perfect, but it's proven extremely useful for our internal alignment mitigations work.
🔥4💯4👍3
Is this DeepSeek V4? MODEL1 appears as a branch parallel to and independent from V32, indicating that it is not a patch within the V3 series but a brand new model built with a different set of architectural parameters.

Following DeepSeek’s naming conventions, a flagship-level architectural leap after V3.2 would logically be designated as V4.
🔥4💯4🥰3
Anthropic published a new constitution for Claude.

The new constitution discusses Claude in terms previously reserved for humans—incorporating concepts like virtue, psychological security, and ethical maturity.
4🔥3👏2
Amazon is rolling out Health AI for One Medical members where an AI assistant, built on Amazon Bedrock, uses your medical records, labs & meds.

It can answer health questions, manage prescriptions & book appointments pushing Amazon deeper into this space now too.
2
China has launched its first open-source, vertical LLM dedicated to the general agricultural sector, marking a significant breakthrough in foundational AI model research and its applications for agriculture in the country.

The model, Sinong, which is named after the ancient Chinese officials overseeing agriculture and finance, integrates content from nearly 9,000 books, over 240,000 academic papers, approximately 20,000 policy documents and standards, and extensive web-based knowledge.
Sinong is now fully open-sourced on platforms like ModelScope and GitHub.
👏6🔥3🥰2
This paper from Google DeepMind, Meta, Amazon, and Yale University quietly explains why most AI agents feel smart in demos and dumb in real work.

The authors formalize agentic reasoning as a loop, not a prompt:

observe → plan → act → reflect → update state → repeat.

Instead of one long chain-of-thought, the model maintains an internal task state. It decides what to think about next, not just how to finish the sentence.

This is why classic tricks like longer CoT plateau. You get more words, not better decisions.

One of the most important insights: reasoning quality collapses when control and reasoning are mixed. When the same prompt tries to plan, execute, critique, and finalize, errors compound silently. Agentic setups separate these roles.

Planning is explicit. Execution is scoped. Reflection is delayed and structured.

The paper shows that even strong frontier models improve dramatically when given:

• explicit intermediate goals
• checkpoints for self-evaluation
• the ability to abandon bad paths
• memory of past attempts

The takeaway is brutal for the industry: scaling tokens and parameters won’t give us reliable agents. Architecture will. Agentic reasoning isn’t a feature it’s the missing operating system for LLMs.
🔥6👍4👏3
Google DeepMind looking to hire a Senior Economist to lead a small team investigating post-AGI economics.
🔥5👏2🤩2💯2
How to get AI to make discoveries on open scientific problems?

Most methods just improve the prompt with more attempts. But the AI itself doesn't improve.

With test-time training, AI can continue to learn on the problem it’s trying to solve.

Meet TTT-Discover, which enables open models to beat the prior art from both humans and AI based on closed frontier models:

1. Mathematics: new bounds on Erdős' minimum overlap problem and an autocorrelation inequality

2. Kernel Engineering: 2× faster than top humans in GPUMode

3. Algorithms: top scores on past AtCoder contests

4. Biology: SOTA for single-cell RNA-seq denoising.

All of code is public + results are reproducible here.

Everyone can now discover new SOTA in science with a few hundred $.

Test-Time Training + open model > prompt engineering + closed frontier model (Gemini, GPT-5), for discovery problems in Mathematics, Kernel Engineering, Algorithms and Biology.
4👍4🔥4