All about AI, Web 3.0, BCI
3.26K subscribers
727 photos
26 videos
161 files
3.11K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
#DeepSeek introduced NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference

Core components of NSA:

1. Dynamic hierarchical sparse strategy
2. Coarse-grained token compression
3. Fine-grained token selection

With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning.
👍6👀1
LangMem SDK: long-term memory for AI agents

Different types of memory enable different agent capabilities.

LangChain released an open-source library that implements some of these patterns so you can build agents that learn from interactions.

The LangMem SDK makes it easier to:

- Extract and manage semantic knowledge from conversations
- Auto-optimize prompt instructions and rules from interactions

Can be used with or without LangGraph.

Docs.
Microsoft presented Magma: A Foundation Model for Multimodal AI Agents

- SotA on UI navigation and robotic manipulation tasks
- Pretrained on a large dataset annotated with Set-of-Mark (SoM) for action grounding and Trace-of-Mark (ToM) for action planning.
Kimi introduced MoBA: Mixture of Block Attention for Long-Context LLMs

This innovative approach revolutionizes long-context processing in LLMs by combining the power of Mixture of Experts (MoE) with sparse attention.

MoBA achieves efficiency without sacrificing performance, making long-context tasks more scalable than ever.

Key features of MoBA:
1. Trainable block sparse attention: Capable of continued training from any current full attention model
2. Parameter-less gating mechanism: Seamlessly switches between full & sparse attention
3. Production-proven quality at kimi.ai , 6.5x speedup at 1M input
🔥5
Baichuan-M1

- Opensources SotA medical LLM (Baichuan-M1-14B)
- Trained from scratch on 20T tokens with a dedicated focus on enhancing medical capabilities
Google's AI Co-Scientist: A New Era of Scientific Discovery

Google unveiled its latest innovation in artificial intelligence - the AI co-scientist. Built on Gemini 2.0, this sophisticated multi-agent system represents a significant leap forward in scientific research and discovery.

Unlike traditional AI coding assistants like GitHub Copilot, the AI co-scientist functions as a genuine research partner. This system can:

1. Generate Novel Scientific Hypotheses.

The system doesn't just analyze existing data - it proposes entirely new research directions. For instance, it can suggest innovative applications for existing drugs or identify previously unknown biological mechanisms.

2. Design Experimental Protocols. Going beyond theoretical proposals, the system can outline detailed experimental procedures to test its hypotheses, making it immediately practical for laboratory implementation.

3. Collaborate with Human Scientists.
The system engages in meaningful scientific dialogue, incorporating feedback and iteratively improving its proposals through natural language interaction with researchers.

The system has already demonstrated impressive results in several critical areas:

1. Cancer Treatment Research.
Successfully identified new applications for existing drugs in treating acute myeloid leukemia, with laboratory validation confirming the effectiveness of its proposals.

2. Liver Disease Treatment. Discovered novel epigenetic targets for liver fibrosis treatment, validated through tests on human liver organoids.

3. Antimicrobial Resistance. Independently proposed and correctly identified complex mechanisms of bacterial gene transfer, matching discoveries made through traditional laboratory research.

The AI co-scientist operates through a coalition of specialized agents:
- Generation
- Reflection
- Ranking
- Evolution
- Proximity
- Meta-review


Google is launching a Trusted Tester Program, opening access to research organizations worldwide. This initiative aims to further validate and expand the system's capabilities across various scientific domains.
5
Breakthrough in Robot Design: Universal Controllers Transform How We Build Robots?

Northwestern University researchers have made a significant breakthrough in robotics design, introducing a method that could revolutionize how we create and evolve robots.

Their paper "Accelerated co-design of robots through morphological pretraining" presents a novel approach that solves a decades-old challenge in robotics.

Code is coming soon.

And here are more robots

Key Innovations:

1. Universal Controller

- Developed a single controller that can work with multiple robot body types
- Pre-trained on millions of different robot morphologies
- Uses gradient-based optimization through differentiable simulation
- Can immediately adapt to new robot designs without extensive retraining

2. Zero-Shot Evolution

- Allows rapid testing of new robot body designs
- Enables immediate evaluation of design changes
- Supports successful recombination of robot parts
- Dramatically speeds up the design process

3. Diversity Maintenance

- Identified and solved "diversity collapse" - a previously unknown problem in robot co-design
- Developed methods to maintain morphological diversity while improving performance
- Enabled successful crossover between different robot designs

Technical Details:
- Controllers are trained on over 10 million distinct robot morphologies
- Uses differentiable simulation for gradient-based optimization
- Supports complex 3D environments with varying terrains
- Enables robots to perform adaptive behaviors like phototaxis (movement toward light)


Future Implications:

- Could dramatically accelerate robot design and development
- Opens new possibilities for self-reconfigurable robots
- Provides a framework for more complex multi-material robots
- May help bridge the simulation-to-reality gap in robotics
👍7
Just now Arc institute and Nvidia released Evo 2, the largest AI model for biology. It is fully open-source.

Evo 2 can predict which mutations in a gene are likely to be pathogenic, or even design entire eukaryotic genomes.

Preprint.
GitHub
Nvidia bionemo.
Evo designer.
Evo Mechanistic Interpretability Visualizer

Early model Evo.
🆒5
HuggingFace released the "Ultra-Scale Playbook"

A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels, how and why overlap compute & communication – all scaling bottlenecks and tools introduced with motivation, theory, interactive plots from our 4000+ scaling experiments and even NotebookLM podcasters to tag along with you.

- How was DeepSeek trained for $5M only?
- Why did Mistral trained an MoE?
- Why is PyTorch native Data Parallelism implementation so complex under the hood?
- What are all the parallelism techniques and why were they invented?
- Should I use ZeRO-3 or Pipeline Parallelism when scaling and what's the story behind both techniques?
- What is this Context Parallelism that Meta used to train Llama 3? Is it different from Sequence Parallelism?
- What is FP8? how does it compares to BF16?

The largest factor for democratizing AI will always be teaching everyone how to build AI and in particular how to create, train and fine-tune high performance models. In other word making accessible to everybody the techniques that power all recent large language models and efficient training is possibly one of the most essential of them.
🔥11
Sakana AI introduced the AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in PyTorch.

System is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production.

Sakana AI believe that fundamentally, AI systems can and should be as resource-efficient as the human brain, and that the best path to achieve this efficiency is to use AI to make AI more efficient!

Team also released a dataset of over 17,000 verified CUDA kernels produced by The AI CUDA Engineer.

Kernel archive webpage
2
Microsoft unveiled Muse, an AI that can generate minutes of unique game sequences from a single sec of gameplay frames

It's the first World and Human Action Model that predicts 3D environments and actions for playable games.

The scale of training is mind-blowing:

— Trained on 1B+ gameplay images
— Used 7+ YEARS of continuous gameplay data
— Learned from real Xbox multiplayer matches

From a single second of gameplay + controller inputs, Muse can create multiple unique, playable sequences that follow actual game physics, mechanics, and rules.

The version shown in research was trained on just a single game (Bleeding Edge).
Wow, #DeepSeek announced Day 0: Warming up for #OpenSourceWeek

Starting next week, they'll be open-sourcing 5 repos, sharing sincere progress with full transparency.

These humble building blocks in their online service have been documented, deployed and battle-tested in production.

Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
🔥11
Meta presented MLGym: A New Framework and Benchmark for Advancing AI Research Agents

- The first Gym environment for ML tasks
- 13 diverse and open-ended AI research tasks from diverse domains

GitHub
Paper
👍11
Google dropped SigLIP 2 is the most powerful image-text encoder

SigLIP 2 is new version of SigLIP: best open-source multimodal encoders by Google, now on HF.

What's new?
> Improvements from new masked loss, self-distillation and dense features (better localization)
> Dynamic resolution with Naflex (better OCR).

U can use it to do:
> image-to-image search
> text-to-image-search
> image-to-text search
> image classification with open-ended classes
> train vision language models

SigLIP 2 comes in 3 sizes (base, large, giant), three patch sizes (14, 16, 32) and shape-optimized variants with Naflex.
As usual, supported by transformers from get go.

Models.
🔥10
Chinese researchers introduced BEAMDOJO

It's a new reinforcement learning framework that teaches robots how to walk on uneven surfaces like stepping stones and balancing beams.

Paper.
3
#DeepSeek introduced FlashMLA - efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.

- BF16 support
- Paged KV cache (block size 64).
- 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
Anthropic to release Claude Sonnet 3.7 on Feb 26

It’s expected to have step-by-step thinking, never before seen coding capabilities and web search.

The best coding model which powers Cursor and Windsurf is about to get a whole lot better.


Claude 3.7 Sonnet is Anthropic's most intelligent model to date and the first Claude model to offer extended thinking - the ability to solve complex problems with careful, step-by-step reasoning.

Anthropic is the first AI lab to introduce a single model where users can balance speed and quality by choosing between standard thinking for near-instant responses or extended thinking or advanced reasoning.

Claude 3.7 Sonnet is state-of-the-art for coding, and delivers advancements in computer use, agentic capabilities, complex reasoning, and content generation. With frontier performance and more control over speed, Claude 3.7 Sonnet is the ideal choice for powering AI agents, especially customer-facing agents, and complex AI workflows.

Supported use cases: RAG or search & retrieval over vast amounts of knowledge, product recommendations, forecasting, targeted marketing, code generation, quality control, parse text from images, agentic computer use, content generation

Model attributes: Reasoning, Text generation, Code generation, Rich text formatting, Agentic computer use
❤‍🔥7
2 AI agents on a phone call realize they’re both AI and switch to a superior audio signal ggwave

The project mentioned in the demo uses the ggwave library created by developers Anton and Boris.

There's also a related project called "gibberlink" by PennyroyalTea that recently won first place in a hackathon competition.

How It Works?

When the AIs detect they're communicating with another AI rather than a human, they transition to using the ggwave library - a specialized audio signal that allows for much faster and more efficient data transmission through sound waves.

This technology opens up numerous possibilities:

1. Devices can share information through audio channels without requiring internet connectivity.

2. When AI assistants need to communicate with each other, they can do so at vastly improved speeds.

3. Encrypted data can be transmitted through audio in ways less susceptible to conventional interception methods.

4. Speakers, TVs, and other devices can communicate via sound without additional infrastructure.

5. Robots can coordinate activities through audio signals.

6. environments where radio communication is limited or restricted.

7. Systems can exchange supplementary information alongside regular conversation.
🆒65👍1