Sakana AI introduced the AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition
The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in PyTorch.
System is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production.
Sakana AI believe that fundamentally, AI systems can and should be as resource-efficient as the human brain, and that the best path to achieve this efficiency is to use AI to make AI more efficient!
Team also released a dataset of over 17,000 verified CUDA kernels produced by The AI CUDA Engineer.
Kernel archive webpage
The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in PyTorch.
System is also able to produce highly optimized CUDA kernels that are much faster than existing CUDA kernels commonly used in production.
Sakana AI believe that fundamentally, AI systems can and should be as resource-efficient as the human brain, and that the best path to achieve this efficiency is to use AI to make AI more efficient!
Team also released a dataset of over 17,000 verified CUDA kernels produced by The AI CUDA Engineer.
Kernel archive webpage
❤2
Microsoft unveiled Muse, an AI that can generate minutes of unique game sequences from a single sec of gameplay frames
It's the first World and Human Action Model that predicts 3D environments and actions for playable games.
The scale of training is mind-blowing:
— Trained on 1B+ gameplay images
— Used 7+ YEARS of continuous gameplay data
— Learned from real Xbox multiplayer matches
From a single second of gameplay + controller inputs, Muse can create multiple unique, playable sequences that follow actual game physics, mechanics, and rules.
The version shown in research was trained on just a single game (Bleeding Edge).
It's the first World and Human Action Model that predicts 3D environments and actions for playable games.
The scale of training is mind-blowing:
— Trained on 1B+ gameplay images
— Used 7+ YEARS of continuous gameplay data
— Learned from real Xbox multiplayer matches
From a single second of gameplay + controller inputs, Muse can create multiple unique, playable sequences that follow actual game physics, mechanics, and rules.
The version shown in research was trained on just a single game (Bleeding Edge).
Xbox Wire
Empowering Creators and Players With Muse, a Generative AI Model for Gameplay - Xbox Wire
Introducing Muse, a new kind of generative AI model that lets you play and create. From developer ideation to one day supporting game preservation, Muse holds potential to unlock new possibilities.
Wow, #DeepSeek announced Day 0: Warming up for #OpenSourceWeek
Starting next week, they'll be open-sourcing 5 repos, sharing sincere progress with full transparency.
These humble building blocks in their online service have been documented, deployed and battle-tested in production.
Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
Starting next week, they'll be open-sourcing 5 repos, sharing sincere progress with full transparency.
These humble building blocks in their online service have been documented, deployed and battle-tested in production.
Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
🔥11
Meta presented MLGym: A New Framework and Benchmark for Advancing AI Research Agents
- The first Gym environment for ML tasks
- 13 diverse and open-ended AI research tasks from diverse domains
GitHub
Paper
- The first Gym environment for ML tasks
- 13 diverse and open-ended AI research tasks from diverse domains
GitHub
Paper
GitHub
GitHub - facebookresearch/MLGym: MLGym A New Framework and Benchmark for Advancing AI Research Agents
MLGym A New Framework and Benchmark for Advancing AI Research Agents - facebookresearch/MLGym
👍11
Google dropped SigLIP 2 is the most powerful image-text encoder
SigLIP 2 is new version of SigLIP: best open-source multimodal encoders by Google, now on HF.
What's new?
> Improvements from new masked loss, self-distillation and dense features (better localization)
> Dynamic resolution with Naflex (better OCR).
U can use it to do:
> image-to-image search
> text-to-image-search
> image-to-text search
> image classification with open-ended classes
> train vision language models
SigLIP 2 comes in 3 sizes (base, large, giant), three patch sizes (14, 16, 32) and shape-optimized variants with Naflex.
As usual, supported by transformers from get go.
Models.
SigLIP 2 is new version of SigLIP: best open-source multimodal encoders by Google, now on HF.
What's new?
> Improvements from new masked loss, self-distillation and dense features (better localization)
> Dynamic resolution with Naflex (better OCR).
U can use it to do:
> image-to-image search
> text-to-image-search
> image-to-text search
> image classification with open-ended classes
> train vision language models
SigLIP 2 comes in 3 sizes (base, large, giant), three patch sizes (14, 16, 32) and shape-optimized variants with Naflex.
As usual, supported by transformers from get go.
Models.
huggingface.co
SigLIP 2: A better multilingual vision language encoder
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥10
Chinese researchers introduced BEAMDOJO
It's a new reinforcement learning framework that teaches robots how to walk on uneven surfaces like stepping stones and balancing beams.
Paper.
It's a new reinforcement learning framework that teaches robots how to walk on uneven surfaces like stepping stones and balancing beams.
Paper.
why618188.github.io
BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds
Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.
❤3
#DeepSeek introduced FlashMLA - efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
- BF16 support
- Paged KV cache (block size 64).
- 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
- BF16 support
- Paged KV cache (block size 64).
- 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
GitHub
GitHub - deepseek-ai/FlashMLA: FlashMLA: Efficient Multi-head Latent Attention Kernels
FlashMLA: Efficient Multi-head Latent Attention Kernels - deepseek-ai/FlashMLA
Stablecoin issuer Ethena raises $100M to build new chain and launch TradFi-focused token, with backing from Franklin Templeton.
Bloomberg.com
Ethena Crypto Project Raises $100 Million to Fund Finance Foray
Ethena, a crypto project whose dollar-pegged coin has ballooned to become one of the biggest of its kind since its launch a year ago, has raised $100 million to help fund the introduction of a similar token aimed at traditional financial institutions.
❤3🆒3
Anthropic to release Claude Sonnet 3.7 on Feb 26
It’s expected to have step-by-step thinking, never before seen coding capabilities and web search.
The best coding model which powers Cursor and Windsurf is about to get a whole lot better.
Claude 3.7 Sonnet is Anthropic's most intelligent model to date and the first Claude model to offer extended thinking - the ability to solve complex problems with careful, step-by-step reasoning.
Anthropic is the first AI lab to introduce a single model where users can balance speed and quality by choosing between standard thinking for near-instant responses or extended thinking or advanced reasoning.
Claude 3.7 Sonnet is state-of-the-art for coding, and delivers advancements in computer use, agentic capabilities, complex reasoning, and content generation. With frontier performance and more control over speed, Claude 3.7 Sonnet is the ideal choice for powering AI agents, especially customer-facing agents, and complex AI workflows.
Supported use cases: RAG or search & retrieval over vast amounts of knowledge, product recommendations, forecasting, targeted marketing, code generation, quality control, parse text from images, agentic computer use, content generation
Model attributes: Reasoning, Text generation, Code generation, Rich text formatting, Agentic computer use
It’s expected to have step-by-step thinking, never before seen coding capabilities and web search.
The best coding model which powers Cursor and Windsurf is about to get a whole lot better.
Claude 3.7 Sonnet is Anthropic's most intelligent model to date and the first Claude model to offer extended thinking - the ability to solve complex problems with careful, step-by-step reasoning.
Anthropic is the first AI lab to introduce a single model where users can balance speed and quality by choosing between standard thinking for near-instant responses or extended thinking or advanced reasoning.
Claude 3.7 Sonnet is state-of-the-art for coding, and delivers advancements in computer use, agentic capabilities, complex reasoning, and content generation. With frontier performance and more control over speed, Claude 3.7 Sonnet is the ideal choice for powering AI agents, especially customer-facing agents, and complex AI workflows.
Supported use cases: RAG or search & retrieval over vast amounts of knowledge, product recommendations, forecasting, targeted marketing, code generation, quality control, parse text from images, agentic computer use, content generation
Model attributes: Reasoning, Text generation, Code generation, Rich text formatting, Agentic computer use
❤🔥7
2 AI agents on a phone call realize they’re both AI and switch to a superior audio signal ggwave
The project mentioned in the demo uses the ggwave library created by developers Anton and Boris.
There's also a related project called "gibberlink" by PennyroyalTea that recently won first place in a hackathon competition.
How It Works?
When the AIs detect they're communicating with another AI rather than a human, they transition to using the ggwave library - a specialized audio signal that allows for much faster and more efficient data transmission through sound waves.
This technology opens up numerous possibilities:
1. Devices can share information through audio channels without requiring internet connectivity.
2. When AI assistants need to communicate with each other, they can do so at vastly improved speeds.
3. Encrypted data can be transmitted through audio in ways less susceptible to conventional interception methods.
4. Speakers, TVs, and other devices can communicate via sound without additional infrastructure.
5. Robots can coordinate activities through audio signals.
6. environments where radio communication is limited or restricted.
7. Systems can exchange supplementary information alongside regular conversation.
The project mentioned in the demo uses the ggwave library created by developers Anton and Boris.
There's also a related project called "gibberlink" by PennyroyalTea that recently won first place in a hackathon competition.
How It Works?
When the AIs detect they're communicating with another AI rather than a human, they transition to using the ggwave library - a specialized audio signal that allows for much faster and more efficient data transmission through sound waves.
This technology opens up numerous possibilities:
1. Devices can share information through audio channels without requiring internet connectivity.
2. When AI assistants need to communicate with each other, they can do so at vastly improved speeds.
3. Encrypted data can be transmitted through audio in ways less susceptible to conventional interception methods.
4. Speakers, TVs, and other devices can communicate via sound without additional infrastructure.
5. Robots can coordinate activities through audio signals.
6. environments where radio communication is limited or restricted.
7. Systems can exchange supplementary information alongside regular conversation.
GitHub
GitHub - ggerganov/ggwave: Tiny data-over-sound library
Tiny data-over-sound library. Contribute to ggerganov/ggwave development by creating an account on GitHub.
🆒6❤5👍1
All about AI, Web 3.0, BCI
Anthropic to release Claude Sonnet 3.7 on Feb 26 It’s expected to have step-by-step thinking, never before seen coding capabilities and web search. The best coding model which powers Cursor and Windsurf is about to get a whole lot better. Claude 3.7 Sonnet…
Claude 3.7 Sonnet is live right now! https://xn--r1a.website/alwebbci/3041
Telegram
All about AI, Web 3.0, BCI
Anthropic to release Claude Sonnet 3.7 on Feb 26
It’s expected to have step-by-step thinking, never before seen coding capabilities and web search.
The best coding model which powers Cursor and Windsurf is about to get a whole lot better.
Claude 3.7 Sonnet…
It’s expected to have step-by-step thinking, never before seen coding capabilities and web search.
The best coding model which powers Cursor and Windsurf is about to get a whole lot better.
Claude 3.7 Sonnet…
❤5🆒2
#DeepSeek introduced DeepEP - the first open-source EP communication library for MoE model training and inference.
1. Efficient and optimized all-to-all communication
2. Both intranode and internode support with NVLink and RDMA
3. High-throughput kernels for training and inference prefilling
4. Low-latency kernels for inference decoding
5. Native FP8 dispatch support
6. Flexible GPU resource control for computation-communication overlapping.
1. Efficient and optimized all-to-all communication
2. Both intranode and internode support with NVLink and RDMA
3. High-throughput kernels for training and inference prefilling
4. Low-latency kernels for inference decoding
5. Native FP8 dispatch support
6. Flexible GPU resource control for computation-communication overlapping.
GitHub
GitHub - deepseek-ai/DeepEP: DeepEP: an efficient expert-parallel communication library
DeepEP: an efficient expert-parallel communication library - deepseek-ai/DeepEP
Anthropic raised $3.5 billion at a $61.5 billion valuation instead of $2 billion it expected.
Reuters
AI startup Anthropic close to $3.5 billion fundraise, sources say
AI startup Anthropic is seeking $3.5 billion in a funding round that would value it at $61.5 billion, two sources familiar with the matter told Reuters on Monday.
👏4
ROCKET: an AlphaFold augmentation that integrates crystallographic and cryoEM/ET data with room for more
AF-based methods encode rich structural priors but lack a general mechanism for integrating arbitrary data modalities.
ROCKET tackles this by optimizing latent representations to fit experimental data at inference time, without retraining.
Code available soon.
AF-based methods encode rich structural priors but lack a general mechanism for integrating arbitrary data modalities.
ROCKET tackles this by optimizing latent representations to fit experimental data at inference time, without retraining.
Code available soon.
bioRxiv
AlphaFold as a Prior: Experimental Structure Determination Conditioned on a Pretrained Neural Network
Advances in machine learning have transformed structural biology, enabling swift and accurate prediction of protein structure from sequence. However, challenges persist in capturing sidechain packing, condition-dependent conformational dynamics, and biomolecular…
👍3🔥3❤2
Google just launched a free version of Gemini Code Assist globally
It comes with:
1. 180K code completions per month
2. Support for all programming languages in the public domain
3. 128K token context window
It comes with:
1. 180K code completions per month
2. Support for all programming languages in the public domain
3. 128K token context window
Google
Get coding help from Gemini Code Assist — now for free
Announcing a free version of Gemini Code Assist, powered by Gemini 2.0, and Gemini Code Review in GitHub.
👍3👏3❤2
Nasdaq has filed a 19b-4 filing for the Grayscale POLKADOT ETF.
In 2021, Grayscale announced the establishment of several new cryptocurrency trusts, including the Grayscale Polkadot Trust.
In 2021, Grayscale announced the establishment of several new cryptocurrency trusts, including the Grayscale Polkadot Trust.
👏5👍2🔥2
AI models now handle voice/speech yet building with them in Python is very frustrating
FastRTC is here to solve
- Automatic Voice Detection
- Handling WebRTC & the backend for real-time apps
- Calling Phones
Github
FastRTC is here to solve
- Automatic Voice Detection
- Handling WebRTC & the backend for real-time apps
- Calling Phones
Github
huggingface.co
FastRTC: The Real-Time Communication Library for Python
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥5❤2🆒2👍1
Arc institute introduced the world's largest single-cell dataset
They’re launched the Arc Virtual Cell Atlas, a growing resource for computation-ready single-cell measurements.
As the initial contributions, Vevo Therapeutics has open sourced Tahoe-100M, the world's largest single-cell dataset, mapping 60,000 drug-cell interactions, and announced scBaseCamp, the first RNA sequencing data repository curated using AI agents. Combined, the release includes data from over 300 million cells.
They’re launched the Arc Virtual Cell Atlas, a growing resource for computation-ready single-cell measurements.
As the initial contributions, Vevo Therapeutics has open sourced Tahoe-100M, the world's largest single-cell dataset, mapping 60,000 drug-cell interactions, and announced scBaseCamp, the first RNA sequencing data repository curated using AI agents. Combined, the release includes data from over 300 million cells.
arcinstitute.org
Virtual Cell Atlas | Arc Institute
Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.
👍6
#DeepSeek makes 2 major announcements
1. Starting today, DeepSeek is offering significant discounts on their API Platform during off-peak hours (16:30-00:30 UTC daily):
• DeepSeek-V3: 50% OFF
• DeepSeek-R1: Massive 75% OFF
This means you can access powerful AI models at a fraction of the cost during these hours. For example, DeepSeek-R1 output cost drops from $2.19 to just $0.550 per 1M tokens!
2. DeepSeek has also released DeepGEMM - an impressive FP8 GEMM library that supports both dense and MoE GEMMs, powering their V3/R1 models.
Key features:
- Up to 1350+ FP8 TFLOPS on Hopper GPUs
- Lightweight with no heavy dependencies
- Fully Just-In-Time compiled
- Core logic at just ~300 lines of code
- Outperforms expert-tuned kernels on most matrix sizes
- Supports dense layout and two MoE layouts
1. Starting today, DeepSeek is offering significant discounts on their API Platform during off-peak hours (16:30-00:30 UTC daily):
• DeepSeek-V3: 50% OFF
• DeepSeek-R1: Massive 75% OFF
This means you can access powerful AI models at a fraction of the cost during these hours. For example, DeepSeek-R1 output cost drops from $2.19 to just $0.550 per 1M tokens!
2. DeepSeek has also released DeepGEMM - an impressive FP8 GEMM library that supports both dense and MoE GEMMs, powering their V3/R1 models.
Key features:
- Up to 1350+ FP8 TFLOPS on Hopper GPUs
- Lightweight with no heavy dependencies
- Fully Just-In-Time compiled
- Core logic at just ~300 lines of code
- Outperforms expert-tuned kernels on most matrix sizes
- Supports dense layout and two MoE layouts
🔥8❤6
This is huge: Oklahoma, a state in the USA, has passed the House of Representatives Committee and entered the full vote.
The bill allows the state to invest up to 10% of public funds in BTC or digital assets with a market value of more than $500 billion.
The bill allows the state to invest up to 10% of public funds in BTC or digital assets with a market value of more than $500 billion.
❤3🔥3👏2