Google launched Gemini CLI, a powerful open-source AI agent built for the terminal.
- Built on Gemini 2.5 Pro
- 1 million token context window
- Free tier with 60 requests per minute and 1,000 per day
- Google Search grounding for real-time context
- Script and plugin support
- Non-interactive mode for automation
- Support for Model Context Protocol (MCP)
- Integrated with Gemini Code Assist in VS Code
- Fully open-source under Apache 2.0
- Built on Gemini 2.5 Pro
- 1 million token context window
- Free tier with 60 requests per minute and 1,000 per day
- Google Search grounding for real-time context
- Script and plugin support
- Non-interactive mode for automation
- Support for Model Context Protocol (MCP)
- Integrated with Gemini Code Assist in VS Code
- Fully open-source under Apache 2.0
Google
Gemini CLI: your open-source AI agent
Free and open source, Gemini CLI brings Gemini directly into developers’ terminals — with unmatched access for individuals.
🔥8
Ai2 introduced OMEGA is a new math benchmark that pushes LLMs beyond pattern-matching to test true mathematical reasoning.
Paper
Code.
Paper
Code.
allenai.org
OMEGA: Can LLMs reason outside the box in math? | Ai2
Discover how OMEGA is being used to evaluate large language models' ability to generalize in math through exploratory, compositional, and transformative reasoning
🥰3
Anthropic launches AI-powered apps through Claude artifacts
Over 500M artifacts created since launch, and now they're letting you embed Claude's intelligence directly into those creations. Pretty wild move.
The mechanic is clean: instead of building static outputs, you can now create apps that think and adapt in real-time.
Your flashcard generator becomes a flashcard app that responds to any topic users throw at it.
Cost structure makes sense too - when someone uses your AI-powered artifact, it burns their Claude credits, not yours.
No API keys, no billing complexity. Just build and share.
This isn't really "no-code" anymore.
It's more like "no-barriers-to-building-intelligent-things." The gap between having an idea and having a working AI app just collapsed.
Early to tell how this plays out, but the direction feels significant. They're not trying to compete with app stores - they're making it trivial to turn conversations into tools that other people can actually use.
The internal name "Claude in Claude" captures it perfectly. It's recursion all the way down.
Worth watching what people build with this.
Over 500M artifacts created since launch, and now they're letting you embed Claude's intelligence directly into those creations. Pretty wild move.
The mechanic is clean: instead of building static outputs, you can now create apps that think and adapt in real-time.
Your flashcard generator becomes a flashcard app that responds to any topic users throw at it.
Cost structure makes sense too - when someone uses your AI-powered artifact, it burns their Claude credits, not yours.
No API keys, no billing complexity. Just build and share.
This isn't really "no-code" anymore.
It's more like "no-barriers-to-building-intelligent-things." The gap between having an idea and having a working AI app just collapsed.
Early to tell how this plays out, but the direction feels significant. They're not trying to compete with app stores - they're making it trivial to turn conversations into tools that other people can actually use.
The internal name "Claude in Claude" captures it perfectly. It's recursion all the way down.
Worth watching what people build with this.
claude.ai
Talk with Claude, an AI assistant from Anthropic
🥰3
ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million tokens on 4 H100 nodes, all using Hugging Face Transformers and DeepSpeed, with no custom modeling code required.
ALST makes long-sequence training fast, efficient, and accessible on GPU nodes or even single GPUs.
ALST makes long-sequence training fast, efficient, and accessible on GPU nodes or even single GPUs.
🔥3🆒3
Meet ROSETTA a framework that's revolutionizing how AI agents learn from human feedback
Unlike traditional AI systems that need extensive offline training or task-specific fine-tuning, ROSETTA lets AI agents adapt to your preferences in real-time using natural language.
Three-Step Pipeline:
1. Grounding - Transforms vague, colloquial language into concrete understanding
2. Planning - Creates multi-stage reward strategies
3. Code Generation - Converts plans into executable reward functions
Mind-Blowing Results:
- 87% average success rate - 86% human satisfaction - Tested across 116 diverse preferences - Handles contradictory and evolving preferences seamlessly
Unlike traditional AI systems that need extensive offline training or task-specific fine-tuning, ROSETTA lets AI agents adapt to your preferences in real-time using natural language.
Three-Step Pipeline:
1. Grounding - Transforms vague, colloquial language into concrete understanding
2. Planning - Creates multi-stage reward strategies
3. Code Generation - Converts plans into executable reward functions
Mind-Blowing Results:
- 87% average success rate - 86% human satisfaction - Tested across 116 diverse preferences - Handles contradictory and evolving preferences seamlessly
sanjanasrivastava.github.io
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
ROSETTA extracts signal from complex, unconstrained human language preference and generates dense code-based reward for RL agents in a single shot.
🔥4
Sora Neuroscience received FDA clearance for its Cirrus Resting State fMRI Brain Mapping Software.
This AI based software uses resting-state functional magnetic resonance imaging (rs-fMRI) to map critical brain networks, aiding neurosurgeons in planning surgeries for conditions like brain tumors and epilepsy.
Key points about Cirrus:
- Functionality: It generates maps of eloquent cortex areas (e.g., those controlling speech, vision, and movement) in as little as 12 minutes by analyzing patterns of brain activity during rest, using blood-oxygen-level-dependent (BOLD) signals. This is faster and more reliable than traditional task-based fMRI, which requires patient compliance and can take up to an hour.
- Accessibility: Unlike task-based fMRI, which is challenging for patients like children, those with cognitive impairments, or non-English speakers, Cirrus works with patients who are sedated or unable to follow instructions, expanding its applicability.
- Integration: The software integrates with existing surgical navigation platforms and has a non-exclusive distribution agreement with Prism Clinical Imaging, Inc., enhancing its compatibility with tools like fMRI and diffusion tensor imaging (DTI) for comprehensive brain mapping.
- Technology: Built on decades of WashU research, Cirrus uses a multi-layer perceptron (MLP) algorithm for supervised classification of rs-fMRI data, offering high sensitivity compared to intraoperative cortical stimulation, the gold standard for mapping.
This AI based software uses resting-state functional magnetic resonance imaging (rs-fMRI) to map critical brain networks, aiding neurosurgeons in planning surgeries for conditions like brain tumors and epilepsy.
Key points about Cirrus:
- Functionality: It generates maps of eloquent cortex areas (e.g., those controlling speech, vision, and movement) in as little as 12 minutes by analyzing patterns of brain activity during rest, using blood-oxygen-level-dependent (BOLD) signals. This is faster and more reliable than traditional task-based fMRI, which requires patient compliance and can take up to an hour.
- Accessibility: Unlike task-based fMRI, which is challenging for patients like children, those with cognitive impairments, or non-English speakers, Cirrus works with patients who are sedated or unable to follow instructions, expanding its applicability.
- Integration: The software integrates with existing surgical navigation platforms and has a non-exclusive distribution agreement with Prism Clinical Imaging, Inc., enhancing its compatibility with tools like fMRI and diffusion tensor imaging (DTI) for comprehensive brain mapping.
- Technology: Built on decades of WashU research, Cirrus uses a multi-layer perceptron (MLP) algorithm for supervised classification of rs-fMRI data, offering high sensitivity compared to intraoperative cortical stimulation, the gold standard for mapping.
EIN Presswire
Sora Neuroscience Announces FDA Clearance of Cirrus Resting State fMRI Brain Mapping Software
Sora's FDA clearance of Cirrus simplifies fMRI generation of eloquent cortex maps for surgical planning.
🔥4💯2
Big open AI day!
1. Black Forest labs just released the best open weights image editing model, comparable to GPT4-o.
HF.
Code.
2. Google released Gemma 3n, the first model under 10B. The E4B version is the first model under 10B parameters to break a 1300 lmarena’s score.
1. Black Forest labs just released the best open weights image editing model, comparable to GPT4-o.
HF.
Code.
2. Google released Gemma 3n, the first model under 10B. The E4B version is the first model under 10B parameters to break a 1300 lmarena’s score.
fal.ai Blog | Generative AI Model Releases & Tutorials
Announcing FLUX.1 Kontext [dev] Inference & Training
Open-weights, Fast Inference and LoRA Support
Following last month's successful launch of FLUX.1 Kontext [pro] and [max] models, we're excited to announce the release of BFL's FLUX.1 Kontext [dev] with open weights. This new version delivers exceptional…
Following last month's successful launch of FLUX.1 Kontext [pro] and [max] models, we're excited to announce the release of BFL's FLUX.1 Kontext [dev] with open weights. This new version delivers exceptional…
🔥6🦄3
Google introduced Doppl, a new mobile app that lets you upload a photo or screenshot of an outfit and then creates a video of you wearing the clothes to help you find your
Available on iOS and Android in the US.
Available on iOS and Android in the US.
Google
Try on looks and discover your style with Doppl
Doppl, a new Google Labs app, uses AI to create personalized outfit try-on images and videos.
🔥3
How Transformers Learn to Navigate: Episodic Memory as a Computational Workspace
A new study researchers reveals surprising insights into how transformers achieve rapid in-context learning, with implications that bridge AI and neuroscience.
Key findings:
1. Internal Map Formation
The researchers discovered that transformers don't just memorize solutions—they actually build internal representations of spatial environments. These models learn to:
- Create cognitive maps of gridworld and tree maze environments
- Align representations across different contexts with similar structures
- Use these maps for efficient navigation in novel scenarios.
2. Novel Decision-Making Strategy. Surprisingly, the models don't use traditional reinforcement learning approaches like value estimation or explicit path planning. Instead, they employ a geometric strategy:
Align representations to Euclidean space using in-context experience
Calculate angles from current state to goal in this space
Select actions based on these angular computations
This approach is both elegant and computationally efficient.
3. Memory as Computational Workspace. Perhaps most intriguingly, the study reveals that episodic memory tokens serve as more than just storage—they become an active computational workspace where intermediate calculations are cached and processed.
This work challenges our understanding of in-context learning in several ways:
Beyond simple pattern matching: Transformers are developing sophisticated algorithmic strategies
Neuroscience connections: The mechanisms mirror hippocampal-entorhinal cortex computations in biological brains
Architectural insights: Memory systems can serve dual roles as both storage and computation
A new study researchers reveals surprising insights into how transformers achieve rapid in-context learning, with implications that bridge AI and neuroscience.
Key findings:
1. Internal Map Formation
The researchers discovered that transformers don't just memorize solutions—they actually build internal representations of spatial environments. These models learn to:
- Create cognitive maps of gridworld and tree maze environments
- Align representations across different contexts with similar structures
- Use these maps for efficient navigation in novel scenarios.
2. Novel Decision-Making Strategy. Surprisingly, the models don't use traditional reinforcement learning approaches like value estimation or explicit path planning. Instead, they employ a geometric strategy:
Align representations to Euclidean space using in-context experience
Calculate angles from current state to goal in this space
Select actions based on these angular computations
This approach is both elegant and computationally efficient.
3. Memory as Computational Workspace. Perhaps most intriguingly, the study reveals that episodic memory tokens serve as more than just storage—they become an active computational workspace where intermediate calculations are cached and processed.
This work challenges our understanding of in-context learning in several ways:
Beyond simple pattern matching: Transformers are developing sophisticated algorithmic strategies
Neuroscience connections: The mechanisms mirror hippocampal-entorhinal cortex computations in biological brains
Architectural insights: Memory systems can serve dual roles as both storage and computation
The latest update from the Neuralink. Using brainwaves to:
1. Play Call of Duty
2. Control robotic hands
3. Have a neural Mario Kart party
4 Regain one’s own natural voice
1. Play Call of Duty
2. Control robotic hands
3. Have a neural Mario Kart party
4 Regain one’s own natural voice
🔥4
Huge drop from Baidu: Ernie 4.5
From 0.3B to 424B
This is a very impressive family of open models by Baidu, competitive with qwen3 and latest Deepseek V3+ they open source the training code as well.
GitHub.
Hf.
From 0.3B to 424B
This is a very impressive family of open models by Baidu, competitive with qwen3 and latest Deepseek V3+ they open source the training code as well.
GitHub.
Hf.
GitHub
GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框…
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署) - PaddlePaddle/Paddle
🔥4
OpenAI acquired the team behind Crossing Minds, a startup focused on AI recommendations for e-commerce
They will now focus on agents and information retrieval, covering how OpenAI systems learn, reason, and retrieve knowledge at scale, in real-time
They will now focus on agents and information retrieval, covering how OpenAI systems learn, reason, and retrieve knowledge at scale, in real-time
🔥4
A new study from multiple institutions introduces CoreCognition, a benchmark that systematically evaluates whether large language models possess "core knowledge" - fundamental cognitive abilities that humans develop in early childhood.
Key findings from testing 230 models:
Reversed developmental trajectory: MLLMs excel at complex tasks (formal reasoning, math) but fail at basic ones that 2-year-olds master, such as object permanence and spatial understanding. Performance on higher-level abilities doesn't correlate with mastery of foundational ones.
Scaling doesn't help: Increasing model size improves performance on complex tasks but shows minimal or negative impact on basic cognitive abilities. Some abilities, like perspective-taking, actually decline with scale.
Reasoning models show no advantage: Models with chain-of-thought reasoning (GPT-o1, QVQ-72B) perform no better on core knowledge tasks than standard models, suggesting the deficit is architectural, not procedural.
Shortcut learning vs. genuine understanding: Through "Concept Hacking" - manipulating images to invert correct answers - researchers found models rely on learned patterns rather than genuine conceptual understanding.
The benchmark tests 12 core abilities across three developmental stages:
1. Sensorimotor (0-2 years): boundary detection, object permanence, continuity, spatiality
2. Concrete operations (7-11 years): conservation, intuitive physics, perspective-taking
3. Formal operations (11+ years): intentionality understanding, mechanical reasoning, tool use.
Key findings from testing 230 models:
Reversed developmental trajectory: MLLMs excel at complex tasks (formal reasoning, math) but fail at basic ones that 2-year-olds master, such as object permanence and spatial understanding. Performance on higher-level abilities doesn't correlate with mastery of foundational ones.
Scaling doesn't help: Increasing model size improves performance on complex tasks but shows minimal or negative impact on basic cognitive abilities. Some abilities, like perspective-taking, actually decline with scale.
Reasoning models show no advantage: Models with chain-of-thought reasoning (GPT-o1, QVQ-72B) perform no better on core knowledge tasks than standard models, suggesting the deficit is architectural, not procedural.
Shortcut learning vs. genuine understanding: Through "Concept Hacking" - manipulating images to invert correct answers - researchers found models rely on learned patterns rather than genuine conceptual understanding.
The benchmark tests 12 core abilities across three developmental stages:
1. Sensorimotor (0-2 years): boundary detection, object permanence, continuity, spatiality
2. Concrete operations (7-11 years): conservation, intuitive physics, perspective-taking
3. Formal operations (11+ years): intentionality understanding, mechanical reasoning, tool use.
williamium3000.github.io
Home - Core Cognition
Core Knowledge Deficits in Multi-Modal Language Models
🆒4🔥2
Stablecoin issuer Circle has applied to the U.S. OCC to establish “First National Digital Currency Bank, N.A.”
If approved, the charter would allow Circle to self-custody USDC reserves and offer digital asset custody services to institutions, excluding deposit-taking and lending.
If approved, the charter would allow Circle to self-custody USDC reserves and offer digital asset custody services to institutions, excluding deposit-taking and lending.
🔥4
Sakana AI introduced Inference-Time Scaling and Collective Intelligence for Frontier AI
AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin.
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution.
ARC-AGI-2 code.
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.
Implementation of AB-MCTS on GitHub.
AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin.
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution.
ARC-AGI-2 code.
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.
Implementation of AB-MCTS on GitHub.
sakana.ai
Sakana AI
Inference-Time Scaling and Collective Intelligence for Frontier AI
🔥6👍3
A noninvasive brain-computer interface that enables humans to control a robotic hand at the level of individual fingers—just by thinking
This advance moves #robotic #BCI control from the arm level to the #finger level, using only scalp #EEG.
With the help of #AI and #deeplearning, researchers were able to extract extremely weak brain signals reflecting a user’s mental intention and use them for real-time, finger-level robotic control.
In this study, 21 human participants learned to control individual fingers of a robotic hand with ~80% accuracy for two distinct fingers on the same hand.
EEG-based BCI is safe, noninvasive, and economical, offering the potential for widespread use—not just for patients, but possibly the general public as well.
Despite challenges in reading brain signals through the scalp, AI-assisted signal decoding made this breakthrough possible.
This advance moves #robotic #BCI control from the arm level to the #finger level, using only scalp #EEG.
With the help of #AI and #deeplearning, researchers were able to extract extremely weak brain signals reflecting a user’s mental intention and use them for real-time, finger-level robotic control.
In this study, 21 human participants learned to control individual fingers of a robotic hand with ~80% accuracy for two distinct fingers on the same hand.
EEG-based BCI is safe, noninvasive, and economical, offering the potential for widespread use—not just for patients, but possibly the general public as well.
Despite challenges in reading brain signals through the scalp, AI-assisted signal decoding made this breakthrough possible.
🔥4
Where should consumer AI founders build next?
From the Menlo Ventures consumer survey of 5k+ Americans, these were the activities with high participation but lowest AI penetration today
From the Menlo Ventures consumer survey of 5k+ Americans, these were the activities with high participation but lowest AI penetration today
Small Language Models are the Future of Agentic AI
This position paper argues that small language models (SLMs), defined pragmatically as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications, especially when tasks are narrow, repetitive, or tool-oriented.
The authors propose that shifting from LLM-first to SLM-first architectures will yield major gains in efficiency, modularity, and sustainability.
SLMs are already capable of commonsense reasoning, instruction following, and code/tool interaction at levels comparable to 30–70B models, with orders of magnitude better throughput.
Examples include Phi-3, Hymba-1.5B, DeepSeek-R1-Distill, and RETRO-7.5B.
This position paper argues that small language models (SLMs), defined pragmatically as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications, especially when tasks are narrow, repetitive, or tool-oriented.
The authors propose that shifting from LLM-first to SLM-first architectures will yield major gains in efficiency, modularity, and sustainability.
SLMs are already capable of commonsense reasoning, instruction following, and code/tool interaction at levels comparable to 30–70B models, with orders of magnitude better throughput.
Examples include Phi-3, Hymba-1.5B, DeepSeek-R1-Distill, and RETRO-7.5B.
arXiv.org
Small Language Models are the Future of Agentic AI
Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation. The rise of agentic AI...
🔥6
Amazon announced DeepFleet, an AI that routes warehouse bots 10% faster to trim costs and shorten delivery times
Andy Jassy likened it to "an intelligent traffic management system" that coordinates robots’ movements to find optimal paths.
Andy Jassy likened it to "an intelligent traffic management system" that coordinates robots’ movements to find optimal paths.
🔥4
HuggingFace announced a new open-source challenge in collaboration with Proxima Fusion: unlocking fusion with AI
The "Bringing Fusion Down to Earth: ML for Stellarator Optimization" project is an initiative by Hugging Face in collaboration with Proxima Fusion, a spin-out from the Max Planck Institute for Plasma Physics, aimed at accelerating fusion energy research through ML applied to stellarator design.
The initiative focuses on using ML to optimize stellarator designs, addressing the computational complexity of simulating and designing these devices. Key goals include:
- Accelerating Design Processes: Traditional stellarator design, like that of W7-X, required massive computational effort and iterative, hand-tuned processes. ML aims to streamline this by developing surrogate models that predict outcomes of complex simulations (e.g., VMEC++ simulations) and key plasma properties from input parameters. These models could replace expensive simulations, enabling faster design iterations and differentiable optimization loops.
- Open Collaboration: The project opens fusion research to the broader ML community, encouraging global participation to tackle one of the hardest scientific challenges. It includes a live leaderboard where researchers can submit optimized stellarator designs and compare performance on standard metrics.
- Advancing Fusion Energy: By optimizing stellarators, the project aims to make fusion a viable, zero-carbon, fuel-abundant, and safe energy source, capable of transforming the global energy system without the drawbacks of fossil fuels, nuclear fission, or intermittent renewables.
The "Bringing Fusion Down to Earth: ML for Stellarator Optimization" project is an initiative by Hugging Face in collaboration with Proxima Fusion, a spin-out from the Max Planck Institute for Plasma Physics, aimed at accelerating fusion energy research through ML applied to stellarator design.
The initiative focuses on using ML to optimize stellarator designs, addressing the computational complexity of simulating and designing these devices. Key goals include:
- Accelerating Design Processes: Traditional stellarator design, like that of W7-X, required massive computational effort and iterative, hand-tuned processes. ML aims to streamline this by developing surrogate models that predict outcomes of complex simulations (e.g., VMEC++ simulations) and key plasma properties from input parameters. These models could replace expensive simulations, enabling faster design iterations and differentiable optimization loops.
- Open Collaboration: The project opens fusion research to the broader ML community, encouraging global participation to tackle one of the hardest scientific challenges. It includes a live leaderboard where researchers can submit optimized stellarator designs and compare performance on standard metrics.
- Advancing Fusion Energy: By optimizing stellarators, the project aims to make fusion a viable, zero-carbon, fuel-abundant, and safe energy source, capable of transforming the global energy system without the drawbacks of fossil fuels, nuclear fission, or intermittent renewables.
huggingface.co
Bringing Fusion Down to Earth: ML for Stellarator Optimization
A Blog post by Georgia Channing on Hugging Face
🦄5🥰3🔥2🤣1