Google introduced Gemini Robotics is the most advanced VLA in the world
Gemini Robotics featured in the post, builds on Gemini 2.0, introducing advanced vision-language-action capabilities to control robots physically.
The technology enables robots to understand and react to the physical world, performing tasks like desk cleanup through voice commands, as part of a broader push toward embodied AI.
Gemini Robotics-ER, a related model, enhances spatial understanding, allowing robots to adapt to dynamic environments and interact seamlessly with humans.
Tech report.
Gemini Robotics featured in the post, builds on Gemini 2.0, introducing advanced vision-language-action capabilities to control robots physically.
The technology enables robots to understand and react to the physical world, performing tasks like desk cleanup through voice commands, as part of a broader push toward embodied AI.
Gemini Robotics-ER, a related model, enhances spatial understanding, allowing robots to adapt to dynamic environments and interact seamlessly with humans.
Tech report.
๐5โค3๐ฅ2
Hugging Face (LeRobot) & Yaak released the worlds largest open source self driving dataset
To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works.
Natural language search of multi-modal data.
Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data.
Try Nutron.
To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works.
Natural language search of multi-modal data.
Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data.
Try Nutron.
โค4๐ฅ4๐1
The first quantum supremacy for a useful application D-Waveโs quantum computer performed a complex simulation in minutes and with a level of accuracy that would take nearly a million years using the DOE supercomputer built with GPUs.
In addition, it would require more than the worldโs annual electricity consumption to solve this problem using the classical supercomputer.
In addition, it would require more than the worldโs annual electricity consumption to solve this problem using the classical supercomputer.
Dwavequantum
Beyond Classical: D-Wave First to Demonstrate Quantum Supremacy on Useful, Real-World Problem
๐ฅ5๐2โค1
IOSCO_AI_1741862094.pdf
1.5 MB
IOSCO Report: AI in Capital Markets - Uses, Risks, and Regulatory Responses
This report delves into the current and potential applications of AI within financial markets, outlines the associated risks and challenges, and examines how regulators and market participants are adapting to these changes.
The report cites regulatory approaches from Hong Kong, the EU, Canada, the US, Singapore, the Netherlands, the UK, Greece, Japan, Brazil, and Australia.
Key AI Applications in Financial Markets:
Decision-Making Support:
- Robo-advising (automated investment advice)
- Algorithmic trading
- Investment research and market sentiment analysis
Specific AI Use Cases.
Nasdaq:
Developed the Dynamic M-ELO AI-driven trading order, optimizing order holding time for improved execution efficiency.
Broker-Dealers.
Customer interaction via chatbots
Algorithmic trading enhancements
Fraud and anomaly detection
Asset Managers.
Automated investment advice
Investment research
Portfolio construction and optimization
This report delves into the current and potential applications of AI within financial markets, outlines the associated risks and challenges, and examines how regulators and market participants are adapting to these changes.
The report cites regulatory approaches from Hong Kong, the EU, Canada, the US, Singapore, the Netherlands, the UK, Greece, Japan, Brazil, and Australia.
Key AI Applications in Financial Markets:
Decision-Making Support:
- Robo-advising (automated investment advice)
- Algorithmic trading
- Investment research and market sentiment analysis
Specific AI Use Cases.
Nasdaq:
Developed the Dynamic M-ELO AI-driven trading order, optimizing order holding time for improved execution efficiency.
Broker-Dealers.
Customer interaction via chatbots
Algorithmic trading enhancements
Fraud and anomaly detection
Asset Managers.
Automated investment advice
Investment research
Portfolio construction and optimization
๐ฅ4๐3๐1
Cohere introduced Command A: a new AI model that can match or outperform GPT-4o and DeepSeek-V3 on business tasks, with significantly greater efficiency.
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases.
Runs on only 2 GPUs (vs. typically 32), offers 256k context length, supports 23 languages and delivers up to 156 tokens/sec.
API.
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases.
Runs on only 2 GPUs (vs. typically 32), offers 256k context length, supports 23 languages and delivers up to 156 tokens/sec.
API.
Cohere
Introducing Command A: Max performance, minimal compute | Cohere Blog
Cohere Command A is on par or better than GPT-4o and DeepSeek-V3 across agentic enterprise tasks, with significantly greater efficiency.
๐ฅ5โค3๐ฅฐ1๐1
Transformers, but without normalization layers. New paper by Meta.
๐ฅ7๐1๐1
Baidu, the Google of China, dropped two models
1. ERNIE 4.5: beats GPT 4.5 for 1% of price
2. Reasoning model X1: beats DeepSeek R1 for 50% of price.
China continues to build intelligence too cheap to meter. The AI price war is on.
1. ERNIE 4.5: beats GPT 4.5 for 1% of price
2. Reasoning model X1: beats DeepSeek R1 for 50% of price.
China continues to build intelligence too cheap to meter. The AI price war is on.
๐ฅ5๐2โค1
Xiaomi's development of a SOTA audio reasoning model leverages DeepSeek's GRPO RL algorithm, achieving a 64.5% accuracy on the MMAU benchmark in just one week.
The breakthrough involves applying GRPO to the Qwen2-Audio-7B model, trained on 38,000 samples from Tsinghua University's AVQA dataset, marking a significant advancement in multimodal audio understanding.
The MMAU benchmark, introduced in 2024, tests models on complex audio tasks across speech, sound, and music, with even top models like Gemini Pro 1.5 achieving only 52.97% accuracy, highlighting the challenge Xiaomi's model addresses.
The breakthrough involves applying GRPO to the Qwen2-Audio-7B model, trained on 38,000 samples from Tsinghua University's AVQA dataset, marking a significant advancement in multimodal audio understanding.
The MMAU benchmark, introduced in 2024, tests models on complex audio tasks across speech, sound, and music, with even top models like Gemini Pro 1.5 achieving only 52.97% accuracy, highlighting the challenge Xiaomi's model addresses.
๐3๐ฅ3๐2
The total value of real world tokenized assets has ๐ by โ20% over the last 30 days. Data by RWA.xyz.
Total Real-World Asset (RWA) value continues to climbโover $18B is now tokenized onchain, excluding stablecoins.
Total Real-World Asset (RWA) value continues to climbโover $18B is now tokenized onchain, excluding stablecoins.
๐5โค3๐2
A breakthrough in brain signal analysis that combines PCA and ANFIS to hit 99.5% accuracy in cognitive pattern recognition.
It could be a game-changer for #neuroscience, #BCI tech and clinical applications.
It could be a game-changer for #neuroscience, #BCI tech and clinical applications.
โค3๐ค2
Mistral announced Small 3.1: multimodal, multilingual, Apache 2.0
Lightweight: Runs on a single RTX 4090 or a Mac with 32GB RAM, perfect for on-device applications.
Fast-Response Conversations: Ideal for virtual assistants and other applications where quick, accurate responses are essential.
Low-Latency Function Calling: Capable of rapid function execution within automated or agentic workflows.
Specialized Fine-Tuning: Customizable for specific domains.
Advanced Reasoning Foundation: Inspires community innovation, with models like DeepHermes 24B by Nous Research built on Mistral Small 3.
Lightweight: Runs on a single RTX 4090 or a Mac with 32GB RAM, perfect for on-device applications.
Fast-Response Conversations: Ideal for virtual assistants and other applications where quick, accurate responses are essential.
Low-Latency Function Calling: Capable of rapid function execution within automated or agentic workflows.
Specialized Fine-Tuning: Customizable for specific domains.
Advanced Reasoning Foundation: Inspires community innovation, with models like DeepHermes 24B by Nous Research built on Mistral Small 3.
huggingface.co
mistralai/Mistral-Small-3.1-24B-Base-2503 ยท Hugging Face
Weโre on a journey to advance and democratize artificial intelligence through open source and open science.
๐ฅ4๐2๐2
ByteDance Seed, Tsinghua, and UHK dropped open-sourced a new RL algorithm for building reasoning models.
DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32B, and scores 50 on AIME 2024 with 50% fewer steps.
It is trained with zero-shot RL from the Qwen-32b pre-trained model.
Everything is fully open-sourced (algorithm, code, dataset, verifier, and model).
DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32B, and scores 50 on AIME 2024 with 50% fewer steps.
It is trained with zero-shot RL from the Qwen-32b pre-trained model.
Everything is fully open-sourced (algorithm, code, dataset, verifier, and model).
๐ฅ7โค1๐1
Cool research on open-source by Harvard
$4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created).
Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist.
$4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created).
Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist.
๐ฅ3โค2๐1
HuggingFace and IBM introduced SmolDocling an ultra-compact VLM for end-to-end multi-modal document conversion
SmolDocling is good for enterprise use cases:
- 256M parameters - cheap and easy to run locally
- Performs better than 20x larger models
- Fast inference using VLLM โ Avg of 0.35 secs per page on A100 GPU.
- Apache 2.0 license
Demo.
SmolDocling is good for enterprise use cases:
- 256M parameters - cheap and easy to run locally
- Performs better than 20x larger models
- Fast inference using VLLM โ Avg of 0.35 secs per page on A100 GPU.
- Apache 2.0 license
Demo.
โค6๐2๐1
Biggest deal in Google/Alphabet history: Google is buying Wiz for $32B to beef up in cloud security
Wiz. is an Israeli cloud security startup headquartered in New York City. The company was founded in January 2020.
This acquisition positions Google to better compete with AWS and Azure.
Wiz. is an Israeli cloud security startup headquartered in New York City. The company was founded in January 2020.
This acquisition positions Google to better compete with AWS and Azure.
TechCrunch
Confirmed: Google buys Wiz for $32B to beef up in cloud security | TechCrunch
Google is making the biggest acquisition in its history. The company's parent company Alphabet is buying Wiz, the cloud security startup, for $32 billion
๐ฅ4๐3๐3
Anthropic is working on voice capabilities for Claude.
The companyโs chief product officer, Mike Krieger, told the Financial Timesthat Anthropic plans to launch experiences that allow users to talk to Anthropicโs AI models.
The companyโs chief product officer, Mike Krieger, told the Financial Timesthat Anthropic plans to launch experiences that allow users to talk to Anthropicโs AI models.
๐5๐1๐ฅ1๐1
Media is too big
VIEW IN TELEGRAM
NVIDIA, Google DeepMind and Disney Research are collaborating to build an R2D2 style home droid.
Jensen giving the little guy voice and gesture commands live on stage.
Robotโs name is Blue, he is so cute.
Jensen giving the little guy voice and gesture commands live on stage.
Robotโs name is Blue, he is so cute.
โค6๐ฅฐ2๐1
Nvidia announced GR00T N1, the worldโs first open foundation model for humanoid robots
The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight:
- Real humanoid teleoperation data.
- Large-scale simulation data: we are open-sourcing 300K+ trajectories
- Neural trajectories: SOTA video generation models to โhallucinateโ new synthetic data that features accurate physics in pixels. Using Jensenโs words, โsystematically infinite dataโ
- Latent actions: novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos.
GR00T N1 is a single end-to-end neural net, from photons to actions:
- Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions.
- Diffusion Transformer (System 1) that โrendersโ smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2.
Code.
Weights on HF.
Open Physical AI dataset release.
Blog.
The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight:
- Real humanoid teleoperation data.
- Large-scale simulation data: we are open-sourcing 300K+ trajectories
- Neural trajectories: SOTA video generation models to โhallucinateโ new synthetic data that features accurate physics in pixels. Using Jensenโs words, โsystematically infinite dataโ
- Latent actions: novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos.
GR00T N1 is a single end-to-end neural net, from photons to actions:
- Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions.
- Diffusion Transformer (System 1) that โrendersโ smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2.
Code.
Weights on HF.
Open Physical AI dataset release.
Blog.
Nvidia
NVIDIA Isaac GR00T N1: An Open Foundation Model for Humanoid Robots | Research
At NVIDIA, we are developing AI solutions to enable general-purpose humanoid robots to understand the human world, follow language instructions, and perform diverse tasks. A robust Vision-Language-Action (VLA) model is crucial for such advanced capabilities.โฆ
๐ฅ5๐2โค1
Also Nvidia introduced Newton, an open-source physics engine developed by NVIDIA and Google DeepMind, is designed to accelerate robot learning and development.
Built on NVIDIA Warp, which enables robots to learn how to handle complex tasks with greater precision, Newton is compatible with learning frameworks such as MuJoCo Playground or NVIDIA Isaac Labโan open-source, unified framework for robot learning.
Disney Research will be one of the first to use Newton to advance its robotic character platform.
GitHub.
Built on NVIDIA Warp, which enables robots to learn how to handle complex tasks with greater precision, Newton is compatible with learning frameworks such as MuJoCo Playground or NVIDIA Isaac Labโan open-source, unified framework for robot learning.
Disney Research will be one of the first to use Newton to advance its robotic character platform.
GitHub.
NVIDIA Technical Blog
Announcing Newton, an Open-Source Physics Engine for Robotics Simulation
Physical AI models enable robots to autonomously perceive, interpret, reason, and interact with the real world. Accelerated computing and simulations are key to developing the next generation ofโฆ
โค4๐3๐2