Ex-DeepMind researchers Misha Laskin and Ioannis Antonoglou launched Reflection AI with $130M funding
The startup plans to build superintelligent AI, starting with coding systems. A focus on building autonomous coding systems. Equal emphasis on research and product.
Previously, their team helped build AI systems like AphaGo & GPT-4.
The startup plans to build superintelligent AI, starting with coding systems. A focus on building autonomous coding systems. Equal emphasis on research and product.
Previously, their team helped build AI systems like AphaGo & GPT-4.
Reflection AI
Building frontier open intelligence.
π4π₯2π1
OpenAI has evolved their thinking about AGI development.
Rather than viewing AGI as a sudden leap, they now see it as a continuous series of increasingly capable systems.
Key Risks They're Addressing
1. Human Misuse: People using AI in ways that violate laws and democratic values.
2. Misaligned AI: AI systems acting in ways that don't align with human values and intentions.
3. Societal Disruption: Rapid AI-driven changes causing unpredictable effects on society and inequality.
Their Core Safety Principles
Embracing Uncertainty
They treat safety as a science, learning through real-world deployment rather than just theory. This includes rigorous measurement of risks and proactive mitigation strategies.
Defense in Depth
OpenAI applies multiple layers of safeguards, similar to other safety-critical fields. They teach models to understand safety values, follow instructions, and remain reliable even under uncertainty.
Methods that Scale
They develop safety approaches that become more effective as AI capabilities increase, even using current AI systems to help align more advanced ones.
Human Control
OpenAI places humans at the center of their alignment approach, creating transparent systems that people can meaningfully supervise. They incorporate public feedback into policy formation and work on interfaces that help humans guide AI effectively.
Community Effort
They recognize that ensuring safe AGI requires collaboration across industry, academia, government, and the public. OpenAI shares research, provides resources to the field, funds external research, and engages with policymakers.
While OpenAI has a clear vision for safety, they remain open to being wrong about how AI progress will unfold and welcome diverse perspectives on AI risk management.
Rather than viewing AGI as a sudden leap, they now see it as a continuous series of increasingly capable systems.
Key Risks They're Addressing
1. Human Misuse: People using AI in ways that violate laws and democratic values.
2. Misaligned AI: AI systems acting in ways that don't align with human values and intentions.
3. Societal Disruption: Rapid AI-driven changes causing unpredictable effects on society and inequality.
Their Core Safety Principles
Embracing Uncertainty
They treat safety as a science, learning through real-world deployment rather than just theory. This includes rigorous measurement of risks and proactive mitigation strategies.
Defense in Depth
OpenAI applies multiple layers of safeguards, similar to other safety-critical fields. They teach models to understand safety values, follow instructions, and remain reliable even under uncertainty.
Methods that Scale
They develop safety approaches that become more effective as AI capabilities increase, even using current AI systems to help align more advanced ones.
Human Control
OpenAI places humans at the center of their alignment approach, creating transparent systems that people can meaningfully supervise. They incorporate public feedback into policy formation and work on interfaces that help humans guide AI effectively.
Community Effort
They recognize that ensuring safe AGI requires collaboration across industry, academia, government, and the public. OpenAI shares research, provides resources to the field, funds external research, and engages with policymakers.
While OpenAI has a clear vision for safety, they remain open to being wrong about how AI progress will unfold and welcome diverse perspectives on AI risk management.
Openai
How we think about safety and alignment
The mission of OpenAI is to ensure artificial general intelligence (AGI) benefits all of humanity. Safetyβthe practice of enabling AIβs positive impacts by mitigating the negative onesβis thus core to our mission.
π₯7β€3π2π1
Ex-CEO Google Eric Schmidt is taking a controlling stake in Relativity Space and taking over as CEO.
Company makes reusable rockets like SpaceX, has raised $2.4B to date
YC company from W16.
Company makes reusable rockets like SpaceX, has raised $2.4B to date
YC company from W16.
NY Times
Eric Schmidt Joins Relativity Space, a Rocket Start-Up, as C.E.O.
The former Google chief executive is taking a controlling interest in Relativity Space, which aims to build low-cost, reusable rockets to compete against Elon Muskβs SpaceX and to reach Mars.
π5β€3π2
The OWASP Top 10 Agentic AI Threats and Mitigations report has put together a threat-model-based framework to secure autonomous AI systems powered by GenAI agents.
π₯4β€3π2
LeanAgent: the first lifelong learning agent for formal theorem proving in Lean.
LLMs have been integrated with interactive proof assistants like Lean for theorem proving with 100% accuracy.
So far, these LLMs cannot continuously generalize to new knowledge and struggle with advanced mathematics.
LeanAgent continuously learns and improves on ever-expanding mathematical knowledge without forgetting what it learned before. It has a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity.
LeanAgent successfully proves 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It proves challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics.
LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks.
Code.
LLMs have been integrated with interactive proof assistants like Lean for theorem proving with 100% accuracy.
So far, these LLMs cannot continuously generalize to new knowledge and struggle with advanced mathematics.
LeanAgent continuously learns and improves on ever-expanding mathematical knowledge without forgetting what it learned before. It has a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity.
LeanAgent successfully proves 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It proves challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics.
LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks.
Code.
arXiv.org
LeanAgent: Lifelong Learning for Formal Theorem Proving
Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches...
π4β€3π₯°1π1
OpenAI launches new tools to help businesses build AI agents
A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API.
Plus,theyβre upgraded Swarm to the (open source) Agents SDK to make it easy to build, orchestrate, and monitor systems of agents
Web search in the OpenAI APIβget fast, up-to-date answers with links to relevant web sources. Powered by the same model used for ChatGPT search.
Now with the Responses API, OpenAI wants to sell access to the components that power AI agents, allowing developers to build their own Operator- and deep research-style agentic applications.
OpenAI hopes that developers can create some applications with its agent technology that feel more autonomous than whatβs available today.
A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API.
Plus,theyβre upgraded Swarm to the (open source) Agents SDK to make it easy to build, orchestrate, and monitor systems of agents
Web search in the OpenAI APIβget fast, up-to-date answers with links to relevant web sources. Powered by the same model used for ChatGPT search.
Now with the Responses API, OpenAI wants to sell access to the components that power AI agents, allowing developers to build their own Operator- and deep research-style agentic applications.
OpenAI hopes that developers can create some applications with its agent technology that feel more autonomous than whatβs available today.
Openai
New tools for building agents
Weβre evolving our platform to help developers and enterprises build useful and reliable agents.
β€5π4π2
Google introduced Gemma 3 and ShieldGemma 2
Google DeepMind released. Gemma 3 family of open models
Available in various sizes: 1B, 4B, 12B and 27B
Key Capabilities:
1. World's best single-accelerator model: outperforms Llama-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena's leaderboard
2. Global language support: out-of-the-box support for over 35 languages and pretrained support for over 140 languages
3. Multimodal reasoning: analyze images, text, and short videos
4. Expanded context: 128k-token context window
5. Function calling: supports function calling and structured output for AI-driven workflows
6. Quantized models: official quantized versions to reduce model size and computational requirements
Alongside Gemma 3, Google is launching ShieldGemma 2 β a powerful 4B image safety checker built on the Gemma 3 foundation that provides:
- A ready-made solution for image safety
- Safety labels across three categories: dangerous content, sexually explicit content, and violence
- Customization options for specific safety needs
Paper.
HuggingFace.
Google DeepMind released. Gemma 3 family of open models
Available in various sizes: 1B, 4B, 12B and 27B
Key Capabilities:
1. World's best single-accelerator model: outperforms Llama-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena's leaderboard
2. Global language support: out-of-the-box support for over 35 languages and pretrained support for over 140 languages
3. Multimodal reasoning: analyze images, text, and short videos
4. Expanded context: 128k-token context window
5. Function calling: supports function calling and structured output for AI-driven workflows
6. Quantized models: official quantized versions to reduce model size and computational requirements
Alongside Gemma 3, Google is launching ShieldGemma 2 β a powerful 4B image safety checker built on the Gemma 3 foundation that provides:
- A ready-made solution for image safety
- Safety labels across three categories: dangerous content, sexually explicit content, and violence
- Customization options for specific safety needs
Paper.
HuggingFace.
Google
Introducing Gemma 3: The most capable model you can run on a single GPU or TPU
Today, we're introducing Gemma 3, our most capable, portable and responsible open model yet.
π₯7β€2π2
This media is not supported in your browser
VIEW IN TELEGRAM
Claude just bought the Solana team a 5th bday gift on Amazon.
Paid in solana USDC, shipped to their NY office.
Agents can now buy anything on Amazon using the goat MCP adapter + Crossmint Checkout.
Crossmintβs Headless Checkout plugin gives Claude access to Amazonβs entire product catalog.
Paid in solana USDC, shipped to their NY office.
Agents can now buy anything on Amazon using the goat MCP adapter + Crossmint Checkout.
Crossmintβs Headless Checkout plugin gives Claude access to Amazonβs entire product catalog.
HuggingFace trained a new open model Open-R1 that outperforms DeepSeek R1 on the International Olympiad in Informatics.
The 7B variant beats Claude 3.7 Sonnet. The dataset, training recipe and benchmark are all public.
The 7B variant beats Claude 3.7 Sonnet. The dataset, training recipe and benchmark are all public.
π4π2β€1π₯1
All about AI, Web 3.0, BCI
OpenAI launches new tools to help businesses build AI agents A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API. Plus,theyβre upgraded Swarm to the (open source) Agents SDK to make it easy to buildβ¦
Summary of insights from OpenAI AMA on X following the launch of Agent Tools and APIs
Responses API and Tools
1. Operator functionality (CUA model) is available starting today through the Responses API
2. Responses API is stateful by default, supports retrieving past responses, chaining them, and will soon reintroduce threads
3. Code Interpreter tool is planned as the next built-in tool in the Responses API
4. Web search can be used together with structured outputs by defining a JSON schema explicitly
5. Assistants API won't be deprecated until migration to Responses API is possible without data loss
6. "Assistants" and "agents" terms are interchangeable, both describing systems independently accomplishing user tasks
7. Curl documentation is provided in API references, with more examples coming soon
Agents SDK and Compatibility
- Agents SDK supports external API calls through custom-defined function tools
- SDK compatible with external open-source models that expose a Chat Completions-compatible API endpoint (no Responses API compatibility required)
- JavaScript and TypeScript SDKs coming soon; Java SDK may be prioritized based on demand
- Tracing functionality covers external Chat Completions-compatible models as a "generation span"
- Agents SDK supports MCP connections through custom-defined function tools
- Asynchronous operations aren't natively supported yet; interim solution is immediately returning "success" followed by updating via user message later
- Agent tools can include preset variables either hardcoded or through a context object
- Privacy can be managed using guardrails and an
- Agents SDK workflows can combine external Chat Completions-compatible models and OpenAI models, including built-in tools like CUA
- Agentic "deep research" functionality can be built using Responses API or Agents SDK
File and Vector Store Features
- File search returns citation texts via the "annotations" parameter
- Vector stores already support custom chunking and hybrid search, with further improvements planned
- Images are not yet supported in vector stores, but entire PDFs can be directly uploaded into the Responses API for small-document use cases
Computer Use Model (CUA) and Environment
- Docker environments for computer use must be managed by developers; recommended third-party cloud services are Browserbase and Scrapybara, with provided sample apps
- Predefined Ubuntu environments and company-specific setups can be created using OpenAI's CUA starter app
- Integrated VMs or fully managed cloud environments for CUA are not planned yet; developers are encouraged to use sample apps with third-party hosting providers
- CUA model primarily trained on web tasks but shows promising performance in desktop applications; still early stage
Realtime Usage Tracking
- OpenAI currently doesn't provide a built-in solution for tracking realtime usage via WebRTC ephemeral tokens; using a relay/proxy is recommended as an interim solution
OpenAI Models and Roadmap
- o1-pro will be available soon in Responses API
- o3 model development continues with API release planned; details forthcoming
Strategy and Positioning
OpenAI identifies as both a product and model company, noting ChatGPT's 400M weekly users help improve model quality, and acknowledges they won't build all needed AI products themselves
Unexpected Use Cases
Early Responses API tests revealed use cases such as art generation, live event summarization, apartment finding, and belief simulations
Responses API and Tools
1. Operator functionality (CUA model) is available starting today through the Responses API
2. Responses API is stateful by default, supports retrieving past responses, chaining them, and will soon reintroduce threads
3. Code Interpreter tool is planned as the next built-in tool in the Responses API
4. Web search can be used together with structured outputs by defining a JSON schema explicitly
5. Assistants API won't be deprecated until migration to Responses API is possible without data loss
6. "Assistants" and "agents" terms are interchangeable, both describing systems independently accomplishing user tasks
7. Curl documentation is provided in API references, with more examples coming soon
Agents SDK and Compatibility
- Agents SDK supports external API calls through custom-defined function tools
- SDK compatible with external open-source models that expose a Chat Completions-compatible API endpoint (no Responses API compatibility required)
- JavaScript and TypeScript SDKs coming soon; Java SDK may be prioritized based on demand
- Tracing functionality covers external Chat Completions-compatible models as a "generation span"
- Agents SDK supports MCP connections through custom-defined function tools
- Asynchronous operations aren't natively supported yet; interim solution is immediately returning "success" followed by updating via user message later
- Agent tools can include preset variables either hardcoded or through a context object
- Privacy can be managed using guardrails and an
input_filter to constrain context during agent handoffs- Agents SDK workflows can combine external Chat Completions-compatible models and OpenAI models, including built-in tools like CUA
- Agentic "deep research" functionality can be built using Responses API or Agents SDK
File and Vector Store Features
- File search returns citation texts via the "annotations" parameter
- Vector stores already support custom chunking and hybrid search, with further improvements planned
- Images are not yet supported in vector stores, but entire PDFs can be directly uploaded into the Responses API for small-document use cases
Computer Use Model (CUA) and Environment
- Docker environments for computer use must be managed by developers; recommended third-party cloud services are Browserbase and Scrapybara, with provided sample apps
- Predefined Ubuntu environments and company-specific setups can be created using OpenAI's CUA starter app
- Integrated VMs or fully managed cloud environments for CUA are not planned yet; developers are encouraged to use sample apps with third-party hosting providers
- CUA model primarily trained on web tasks but shows promising performance in desktop applications; still early stage
Realtime Usage Tracking
- OpenAI currently doesn't provide a built-in solution for tracking realtime usage via WebRTC ephemeral tokens; using a relay/proxy is recommended as an interim solution
OpenAI Models and Roadmap
- o1-pro will be available soon in Responses API
- o3 model development continues with API release planned; details forthcoming
Strategy and Positioning
OpenAI identifies as both a product and model company, noting ChatGPT's 400M weekly users help improve model quality, and acknowledges they won't build all needed AI products themselves
Unexpected Use Cases
Early Responses API tests revealed use cases such as art generation, live event summarization, apartment finding, and belief simulations
π3π₯2β€1
Google introduced Gemini Robotics is the most advanced VLA in the world
Gemini Robotics featured in the post, builds on Gemini 2.0, introducing advanced vision-language-action capabilities to control robots physically.
The technology enables robots to understand and react to the physical world, performing tasks like desk cleanup through voice commands, as part of a broader push toward embodied AI.
Gemini Robotics-ER, a related model, enhances spatial understanding, allowing robots to adapt to dynamic environments and interact seamlessly with humans.
Tech report.
Gemini Robotics featured in the post, builds on Gemini 2.0, introducing advanced vision-language-action capabilities to control robots physically.
The technology enables robots to understand and react to the physical world, performing tasks like desk cleanup through voice commands, as part of a broader push toward embodied AI.
Gemini Robotics-ER, a related model, enhances spatial understanding, allowing robots to adapt to dynamic environments and interact seamlessly with humans.
Tech report.
π5β€3π₯2
Hugging Face (LeRobot) & Yaak released the worlds largest open source self driving dataset
To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works.
Natural language search of multi-modal data.
Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data.
Try Nutron.
To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works.
Natural language search of multi-modal data.
Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data.
Try Nutron.
huggingface.co
LeRobot goes to driving school: Worldβs largest open-source self-driving dataset
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
β€4π₯4π1
The first quantum supremacy for a useful application D-Waveβs quantum computer performed a complex simulation in minutes and with a level of accuracy that would take nearly a million years using the DOE supercomputer built with GPUs.
In addition, it would require more than the worldβs annual electricity consumption to solve this problem using the classical supercomputer.
In addition, it would require more than the worldβs annual electricity consumption to solve this problem using the classical supercomputer.
Dwavequantum
Beyond Classical: D-Wave First to Demonstrate Quantum Supremacy on Useful, Real-World Problem
π₯5π2β€1
IOSCO_AI_1741862094.pdf
1.5 MB
IOSCO Report: AI in Capital Markets - Uses, Risks, and Regulatory Responses
This report delves into the current and potential applications of AI within financial markets, outlines the associated risks and challenges, and examines how regulators and market participants are adapting to these changes.
The report cites regulatory approaches from Hong Kong, the EU, Canada, the US, Singapore, the Netherlands, the UK, Greece, Japan, Brazil, and Australia.
Key AI Applications in Financial Markets:
Decision-Making Support:
- Robo-advising (automated investment advice)
- Algorithmic trading
- Investment research and market sentiment analysis
Specific AI Use Cases.
Nasdaq:
Developed the Dynamic M-ELO AI-driven trading order, optimizing order holding time for improved execution efficiency.
Broker-Dealers.
Customer interaction via chatbots
Algorithmic trading enhancements
Fraud and anomaly detection
Asset Managers.
Automated investment advice
Investment research
Portfolio construction and optimization
This report delves into the current and potential applications of AI within financial markets, outlines the associated risks and challenges, and examines how regulators and market participants are adapting to these changes.
The report cites regulatory approaches from Hong Kong, the EU, Canada, the US, Singapore, the Netherlands, the UK, Greece, Japan, Brazil, and Australia.
Key AI Applications in Financial Markets:
Decision-Making Support:
- Robo-advising (automated investment advice)
- Algorithmic trading
- Investment research and market sentiment analysis
Specific AI Use Cases.
Nasdaq:
Developed the Dynamic M-ELO AI-driven trading order, optimizing order holding time for improved execution efficiency.
Broker-Dealers.
Customer interaction via chatbots
Algorithmic trading enhancements
Fraud and anomaly detection
Asset Managers.
Automated investment advice
Investment research
Portfolio construction and optimization
π₯4π3π1
Cohere introduced Command A: a new AI model that can match or outperform GPT-4o and DeepSeek-V3 on business tasks, with significantly greater efficiency.
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases.
Runs on only 2 GPUs (vs. typically 32), offers 256k context length, supports 23 languages and delivers up to 156 tokens/sec.
API.
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases.
Runs on only 2 GPUs (vs. typically 32), offers 256k context length, supports 23 languages and delivers up to 156 tokens/sec.
API.
Cohere
Introducing Command A: Max performance, minimal compute | Cohere Blog
Cohere Command A is on par or better than GPT-4o and DeepSeek-V3 across agentic enterprise tasks, with significantly greater efficiency.
π₯5β€3π₯°1π1
Transformers, but without normalization layers. New paper by Meta.
π₯7π1π1
Baidu, the Google of China, dropped two models
1. ERNIE 4.5: beats GPT 4.5 for 1% of price
2. Reasoning model X1: beats DeepSeek R1 for 50% of price.
China continues to build intelligence too cheap to meter. The AI price war is on.
1. ERNIE 4.5: beats GPT 4.5 for 1% of price
2. Reasoning model X1: beats DeepSeek R1 for 50% of price.
China continues to build intelligence too cheap to meter. The AI price war is on.
π₯5π2β€1
Xiaomi's development of a SOTA audio reasoning model leverages DeepSeek's GRPO RL algorithm, achieving a 64.5% accuracy on the MMAU benchmark in just one week.
The breakthrough involves applying GRPO to the Qwen2-Audio-7B model, trained on 38,000 samples from Tsinghua University's AVQA dataset, marking a significant advancement in multimodal audio understanding.
The MMAU benchmark, introduced in 2024, tests models on complex audio tasks across speech, sound, and music, with even top models like Gemini Pro 1.5 achieving only 52.97% accuracy, highlighting the challenge Xiaomi's model addresses.
The breakthrough involves applying GRPO to the Qwen2-Audio-7B model, trained on 38,000 samples from Tsinghua University's AVQA dataset, marking a significant advancement in multimodal audio understanding.
The MMAU benchmark, introduced in 2024, tests models on complex audio tasks across speech, sound, and music, with even top models like Gemini Pro 1.5 achieving only 52.97% accuracy, highlighting the challenge Xiaomi's model addresses.
π3π₯3π2
The total value of real world tokenized assets has π by β20% over the last 30 days. Data by RWA.xyz.
Total Real-World Asset (RWA) value continues to climbβover $18B is now tokenized onchain, excluding stablecoins.
Total Real-World Asset (RWA) value continues to climbβover $18B is now tokenized onchain, excluding stablecoins.
π5β€3π2