LeanAgent: the first lifelong learning agent for formal theorem proving in Lean.
LLMs have been integrated with interactive proof assistants like Lean for theorem proving with 100% accuracy.
So far, these LLMs cannot continuously generalize to new knowledge and struggle with advanced mathematics.
LeanAgent continuously learns and improves on ever-expanding mathematical knowledge without forgetting what it learned before. It has a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity.
LeanAgent successfully proves 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It proves challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics.
LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks.
Code.
LLMs have been integrated with interactive proof assistants like Lean for theorem proving with 100% accuracy.
So far, these LLMs cannot continuously generalize to new knowledge and struggle with advanced mathematics.
LeanAgent continuously learns and improves on ever-expanding mathematical knowledge without forgetting what it learned before. It has a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity.
LeanAgent successfully proves 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It proves challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics.
LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks.
Code.
arXiv.org
LeanAgent: Lifelong Learning for Formal Theorem Proving
Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches...
π4β€3π₯°1π1
OpenAI launches new tools to help businesses build AI agents
A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API.
Plus,theyβre upgraded Swarm to the (open source) Agents SDK to make it easy to build, orchestrate, and monitor systems of agents
Web search in the OpenAI APIβget fast, up-to-date answers with links to relevant web sources. Powered by the same model used for ChatGPT search.
Now with the Responses API, OpenAI wants to sell access to the components that power AI agents, allowing developers to build their own Operator- and deep research-style agentic applications.
OpenAI hopes that developers can create some applications with its agent technology that feel more autonomous than whatβs available today.
A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API.
Plus,theyβre upgraded Swarm to the (open source) Agents SDK to make it easy to build, orchestrate, and monitor systems of agents
Web search in the OpenAI APIβget fast, up-to-date answers with links to relevant web sources. Powered by the same model used for ChatGPT search.
Now with the Responses API, OpenAI wants to sell access to the components that power AI agents, allowing developers to build their own Operator- and deep research-style agentic applications.
OpenAI hopes that developers can create some applications with its agent technology that feel more autonomous than whatβs available today.
Openai
New tools for building agents
Weβre evolving our platform to help developers and enterprises build useful and reliable agents.
β€5π4π2
Google introduced Gemma 3 and ShieldGemma 2
Google DeepMind released. Gemma 3 family of open models
Available in various sizes: 1B, 4B, 12B and 27B
Key Capabilities:
1. World's best single-accelerator model: outperforms Llama-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena's leaderboard
2. Global language support: out-of-the-box support for over 35 languages and pretrained support for over 140 languages
3. Multimodal reasoning: analyze images, text, and short videos
4. Expanded context: 128k-token context window
5. Function calling: supports function calling and structured output for AI-driven workflows
6. Quantized models: official quantized versions to reduce model size and computational requirements
Alongside Gemma 3, Google is launching ShieldGemma 2 β a powerful 4B image safety checker built on the Gemma 3 foundation that provides:
- A ready-made solution for image safety
- Safety labels across three categories: dangerous content, sexually explicit content, and violence
- Customization options for specific safety needs
Paper.
HuggingFace.
Google DeepMind released. Gemma 3 family of open models
Available in various sizes: 1B, 4B, 12B and 27B
Key Capabilities:
1. World's best single-accelerator model: outperforms Llama-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena's leaderboard
2. Global language support: out-of-the-box support for over 35 languages and pretrained support for over 140 languages
3. Multimodal reasoning: analyze images, text, and short videos
4. Expanded context: 128k-token context window
5. Function calling: supports function calling and structured output for AI-driven workflows
6. Quantized models: official quantized versions to reduce model size and computational requirements
Alongside Gemma 3, Google is launching ShieldGemma 2 β a powerful 4B image safety checker built on the Gemma 3 foundation that provides:
- A ready-made solution for image safety
- Safety labels across three categories: dangerous content, sexually explicit content, and violence
- Customization options for specific safety needs
Paper.
HuggingFace.
Google
Introducing Gemma 3: The most capable model you can run on a single GPU or TPU
Today, we're introducing Gemma 3, our most capable, portable and responsible open model yet.
π₯7β€2π2
This media is not supported in your browser
VIEW IN TELEGRAM
Claude just bought the Solana team a 5th bday gift on Amazon.
Paid in solana USDC, shipped to their NY office.
Agents can now buy anything on Amazon using the goat MCP adapter + Crossmint Checkout.
Crossmintβs Headless Checkout plugin gives Claude access to Amazonβs entire product catalog.
Paid in solana USDC, shipped to their NY office.
Agents can now buy anything on Amazon using the goat MCP adapter + Crossmint Checkout.
Crossmintβs Headless Checkout plugin gives Claude access to Amazonβs entire product catalog.
HuggingFace trained a new open model Open-R1 that outperforms DeepSeek R1 on the International Olympiad in Informatics.
The 7B variant beats Claude 3.7 Sonnet. The dataset, training recipe and benchmark are all public.
The 7B variant beats Claude 3.7 Sonnet. The dataset, training recipe and benchmark are all public.
π4π2β€1π₯1
All about AI, Web 3.0, BCI
OpenAI launches new tools to help businesses build AI agents A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API. Plus,theyβre upgraded Swarm to the (open source) Agents SDK to make it easy to buildβ¦
Summary of insights from OpenAI AMA on X following the launch of Agent Tools and APIs
Responses API and Tools
1. Operator functionality (CUA model) is available starting today through the Responses API
2. Responses API is stateful by default, supports retrieving past responses, chaining them, and will soon reintroduce threads
3. Code Interpreter tool is planned as the next built-in tool in the Responses API
4. Web search can be used together with structured outputs by defining a JSON schema explicitly
5. Assistants API won't be deprecated until migration to Responses API is possible without data loss
6. "Assistants" and "agents" terms are interchangeable, both describing systems independently accomplishing user tasks
7. Curl documentation is provided in API references, with more examples coming soon
Agents SDK and Compatibility
- Agents SDK supports external API calls through custom-defined function tools
- SDK compatible with external open-source models that expose a Chat Completions-compatible API endpoint (no Responses API compatibility required)
- JavaScript and TypeScript SDKs coming soon; Java SDK may be prioritized based on demand
- Tracing functionality covers external Chat Completions-compatible models as a "generation span"
- Agents SDK supports MCP connections through custom-defined function tools
- Asynchronous operations aren't natively supported yet; interim solution is immediately returning "success" followed by updating via user message later
- Agent tools can include preset variables either hardcoded or through a context object
- Privacy can be managed using guardrails and an
- Agents SDK workflows can combine external Chat Completions-compatible models and OpenAI models, including built-in tools like CUA
- Agentic "deep research" functionality can be built using Responses API or Agents SDK
File and Vector Store Features
- File search returns citation texts via the "annotations" parameter
- Vector stores already support custom chunking and hybrid search, with further improvements planned
- Images are not yet supported in vector stores, but entire PDFs can be directly uploaded into the Responses API for small-document use cases
Computer Use Model (CUA) and Environment
- Docker environments for computer use must be managed by developers; recommended third-party cloud services are Browserbase and Scrapybara, with provided sample apps
- Predefined Ubuntu environments and company-specific setups can be created using OpenAI's CUA starter app
- Integrated VMs or fully managed cloud environments for CUA are not planned yet; developers are encouraged to use sample apps with third-party hosting providers
- CUA model primarily trained on web tasks but shows promising performance in desktop applications; still early stage
Realtime Usage Tracking
- OpenAI currently doesn't provide a built-in solution for tracking realtime usage via WebRTC ephemeral tokens; using a relay/proxy is recommended as an interim solution
OpenAI Models and Roadmap
- o1-pro will be available soon in Responses API
- o3 model development continues with API release planned; details forthcoming
Strategy and Positioning
OpenAI identifies as both a product and model company, noting ChatGPT's 400M weekly users help improve model quality, and acknowledges they won't build all needed AI products themselves
Unexpected Use Cases
Early Responses API tests revealed use cases such as art generation, live event summarization, apartment finding, and belief simulations
Responses API and Tools
1. Operator functionality (CUA model) is available starting today through the Responses API
2. Responses API is stateful by default, supports retrieving past responses, chaining them, and will soon reintroduce threads
3. Code Interpreter tool is planned as the next built-in tool in the Responses API
4. Web search can be used together with structured outputs by defining a JSON schema explicitly
5. Assistants API won't be deprecated until migration to Responses API is possible without data loss
6. "Assistants" and "agents" terms are interchangeable, both describing systems independently accomplishing user tasks
7. Curl documentation is provided in API references, with more examples coming soon
Agents SDK and Compatibility
- Agents SDK supports external API calls through custom-defined function tools
- SDK compatible with external open-source models that expose a Chat Completions-compatible API endpoint (no Responses API compatibility required)
- JavaScript and TypeScript SDKs coming soon; Java SDK may be prioritized based on demand
- Tracing functionality covers external Chat Completions-compatible models as a "generation span"
- Agents SDK supports MCP connections through custom-defined function tools
- Asynchronous operations aren't natively supported yet; interim solution is immediately returning "success" followed by updating via user message later
- Agent tools can include preset variables either hardcoded or through a context object
- Privacy can be managed using guardrails and an
input_filter to constrain context during agent handoffs- Agents SDK workflows can combine external Chat Completions-compatible models and OpenAI models, including built-in tools like CUA
- Agentic "deep research" functionality can be built using Responses API or Agents SDK
File and Vector Store Features
- File search returns citation texts via the "annotations" parameter
- Vector stores already support custom chunking and hybrid search, with further improvements planned
- Images are not yet supported in vector stores, but entire PDFs can be directly uploaded into the Responses API for small-document use cases
Computer Use Model (CUA) and Environment
- Docker environments for computer use must be managed by developers; recommended third-party cloud services are Browserbase and Scrapybara, with provided sample apps
- Predefined Ubuntu environments and company-specific setups can be created using OpenAI's CUA starter app
- Integrated VMs or fully managed cloud environments for CUA are not planned yet; developers are encouraged to use sample apps with third-party hosting providers
- CUA model primarily trained on web tasks but shows promising performance in desktop applications; still early stage
Realtime Usage Tracking
- OpenAI currently doesn't provide a built-in solution for tracking realtime usage via WebRTC ephemeral tokens; using a relay/proxy is recommended as an interim solution
OpenAI Models and Roadmap
- o1-pro will be available soon in Responses API
- o3 model development continues with API release planned; details forthcoming
Strategy and Positioning
OpenAI identifies as both a product and model company, noting ChatGPT's 400M weekly users help improve model quality, and acknowledges they won't build all needed AI products themselves
Unexpected Use Cases
Early Responses API tests revealed use cases such as art generation, live event summarization, apartment finding, and belief simulations
π3π₯2β€1
Google introduced Gemini Robotics is the most advanced VLA in the world
Gemini Robotics featured in the post, builds on Gemini 2.0, introducing advanced vision-language-action capabilities to control robots physically.
The technology enables robots to understand and react to the physical world, performing tasks like desk cleanup through voice commands, as part of a broader push toward embodied AI.
Gemini Robotics-ER, a related model, enhances spatial understanding, allowing robots to adapt to dynamic environments and interact seamlessly with humans.
Tech report.
Gemini Robotics featured in the post, builds on Gemini 2.0, introducing advanced vision-language-action capabilities to control robots physically.
The technology enables robots to understand and react to the physical world, performing tasks like desk cleanup through voice commands, as part of a broader push toward embodied AI.
Gemini Robotics-ER, a related model, enhances spatial understanding, allowing robots to adapt to dynamic environments and interact seamlessly with humans.
Tech report.
π5β€3π₯2
Hugging Face (LeRobot) & Yaak released the worlds largest open source self driving dataset
To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works.
Natural language search of multi-modal data.
Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data.
Try Nutron.
To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works.
Natural language search of multi-modal data.
Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data.
Try Nutron.
huggingface.co
LeRobot goes to driving school: Worldβs largest open-source self-driving dataset
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
β€4π₯4π1
The first quantum supremacy for a useful application D-Waveβs quantum computer performed a complex simulation in minutes and with a level of accuracy that would take nearly a million years using the DOE supercomputer built with GPUs.
In addition, it would require more than the worldβs annual electricity consumption to solve this problem using the classical supercomputer.
In addition, it would require more than the worldβs annual electricity consumption to solve this problem using the classical supercomputer.
Dwavequantum
Beyond Classical: D-Wave First to Demonstrate Quantum Supremacy on Useful, Real-World Problem
π₯5π2β€1
IOSCO_AI_1741862094.pdf
1.5 MB
IOSCO Report: AI in Capital Markets - Uses, Risks, and Regulatory Responses
This report delves into the current and potential applications of AI within financial markets, outlines the associated risks and challenges, and examines how regulators and market participants are adapting to these changes.
The report cites regulatory approaches from Hong Kong, the EU, Canada, the US, Singapore, the Netherlands, the UK, Greece, Japan, Brazil, and Australia.
Key AI Applications in Financial Markets:
Decision-Making Support:
- Robo-advising (automated investment advice)
- Algorithmic trading
- Investment research and market sentiment analysis
Specific AI Use Cases.
Nasdaq:
Developed the Dynamic M-ELO AI-driven trading order, optimizing order holding time for improved execution efficiency.
Broker-Dealers.
Customer interaction via chatbots
Algorithmic trading enhancements
Fraud and anomaly detection
Asset Managers.
Automated investment advice
Investment research
Portfolio construction and optimization
This report delves into the current and potential applications of AI within financial markets, outlines the associated risks and challenges, and examines how regulators and market participants are adapting to these changes.
The report cites regulatory approaches from Hong Kong, the EU, Canada, the US, Singapore, the Netherlands, the UK, Greece, Japan, Brazil, and Australia.
Key AI Applications in Financial Markets:
Decision-Making Support:
- Robo-advising (automated investment advice)
- Algorithmic trading
- Investment research and market sentiment analysis
Specific AI Use Cases.
Nasdaq:
Developed the Dynamic M-ELO AI-driven trading order, optimizing order holding time for improved execution efficiency.
Broker-Dealers.
Customer interaction via chatbots
Algorithmic trading enhancements
Fraud and anomaly detection
Asset Managers.
Automated investment advice
Investment research
Portfolio construction and optimization
π₯4π3π1
Cohere introduced Command A: a new AI model that can match or outperform GPT-4o and DeepSeek-V3 on business tasks, with significantly greater efficiency.
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases.
Runs on only 2 GPUs (vs. typically 32), offers 256k context length, supports 23 languages and delivers up to 156 tokens/sec.
API.
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases.
Runs on only 2 GPUs (vs. typically 32), offers 256k context length, supports 23 languages and delivers up to 156 tokens/sec.
API.
Cohere
Introducing Command A: Max performance, minimal compute | Cohere Blog
Cohere Command A is on par or better than GPT-4o and DeepSeek-V3 across agentic enterprise tasks, with significantly greater efficiency.
π₯5β€3π₯°1π1
Transformers, but without normalization layers. New paper by Meta.
π₯7π1π1
Baidu, the Google of China, dropped two models
1. ERNIE 4.5: beats GPT 4.5 for 1% of price
2. Reasoning model X1: beats DeepSeek R1 for 50% of price.
China continues to build intelligence too cheap to meter. The AI price war is on.
1. ERNIE 4.5: beats GPT 4.5 for 1% of price
2. Reasoning model X1: beats DeepSeek R1 for 50% of price.
China continues to build intelligence too cheap to meter. The AI price war is on.
π₯5π2β€1
Xiaomi's development of a SOTA audio reasoning model leverages DeepSeek's GRPO RL algorithm, achieving a 64.5% accuracy on the MMAU benchmark in just one week.
The breakthrough involves applying GRPO to the Qwen2-Audio-7B model, trained on 38,000 samples from Tsinghua University's AVQA dataset, marking a significant advancement in multimodal audio understanding.
The MMAU benchmark, introduced in 2024, tests models on complex audio tasks across speech, sound, and music, with even top models like Gemini Pro 1.5 achieving only 52.97% accuracy, highlighting the challenge Xiaomi's model addresses.
The breakthrough involves applying GRPO to the Qwen2-Audio-7B model, trained on 38,000 samples from Tsinghua University's AVQA dataset, marking a significant advancement in multimodal audio understanding.
The MMAU benchmark, introduced in 2024, tests models on complex audio tasks across speech, sound, and music, with even top models like Gemini Pro 1.5 achieving only 52.97% accuracy, highlighting the challenge Xiaomi's model addresses.
π3π₯3π2
The total value of real world tokenized assets has π by β20% over the last 30 days. Data by RWA.xyz.
Total Real-World Asset (RWA) value continues to climbβover $18B is now tokenized onchain, excluding stablecoins.
Total Real-World Asset (RWA) value continues to climbβover $18B is now tokenized onchain, excluding stablecoins.
π5β€3π2
A breakthrough in brain signal analysis that combines PCA and ANFIS to hit 99.5% accuracy in cognitive pattern recognition.
It could be a game-changer for #neuroscience, #BCI tech and clinical applications.
It could be a game-changer for #neuroscience, #BCI tech and clinical applications.
β€3π€2
Mistral announced Small 3.1: multimodal, multilingual, Apache 2.0
Lightweight: Runs on a single RTX 4090 or a Mac with 32GB RAM, perfect for on-device applications.
Fast-Response Conversations: Ideal for virtual assistants and other applications where quick, accurate responses are essential.
Low-Latency Function Calling: Capable of rapid function execution within automated or agentic workflows.
Specialized Fine-Tuning: Customizable for specific domains.
Advanced Reasoning Foundation: Inspires community innovation, with models like DeepHermes 24B by Nous Research built on Mistral Small 3.
Lightweight: Runs on a single RTX 4090 or a Mac with 32GB RAM, perfect for on-device applications.
Fast-Response Conversations: Ideal for virtual assistants and other applications where quick, accurate responses are essential.
Low-Latency Function Calling: Capable of rapid function execution within automated or agentic workflows.
Specialized Fine-Tuning: Customizable for specific domains.
Advanced Reasoning Foundation: Inspires community innovation, with models like DeepHermes 24B by Nous Research built on Mistral Small 3.
huggingface.co
mistralai/Mistral-Small-3.1-24B-Base-2503 Β· Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π₯4π2π2
ByteDance Seed, Tsinghua, and UHK dropped open-sourced a new RL algorithm for building reasoning models.
DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32B, and scores 50 on AIME 2024 with 50% fewer steps.
It is trained with zero-shot RL from the Qwen-32b pre-trained model.
Everything is fully open-sourced (algorithm, code, dataset, verifier, and model).
DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32B, and scores 50 on AIME 2024 with 50% fewer steps.
It is trained with zero-shot RL from the Qwen-32b pre-trained model.
Everything is fully open-sourced (algorithm, code, dataset, verifier, and model).
π₯7β€1π1
Cool research on open-source by Harvard
$4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created).
Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist.
$4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created).
Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist.
π₯3β€2π1