All about AI, Web 3.0, BCI
3.24K subscribers
727 photos
26 videos
161 files
3.1K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
LeanAgent: the first lifelong learning agent for formal theorem proving in Lean.

LLMs have been integrated with interactive proof assistants like Lean for theorem proving with 100% accuracy.

So far, these LLMs cannot continuously generalize to new knowledge and struggle with advanced mathematics.

LeanAgent continuously learns and improves on ever-expanding mathematical knowledge without forgetting what it learned before. It has a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity.

LeanAgent successfully proves 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It proves challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics.

LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. 

Code.
πŸ†’4❀3πŸ₯°1πŸ‘1
OpenAI launches new tools to help businesses build AI agents

A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API.

Plus,they’re upgraded Swarm to the (open source) Agents SDK to make it easy to build, orchestrate, and monitor systems of agents

Web search in the OpenAI APIβ€”get fast, up-to-date answers with links to relevant web sources. Powered by the same model used for ChatGPT search.

Now with the Responses API, OpenAI wants to sell access to the components that power AI agents, allowing developers to build their own Operator- and deep research-style agentic applications.

OpenAI hopes that developers can create some applications with its agent technology that feel more autonomous than what’s available today.
❀5πŸ‘4πŸ‘2
Google introduced Gemma 3 and ShieldGemma 2

Google DeepMind released. Gemma 3 family of open models

Available in various sizes: 1B, 4B, 12B and 27B

Key Capabilities:

1. World's best single-accelerator model
: outperforms Llama-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena's leaderboard

2. Global language support: out-of-the-box support for over 35 languages and pretrained support for over 140 languages

3. Multimodal reasoning: analyze images, text, and short videos

4. Expanded context: 128k-token context window

5. Function calling: supports function calling and structured output for AI-driven workflows

6. Quantized models: official quantized versions to reduce model size and computational requirements

Alongside Gemma 3, Google is launching ShieldGemma 2 β€” a powerful 4B image safety checker built on the Gemma 3 foundation that provides:

- A ready-made solution for image safety
- Safety labels across three categories: dangerous content, sexually explicit content, and violence
- Customization options for specific safety needs

Paper.
HuggingFace.
πŸ”₯7❀2πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
Claude just bought the Solana team a 5th bday gift on Amazon.

Paid in solana USDC, shipped to their NY office.

Agents can now buy anything on Amazon using the goat MCP adapter + Crossmint Checkout.

Crossmint’s Headless Checkout plugin gives Claude access to Amazon’s entire product catalog.
HuggingFace trained a new open model Open-R1 that outperforms DeepSeek R1 on the International Olympiad in Informatics.

The 7B variant beats Claude 3.7 Sonnet. The dataset, training recipe and benchmark are all public.
πŸ†’4πŸ‘2❀1πŸ”₯1
All about AI, Web 3.0, BCI
OpenAI launches new tools to help businesses build AI agents A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API. Plus,they’re upgraded Swarm to the (open source) Agents SDK to make it easy to build…
Summary of insights from OpenAI AMA on X following the launch of Agent Tools and APIs

Responses API and Tools

1. Operator functionality (CUA model) is available starting today through the Responses API

2. Responses API is stateful by default, supports retrieving past responses, chaining them, and will soon reintroduce threads

3. Code Interpreter tool is planned as the next built-in tool in the Responses API

4. Web search can be used together with structured outputs by defining a JSON schema explicitly

5. Assistants API won't be deprecated until migration to Responses API is possible without data loss

6. "Assistants" and "agents" terms are interchangeable, both describing systems independently accomplishing user tasks

7. Curl documentation is provided in API references, with more examples coming soon

Agents SDK and Compatibility

- Agents SDK supports external API calls through custom-defined function tools
- SDK compatible with external open-source models that expose a Chat Completions-compatible API endpoint (no Responses API compatibility required)
- JavaScript and TypeScript SDKs coming soon; Java SDK may be prioritized based on demand
- Tracing functionality covers external Chat Completions-compatible models as a "generation span"
- Agents SDK supports MCP connections through custom-defined function tools
- Asynchronous operations aren't natively supported yet; interim solution is immediately returning "success" followed by updating via user message later
- Agent tools can include preset variables either hardcoded or through a context object
- Privacy can be managed using guardrails and an input_filter to constrain context during agent handoffs
- Agents SDK workflows can combine external Chat Completions-compatible models and OpenAI models, including built-in tools like CUA
- Agentic "deep research" functionality can be built using Responses API or Agents SDK

File and Vector Store Features

- File search returns citation texts via the "annotations" parameter
- Vector stores already support custom chunking and hybrid search, with further improvements planned
- Images are not yet supported in vector stores, but entire PDFs can be directly uploaded into the Responses API for small-document use cases

Computer Use Model (CUA) and Environment

- Docker environments for computer use must be managed by developers; recommended third-party cloud services are Browserbase and Scrapybara, with provided sample apps
- Predefined Ubuntu environments and company-specific setups can be created using OpenAI's CUA starter app
- Integrated VMs or fully managed cloud environments for CUA are not planned yet; developers are encouraged to use sample apps with third-party hosting providers
- CUA model primarily trained on web tasks but shows promising performance in desktop applications; still early stage

Realtime Usage Tracking

- OpenAI currently doesn't provide a built-in solution for tracking realtime usage via WebRTC ephemeral tokens; using a relay/proxy is recommended as an interim solution

OpenAI Models and Roadmap

- o1-pro will be available soon in Responses API
- o3 model development continues with API release planned; details forthcoming

Strategy and Positioning

OpenAI identifies as both a product and model company, noting ChatGPT's 400M weekly users help improve model quality, and acknowledges they won't build all needed AI products themselves

Unexpected Use Cases

Early Responses API tests revealed use cases such as art generation, live event summarization, apartment finding, and belief simulations
πŸ‘3πŸ”₯2❀1
Google introduced Gemini Robotics is the most advanced VLA in the world

Gemini Robotics featured in the post, builds on Gemini 2.0, introducing advanced vision-language-action capabilities to control robots physically.

The technology enables robots to understand and react to the physical world, performing tasks like desk cleanup through voice commands, as part of a broader push toward embodied AI.

Gemini Robotics-ER, a related model, enhances spatial understanding, allowing robots to adapt to dynamic environments and interact seamlessly with humans.

Tech report.
πŸ‘5❀3πŸ”₯2
Hugging Face (LeRobot) & Yaak released the worlds largest open source self driving dataset

To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works.

Natural language search of multi-modal data.
Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data.

Try Nutron.
❀4πŸ”₯4πŸ‘1
The first quantum supremacy for a useful application D-Wave’s quantum computer performed a complex simulation in minutes and with a level of accuracy that would take nearly a million years using the DOE supercomputer built with GPUs.

In addition, it would require more than the world’s annual electricity consumption to solve this problem using the classical supercomputer.
πŸ”₯5πŸ‘2❀1
IOSCO_AI_1741862094.pdf
1.5 MB
IOSCO Report: AI in Capital Markets - Uses, Risks, and Regulatory Responses

This report delves into the current and potential applications of AI within financial markets, outlines the associated risks and challenges, and examines how regulators and market participants are adapting to these changes.

The report cites regulatory approaches from Hong Kong, the EU, Canada, the US, Singapore, the Netherlands, the UK, Greece, Japan, Brazil, and Australia.

Key AI Applications in Financial Markets:

Decision-Making Support:
- Robo-advising (automated investment advice)
- Algorithmic trading
- Investment research and market sentiment analysis

Specific AI Use Cases.
Nasdaq:
Developed the Dynamic M-ELO AI-driven trading order, optimizing order holding time for improved execution efficiency.

Broker-Dealers.
Customer interaction via chatbots
Algorithmic trading enhancements
Fraud and anomaly detection

Asset Managers.
Automated investment advice
Investment research
Portfolio construction and optimization
πŸ”₯4πŸ‘3πŸ‘1
Cohere introduced Command A: a new AI model that can match or outperform GPT-4o and DeepSeek-V3 on business tasks, with significantly greater efficiency.

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases.

Runs on only 2 GPUs (vs. typically 32), offers 256k context length, supports 23 languages and delivers up to 156 tokens/sec.

API.
πŸ”₯5❀3πŸ₯°1πŸ‘Œ1
Transformers, but without normalization layers. New paper by Meta.
πŸ”₯7πŸ‘1πŸ‘1
Baidu, the Google of China, dropped two models

1. ERNIE 4.5: beats GPT 4.5 for 1% of price

2. Reasoning model X1: beats DeepSeek R1 for 50% of price.

China continues to build intelligence too cheap to meter. The AI price war is on.
πŸ”₯5πŸ‘2❀1
Microsoft has released this useful tool for performing R&D with LLM-based agents.
❀4πŸ”₯2πŸ‘1
Xiaomi's development of a SOTA audio reasoning model leverages DeepSeek's GRPO RL algorithm, achieving a 64.5% accuracy on the MMAU benchmark in just one week.

The breakthrough involves applying GRPO to the Qwen2-Audio-7B model, trained on 38,000 samples from Tsinghua University's AVQA dataset, marking a significant advancement in multimodal audio understanding.

The MMAU benchmark, introduced in 2024, tests models on complex audio tasks across speech, sound, and music, with even top models like Gemini Pro 1.5 achieving only 52.97% accuracy, highlighting the challenge Xiaomi's model addresses.
πŸ‘3πŸ”₯3πŸ‘2
The total value of real world tokenized assets has πŸ“ˆ by β‰ˆ20% over the last 30 days. Data by RWA.xyz.

Total Real-World Asset (RWA) value continues to climbβ€”over $18B is now tokenized onchain, excluding stablecoins.
πŸ‘5❀3πŸ‘2
A breakthrough in brain signal analysis that combines PCA and ANFIS to hit 99.5% accuracy in cognitive pattern recognition.

It could be a game-changer for #neuroscience, #BCI tech and clinical applications.
❀3πŸ€”2
Mistral announced Small 3.1: multimodal, multilingual, Apache 2.0

Lightweight: Runs on a single RTX 4090 or a Mac with 32GB RAM, perfect for on-device applications.

Fast-Response Conversations: Ideal for virtual assistants and other applications where quick, accurate responses are essential.

Low-Latency Function Calling: Capable of rapid function execution within automated or agentic workflows.

Specialized Fine-Tuning: Customizable for specific domains.

Advanced Reasoning Foundation: Inspires community innovation, with models like DeepHermes 24B by Nous Research built on Mistral Small 3.
πŸ”₯4πŸ‘2πŸ‘2
ByteDance Seed, Tsinghua, and UHK dropped open-sourced a new RL algorithm for building reasoning models.

DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32B, and scores 50 on AIME 2024 with 50% fewer steps.

It is trained with zero-shot RL from the Qwen-32b pre-trained model.

Everything is fully open-sourced (algorithm, code, dataset, verifier, and model).
πŸ”₯7❀1πŸ‘1
Cool research on open-source by Harvard

$4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created).

Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist.
πŸ”₯3❀2πŸ‘1