All about AI, Web 3.0, BCI
3.29K subscribers
729 photos
26 videos
161 files
3.14K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
Salesforce introduced SFR-DeepResearch (SFR-DR): RL-trained autonomous agents that can reason, search, and code their way through deep research tasks.

SFR-DR agents are trained to operate independently, without pre-defined multi-agent workflows. They autonomously plan, reason, and propose and take actions as defined by their tools.

SFR-DR-20B achieves 28.7% on Humanity's Last Exam (text-only) using only web search, browsing, and Python interpreter, surpassing DeepResearch with OpenAI o3 and Kimi Researcher.

SFR-DR agents are also trained to manage their own memory by summarizing previous results when context becomes limited. This enables a virtually unlimited context window, enabling long-horizon tasks
5🔥3🥰2
A new research paper from Thinking Machines (ex-openAI team): Why LLM Gives Different Answers to the Same Question (And How to Fix It)

Ever notice that ChatGPT gives you slightly different responses when you ask the same question multiple times? Even at temperature 0, where the model should theoretically always pick the most likely token?

Most people assume this happens because of sampling randomness or GPU parallelization quirks. The conventional wisdom goes something like this: "GPUs do parallel calculations, floating-point math isn't associative, so results vary depending on which threads finish first."

This explanation isn't wrong, but it misses the real culprit. Horace He and the team at Thinking Machines dug deeper and found something more fundamental: batch invariance.

Here's what's actually happening:
when you send a request to an LLM API, your output depends not just on your input, but on how many other people are using the service at the same time.

The server batches requests together for efficiency, and the batch size affects the numerical computations.

Even though each individual operation might be deterministic, the same input can produce different outputs depending on whether it's processed alone or with 10, 100, or 1000 other requests.
Think of it this way: you ask a question, but the answer changes based on how crowded the "room" is when you ask it.

This work challenges a common attitude in ML: "our systems are already probabilistic, so what's a little more randomness?" The researchers argue this is defeatist. With careful engineering, we can understand and eliminate these sources of nondeterminism.

They've open-sourced their implementation on top of vLLM, making it possible for others to achieve truly deterministic LLM inference today.
4🔥4🥰2
ByteDance Seed presented AgentGym-RL

• First unified RL framework for multi-turn agent training (no SFT)
• Modular, extensible design across web, search, games, embodied & science tasks
• Agents rival/surpass commercial models on 27 task.

GitHub
Paper.
Chinese researchers introduced WebExplorer, which is a simple yet effective approach to train long-horizon web agents.

Instead of depending heavily on rigid pre-defined graph structures, WebExplorer utilizes the model-based exploration strategy to synthesize high-quality agentic data.

8B model is able to outperform most 32B or even 72B models on BrowseComp and HLE.
🔥32👍2
Nvidia released La-Proteina fully open source

La-Proteina is generative model demonstrating accurate co-design of fully atomistic protein structures (sequence + side-chains + backbone) at scale, up to 800 residues, with state-of-the-art atomistic motif scaffolding performance - has just made its code open-source.

Paper.
Code.
🔥3👍2🥰2
Medra AI has automated experimentation down to the physical level with reasoning and robotics.

The Medra technology platform consists of two core components:

1. Physical AI: Their general-purpose robots use vision-language models (VLMs) to operate standard laboratory instruments flexibly and execute experimental protocols. Medra is the first company to deploy Physical AI in the laboratory, leveraging the same advanced models that power self-driving cars and humanoid robots.

2. Scientific AI: Their reasoning models analyze experimental results and integrate with partners' internal infrastructure—such as LIMS, electronic lab notebooks, and ML pipelines—to glean insights from disparate data sources.

These two systems operate in a closed loop: Physical AI executes experiments while Scientific AI analyzes the outcomes and iterates on the design. This cycle helps scientists rapidly converge on the optimal protocol.
4🥰2👏2👎1🔥1
ByteDance launched Seedream 4.0, an image generation tool that aims to compete with Google's “Nano Banana” AI image editor.
⚡️ Claude now has memory. Anthropic also introduced incognito chats for all users.

With project-scoped memory, each project maintains its own focused context.

Memory is fully optional with granular controls.

In settings, view the complete memory summary, edit what's stored, and guide Claude by telling it what to focus on or ignore.
🔥43👏3
Anthropic shared the best tips for developers how to writing effective tools for LLM agents.
🔥62🥰2
Meet Gauss the first autoformalization agent that just completed Terry Tao & Alex Kontorovich's Strong Prime Number Theorem project in 3 weeks—an effort that took human experts 18+ months of partial progress.

GitHub.
Early access.
6🔥2👏2
Google presented Speculative Cascades is a new approach for improving LLM efficiency that combines the best features of both cascades (where a small LLM precedes a larger LLM) and speculative decoding (which uses a drafter model verified by a target model).
Google shared a new work:
Virtual Agent Economies


Researchers discussed a number of possible frameworks for establishing steerable agent markets.

The rapid adoption of AI agents points to a future where AI agents may be able to produce economic value independently of human labor.

Coupled with the development of new interoperability standards like the Agent2Agent (A2A) and Model Context Protocol (MCP), this signals the inevitable emergence of a new economic layer.

The arising virtual (sandbox) AI agent economy may offer us opportunities for insulation and safeguarding, as well as establishing potentially unprecedented coordination between agents, and orchestrating their interactions towards achieving major societal or community goals, or better aligning with user preferences.

Market-based mechanisms like auctions may also be employed for fair resource allocation.

Finally, outline the technical and governance infrastructure—such as verifiable credentials for establishing trust—required to safely and robustly scale agentic AI deployments. These are necessary to address systemic market risks, and prevent exacerbating inequalities.
👍4👏3🔥2
OpenAI introduced openai grove: a program for early stage founders.

Grove builds on work with openai for startups & pioneers, and includes 5 weeks of hands-on workshops, office hours, events with our team, and early access.
🔥3👏3👍2
UAE released K2-Think open-source AI reasoning model

32 billion parameters. That's it. And this thing is matching GPT-4 level reasoning while being 20x smaller.

A 32B parameter reasoning model that matches or beats models much larger in size.

It is built on Qwen2.5 32B and trained with long chain-of-thought examples, so it learns to show its reasoning step by step.

Then reinforcement learning is added, using tasks where answers can be checked automatically, like math or code, so the model improves by being rewarded for correct results.

At test time, two tricks are used. First, a helper model writes a short plan before solving, which gives structure. Second, the system generates 3 answers and another model picks the best, which improves accuracy and keeps responses shorter.

Speed is handled with specialized hardware, the Cerebras Wafer Scale Engine, which delivers about 2,000 tokens per second. This makes even very long reasoning tasks run in seconds instead of minutes.
👏3🔥2🥰2
OpenAI released GPT-5-Codex — a version of GPT-5 further optimized for agentic coding in Codex.

Available in the Codex CLI, IDE Extension, web, mobile, and for code reviews in Github.
Anthropic co-founder Jack Clark says in the next 16 months

AI will be smarter than a Nobel prize winner and able to complete tasks that take days, weeks or months.

In short, Jack Clark says, AI will be akin to a “call center of geniuses” or a “country of geniuses.”
😁5🔥4🥰2👏2
ByteDance introduced EMPG, a framework that recalibrates the learning signal using the agent's own uncertainty.

Comparing with GRPO and DAPO, it achieves promising gains on agent benchmarks like WebShop, ALFWorld, & Deep Search.

Paper.
🔥43👏2
Google launched new protocol for agent-driven purchases

Google announced a new open protocol for purchases initiated by AI agents — automated software programs that can shop and make decisions on behalf of users. AI payments protocol supporting credit cards and stablecoins, built with Coinbase, the Ethereum Foundation and over 60 partners, per Fortune. GitHub.

Called the Agent Payments Protocol (AP2), the system is meant to be interoperable between AI platforms, payment systems and vendors, providing a traceable paper trail for each transaction.

In collaboration with cryptocurrency outfits Coinbase, Metamask and the Ethereum foundation, Google also produced an extension that would integrate the cryptocurrency-oriented x402 protocol, allowing for AI-driven purchasing from crypto wallets.
A number of other tech companies are working on their own agentic purchasing systems — most notably Perplexity, which allows for a Buy With Pro service in its agentic browser. The payment provider Stripe also produces software tools for agentic purchasing on its platform, though they are not as comprehensive as AP2.
👍32🥰2👏2
That's a lot of money for robots: Figure has exceeded $1B in funding at a $39B post-money valuation

The round was led by Parkway Venture Capital with significant investments from Brookfield Asset Management, NVIDIA, Macquarie Capital, Intel Capital, Align Ventures, Tamarack Global, LG Technology Ventures, Salesforce, T-Mobile Ventures, and Qualcomm Ventures.

A new funding will support Figure's momentum across three core areas:

1. Scaling humanoid robots into homes & commercial operations

2. Building next-generation GPU infrastructure to accelerate training & simulation

3. Launching advanced data collection efforts for Helix
5🔥4👏2
Tongyi Lab dropped half a dozen new papers, most focused on Deep Research agents.

1. Tongyi DeepResearch: Open-source DeepResearch Agent

• First OSS web agent matching OpenAI’s DeepResearch
• SOTA on HLE (32.9), BrowseComp (43.4/46.7), xbench-DeepSearch (75)
• Full-stack pipeline: Agentic CPT → SFT → RL w/ synthetic data
• Native ReAct & new Heavy Mode (IterResearch) for long-horizon tasks

2. WebResearcher: Unbounded reasoning for long-horizon agents

• IterResearch: Iterative deep-research paradigm (avoids context suffocation & noise)
• WebFrontier: Tool-augmented data engine for complex research tasks
• Parallel agents + synthesis → scalable, evidence-grounded reasoning
• Beats proprietary systems: 36.7% on HLE, 51.7% on BrowseComp

3. AgentScaler: Towards General Agentic Intelligence

• Scales environments for diverse, realistic tool-calling
• Fully simulated envs = verifiable + scalable interactions
• SOTA on τ-bench, τ²-bench, ACEBench
• AgentScaler-30B matches 1T-parameter models with far fewer params

4. AgentFounder: Scaling Agents via Continual Pre-training

• First to propose Agentic CPT → builds agentic foundation models before fine-tuning
• Solves post-training bottlenecks (capabilities + alignment conflict)
• Data synthesis: First-order (planning/actions) + Higher-order (multi-step decision)
• Two-stage training (32K → 128K context)
• SOTA: 39.9% BrowseComp-en, 72.8% GAIA

5. WebWeaver: Structuring Web-Scale Evidence for Deep Research

• Dual-agent framework (Planner + Writer)
• Dynamic outlines: search refine search (human-like loop)
• Memory-grounded, section-by-section synthesis → avoids long-context failures
• SOTA across DeepResearch Bench, DeepConsult, DeepResearchGym
• Produces reliable, well-cited, structured reports

6. ReSum: Long-Horizon Web Agents Without Context Limits

• Problem: ReAct hits context limits in long searches (32k tokens)
• Solution: ReSum periodically compresses history → compact reasoning states
• ReSumTool-30B: specialized summarizer extracts key evidence & gaps
• ReSum-GRPO (RL): trains agents to adapt summaries into reasoning
• +4.5% over ReAct baseline, +8.2% with RL across web search benchmarks.
🔥54👏3
Anthropic shipped two updates for developers using Claude

1. Claude in Xcode 26 Claude Sonnet 4 is now available as a coding assistant directly in Apple's IDE. Developers can connect their Claude account to access natural language code interaction, documentation generation, and inline editing tools. The integration shares usage limits with other Claude platforms and works with Pro, Max, and premium Team/Enterprise plans.

2. Claude Code UX Update A small but useful interface improvement: keywords like "think" and "ultrathink" now get highlighted when they would trigger extended thinking mode. Use /t to disable the mode, preventing accidental activation when these words appear in regular prompts.
🔥32👏2