All about AI, Web 3.0, BCI – Telegram

All about AI, Web 3.0, BCI

3.29K subscribers

729 photos

26 videos

161 files

3.14K links

This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan

Download Telegram

About

Blog

Apps

Platform

All about AI, Web 3.0, BCI

3.29K subscribers

All about AI, Web 3.0, BCI

Salesforce introduced SFR-DeepResearch (SFR-DR): RL-trained autonomous agents that can reason, search, and code their way through deep research tasks.

SFR-DR agents are trained to operate independently, without pre-defined multi-agent workflows. They autonomously plan, reason, and propose and take actions as defined by their tools.

SFR-DR-20B achieves 28.7% on Humanity's Last Exam (text-only) using only web search, browsing, and Python interpreter, surpassing DeepResearch with OpenAI o3 and Kimi Researcher.

SFR-DR agents are also trained to manage their own memory by summarizing previous results when context becomes limited. This enables a virtually unlimited context window, enabling long-horizon tasks

SFR-DeepResearch: Towards Effective Reinforcement Learning for...

Equipping large language models (LLMs) with complex, interleaved reasoning and tool-use capabilities has become a key focus in agentic AI research, especially with recent advances in...

❤5🔥3🥰2

575 views12:35

All about AI, Web 3.0, BCI

A new research paper from Thinking Machines (ex-openAI team): Why LLM Gives Different Answers to the Same Question (And How to Fix It)

Ever notice that ChatGPT gives you slightly different responses when you ask the same question multiple times? Even at temperature 0, where the model should theoretically always pick the most likely token?

Most people assume this happens because of sampling randomness or GPU parallelization quirks. The conventional wisdom goes something like this: "GPUs do parallel calculations, floating-point math isn't associative, so results vary depending on which threads finish first."

This explanation isn't wrong, but it misses the real culprit. Horace He and the team at Thinking Machines dug deeper and found something more fundamental: batch invariance.

Here's what's actually happening: when you send a request to an LLM API, your output depends not just on your input, but on how many other people are using the service at the same time.

The server batches requests together for efficiency, and the batch size affects the numerical computations.

Even though each individual operation might be deterministic, the same input can produce different outputs depending on whether it's processed alone or with 10, 100, or 1000 other requests.
Think of it this way: you ask a question, but the answer changes based on how crowded the "room" is when you ask it.

This work challenges a common attitude in ML: "our systems are already probabilistic, so what's a little more randomness?" The researchers argue this is defeatist. With careful engineering, we can understand and eliminate these sources of nondeterminism.

They've open-sourced their implementation on top of vLLM, making it possible for others to achieve truly deterministic LLM inference today.

Thinking Machines Lab

Defeating Nondeterminism in LLM Inference

Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language models.
For example, you might observe that asking ChatGPT the same question multiple times provides different results.…

❤4🔥4🥰2

732 views18:11

All about AI, Web 3.0, BCI

ByteDance Seed presented AgentGym-RL

• First unified RL framework for multi-turn agent training (no SFT)
• Modular, extensible design across web, search, games, embodied & science tasks
• Agents rival/surpass commercial models on 27 task.

GitHub
Paper.

605 views07:05

All about AI, Web 3.0, BCI

Chinese researchers introduced WebExplorer, which is a simple yet effective approach to train long-horizon web agents.

Instead of depending heavily on rigid pre-defined graph structures, WebExplorer utilizes the model-based exploration strategy to synthesize high-quality agentic data.

8B model is able to outperform most 32B or even 72B models on BrowseComp and HLE.

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

The paradigm of Large Language Models (LLMs) has increasingly shifted toward agentic applications, where web browsing capabilities are fundamental for retrieving information from diverse online...

🔥3❤2👍2

558 views13:28

All about AI, Web 3.0, BCI

Nvidia released La-Proteina fully open source

La-Proteina is generative model demonstrating accurate co-design of fully atomistic protein structures (sequence + side-chains + backbone) at scale, up to 800 residues, with state-of-the-art atomistic motif scaffolding performance - has just made its code open-source.

Paper.
Code.

La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching

La-Proteina is a novel partially-latent fully atomistic protein design model. Protein backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables. La-Proteina achieves state-of-the-art performance…

🔥3👍2🥰2

583 views14:38

All about AI, Web 3.0, BCI

Medra AI has automated experimentation down to the physical level with reasoning and robotics.

The Medra technology platform consists of two core components:

1. Physical AI: Their general-purpose robots use vision-language models (VLMs) to operate standard laboratory instruments flexibly and execute experimental protocols. Medra is the first company to deploy Physical AI in the laboratory, leveraging the same advanced models that power self-driving cars and humanoid robots.

2. Scientific AI: Their reasoning models analyze experimental results and integrate with partners' internal infrastructure—such as LIMS, electronic lab notebooks, and ML pipelines—to glean insights from disparate data sources.

These two systems operate in a closed loop: Physical AI executes experiments while Scientific AI analyzes the outcomes and iterates on the design. This cycle helps scientists rapidly converge on the optimal protocol.

Physical AI in the Lab: Unlocking Data for Scientific Breakthroughs

Medra is building new AI technology to empower scientists in the lab.

❤4🥰2👏2👎1🔥1

657 views16:33

All about AI, Web 3.0, BCI

ByteDance launched Seedream 4.0, an image generation tool that aims to compete with Google's “Nano Banana” AI image editor.

634 views18:44

All about AI, Web 3.0, BCI

⚡️ Claude now has memory. Anthropic also introduced incognito chats for all users.

With project-scoped memory, each project maintains its own focused context.

Memory is fully optional with granular controls.

In settings, view the complete memory summary, edit what's stored, and guide Claude by telling it what to focus on or ignore.

Bringing memory to teams | Claude

Today, we’re introducing memory to the Claude app, where Claude remembers you and your team’s projects and preferences, eliminating the need to re-explain context and keeping complex work moving forward.

🔥4❤3👏3

764 views19:39

All about AI, Web 3.0, BCI

Anthropic shared the best tips for developers how to writing effective tools for LLM agents.

Writing effective tools for AI agents—using AI agents

🔥6❤2🥰2

732 views08:36

All about AI, Web 3.0, BCI

Meet Gauss the first autoformalization agent that just completed Terry Tao & Alex Kontorovich's Strong Prime Number Theorem project in 3 weeks—an effort that took human experts 18+ months of partial progress.

GitHub.
Early access.

GitHub - math-inc/strongpnt

Contribute to math-inc/strongpnt development by creating an account on GitHub.

❤6🔥2👏2

777 views11:00

All about AI, Web 3.0, BCI

Google presented Speculative Cascades is a new approach for improving LLM efficiency that combines the best features of both cascades (where a small LLM precedes a larger LLM) and speculative decoding (which uses a drafter model verified by a target model).

research.google

Speculative cascades — A hybrid approach for smarter, faster LLM inference

689 views13:14

All about AI, Web 3.0, BCI

Google shared a new work:
Virtual Agent Economies

Researchers discussed a number of possible frameworks for establishing steerable agent markets.

The rapid adoption of AI agents points to a future where AI agents may be able to produce economic value independently of human labor.

Coupled with the development of new interoperability standards like the Agent2Agent (A2A) and Model Context Protocol (MCP), this signals the inevitable emergence of a new economic layer.

The arising virtual (sandbox) AI agent economy may offer us opportunities for insulation and safeguarding, as well as establishing potentially unprecedented coordination between agents, and orchestrating their interactions towards achieving major societal or community goals, or better aligning with user preferences.

Market-based mechanisms like auctions may also be employed for fair resource allocation.

Finally, outline the technical and governance infrastructure—such as verifiable credentials for establishing trust—required to safely and robustly scale agentic AI deployments. These are necessary to address systemic market risks, and prevent exacerbating inequalities.

Virtual Agent Economies

The rapid adoption of autonomous AI agents is giving rise to a new economic layer where agents transact and coordinate at scales and speeds beyond direct human oversight. We propose the "sandbox...

👍4👏3🔥2

750 views09:00

All about AI, Web 3.0, BCI

OpenAI introduced openai grove: a program for early stage founders.

Grove builds on work with openai for startups & pioneers, and includes 5 weeks of hands-on workshops, office hours, events with our team, and early access.

Apply to OpenAI Grove

A program for individuals early in their company building journey.

🔥3👏3👍2

627 views10:55

All about AI, Web 3.0, BCI

UAE released K2-Think open-source AI reasoning model

32 billion parameters. That's it. And this thing is matching GPT-4 level reasoning while being 20x smaller.

A 32B parameter reasoning model that matches or beats models much larger in size.

It is built on Qwen2.5 32B and trained with long chain-of-thought examples, so it learns to show its reasoning step by step.

Then reinforcement learning is added, using tasks where answers can be checked automatically, like math or code, so the model improves by being rewarded for correct results.

At test time, two tricks are used. First, a helper model writes a short plan before solving, which gives structure. Second, the system generates 3 answers and another model picks the best, which improves accuracy and keeps responses shorter.

Speed is handled with specialized hardware, the Cerebras Wafer Scale Engine, which delivers about 2,000 tokens per second. This makes even very long reasoning tasks run in seconds instead of minutes.

👏3🔥2🥰2

657 views14:58

All about AI, Web 3.0, BCI

OpenAI released GPT-5-Codex — a version of GPT-5 further optimized for agentic coding in Codex.

Available in the Codex CLI, IDE Extension, web, mobile, and for code reviews in Github.

Introducing upgrades to Codex

Codex just got faster, more reliable, and better at real-time collaboration and tackling tasks independently anywhere you develop—whether via the terminal, IDE, web, or even your phone.

622 views17:25

All about AI, Web 3.0, BCI

Anthropic co-founder Jack Clark says in the next 16 months

AI will be smarter than a Nobel prize winner and able to complete tasks that take days, weeks or months.

In short, Jack Clark says, AI will be akin to a “call center of geniuses” or a “country of geniuses.”

😁5🔥4🥰2👏2

613 views08:40

All about AI, Web 3.0, BCI

ByteDance introduced EMPG, a framework that recalibrates the learning signal using the agent's own uncertainty.

Comparing with GRPO and DAPO, it achieves promising gains on agent benchmarks like WebShop, ALFWorld, & Deep Search.

Paper.

🔥4❤3👏2

584 views12:00

All about AI, Web 3.0, BCI

Google launched new protocol for agent-driven purchases

Google announced a new open protocol for purchases initiated by AI agents — automated software programs that can shop and make decisions on behalf of users. AI payments protocol supporting credit cards and stablecoins, built with Coinbase, the Ethereum Foundation and over 60 partners, per Fortune. GitHub.

Called the Agent Payments Protocol (AP2), the system is meant to be interoperable between AI platforms, payment systems and vendors, providing a traceable paper trail for each transaction.

In collaboration with cryptocurrency outfits Coinbase, Metamask and the Ethereum foundation, Google also produced an extension that would integrate the cryptocurrency-oriented x402 protocol , allowing for AI-driven purchasing from crypto wallets.
A number of other tech companies are working on their own agentic purchasing systems — most notably Perplexity, which allows for a Buy With Pro service in its agentic browser. The payment provider Stripe also produces software tools for agentic purchasing on its platform , though they are not as comprehensive as AP2.

Google launches new protocol for agent-driven purchases | TechCrunch

Called the Agent Payments Protocol (AP2), the system is meant to be interoperable between AI platforms, payment systems, and vendors.

👍3❤2🥰2👏2

617 viewsedited 13:17

All about AI, Web 3.0, BCI

That's a lot of money for robots: Figure has exceeded $1B in funding at a $39B post-money valuation

The round was led by Parkway Venture Capital with significant investments from Brookfield Asset Management, NVIDIA, Macquarie Capital, Intel Capital, Align Ventures, Tamarack Global, LG Technology Ventures, Salesforce, T-Mobile Ventures, and Qualcomm Ventures.

A new funding will support Figure's momentum across three core areas:

1. Scaling humanoid robots into homes & commercial operations

2. Building next-generation GPU infrastructure to accelerate training & simulation

3. Launching advanced data collection efforts for Helix

Figure Exceeds $1B in Series C Funding at $39B Post-Money Valuation

❤5🔥4👏2

531 views15:24

All about AI, Web 3.0, BCI

Tongyi Lab dropped half a dozen new papers, most focused on Deep Research agents.

1. Tongyi DeepResearch: Open-source DeepResearch Agent

• First OSS web agent matching OpenAI’s DeepResearch
• SOTA on HLE (32.9), BrowseComp (43.4/46.7), xbench-DeepSearch (75)
• Full-stack pipeline: Agentic CPT → SFT → RL w/ synthetic data
• Native ReAct & new Heavy Mode (IterResearch) for long-horizon tasks

2. WebResearcher: Unbounded reasoning for long-horizon agents

• IterResearch: Iterative deep-research paradigm (avoids context suffocation & noise)
• WebFrontier: Tool-augmented data engine for complex research tasks
• Parallel agents + synthesis → scalable, evidence-grounded reasoning
• Beats proprietary systems: 36.7% on HLE, 51.7% on BrowseComp

3. AgentScaler: Towards General Agentic Intelligence

• Scales environments for diverse, realistic tool-calling
• Fully simulated envs = verifiable + scalable interactions
• SOTA on τ-bench, τ²-bench, ACEBench
• AgentScaler-30B matches 1T-parameter models with far fewer params

4. AgentFounder: Scaling Agents via Continual Pre-training

• First to propose Agentic CPT → builds agentic foundation models before fine-tuning
• Solves post-training bottlenecks (capabilities + alignment conflict)
• Data synthesis: First-order (planning/actions) + Higher-order (multi-step decision)
• Two-stage training (32K → 128K context)
• SOTA: 39.9% BrowseComp-en, 72.8% GAIA

5. WebWeaver: Structuring Web-Scale Evidence for Deep Research

• Dual-agent framework (Planner + Writer)
• Dynamic outlines: search ↔ refine ↔ search (human-like loop)
• Memory-grounded, section-by-section synthesis → avoids long-context failures
• SOTA across DeepResearch Bench, DeepConsult, DeepResearchGym
• Produces reliable, well-cited, structured reports

6. ReSum: Long-Horizon Web Agents Without Context Limits

• Problem: ReAct hits context limits in long searches (32k tokens)
• Solution: ReSum periodically compresses history → compact reasoning states
• ReSumTool-30B: specialized summarizer extracts key evidence & gaps
• ReSum-GRPO (RL): trains agents to adapt summaries into reasoning
• +4.5% over ReAct baseline, +8.2% with RL across web search benchmarks.

🔥5❤4👏3

587 views07:21

All about AI, Web 3.0, BCI

Anthropic shipped two updates for developers using Claude

1. Claude in Xcode 26 Claude Sonnet 4 is now available as a coding assistant directly in Apple's IDE. Developers can connect their Claude account to access natural language code interaction, documentation generation, and inline editing tools. The integration shares usage limits with other Claude platforms and works with Pro, Max, and premium Team/Enterprise plans.

2. Claude Code UX Update A small but useful interface improvement: keywords like "think" and "ultrathink" now get highlighted when they would trigger extended thinking mode. Use /t to disable the mode, preventing accidental activation when these words appear in regular prompts.

Claude is now generally available in Xcode

Connect your Claude account to Xcode 26 for AI-powered coding assistance. Debug, refactor, and build Apple apps faster with Claude Sonnet 4 by Anthropic.

🔥3❤2👏2

574 views10:08