All about AI, Web 3.0, BCI
3.29K subscribers
729 photos
26 videos
161 files
3.14K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
Tongyi Lab dropped half a dozen new papers, most focused on Deep Research agents.

1. Tongyi DeepResearch: Open-source DeepResearch Agent

• First OSS web agent matching OpenAI’s DeepResearch
• SOTA on HLE (32.9), BrowseComp (43.4/46.7), xbench-DeepSearch (75)
• Full-stack pipeline: Agentic CPT → SFT → RL w/ synthetic data
• Native ReAct & new Heavy Mode (IterResearch) for long-horizon tasks

2. WebResearcher: Unbounded reasoning for long-horizon agents

• IterResearch: Iterative deep-research paradigm (avoids context suffocation & noise)
• WebFrontier: Tool-augmented data engine for complex research tasks
• Parallel agents + synthesis → scalable, evidence-grounded reasoning
• Beats proprietary systems: 36.7% on HLE, 51.7% on BrowseComp

3. AgentScaler: Towards General Agentic Intelligence

• Scales environments for diverse, realistic tool-calling
• Fully simulated envs = verifiable + scalable interactions
• SOTA on τ-bench, τ²-bench, ACEBench
• AgentScaler-30B matches 1T-parameter models with far fewer params

4. AgentFounder: Scaling Agents via Continual Pre-training

• First to propose Agentic CPT → builds agentic foundation models before fine-tuning
• Solves post-training bottlenecks (capabilities + alignment conflict)
• Data synthesis: First-order (planning/actions) + Higher-order (multi-step decision)
• Two-stage training (32K → 128K context)
• SOTA: 39.9% BrowseComp-en, 72.8% GAIA

5. WebWeaver: Structuring Web-Scale Evidence for Deep Research

• Dual-agent framework (Planner + Writer)
• Dynamic outlines: search refine search (human-like loop)
• Memory-grounded, section-by-section synthesis → avoids long-context failures
• SOTA across DeepResearch Bench, DeepConsult, DeepResearchGym
• Produces reliable, well-cited, structured reports

6. ReSum: Long-Horizon Web Agents Without Context Limits

• Problem: ReAct hits context limits in long searches (32k tokens)
• Solution: ReSum periodically compresses history → compact reasoning states
• ReSumTool-30B: specialized summarizer extracts key evidence & gaps
• ReSum-GRPO (RL): trains agents to adapt summaries into reasoning
• +4.5% over ReAct baseline, +8.2% with RL across web search benchmarks.
🔥54👏3
Anthropic shipped two updates for developers using Claude

1. Claude in Xcode 26 Claude Sonnet 4 is now available as a coding assistant directly in Apple's IDE. Developers can connect their Claude account to access natural language code interaction, documentation generation, and inline editing tools. The integration shares usage limits with other Claude platforms and works with Pro, Max, and premium Team/Enterprise plans.

2. Claude Code UX Update A small but useful interface improvement: keywords like "think" and "ultrathink" now get highlighted when they would trigger extended thinking mode. Use /t to disable the mode, preventing accidental activation when these words appear in regular prompts.
🔥32👏2
New a16z benchmark: Which AI-native Office tools actually work?

First, the market splits into two camps:

- Generalists (Assistants: Manus, Genspark; Browsers: Dia, Comet; Extensions: MaxAI, Monica) - flexible but less polished.

- Specialists (Email: Fyxer, Serif; Slides: Gamma, Chronicle; Notes: Mem, Granola) - focused and refined in a single workflow.

a16z benchmarked both across office tasks: summarization, communication, file understanding, research, planning, and execution in 5 use cases.

Key Takeaways from Testing:

1. Specialists still win in PPT, email, and notes - but generalists are catching up, boosted by rapid model progress (e.g. Manus).
2. The horizontal race is heating up - even labs (Anthropic, OpenAI) are entering to own the “work UI”.
3. Convergence is inevitable - verticals are expanding categories, and horizontals are doubling down on use cases.
🔥32👏2
Vitalik Buterin presented Ethereum’s roadmap at Japan Dev Conference today:

1. short-term goals focus on scaling and increasing L1 gas limits;

2. mid-term aims target cross-L2 interoperability and faster responsiveness;

3. long-term vision emphasizes a secure, simple, quantum-resistant, and formally verified minimalist Ethereum.
🆒3👍2
Perceptron AI introduced Isaac 0.1 that can understand and interact with the physical world

Isaac 0.1 is an open-source, 2B params, open weights. Matches or beats models significantly larger on core perception.

Founded by the team behind Meta's Chameleon multimodal models.

Isaac is tuned for where the physical world needs intelligence. Capability alone isn’t enough—you need capability at cost, power, and tail-latency constraints.

In-context learning for perception: show a few annotated examples (defects, safety flags, etc.) and the model adapts in-prompt. No YOLO-style fine-tuning or custom detector stacks.
🔥32👏2
Stanford Introduced Paper2Agent - a system that turns static research papers into interactive AI agents you can chat with and use the tools/data in the paper.

The key idea is to represent a paper and its codebase as a Model Context Protocol (MCP) server, which provides the context for an AI agent to create a paper-specific agent.

GitHub.
🔥32🥰2
MIT Physicists have discovered a new form of magnetism, termed, p-wave magnetism.

This breakthrough paves the way for a new class of ultrafast, compact, energy-efficient, and nonvolatile magnetic memory devices.
4🔥4🥰3
SakanaAI presented Robust Agentic CUDA Kernel Optimization

• Fuses ops, boosts forward/backward passes, outperforms torch baselines

• Agentic LLM pipeline: PyTorch → CUDA → evolutionary runtime optimization

• Soft-verification: LLMs flag incorrect kernels (↑30% verification success)

• robust-kbench: new benchmark for real kernel performance + correctness.

Paper.
👏4🔥3🥰3
Chinese device assembler Luxshare has signed a deal with OpenAI to produce a consumer AI device, possibly the tiny robot.

OpenAI now appears to have multiple separate devices in the works, all planned to arrive next year.
🔥4👍3👏2
DeepSeek introduced DeepSeek-V3.1-Terminus. The latest update builds on V3.1’s strengths while addressing key user feedback.

What’s improved?
1. Language consistency: fewer CN/EN mix-ups & no more random chars.
2. Agent upgrades: stronger Code Agent & Search Agent performance
🔥4👏3🥰2
OpenAI & NVIDIA announced partnership to deploy 10GW of NVIDIA Systems

OpenAI to build & deploy at least 10 gigawatts of AI datacenters with NVIDIA systems representing millions of GPUs for OpenAI’s next-gen AI infrastructure.

To support the partnership, NVIDIA intends to invest up to $100 billion in OpenAI progressively as each gigawatt is deployed.

The first gigawatt of NVIDIA systems will be deployed in the second half of 2026 on NVIDIA’s Vera Rubin platform.
🔥42🥰2
Alibaba introduced Qwen3-Omni — the first natively end-to-end omni-modal AI unifying text, image, audio & video in one model — no modality trade-offs

1. SOTA on 22/36 audio & AV benchmarks
2. 119L text / 19L speech in / 10L speech out
3. 211ms latency | 30-min audio understanding
4. Fully customizable via system prompts
5. Built-in tool calling
6. Open-source Captioner model

What’s Open-Sourced?
- Qwen3-Omni-30B-A3B-Instruct, - Qwen3-Omni-30B-A3B-Thinking, - Qwen3-Omni-30B-A3B Captioner.

GitHub.
HuggingFace.
MS Models.
Demo.
🔥3🥰3😁1
Nvidia presented ReaSyn: Rethinking molecule synthesizability

• Treats synthesis like CoT with Chain-of-Reaction (CoR) steps
• Each reaction = reasoning step → richer supervision & step-by-step learning
• Adds RL finetuning + test-time scaling for better optimization
• SOTA in reconstruction, goal-directed design & hit expansion
• Broad coverage of synthesizable chemical space → practical for drug discovery
Kaggle hosting the 5-Day AI Agents Intensive course with Google on November 10 - 14.

This no-cost course is designed to help you explore the foundations and practical applications of AI agents.
🔥3🥰3👏2
New on the Claude Developer Platform: tool helpers in beta for the Python and Typescript SDKs.

Tool helpers simplify tool creation and execution with:
- Automatic input validation
- A tool runner for automated tool handling in conversations.
The changing architecture of the internet. Cloudflare handles ~20% of the internet's traffic for ~24 million websites (AI-sourced)

They just launched support for x402 and their Agents SDK with built in support for USDC

Machine-to-machine payments are arriving.
Impressive work. Meta introduced Soft Tokens, Hard Truths.

• First scalable RL method for continuous CoT

• Learns “soft” tokens (mixtures + noise) → richer reasoning paths

• Matches discrete CoTs at pass@1, beats them at pass

• Best setup: train w/ soft tokens, infer w/ hard tokens
Google introduced deep research agent that models research writing as a diffusion process: Test-Time Diffusion Deep Researcher

Instead of static reasoning or bolted-on tools, the system drafts an initial noisy report and iteratively refines it through retrieval and self-evolution, mimicking how humans plan, search, and revise.

The approach depends on three stages:

1) generate a research plan
2) iteratively search with sub-agents that generate questions and synthesize answers (RAG-style)
3) compile findings into a final report.

Multiple answer variants are created, scored by LLM judges, revised with feedback, and merged, yielding higher quality intermediate results.

Draft reports are repeatedly revised using newly retrieved evidence, progressively improving accuracy and coherence until the final report is produced.

On benchmarks like DeepConsult, Humanity’s Last Exam, and GAIA, TTD-DR beats OpenAI Deep Research by up to 74.5% win rates in long-form tasks and shows consistent gains in multi-hop reasoning.

It's scalable, too!

Ablation studies show each component adds measurable improvements, while Pareto analyses reveal better quality-latency tradeoffs than other DR agents.
4🔥4🆒3👍1🦄1
EY’s latest research explores stablecoin adoption and usage among corporates and financial institutions.

In late June 2025, EY surveyed 250 global corporates and 100 global financial institutions to assess adoption trends, interest levels, use cases, key drivers, barriers, and implementation plans.

Key themes:

1. Adoption: Only about 13% of organizations (financial institutions and corporates) in the survey currently use stablecoins; among non-users, 54% expect to adopt them in the next 6-12 months.

2. Use Cases: Primary use cases for stablecoins over the next five years include paying suppliers across borders (~77%), accepting business #payments from other countries (~49%), and domestic business payments (~37%).

3. Expected Benefits: Reduced cost (especially in cross-border payments), faster transaction speed, and improved liquidity are cited as primary drivers for organizations looking to adopt stablecoins.

4. Users’ feedback: Among users of stablecoins, 41% report cost savings of at least 10%, particularly when using #USD-denominated stablecoins for B2B cross-border transactions.

5. Regulatory Framework: The #GENIUSAct, passed in July 2025, provides important regulatory clarity for USD-denominated #stablecoins — including reserve requirements, issuer oversight, approval process, & guidance on tax and #custody.

6. Trends: Organizations expect stablecoins to handle 5-10% of cross-border payments by 2030, translating to an estimated US$2.1–4.2 trillion in transaction value.

7. Corporates: A majority of corporates would only consider stablecoins if a sufficient portion of their vendors accept them; only 8% currently accept stablecoin payments. Those corporates that use stablecoins convert stablecoins to fiat immediately after transactions, driven by concerns over regulatory clarity (79%) and a lack of confidence in liquidity (56%).

8. Readiness: Organisational readiness varies: 41% of corporates believe integrating stablecoins will require moderate effort, while 36% expect major systems changes. Infrastructure readiness & vendor / partner ecosystem are seen as critical enablers; many organizations plan to lean on FinTechs or third-party providers.

9. Financial institutions favoured hybrid adoption models: about 53% plan to build stablecoin capabilities both in-house & via third-party partnerships, while only 5% aim to do everything internally.

10. Integration: Corporates and FIs are placing priority on integrations: ~56% want embedded APIs within treasury platforms; ~70% prefer stablecoin tools that integrate with existing enterprise resource planning (ERP) systems.

11. Competition: Most institutions believe stablecoins will offer competitive advantages in their markets; ~87% of organizations conducting or planning ROI analyses believe stablecoins can provide an edge.
👍4🔥3🥰2