All about AI, Web 3.0, BCI
3.37K subscribers
735 photos
26 videos
161 files
3.18K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
OpenRouter collaborated with a16z to publish the State of AI - an empirical report on how LLMs have been used on OpenRouter.

After analyzing more than 100 trillion tokens across hundreds of models and 3+ million users (excluding 3rd party) from the last year.

A lot of insights:

1. One finding: OpenRouter observe a Cinderella "Glass Slipper" effect for new models.

Early users a new LLM either churn quickly or become part of a foundational cohort, with much higher retention than others. They are early adopters who can "lead" the rest of the market.

2. Open vs Closed Weights:

By late 2025, open-weight models (abbreviated as OSS below) reached ~⅓ of usage, sustained beyond launch spikes, but have plateaued in Q4.

3. Chinese models: grew from ~1% to around 30% in some weeks. Release velocity + quality make the market lively.

If you want a single picture of the modern stack:
- Closed models = high-value workloads
- Open models = high-volume workloads

And what we have seen is that a lot of teams use both.

OSS isn't "just for tinkering" - it is extremely popular in two areas:
• Roleplay / creative dialogue: >50% of OSS usage
• Programming assistance: ~15-20%.

4. Now the significant platform shift: agentic inference

Tracked it via:
- reasoning model adoption
- tool calling
- prompt/completion “shape” (sequence lengths).

5. Reasoning models go from “negligible” to more than 50% of tokens in 2025. Full paradigm shift.

6. Languages: English dominates with more than 80% of tokens, but the tail is real - Chinese, Russian, Spanish, etc.

7. Economics: price matters, but less than you think.On cost vs usage map, the trendline is nearly flat: reducing cost by 10% only correlates with ~0.5-0.7% more usage.
5🔥5🥰4
Meta published a new paper on what is the path to safer superintelligence: co-improvement.

Everyone is focused on self-improving AI, but:

1) we don't know how to do it yet, and
2) it might be misaligned with humans.

Co-improvement: instead, build AI that collaborates with us to solve AI faster, and to help fix the alignment problem together.
🔥5🥰3👏3
Nvidia introduced CUDA 13.1. It is the biggest expansion of CUDA since it launched in 2006.

CUDA Tile, a new way to program GPUs that makes powerful AI and accelerated computing easier for more developers to use.
6🔥2🥰2
All about AI, Web 3.0, BCI
Essential AI just dropped Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases Researchers labeled 23.6B documents from Common Crawl with a 12-category…
Essential AI introduced their first open models, Rnj-1 base and instruct 8B parameter models.

Rnj-1 is the culmination of 10 months of hard work by a phenomenal team, dedicated to advancing American SOTA OSS AI.

Lots of wins with Rnj-1.

1. SWE bench performance close to GPT 4o.
2. Tool use outperforming all comparable open source models.
3. Mathematical reasoning (AIME’25) nearly at par with GPT OSS MoE 20B.
A transformer's attention could be 99% sparser without losing its smarts.

A new
research from MPI-IS, Oxford, and ETH Zürich shows it can.

A simple post-training method strips away redundant connections, revealing a cleaner, more interpretable circuit.

This suggests much of the computation we rely on is just noise.
🔥32🥰2
the-state-of-enterprise-ai_2025-report.pdf
9.7 MB
OpenAI released their first State of Enterprise AI

OpenAI's data from 1M+ business customers reveals explosive growth—and a widening divide between leaders and laggards.

ChatGPT Enterprise seats: 9x growth year-over-year, serving 7M workplace seats. Message volume: 8x increase. API reasoning token consumption per organization: 320x growth. Custom GPTs/Projects usage: 19x increase year-to-date, now processing 20% of all Enterprise messages.
Over 9,000 organizations processed 10+ billion tokens; nearly 200 exceeded 1 trillion. BBVA regularly uses 4,000+ GPTs—AI has become core infrastructure, not an experimental tool.

Measurable Impact:
Productivity:
75% of workers report improved speed or quality. Average time savings: 40-60 minutes per active day. Workers in data science, engineering, and communications save 60-80 minutes.
Business outcomes:
Intercom's Fin Voice: 53% of calls resolved end-to-end, 40% faster resolution when human agents needed
Lowe's Mylow: 2x online conversion rate, +200 basis points customer satisfaction
Indeed: 20% more applications, 13% higher downstream success (interviews/hires)
BBVA: 9,000+ queries automated annually, equivalent of 3 FTEs redeployed
Task expansion: 75% of workers complete tasks they previously couldn't. Coding messages outside engineering/IT/research grew 36% in six months. AI is redistributing technical capabilities across organizations.
Industry & Geography
Median sector growth: 6x YoY. Technology leads at 11x, healthcare 8x, manufacturing 7x.
International surge: Australia (187%), Brazil (161%), Netherlands (153%), France (146%) lead business customer growth. International API customers: 70% growth in six months. Japan has the most corporate API customers outside the U.S.

The Widening Gap:
Frontier workers
(95th percentile) send 6x more messages than median. Among data analysts, frontier users leverage analysis tools 16x more. The gap is widest in coding (17x), writing (11x), and analysis (10x).
Frontier firms generate 2x more messages per seat and 7x more messages to GPTs than median enterprises.

The underutilization problem: Among monthly active users, 19% never used data analysis, 14% never used reasoning, 12% never used search. Users engaging with ~7 task types save 5x more time than those using ~4 types.

What Leaders Do Differently?

Enable deep system integration with secure data access
Standardize workflows through Custom GPTs and shared solutions
Secure executive sponsorship with clear mandates
Codify institutional knowledge into machine-readable formats
Combine centralized governance with distributed enablement
Critical barrier: ~25% of enterprises still haven't enabled data connectors—while leaders make this their first step.
Qwen introduced Soft Adaptive Policy Optimization (SAPO) — a smooth, stable, and highly effective RL method for training LLM

SAPO replaces hard boundaries with a continuous, temperature‑controlled gate that:

•Smooth trust‑region behavior → no abrupt gradient drop
• Sequence-level coherence → align sequence‑level behavior
• Token-level adaptivity → preserves useful gradients & boosts sample efficiency
• Asymmetric temperatures → significantly improved stability, esp. in MoE models

What does this mean in practice?
1. Longer stable RL runs
2. Higher Pass@1
3. Stronger performance on Qwen3‑VL across math, coding & multimodal tasks

SAPO offers a more scalable and reliable foundation for RL-tuning large language & multimodal models.

Paper.
🔥3👍2🥰2
This research introduces VisPlay, a self-evolving framework where a single vision-language model splits into a "Questioner" and a "Reasoner" to generate its own training data.

It autonomously improves reasoning and reduces hallucinations across major benchmarks, pointing toward scalable, self-improving AI.

GitHub.
🔥4👍32
SOTA open-source vibe coding from your home. Mistral Introduced the Devstral 2 coding model family

Two sizes, both open source.

Also, meet Mistral Vibe, a native CLI, enabling end-to-end automation.

Mistral Vibe CLI is an open-source command-line coding assistant powered by Devstral.

It explores, modifies, and executes changes across your codebase using natural language. Also under Apache 2.0.
Install via: uv tool install mistral-vibe
👏3🔥2🥰2
Meta released Ax 1.0: an open-source platform for adaptive experimentation at scale.

Ax uses ML to automate complex, resource-intensive experiments, enabling efficient optimization for AI, infrastructure, and hardware.
🔥4🆒4🥰2👏2
Anthropic shipped three new updates for Claude Agent SDK to make it easier to build custom agents:

- Support for 1M context windows
- Sandboxing
- V2 of our TypeScript interface

GitHub.
🔥6🥰4👏3
Google released the FACTS Benchmark Suite

It’s the industry’s first comprehensive test evaluating LLM factuality across four dimensions: internal model knowledge, web search, grounding, and multimodal inputs.
4🔥2👏2
Travis Beals, a Google executive working on the orbital data-center effort, said it would take 10,000 satellites to recreate the compute capacity of a gigawatt data center, assuming 100-kilowatt satellites.
🔥42👏2
NVIDIA presents Alpamayo-R1

It's a vision-language-action model that uses "Chain of Causation" reasoning to plan.

It cuts off-road events by 35% and improves decision-making in complex scenarios, showing a promising path to more capable autonomy.
🔥3🥰3👏3
Google released the Gemini Deep Research agent for developers.

It can create a plan, spot gaps, and autonomously navigate the web to produce detailed reports.

Built on Gemini 3 Pro, it was trained using multi-step reinforcement learning to increase accuracy and reduce hallucinations.

It handles massive context – analyzing your uploaded docs alongside the web – and provides citations so you can verify every claim.

Deep Research is the first agent released on the new Interactions API – offering a single endpoint for agentic workflows.
🔥2🥰2👏21
OpenAI shipped a new model. GPT-5.2 showcases OpenAI's incredible post-training stack in action:  significant gains in knowledge work (think building a financial model), long-context capability, and coding.

GPT-5.2 likely involved additional mid-training to refresh the cutoff date, plus significant amounts of RL.

One catch: OpenAI raised pricing 40%. Is it worth it?

SWE-Bench Pro results offer an interesting perspective. GPT-5.2 is able to reach higher scores at comparable cost to 5.1 Codex Max, while also continuing to push the capability ceiling.

This price hike will directly increase OpenAI's margins.

We saw a similar dynamic with Claude models, whereby Opus 4.5 was able to achieve comparable scores to Sonnet 4.5 at much lower cost.

This is due to models becoming increasing token efficient, requiring less thinking to get more done.
👍4🔥4👏4
Apple briefly posted then quickly pulled an arXiv paper, but the v1 snapshot is wild.

The team reveals RLAX, a scalable RL framework on TPUs.


It's built with a parameter server design where a master trainer pushes weights and massive inference fleets pull them to generate rollouts.

With new curation and alignment tricks and preemption friendly engineering, RLAX boosts QwQ-32B pass@8 by 12.8 percent in only 12h48m on 1024 v5p TPUs.
🔥4🥰4👏3
First comprehensive framework for how AI agents actually improve through adaptation.

Researchers from many universities surveyed the rapidly expanding landscape of agentic AI adaptation.

What they found: a fragmented field with no unified understanding of how agents learn to use tools, when to adapt the agent versus the tool, and which strategies work for which scenarios.

These are all important for building production-ready AI agents.


Adaptation in agentic AI follows four distinct paradigms that most practitioners conflate or ignore entirely.

The framework organizes all adaptation strategies into two dimensions.

- Agent Adaptation (A1, A2): modifying the agent's parameters, representations, or policies.
- Tool Adaptation (T1, T2): optimizing external components like retrievers, planners, and memory modules while keeping the agent frozen.
🔥32👏2
Diffusion LLMs are the new frontier? InclusionAI has released LLaDA 2.0—the first diffusion model to scale to 100B params, matching frontier LLMs while achieving 2× faster inference

LLaDA is 2.3x faster on average. We see unique high-TPF advantages in Coding via parallel decoding.

The Challenge: AR models had a 3-year head start.

GitHub.
GitHub.
5🔥5👏3
NVIDIA launched the open Nemotron 3 model family, starting with Nano (30B-3A), which pushes the frontier of accuracy and inference efficiency with a novel hybrid SSM Mixture of Experts architecture.

Super and Ultra are coming in the next few months.

Nemotron 3 Super (~4X bigger than Nano) and Ultra (~16X bigger than Nano) are pretrained using NVFP4, a new "Latent Mixture of Experts" architecture that allows us to use 4X more experts for the same inference cost, and Multi-Token Prediction.
4🔥4🥰2
a16z released 17 crypto predictions for 2026. Most are obvious. A few are not.

The ones worth paying attention to:

1. Privacy becomes the strongest moat

Bridging tokens is easy. Bridging secrets is hard. Users on private chains are less likely to leave.
Winner-take-most dynamics emerge.

2. Know Your Agent (KYA)
Non-human identities outnumber human employees 96-to-1 in financial services.
The agent economy's bottleneck is identity.

3. AI agents are taxing the open web
They extract value from ad-supported sites while bypassing revenue streams.

The web needs real-time, usage-based compensation or content creation collapses.
3🔥3💯3