All about AI, Web 3.0, BCI
3.33K subscribers
733 photos
26 videos
161 files
3.17K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
StepFun released GELab-Zero-4B-preview β€” a 4B multimodal GUI agent fine-tuned for Android.

It understands taps, swipes, typing & waits, and can perform complex, multi-app tasks.
Built on Qwen3-VL-4B-Instruct.

HuggingFace.
GitHub.
πŸ‘4πŸ”₯3πŸ₯°2
#DeepSeek just launched DeepSeek-V3.2 & DeepSeek-V3.2-Speciale β€” Reasoning-first models built for agents

1. DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API.

2. DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now.

Thinking in Tool-Use:

- Introduced a new massive agent training data synthesis method covering 1,800+ environments & 85k+ complex instructions.

- DeepSeek-V3.2 integrate thinking directly into tool-use, and also supports tool-use in both thinking and non-thinking modes.

API update:

- V3.2: Same usage pattern as V3.2-Exp.
- V3.2-Speciale: Served via a temporary endpoint: base_url="
Same pricing as V3.2, no tool calls, available until Dec 15th, 2025, 15:59 (UTC Time).

V3.2 now supports Thinking in Tool-Use β€” details
πŸ‘4πŸ”₯3πŸ₯°2
Google introduced Budget Tracker for smarter AI agents

Current LLM agents waste tool-call budgets.

This work unveils Budget Tracker and BATS, enabling agents to dynamically adapt planning based on remaining resources.
πŸ”₯4πŸ₯°3πŸ‘2
We have a new best text-to-video model that beats Google's Veo. Runway Gen-4.5, or Whisper Thunder, has +20 ELO on preference data over Veo 3, the difference between Veo 3 and Sora 2 Pro.

Does text-to-vid, image-to-vid, keyframes. 5-10s of output. No audio.
πŸ”₯4πŸ‘3πŸ‘3
Sam Altman told staff today that he was declaring a β€œcode red” as ChatGPT faces growing threats from Google and other AI makers.

He wrote that he’s marshaling more resources to improve model behavior and other features in the chatbot.

In an internal Slack memo, Sam said he's directing more employees to work on improving ChatGPT for over 800 million weekly users, with key code red priorities including personalizing the chatbot so each person can customize how it interacts with them, improving ImageGen, improving model behavior, boosting speed and reliability, and minimizing overrefusals

OpenAI is delaying ads (which the company is testing but hasn't publicly acknowledged, according to a person with knowledge of the plans), AI agents (which aim to automate tasks related to shopping and health), Pulse, and plans to release a new reasoning model next week that Sam said beats Google's Gemini 3 in OpenAI's internal tests
πŸ‘4πŸ”₯3πŸ₯°3
The world's first Co-Scientist integrating AI and XR. Meet LabOS.

It uses multimodal perception, self-evolving agents, and XR tools to see what researchers see, grasp experimental context, and assist in real time.

From cancer immunotherapy target discovery to stem-cell engineering, it turns labs into collaborative spaces where human insight and machine smarts evolve together, proving modern science moves fastest when thought and action team up.

Paper
❀6πŸ†’6πŸ”₯5πŸ‘1
Mistral released the Mistral 3 family of models

Small models Ministral 3 (14B, 8B, 3B), each released with base, instruct and reasoning versions.

And Mistral Large 3, a frontier class open source MoE. Apache 2.0.
πŸ”₯6❀3πŸ‘2
Shopify just shipped Tangle - the first open source experimentation platform with content-based caching and visual editor that's actually pleasant to use.

The CPU time savings alone are ridiculous (seeing 1+ year saved at Shopify).
πŸ”₯6πŸ‘3πŸ₯°2
Diffusion Language Models are hyped lately, but hard to reproduce due to missing frameworks and high training costs.

Berkeley and UIUC show a surprisingly simple path: using their dLLM toolkit, they teach BERT to chat via discrete diffusion.

No generative pretraining, about 50 GPU hours, and ModernBERT large chat v0 reaches near Qwen1.5 0.5B quality with only lightweight SFT.

Even better, they open sourced the full training and inference pipeline plus a Hello World example, along with the extensible dllm framework. Efficient, cheap, and beginner friendly.

Models.
❀3πŸ”₯3πŸ‘3
A promising step toward practical, efficient compute in memory systems

A new memristor based ADC with adaptive quantization shows the possibility: analog AI hardware could unlock its full potential without bulky converters in the way.

It delivers strong CIFAR10 and ImageNet performance at just 5 bits, achieves up to 15.1x better energy efficiency and 12.9x smaller area, and cuts CIM system overhead by more than half.
πŸ”₯3πŸ₯°3πŸ‘3
OpenAI published blog post stating: confessions can keep language models honest.

Poof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.

Even when models learn to cheat, they’ll still admit it...
πŸ”₯3πŸ₯°3πŸ‘2
Google introduced the Massive Sound Embedding Benchmark (MSEB).

This new open-source framework evaluates universal sound understanding across 8 core tasks, from retrieval to reconstruction, in order to accelerate progress in multimodal AI.
❀3πŸ‘3πŸ”₯2
Best Paper(DB track) Award at #NeurIPS2025 for Artificial Hivemind

Researchers from University of Washington, CMU, and Allen Institute have identified a fundamental problem in modern language models - the "Artificial Hivemind effect". HuggingFace.

Different models independently generate identical responses to open-ended questions. GPT-4, Qwen, Llama, Mixtral - all write "time is a river" when asked for a metaphor about time.

Average semantic similarity across different model families: 71-82%. This isn't a bug in one model. It's a systemic property of current LLM training paradigms.

The study covers 70+ models using the INFINITY-CHAT dataset:
- 26K real-world open-ended queries from WildChat
- 17 categories (from creative writing to philosophical questions)
- 31,250 human annotations (25 independent annotators per example)

Two forms of collapse:

β€’ Intra-model: a single model repeats itself with pairwise similarity >0.8 in 79% of cases (even at temperature=1.0)

β€’ Inter-model: different models produce identical phrases and structures.

Critical finding: LLM judges and reward models systematically fail when evaluating alternative responses of similar quality. Correlation with humans drops from 0.4 to 0.05 on examples with diverse content.

For business:
This creates an "AI feedback loop" - models are trained based on evaluations from other models that are themselves poorly calibrated for diversity.
Implications: β†’ Reduced innovation potential in AI assistants β†’ Standardization of creative content β†’ Loss of alternative perspectives in strategic analysis β†’ Risk of homogenizing user thinking patterns.

The future of AI should not be echoes of one voice, but a chorus of many.
πŸ”₯8❀5πŸ‘4
Anthropic released Interviewer that lets interview people at scale by using Claude

This helps expand the kind of research you can do.
❀4πŸ‘4πŸ”₯4
Is this Yann LeCun’s first paper after leaving Meta? It demonstrates how humanoid robots can mimic actions from AI-generated videos, which are often too noisy for direct imitation.

The system lifts the video into 3D keypoints and then uses a physics-aware policy to execute the motions, enabling zero-shot control.

They implemented this on the Unitree G1 humanoid robot.
πŸ”₯5❀4πŸ₯°2
OpenRouter collaborated with a16z to publish the State of AI - an empirical report on how LLMs have been used on OpenRouter.

After analyzing more than 100 trillion tokens across hundreds of models and 3+ million users (excluding 3rd party) from the last year.

A lot of insights:

1. One finding: OpenRouter observe a Cinderella "Glass Slipper" effect for new models.

Early users a new LLM either churn quickly or become part of a foundational cohort, with much higher retention than others. They are early adopters who can "lead" the rest of the market.

2. Open vs Closed Weights:

By late 2025, open-weight models (abbreviated as OSS below) reached ~β…“ of usage, sustained beyond launch spikes, but have plateaued in Q4.

3. Chinese models: grew from ~1% to around 30% in some weeks. Release velocity + quality make the market lively.

If you want a single picture of the modern stack:
- Closed models = high-value workloads
- Open models = high-volume workloads

And what we have seen is that a lot of teams use both.

OSS isn't "just for tinkering" - it is extremely popular in two areas:
β€’ Roleplay / creative dialogue: >50% of OSS usage
β€’ Programming assistance: ~15-20%.

4. Now the significant platform shift: agentic inference

Tracked it via:
- reasoning model adoption
- tool calling
- prompt/completion β€œshape” (sequence lengths).

5. Reasoning models go from β€œnegligible” to more than 50% of tokens in 2025. Full paradigm shift.

6. Languages: English dominates with more than 80% of tokens, but the tail is real - Chinese, Russian, Spanish, etc.

7. Economics: price matters, but less than you think.On cost vs usage map, the trendline is nearly flat: reducing cost by 10% only correlates with ~0.5-0.7% more usage.
❀5πŸ”₯5πŸ₯°4
Meta published a new paper on what is the path to safer superintelligence: co-improvement.

Everyone is focused on self-improving AI, but:

1) we don't know how to do it yet, and
2) it might be misaligned with humans.

Co-improvement: instead, build AI that collaborates with us to solve AI faster, and to help fix the alignment problem together.
πŸ”₯5πŸ₯°3πŸ‘3
Nvidia introduced CUDA 13.1. It is the biggest expansion of CUDA since it launched in 2006.

CUDA Tile, a new way to program GPUs that makes powerful AI and accelerated computing easier for more developers to use.
❀6πŸ”₯2πŸ₯°2
All about AI, Web 3.0, BCI
Essential AI just dropped Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases Researchers labeled 23.6B documents from Common Crawl with a 12-category…
Essential AI introduced their first open models, Rnj-1 base and instruct 8B parameter models.

Rnj-1 is the culmination of 10 months of hard work by a phenomenal team, dedicated to advancing American SOTA OSS AI.

Lots of wins with Rnj-1.

1. SWE bench performance close to GPT 4o.
2. Tool use outperforming all comparable open source models.
3. Mathematical reasoning (AIME’25) nearly at par with GPT OSS MoE 20B.