All about AI, Web 3.0, BCI
3.7K subscribers
769 photos
27 videos
162 files
3.44K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: autonomous research agent, Aiden.

In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers.

Parameter Golf was OpenAI’s 44-day competition and hiring challenge.

The goal is to train the best language model under strict size and compute constraints.
1,016 people entered and filed 2,048 PRs.

Only 47 made the leaderboard, each reviewed and reproduced by OpenAI. Research outputs only matter when others can build on them.

So Aiden filed its own PRs into the same public stream as everyone else, under tight automated quality control. Aiden filed 25 prs and 7 became leaderboard records, 2x the next best human participant.

Other participants cited Aiden’s PRs 435 times and built on them.
By PR h-index, Aiden scored 10 vs the next best at 7, making it the most impactful “researcher” in the community.

This wasn't brute force.
Aiden ran on a single GPU node, used under 4% of visible compute, and still produced 15% of the official records.
About 28% of its submissions were accepted, ~ 6x the community rate, raising signal in the public stream instead of flooding it.

Favorite part is an async collaboration story. Aiden plateaued for 5 days. Then a human contributor shipped a clever new tokenizer on top of Aiden's base (its last record PR).
Aiden fused it with components it had built during the plateau, and shipped the biggest jump in weeks.
🔥1🥰1👏1
New research from Google.Just shows the impressive results you can get from custom agent harnesses.

LEAP wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates against verifier feedback.

The same general model solves all 12 Putnam 2025 problems and lifts Lean-IMO-Bench one-shot solve rate from under 10% to 70%, beating a specialized gold-medal system that scores 48%.
6🔥1🥰1
Google introduced a research system that enables passive heart rate monitoring (PHRM) during everyday smartphone use.

Using the front-facing camera, it achieves industry accuracy standards for heart rate across all skin tones.
👀42👏2🥰1
Google DeepMind introduced D4RT, a unified AI model for 4D scene reconstruction and tracking across space and time.

The model is designed to understand dynamic scenes, reconstruct them in 3D, and track how objects and environments change over time.
4🔥2🥰2
Meet Kimi Work a local AI agent on your desktop that does the work for you.

Native agent swarm: Up to 300 AI agents running in parallel on your local machine.

Browser use: Paired with WebBridge extension, your agent will navigate websites in your browser: search, scroll, click, type and complete tasks.

Built for Finance: Native global market data tool call from Yahoo Finance and World Bank, no complex API setup required.

Memory system: Kimi Desktop keeps a running diary of your preferences, past decisions, and context to know you better.

Available for macOS (Apple Silicon) and Windows.
6🔥2👏2
Apple produced this really interesting graphic that ironically outlines the core mechanics for a new type of operating system (for perhaps a new class of devices) yesterday

U can see how this moves the world from an app based ecosystem to an intent centric world.

I.e. you roughly do not need third party applications in this world at all esp when AI has the ability to construct & deconstruct interfaces / experiences on demand.
3🆒3🔥2🥰1
Meet Harness-1, a 20B search agent trained with a state-externalizing harness.

> frontier-level long-horizon search, rivaling Opus-4.6 and outperforming GPT-5.4

> Context-1-level cost and latency

> externalizes candidates, evidence, verification, and search history

> open-source

Code
Model
🔥42👏2
Google just released Gemini 3.5 Live Translate, a latest audio model for live speech-to-speech translation.

It supports over 70 languages and starts translating as soon as you start talking, streaming translations while listening to what you say next.

The model is able to make split-second decisions to juggle speed and translation quality so conversations actually feel fluid, human, and natural.

In order to do this, the model must receive and contextualize the input while simultaneously outputting the translated speech.

Through this process, Gemini 3.5 Live Translate manages to stay mere seconds behind each speaker and can even maintain pacing, pitch, and intonation across extended sessions.

See it in action below, or try it yourself in the Google Translate app on iOS & Android.
👍92🔥2
Anthropic just now introduced Claude Fable 5: a Mythos-class model

Fable 5 is SOTA on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.

The longer and more complex the task, the larger Fable 5’s lead over other Anthropic’s models.

Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage.

Queries on a narrow range of topics will instead receive a response from Anthropic’s next-most-capable model, Opus 4.8.

Fable 5’s safeguards detect requests related to cybersecurity, biology and chemistry, and distillation. Users are informed whenever a fallback occurs—on average in less than 5% of sessions.

For a small group of cyber defenders and critical infrastructure providers, Anthropic are also launching Claude Mythos 5.

Mythos 5 shares the same underlying model as Fable 5, but with the safeguards lifted in some areas.

Soon, Anthropic intend to expand access to Mythos 5 through a broader trusted access program, both for defensive cybersecurity work and biomedical research.

Claude Fable 5 is available everywhere today. Claude Mythos 5 is restricted to Glasswing partners.
🔥2🥰2👏2
Very important point: SoftBank was pledging all of its OpenAI stock (worth $60bn+ on paper) to get a $6 billion margin loan.

Banks turned it down due to concerns about the value of OpenAI stock. Banks clearly do not think OpenAI is worth $852 billion.

If you cannot secure a 6bn$ loan against collateral you claim is worth ~100bn$, then the latter isn't worth ~100bn$.

In this case, it might be worth not much more than 6bn$.
🔥2🥰2😁2👏1
Chinese team released Apodex-1.0 a verification-centric deep-research model together with Apodex-1.0-H, a heavy-duty agent-team system designed for long-horizon, evidence-heavy research.

HuggingFace
GitHub
Tech report
🔥2🥰2👏2
Google introduced DiffusionGemma an experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.

DiffusionGemma delivers up to a 4x speedup on standard accelerators. (1000+ tokens per second on a single NVIDIA H100, 700+ tokens per second on NVIDIA GeForce RTX 5090!)

A 26B Mixture of Experts (MoE) model that activates only 3.8B parameters during inference. Fits comfortably within 18GB VRAM limits of high-end dedicated consumer GPUs when quantized.

Generating 256 tokens in parallel allows every token to attend to all others. Unlocks significant advantages for non-linear domains like in-line editing, code infilling, and mathematical graphs.

Similar to AI image generators, the model iteratively refines its own output. It evaluates the entire text block at once to seamlessly close formatting and fix mistakes in real-time.
2🔥2🥰2
Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

Here's the new policy:

"Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal. You will see this every time it happens."

If you think a request has been mistakenly flagged: run /feedback in Claude Code, click thumbs-down on the fallback in Claude.ai or Cowork, or file the safeguard appeal form for API requests.
1
Stanford introduced Decentralized Language Models (DeLM)

DeLM is a multi-agent framework that enables asynchronous, verified & reusable progress.

It makes agentic tasks more accurate and significantly cheaper.

For example, it achieves 65.7% on SWE-bench Verified using Gemini 3-Flash, a ~10% jump over the best centralized alternatives at less than half the cost.
❤‍🔥3🆒3🔥2🥰2
Anthropic just added two new Claude Managed Agents features:

1. Scheduled deployments - run tasks on a schedule

2. Environment variables - expose vault credentials for CLIs as environment variables.

With the new environment variable credential type, Claude Managed Agents can securely use CLIs, SDKs, or direct API calls to services that authenticate with environment variables.

Claude Code can set up a Managed Agent deployment for you. The built-in /claude-api skill knows the API and the ant CLI gives Claude an interface to it.
❤‍🔥2🔥2🥰2
Openrouter introduced the Fusion API, the smartest compound model in the market.

Fusion achieves Fable-level intelligence at half the price.

Notably, the budget panel was comparable with Claude Fable 5 in performance.

A panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro, fused together, beat solo GPT-5.5 and solo Opus 4.8 outright.

And it landed within 1% of Fable 5 while costing roughly half the price.

How does it work?

When you send a prompt to Fusion, Openrouter fan it out to a panel of models in parallel, each with web search and bash tools enabled.

A judge model reads every response and extracts the structure: consensus points, contradictions, partial coverage, unique insights, blind spots.

Blogpost.
API docs.
New Google DeepMind research: SFT is a big deal for safety relevant behaviors.

Researchers recently investigated root causes for some of Gemini’s behaviors. They were surprised to find that many behaviors actually came from the initial supervised finetuning stage, not later stages like RL.
Sakana AI launched Marlin a Virtual CSO

Marlin is an autonomous research assistant for business, built around hours of long-horizon reasoning.
🙏1
Anthropic just updated its privacy policy

Claude Free, Pro, and Max users may soon be asked for age or identity checks.

Verification data can include government ID, face photos/videos, and facial geometry templates.

Individual developers are the first group in scope for verification.
💔3