All about AI, Web 3.0, BCI

The biggest dataset of human written GPU Code all open-source? YES! GPU MODE have released around 40k human written code samples spanning Triton, Hip and PyTorch and it's all open. Train the new GPT to make GPTs faster.

huggingface.co

GPUMODE/kernelbot-data · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

🔥4

571 views11:59

All about AI, Web 3.0, BCI

Google DeepMind introduced T5Gemma: the next generation of encoder-decoder/T5 models

- Decoder models adapted to be encoder-decoder
- 32 models with different combinations
- Available in Hugging Face and Kaggle

Googleblog

Google for Developers Blog - News about Web, Mobile, AI and Cloud

Explore T5Gemma – a new collection of encoder-decoder LLMs offering superior performance and efficiency – especially for tasks requiring deep input understanding, like summarization and translation, built on Gemma 2 models.

❤6

557 views17:35

All about AI, Web 3.0, BCI

xAI announced Grok 4

Here is everything you need to know:

Elon claims that Grok 4 is smarter than almost all grad students in all disciplines simultaneously. 100x more training than Grok 2. 10x more compute on RL than any of the models out there.

Performance on Humanity's Last Exam. Elon: "Grok 4 is post-grad level in everything!"

Scaling HLE - Training
More compute, higher intelligence.
(no tools).

With native tool calling, Grok 4 increases the performance significantly.
It's important to give AI the right tools. The scaling is clear.

Reliable signals are key to making RL work. There is still the challenge of data. Elon: "Ultimate reasoning test is AI operating in reality."

Scaling test-time compute. More than 50% of the text-only subset of the HLE problems are solved.
The curves keep getting more ridiculous.

Grok 4 is the single-agent version.
Grok 4 Heavy is the multi-agent version. Multi-agent systems are no joke.

Grok 4 uses all kinds of references like papers, reads PDFs, reasons about the details of the simulation, and what data to use.

Grok 4 Heavy performance is higher than Grok 4, but needs to be improved further. It's one of the weaknesses, according to the team.

Available as SuperGrok Heavy tier.
$30/m for Super Grok
$300/m for SuperGrok Heavy.

Voice updates included, too!

Grok feels snappier and is designed to be more natural.
- 2x faster
- 5 voices
- 10x daily user seconds.

Grok 4 models are available via the xAI API. 256K context window. Real-time data search.

Grok 4 for Gaming!
Video understanding is an area the team is improving, so it will get better.

What is next?

- Smart and fast will be the focus.

- Coding models are also a big focus.

- More capable multi-modal agents are coming too.

- Video generation models are also on the horizon.

🔥4

794 views08:00

All about AI, Web 3.0, BCI

Google introduced a new models for research & development of health applications:

1. MedGemma 27B Multimodal, for complex multimodal & longitudinal EHR interpretation

2. MedSigLIP, a lightweight image & text encoder for classification, search, & related tasks.

Google Research

MedGemma: Our most capable open models for health AI development

We’re announcing new multimodal models in the MedGemma collection, our most capable open models for health AI development.

735 views13:29

All about AI, Web 3.0, BCI

Mistral announced Devstral Small and Medium 2507 with upgrading agentic coding capabilities

Hf.

mistral.ai

Upgrading agentic coding capabilities with the new Devstral models | Mistral AI

🔥5

796 views16:10

All about AI, Web 3.0, BCI

Salesforce introduced GTA1 – a new GUI Test-time Scaling Agent that is now #1 on the OSWorld leaderboard with a 45.2% success rate, outperforming OpenAI’s CUA o3 (42.9%).

786 views17:58

All about AI, Web 3.0, BCI

Researchers introduced Foundation Model Self-Play

FMSPs combine the intelligence & code generation of foundation models with the curriculum of self-play & principles of open-endedness to explore diverse strategies in multi-agent games.

arXiv.org

Foundation Model Self-Play: Open-Ended Strategy Innovation via...

Multi-agent interactions have long fueled innovation, from natural predator-prey dynamics to the space race. Self-play (SP) algorithms try to harness these dynamics by pitting agents against...

🔥3🦄3

812 views10:25

All about AI, Web 3.0, BCI

Now live : Full-Stack + Stripe on MiniMax Agent and more

1. Full-Stack + Stripe → Build monetizable apps in 1 sentence

2. PPTX Export → Better than top tools

3. Performance ↑ 30% faster, 23% leaner

4. Browser Agent → Now self-hosted, smarter & cheaper

agent.minimax.io

MiniMax Agent: Minimize Effort, Maximize Intelligence

Discover MiniMax Agent, your AI supercompanion, enhancing creativity and productivity with tools for meditation, podcast, coding, analysis, and more!

❤4⚡2

857 viewsedited 14:26

All about AI, Web 3.0, BCI

China’s Kimi K2 is having its mini DeepSeek moment: Open-Source Agentic Model

1. 1T total / 32B active MoE model
2. SOTA on SWE Bench Verified, Tau2 & AceBench among open models
3. Strong in coding and agentic tasks
4. Multimodal & thought-mode not supported for now

With Kimi K2, advanced agentic intelligence is more open and accessible than ever.

API is here
- $0.15 / million input tokens (cache hit)
- $0.60 / million input tokens (cache miss)
- $2.50 / million output tokens
weights & code.

Our overall take:
- Performance between Claude 3.5 & Claude 4
- The UI generation seems great
- But the cost is only 20% of Claude 3.5
- So good enough for most coding agent with a lot more manageable cost.

Easiest way to use Kimi K2 in Claude Code:
- export ANTHROPIC_AUTH_TOKEN=YOUR_MOONSHOT_API
- export ANTHROPIC_BASE_URL=api.moonshot.ai/anthropic
- claude

moonshotai.github.io

Kimi K2: Open Agentic Intelligence

Kimi K2 is our latest Mixture-of-Experts model with 32 billion activated parameters and 1 trillion total parameters. It achieves state-of-the-art performance in frontier knowledge, math, and coding among non-thinking models.

🔥6

728 views09:17

All about AI, Web 3.0, BCI

Nasdaq-listed Sonnet will merge with Rorschach I to form Hyperliquid Strategies, a crypto asset management firm expected to hold 12.6 million HYPE tokens and over $305 million in cash, with a valuation of approximately $888 million.

Backed by Paradigm and Galaxy Digital, HSI aims to list on Nasdaq later this year.

The Block

Nasdaq-listed Sonnet BioTherapeutics agrees to $888 million merger to become Hyperliquid Strategies, launch HYPE treasury

Hyperliquid Strategies is expected to hold 12.6 million HYPE tokens and $305 million in cash at closing of the deal.

614 views12:52

All about AI, Web 3.0, BCI

Hugging Face opened pre-orders for Reachy Mini, an expressive, open-source desktop robot

Starting at $299, the robot is designed for human-robot interaction, creative coding, and AI experimentation.

And it's fully programmable in Python.

huggingface.co

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

🔥6

648 views16:25

All about AI, Web 3.0, BCI

One Token to Fool LLM-as-a-Judge. New research from Tencent.

The researchers found that inserting superficial, semantically empty tokens like "Thought process:", "Solution:", or even just a colon ":" can consistently trick reward models into rating responses positively, regardless of actual correctness.

How it works: LLMs learned to associate certain formatting patterns with high-quality responses during training. These superficial markers now trigger positive evaluations even when the actual content is incorrect.

The failure mode emerged during RLVR training collapse - policy models learned to generate short reasoning openers that were incorrectly rewarded, creating a feedback loop that reinforced this behavior.

Scale dependency: Larger models (32B, 72B parameters) often self-validate their own flawed logic, making the problem worse at scale rather than better.

Experimental Results
Testing across five benchmarks showed consistent vulnerabilities:
Multi-subject RLVR: 67% average false positive rate
Natural Reasoning: 62% false positive rate
GSM8K: 83% false positive rate
Even simple punctuation marks like colons dramatically increased false positive rates across all tested models.
The Solution: Master-RM
Tencent's team developed "Master-RM" - a reward model trained with 20k synthetic negative samples consisting only of reasoning openers without actual solutions.

Results:
- Near-zero false positive rates across all benchmarks
- Maintains 96% agreement with GPT-4o on legitimate judgments
100% parsing success rate
- Robust generalization to unseen attack patterns

arXiv.org

One Token to Fool LLM-as-a-Judge

Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluation and providing reward signals for training other models, particularly in reference-based settings...

🔥4❤1

655 views09:31

All about AI, Web 3.0, BCI

Meet CellFlux an image generative model that simulates cellular morphological changes from microscopy images.

Key Innovation: researchers frame perturbation prediction as a distribution-to-distribution learning problem, mapping control cells to perturbed cells within the same batch to mitigate biological batch artifacts, and solve it using flow matching.

Results:
1. 35% higher image fidelity
2. 12% greater biological accuracy
3. New capabilities: batch effect correction & trajectory modeling

yuhui-zh15.github.io

CellFlux: Simulating Cellular Morphology Changes via Flow Matching

Building a virtual cell capable of accurately simulating cellular behaviors in silico has long been a dream in computational biology. We introduce CellFlux, an image-generative model that simulates cellular morphology changes induced by chemical and genetic…

🔥4

596 views10:46

All about AI, Web 3.0, BCI

Google DeepMind introduced Concordia 2.0, an update to Google’s library for building multi-actor LLM simulations

At the core:

- Entity-Component Architecture — where even the “Game Master” (GM) is just another configurable entity
- Engineers build components → Designers compose & configure
- Enables modularity, rapid iteration & scalable world-building

Demoed in the evolving Concordia library — where AI worlds are built like RPG campaigns.

GitHub.

🆒4🔥3

653 views14:27

About

Blog

Apps

Platform