Just links

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs
https://arxiv.org/abs/2601.11061
via @buckwheat_thoughts

arXiv.org

Spurious Rewards Paradox: Mechanistically Understanding How RLVR...

Reinforcement Learning with Verifiable Rewards (RLVR) is highly effective for enhancing LLM reasoning, yet recent evidence shows models like Qwen 2.5 achieve significant gains even with spurious...

1.65K views12:53

Just links

https://www.anthropic.com/engineering/building-c-compiler

@seeallochnaya

Anthropic

Building a C compiler with a team of parallel Claudes

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

👀7😁3😨3🥴1👾1

1.72K views19:17

Just links

https://balatrobench.com/

Balatrobench

Leaderboard benchmarking LLMs playing Balatro: rounds, tool-call reliability, cost, and speed.

🔥6

1.81K views21:40

Just links

Weak Diffusion Priors Can Still Achieve Strong Inverse-Problem Performance https://arxiv.org/abs/2601.22443

arXiv.org

Weak Diffusion Priors Can Still Achieve Strong Inverse-Problem Performance

Can a diffusion model trained on bedrooms recover human faces? Diffusion models are widely used as priors for inverse problems, but standard approaches usually assume a high-fidelity model trained...

1.58K views08:50

Just links

Are AI Capabilities Increasing Exponentially? A Competing Hypothesis https://arxiv.org/abs/2602.04836

arXiv.org

Are AI Capabilities Increasing Exponentially? A Competing Hypothesis

Rapidly increasing AI capabilities have substantial real-world consequences, ranging from AI safety concerns to labor market consequences. The Model Evaluation & Threat Research (METR) report...

1.58K views11:42

Just links

BabyVision: Visual Reasoning Beyond Language https://unipat.ai/blog/BabyVision

UniPat AI

BabyVision: Visual Reasoning Beyond Language

State-of-the-art MLLMs achieve PhD-level language reasoning but struggle with visual tasks that 3-year-olds solve effortlessly. We introduce BabyVision, a benchmark revealing the infancy of AI vision.

7.5K views14:03

Just links

Forwarded from Hacker News

The Waymo World Model: A New Frontier for Autonomous Driving Simulation (🔥 Score: 157+ in 1 hour)

Link: https://readhacker.news/s/6Ma63
Comments: https://readhacker.news/c/6Ma63

Waymo

The Waymo World Model: A New Frontier For Autonomous Driving Simulation

We are excited to introduce the Waymo World Model, a frontier generative model that sets a new bar for large-scale, hyper-realistic autonomous driving simulation.

1.67K views17:54

Read 77+ Comments

Just links

Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation https://arxiv.org/abs/2602.03595

arXiv.org

Refer-Agent: A Collaborative Multi-Agent System with Reasoning and...

Referring Video Object Segmentation (RVOS) aims to segment objects in videos based on textual queries. Current methods mainly rely on large-scale supervised fine-tuning (SFT) of Multi-modal Large...

1.82K views08:35

Just links

Learning to Repair Lean Proofs from Compiler Feedback https://arxiv.org/abs/2602.02990

arXiv.org

Learning to Repair Lean Proofs from Compiler Feedback

As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct...

❤1

2K views08:48

Just links