Researchers introduced MedBrowseComp, a challenging deep research benchmark for LLM agents in medicine
MedBrowseComp is the first benchmark that tests the ability of agents to retrieve & synthesize multi-hop medical facts from oncology knowledge bases.
MedBrowseComp is the first benchmark that tests the ability of agents to retrieve & synthesize multi-hop medical facts from oncology knowledge bases.
moreirap12.github.io
MedBrowseComp
MedBrowseComp project page
🔥4
Claude 4 is here, and it’s Anthropic’s vision about future of Agents
👍6
All about AI, Web 3.0, BCI
Claude 4 is here, and it’s Anthropic’s vision about future of Agents
More details about Claude4:
—Both models are hybrid models
—Opus 4 is great at understanding codebases and “the right choice” for agentic workflows
—Sonnet 4 excels at everyday tasks, and is your “daily go to”.
Coding agents are a huge theme here at the event and clearly a major focus for what’s coming next.
-Claude 4 has significantly greater agentic capabilities
-A new Code execution tool
-Claude Code coming to VSCode and Jetbrains
-Can now run Claude Code in GitHub.
Some more details on Claude 4 Opus:
—Matches or beats the best models in the world
—SOTA for coding, agentic tool use, and writing
—Memory capabilities across sessions
—Extended thinking mode for complex problem-solving
—200K context window with 32K output tokens.
Claude Code:
—Now generally available
—Integrates with VSCode and Jetbrains IDEs
—You can now see changes live inline in your editor
—A new Claude Code SDK for more flexibility.
If you want to read more about Sonnet & Opus 4, including a bunch of alignment and reward hacking findings, check out the model card.
—Both models are hybrid models
—Opus 4 is great at understanding codebases and “the right choice” for agentic workflows
—Sonnet 4 excels at everyday tasks, and is your “daily go to”.
Coding agents are a huge theme here at the event and clearly a major focus for what’s coming next.
-Claude 4 has significantly greater agentic capabilities
-A new Code execution tool
-Claude Code coming to VSCode and Jetbrains
-Can now run Claude Code in GitHub.
Some more details on Claude 4 Opus:
—Matches or beats the best models in the world
—SOTA for coding, agentic tool use, and writing
—Memory capabilities across sessions
—Extended thinking mode for complex problem-solving
—200K context window with 32K output tokens.
Claude Code:
—Now generally available
—Integrates with VSCode and Jetbrains IDEs
—You can now see changes live inline in your editor
—A new Claude Code SDK for more flexibility.
If you want to read more about Sonnet & Opus 4, including a bunch of alignment and reward hacking findings, check out the model card.
Anthropic
Introducing Claude 4
Discover Claude 4's breakthrough AI capabilities. Experience more reliable, interpretable assistance for complex tasks across work and learning.
❤6👍3
ByteDance introduced MMaDA: Multimodal Large Diffusion Language Models
MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation.
Surpasses LLaMA-3-7B and Qwen2-7B, SDXL and Janus, Show-o and SEED-X.
3 key innovations:
1. a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components.
2. mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities.
3. UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models.
GitHub.
MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation.
Surpasses LLaMA-3-7B and Qwen2-7B, SDXL and Janus, Show-o and SEED-X.
3 key innovations:
1. a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components.
2. mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities.
3. UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models.
GitHub.
arXiv.org
MMaDA: Multimodal Large Diffusion Language Models
We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and...
👏4
Humans can now see near-infrared light! Very cool tech development in biophotonics: engineered contact lenses convert invisible NIR signals into visible colors- enabling wearable, power-free NIR vision.
This has potential to shift our perceptual boundaries, showing the brain can integrate novel spectral inputs when mapped onto familiar visual codes, reframing light-based information processing and sensory integration.
This has potential to shift our perceptual boundaries, showing the brain can integrate novel spectral inputs when mapped onto familiar visual codes, reframing light-based information processing and sensory integration.
❤4
AI models are finding zero-day vulnerabilities. A new era for cybersecurity.
Sean Heelan's Blog
How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation
In this post I’ll show you how I found a zeroday vulnerability in the Linux kernel using OpenAI’s o3 model. I found the vulnerability with nothing more complicated than the o3 API ̵…
The World Economic Forum has released a report on Asset Tokenization in Financial Markets.
Highlights
1. Tokenization offers a new model of digital asset ownership that enhances transparency, efficiency and accessibility.
2. This report analyses asset class use cases in issuance, securities financing and asset management, identifying factors that enable successful tokenization implementation.
3. Key differentiators include a shared system of record, flexible custody, programmability, fractional ownership and composability across asset types. These features can democratize access to financial markets and modernize infrastructure.
4. While the benefits are demonstrated, adoption is slowed by challenges such as legacy infrastructure, regulatory fragmentation, limited interoperability and liquidity issues.
5. Effective deployment requires phased approaches and strategic coordination among financial institutions, regulators and technology providers. Factors affecting design decisions – such as ledger type, settlement mechanisms and market operating hours – must also be carefully considered.
6. Ultimately, tokenization holds promise for a more inclusive and efficient financial system, provided stakeholders align on standards, safeguards and scalable solutions.
7. Tokenization is expected to reshape financial markets by increasing transparency, efficiency, speed, and inclusivity—paving the way for more resilient and accessible financial systems.
Highlights
1. Tokenization offers a new model of digital asset ownership that enhances transparency, efficiency and accessibility.
2. This report analyses asset class use cases in issuance, securities financing and asset management, identifying factors that enable successful tokenization implementation.
3. Key differentiators include a shared system of record, flexible custody, programmability, fractional ownership and composability across asset types. These features can democratize access to financial markets and modernize infrastructure.
4. While the benefits are demonstrated, adoption is slowed by challenges such as legacy infrastructure, regulatory fragmentation, limited interoperability and liquidity issues.
5. Effective deployment requires phased approaches and strategic coordination among financial institutions, regulators and technology providers. Factors affecting design decisions – such as ledger type, settlement mechanisms and market operating hours – must also be carefully considered.
6. Ultimately, tokenization holds promise for a more inclusive and efficient financial system, provided stakeholders align on standards, safeguards and scalable solutions.
7. Tokenization is expected to reshape financial markets by increasing transparency, efficiency, speed, and inclusivity—paving the way for more resilient and accessible financial systems.
❤4
Singapore's Sharpa unveiled SharpaWave, a lifelike robotic hand
—Features 22 DOF to balance for dexterity and strength
—Each fingertip has 1,000+ tactile sensing pixels and 5 mN pressure sensitivity
—AI models adapt the hand's grip and modulate force
—Features 22 DOF to balance for dexterity and strength
—Each fingertip has 1,000+ tactile sensing pixels and 5 mN pressure sensitivity
—AI models adapt the hand's grip and modulate force
HouseBots
Sharpa Unveils SharpaWave: The World’s Most Tactile Dexterous Robot Hand — HouseBots
Singapore-based robotics startup Sharpa is redefining what robotic manipulation means with the debut of its latest innovation: SharpaWave , a 22-degree-of-freedom (DOF) dexterous hand that brings human-like precision and speed to the world of robotics.
🔥6
Researchers introduced SPORT, a multimodal agent that explores tool usage without human annotation.
It leverages step-wise DPO to further enhance tool-use capabilities following SFT.
SPORT achieves improvements on the GTA and GAIA benchmarks.
It leverages step-wise DPO to further enhance tool-use capabilities following SFT.
SPORT achieves improvements on the GTA and GAIA benchmarks.
Google introduced Lyria RealTime is a new experimental interactive music generation model that allows anyone to interactively create, control and perform music in real time.
Available via the Gemini API and you can try the demo app on Google AI Studio.
Available via the Gemini API and you can try the demo app on Google AI Studio.
Amazon added AI-generated audio discussions about certain products, based on customer reviews and web searches.
About Amazon
Amazon's new generative AI-powered audio feature synthesizes product summaries and reviews to make shopping easier
The new AI shopping experts help save time by compiling research and providing product highlights for customers from product pages, reviews, and insights.
Anthropic just now rolling out voice mode in beta on mobile.
Try starting a voice conversation and asking Claude to summarize your calendar or search your docs. Voice mode in beta is available in English and coming to all plans in the next few weeks.
Try starting a voice conversation and asking Claude to summarize your calendar or search your docs. Voice mode in beta is available in English and coming to all plans in the next few weeks.
Game-Changer for AI: Meet the Low-Latency-Llama Megakernel
Buckle up, because a new breakthrough in AI optimization just dropped, and it’s got even Andrej Karpathy buzzing)
The Low-Latency-Llama Megakernel a approach to running models like Llama-1B faster and smarter on GPUs.
What’s the Big Deal?
Instead of splitting a neural network’s forward pass into multiple CUDA kernels (with pesky synchronization delays), this megakernel runs everything in a single kernel. Think of it as swapping a clunky assembly line for a sleek, all-in-one super-machine!
Why It’s Awesome:
1. No Kernel Boundaries, No Delays. By eliminating kernel switches, the GPU works non-stop, slashing latency and boosting efficiency.
2. Memory Magic. Threads are split into “loaders” and “workers.” While loaders fetch future weights, workers crunch current data, using 16KiB memory pages to hide latency.
3. Fine-Grained Sync. Without kernel boundaries, custom synchronization was needed. This not only solves the issue but unlocks tricks like early attention head launches.
4. Open Source. The code is fully open, so you can stop “torturing” your models with slow kernel launches (as the devs humorously put it) and optimize your own pipelines!
Why It Matters ?
- Speed Boost. Faster inference means real-time AI applications (think chatbots or recommendation systems) with lower latency.
- Cost Savings. Optimized GPU usage reduces hardware demands, perfect for startups or budget-conscious teams.
- Flexibility. Open-source code lets developers tweak it for custom models or use cases.
Karpathy’s Take:
Andrej calls it “so so so cool,” praising the megakernel for enabling “optimal orchestration of compute and memory.” He argues that traditional sequential kernel approaches can’t match this efficiency.
Buckle up, because a new breakthrough in AI optimization just dropped, and it’s got even Andrej Karpathy buzzing)
The Low-Latency-Llama Megakernel a approach to running models like Llama-1B faster and smarter on GPUs.
What’s the Big Deal?
Instead of splitting a neural network’s forward pass into multiple CUDA kernels (with pesky synchronization delays), this megakernel runs everything in a single kernel. Think of it as swapping a clunky assembly line for a sleek, all-in-one super-machine!
Why It’s Awesome:
1. No Kernel Boundaries, No Delays. By eliminating kernel switches, the GPU works non-stop, slashing latency and boosting efficiency.
2. Memory Magic. Threads are split into “loaders” and “workers.” While loaders fetch future weights, workers crunch current data, using 16KiB memory pages to hide latency.
3. Fine-Grained Sync. Without kernel boundaries, custom synchronization was needed. This not only solves the issue but unlocks tricks like early attention head launches.
4. Open Source. The code is fully open, so you can stop “torturing” your models with slow kernel launches (as the devs humorously put it) and optimize your own pipelines!
Why It Matters ?
- Speed Boost. Faster inference means real-time AI applications (think chatbots or recommendation systems) with lower latency.
- Cost Savings. Optimized GPU usage reduces hardware demands, perfect for startups or budget-conscious teams.
- Flexibility. Open-source code lets developers tweak it for custom models or use cases.
Karpathy’s Take:
Andrej calls it “so so so cool,” praising the megakernel for enabling “optimal orchestration of compute and memory.” He argues that traditional sequential kernel approaches can’t match this efficiency.
hazyresearch.stanford.edu
Look Ma, No Bubbles! Designing a Low-Latency Megakernel for Llama-1B
🆒5
Telegram + Grok = this summer https://xn--r1a.website/durov/422
Telegram
Pavel Durov
🔥 This summer, Telegram users will gain access to the best AI technology on the market. Elon Musk and I have agreed to a 1-year partnership to bring xAI’s chatbot Grok to our billion+ users and integrate it across all Telegram apps 🤝
💪 This also strengthens…
💪 This also strengthens…
🆒5
Apple and Duke University introduced 𝐈𝐧𝐭𝐞𝐫𝐥𝐞𝐚𝐯𝐞𝐝 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠
Researchers train LLMs to alternate between thinking & answering.
Reducing Time-to-First-Token (TTFT) by over 80% AND improving Pass@1 accuracy up to 19.3%!
Researchers train LLMs to alternate between thinking & answering.
Reducing Time-to-First-Token (TTFT) by over 80% AND improving Pass@1 accuracy up to 19.3%!
Market map for browser agents
A new companies launch in the space every week, for both consumer and enterprise use cases. ManusAI is one of the most popular generalist consumer agents, and Athena Intelligence is already being used by companies like Anheuser-Busch.
Computer/browser use has become one of the most important frontiers for model capabilities, with OpenAI, Anthropic, and Google DeepMind having dedicated teams to Operator, Claude Computer Use, and Project Mariner.
Open source frameworks like Browser use and Stagehand have become some of the most popular repos on Github, with tens of thousands of stars.
AI-first browsers are poised to disrupt the massive web browser market, with highly anticipated releases like Comet from Perplexity on the way. It's yet to be seen how Google integrates Project Mariner and other AI tools within Chrome.
A new companies launch in the space every week, for both consumer and enterprise use cases. ManusAI is one of the most popular generalist consumer agents, and Athena Intelligence is already being used by companies like Anheuser-Busch.
Computer/browser use has become one of the most important frontiers for model capabilities, with OpenAI, Anthropic, and Google DeepMind having dedicated teams to Operator, Claude Computer Use, and Project Mariner.
Open source frameworks like Browser use and Stagehand have become some of the most popular repos on Github, with tens of thousands of stars.
AI-first browsers are poised to disrupt the massive web browser market, with highly anticipated releases like Comet from Perplexity on the way. It's yet to be seen how Google integrates Project Mariner and other AI tools within Chrome.
🔥4
New paper from Google DeepMind Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
Researchers study 𝙬𝙝𝙮, 𝙝𝙤𝙬, and 𝙬𝙝𝙚𝙣 LLMs should self-reflect and explore at test time—questions that conventional Markovian RL cannot fully answer.
HuggingFace
GitHub
Researchers study 𝙬𝙝𝙮, 𝙝𝙤𝙬, and 𝙬𝙝𝙚𝙣 LLMs should self-reflect and explore at test time—questions that conventional Markovian RL cannot fully answer.
HuggingFace
GitHub
🆒4
An open-source humanoid for under $3k. Meet HopeJr, a full humanoid robot lowering the barrier to entry
Capable of walking, manipulating many objects, open-source and costs under $3000.
Designed by Rob Knight and HuggingFace.
Full bill of materials and links to source the parts will be available on this github
HopeJr has 66 actuated degrees of freedom.
Capable of walking, manipulating many objects, open-source and costs under $3000.
Designed by Rob Knight and HuggingFace.
Full bill of materials and links to source the parts will be available on this github
HopeJr has 66 actuated degrees of freedom.
GitHub
GitHub - TheRobotStudio/HOPEJr: HOPEJr_open-source_DIY_Humanoid_Robot_with_dexterous_hands
HOPEJr_open-source_DIY_Humanoid_Robot_with_dexterous_hands - TheRobotStudio/HOPEJr
#DeepSeek-R1-0528 is here
- Improved benchmark performance
- Enhanced front-end capabilities
- Reduced hallucinations
- Supports JSON output & function calling.
Weights
- Improved benchmark performance
- Enhanced front-end capabilities
- Reduced hallucinations
- Supports JSON output & function calling.
Weights
Deepseek
Reasoning Model (deepseek-reasoner) | DeepSeek API Docs
deepseek-reasoner is a reasoning model developed by DeepSeek. Before delivering the final answer, the model first generates a Chain of Thought (CoT) to enhance the accuracy of its responses. Our API provides users with access to the CoT content generated…
SEO is slowly losing its dominance. Welcome to GEO.
The future of search, marketing, and performance in the LLM era. As search changes, a new paradigm is emerging in marketing, one driven not by page rank, but by language models. Enter Generative Engine Optimization (GEO).
In the age of ChatGPT, Perplexity, and Claude, GEO is positioned to become the new playbook for brand visibility. GEO is rewriting the rules of search — unlocking an $80B+ opportunity.
It's not about gaming the algorithm — it's about being cited by it.
The brands that win in GEO won't just appear in AI responses. They'll shape them.
The future of search, marketing, and performance in the LLM era. As search changes, a new paradigm is emerging in marketing, one driven not by page rank, but by language models. Enter Generative Engine Optimization (GEO).
In the age of ChatGPT, Perplexity, and Claude, GEO is positioned to become the new playbook for brand visibility. GEO is rewriting the rules of search — unlocking an $80B+ opportunity.
It's not about gaming the algorithm — it's about being cited by it.
The brands that win in GEO won't just appear in AI responses. They'll shape them.