Meta introduced NaturalThoughts
Data curation for general reasoning capabilities is still relatively underexplored.
Researchers systematically compare different metrics for selecting high-quality and diverse reasoning traces in terms of data efficiency in the distillation setting.
Researchers find diversity in reasoning strategies matters more than topics diversity, and challenging questions are more sample efficient in distilling reasoning capabilities.
Researchers find that the Less-Is-More approach is not sufficient for solving general reasoning tasks, but scaling up data quantity always brings consistent gains.
Researchers find that NaturalThoughts outperforms state-of-the-art reasoning datasets such as OpenThoughts3, LIMO, S1k, etc. on general STEM domains.
Also find that distillation based on reasoning difficulty can improve the pareto frontier of the student model’s inference efficiency.
Training with a mix of full reasoning traces and the condensed answers enables efficient hybrid reasoning in the student model, by adaptively switching between long chain-of-thought thinking and directly answering.
Data curation for general reasoning capabilities is still relatively underexplored.
Researchers systematically compare different metrics for selecting high-quality and diverse reasoning traces in terms of data efficiency in the distillation setting.
Researchers find diversity in reasoning strategies matters more than topics diversity, and challenging questions are more sample efficient in distilling reasoning capabilities.
Researchers find that the Less-Is-More approach is not sufficient for solving general reasoning tasks, but scaling up data quantity always brings consistent gains.
Researchers find that NaturalThoughts outperforms state-of-the-art reasoning datasets such as OpenThoughts3, LIMO, S1k, etc. on general STEM domains.
Also find that distillation based on reasoning difficulty can improve the pareto frontier of the student model’s inference efficiency.
Training with a mix of full reasoning traces and the condensed answers enables efficient hybrid reasoning in the student model, by adaptively switching between long chain-of-thought thinking and directly answering.
arXiv.org
NaturalThoughts: Selecting and Distilling Reasoning Traces for...
Recent work has shown that distilling reasoning traces from a larger teacher model via supervised finetuning outperforms reinforcement learning with the smaller student model alone (Guo et al....
❤6
Meta introduced research on embodied AI agents that can perceive, learn, act and interact in the virtual and physical worlds.
🆒5
HeyGen launched a new Video Agent that handles content production end-to-end
Using just a doc, some footage, or even a sentence, it can find a story, write the script, select shots/generate new footage, and edit everything for final release.
Using just a doc, some footage, or even a sentence, it can find a story, write the script, select shots/generate new footage, and edit everything for final release.
HeyGen
AI Video Agent | Create and Automate Videos with AI | HeyGen
Meet HeyGen’s AI Video Agent. Instantly generate scripts, voiceovers, avatars, and translations to transform any idea into a compelling video. No credit card required.
🔥3🆒3💅2
Genspark just launched AI Docs, completing their suite with AI Slides and Sheets.
It's similar to the Gemini integration in Google Docs except with a much better UX, where the AI acts more like a creative partner than just a generative tool: you get to iterate together on the output instead of just prompting once and editing the result. And it has markdown support.
It's similar to the Gemini integration in Google Docs except with a much better UX, where the AI acts more like a creative partner than just a generative tool: you get to iterate together on the output instead of just prompting once and editing the result. And it has markdown support.
🥰3🆒3
The Hong Kong Stablecoin Ordinance will officially take effect on August 1 this year
The Hong Kong Monetary Authority will open the license application. It is expected that only a single digit number will be issued, but more than 40 companies are currently preparing to apply.
The applicants are basically the largest financial institutions and Internet companies in China.
The Hong Kong Monetary Authority will open the license application. It is expected that only a single digit number will be issued, but more than 40 companies are currently preparing to apply.
The applicants are basically the largest financial institutions and Internet companies in China.
OpenAI published "Working with 400,000 teachers to shape the future of AI in schools"
OpenAI joining the American Federation of Teachers as the founding partner to launch the National Academy for AI Instruction, a five-year initiative to equip 400,000 K-12 educators with OpenAI contributing $10 million over five years ($8 million in direct funding and $2 million in in-kind resources) alongside the United Federation of Teachers, Microsoft, and Anthropic in supporting the initiative
OpenAI joining the American Federation of Teachers as the founding partner to launch the National Academy for AI Instruction, a five-year initiative to equip 400,000 K-12 educators with OpenAI contributing $10 million over five years ($8 million in direct funding and $2 million in in-kind resources) alongside the United Federation of Teachers, Microsoft, and Anthropic in supporting the initiative
Openai
Working with 400,000 teachers to shape the future of AI in schools
OpenAI joins the American Federation of Teachers to launch the National Academy for AI Instruction.
🔥5
New Mistral Cookbook: Finetuning Pixtral on a satellite imagery dataset 🛰️
- How to call Mistral's batch inference API
- How to pass images (encoded in base64) in your API calls to Mistral's VLM (here Pixtral-12B)
- How to fine-tune Pixtral-12B on an image classification problem in order to improve its accuracy.
- How to call Mistral's batch inference API
- How to pass images (encoded in base64) in your API calls to Mistral's VLM (here Pixtral-12B)
- How to fine-tune Pixtral-12B on an image classification problem in order to improve its accuracy.
GitHub
cookbook/mistral/fine_tune/pixtral_finetune_on_satellite_data.ipynb at main · mistralai/cookbook
Contribute to mistralai/cookbook development by creating an account on GitHub.
🔥7
HuggingFace released SmolLM3: a strong, smol reasoner
> SoTA 3B model
> dual mode reasoning (think/no_think)
> long context, up to 128k
> multilingual: en, fr, es, de, it, pt
> fully open source (data, code, recipes)
> SoTA 3B model
> dual mode reasoning (think/no_think)
> long context, up to 128k
> multilingual: en, fr, es, de, it, pt
> fully open source (data, code, recipes)
🔥3
The biggest dataset of human written GPU Code all open-source? YES! GPU MODE have released around 40k human written code samples spanning Triton, Hip and PyTorch and it's all open. Train the new GPT to make GPTs faster.
huggingface.co
GPUMODE/kernelbot-data · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥4
Google DeepMind introduced T5Gemma: the next generation of encoder-decoder/T5 models
- Decoder models adapted to be encoder-decoder
- 32 models with different combinations
- Available in Hugging Face and Kaggle
- Decoder models adapted to be encoder-decoder
- 32 models with different combinations
- Available in Hugging Face and Kaggle
Googleblog
Google for Developers Blog - News about Web, Mobile, AI and Cloud
Explore T5Gemma – a new collection of encoder-decoder LLMs offering superior performance and efficiency – especially for tasks requiring deep input understanding, like summarization and translation, built on Gemma 2 models.
❤6
xAI announced Grok 4
Here is everything you need to know:
Elon claims that Grok 4 is smarter than almost all grad students in all disciplines simultaneously. 100x more training than Grok 2. 10x more compute on RL than any of the models out there.
Performance on Humanity's Last Exam. Elon: "Grok 4 is post-grad level in everything!"
Scaling HLE - Training
More compute, higher intelligence.
(no tools).
With native tool calling, Grok 4 increases the performance significantly.
It's important to give AI the right tools. The scaling is clear.
Reliable signals are key to making RL work. There is still the challenge of data. Elon: "Ultimate reasoning test is AI operating in reality."
Scaling test-time compute. More than 50% of the text-only subset of the HLE problems are solved.
The curves keep getting more ridiculous.
Grok 4 is the single-agent version.
Grok 4 Heavy is the multi-agent version. Multi-agent systems are no joke.
Grok 4 uses all kinds of references like papers, reads PDFs, reasons about the details of the simulation, and what data to use.
Grok 4 Heavy performance is higher than Grok 4, but needs to be improved further. It's one of the weaknesses, according to the team.
Available as SuperGrok Heavy tier.
$30/m for Super Grok
$300/m for SuperGrok Heavy.
Voice updates included, too!
Grok feels snappier and is designed to be more natural.
- 2x faster
- 5 voices
- 10x daily user seconds.
Grok 4 models are available via the xAI API. 256K context window. Real-time data search.
Grok 4 for Gaming!
Video understanding is an area the team is improving, so it will get better.
What is next?
- Smart and fast will be the focus.
- Coding models are also a big focus.
- More capable multi-modal agents are coming too.
- Video generation models are also on the horizon.
Here is everything you need to know:
Elon claims that Grok 4 is smarter than almost all grad students in all disciplines simultaneously. 100x more training than Grok 2. 10x more compute on RL than any of the models out there.
Performance on Humanity's Last Exam. Elon: "Grok 4 is post-grad level in everything!"
Scaling HLE - Training
More compute, higher intelligence.
(no tools).
With native tool calling, Grok 4 increases the performance significantly.
It's important to give AI the right tools. The scaling is clear.
Reliable signals are key to making RL work. There is still the challenge of data. Elon: "Ultimate reasoning test is AI operating in reality."
Scaling test-time compute. More than 50% of the text-only subset of the HLE problems are solved.
The curves keep getting more ridiculous.
Grok 4 is the single-agent version.
Grok 4 Heavy is the multi-agent version. Multi-agent systems are no joke.
Grok 4 uses all kinds of references like papers, reads PDFs, reasons about the details of the simulation, and what data to use.
Grok 4 Heavy performance is higher than Grok 4, but needs to be improved further. It's one of the weaknesses, according to the team.
Available as SuperGrok Heavy tier.
$30/m for Super Grok
$300/m for SuperGrok Heavy.
Voice updates included, too!
Grok feels snappier and is designed to be more natural.
- 2x faster
- 5 voices
- 10x daily user seconds.
Grok 4 models are available via the xAI API. 256K context window. Real-time data search.
Grok 4 for Gaming!
Video understanding is an area the team is improving, so it will get better.
What is next?
- Smart and fast will be the focus.
- Coding models are also a big focus.
- More capable multi-modal agents are coming too.
- Video generation models are also on the horizon.
🔥4
Google introduced a new models for research & development of health applications:
1. MedGemma 27B Multimodal, for complex multimodal & longitudinal EHR interpretation
2. MedSigLIP, a lightweight image & text encoder for classification, search, & related tasks.
1. MedGemma 27B Multimodal, for complex multimodal & longitudinal EHR interpretation
2. MedSigLIP, a lightweight image & text encoder for classification, search, & related tasks.
research.google
MedGemma: Our most capable open models for health AI development
Salesforce introduced GTA1 – a new GUI Test-time Scaling Agent that is now #1 on the OSWorld leaderboard with a 45.2% success rate, outperforming OpenAI’s CUA o3 (42.9%).