Can AI agents benefit from personality? ETH Zurich, BASF SE, Cledar, IDEAS Research Institute
Researchers presented MBTI-in-Thoughts, a framework that conditions LLMs with psychologically grounded archetypes (e.g., MBTI types) via prompt engineering.
Findings:
- Emotional priming boosts narrative generation
- Analytical priming improves stability in game-theoretic tasks
- Multi-agent setups show better cooperation after self-reflection
- Personality persistence verified via 16Personalities test
- Generalizes beyond MBTI → Big Five, HEXACO, Enneagram
GitHub.
Researchers presented MBTI-in-Thoughts, a framework that conditions LLMs with psychologically grounded archetypes (e.g., MBTI types) via prompt engineering.
Findings:
- Emotional priming boosts narrative generation
- Analytical priming improves stability in game-theoretic tasks
- Multi-agent setups show better cooperation after self-reflection
- Personality persistence verified via 16Personalities test
- Generalizes beyond MBTI → Big Five, HEXACO, Enneagram
GitHub.
arXiv.org
Psychologically Enhanced AI Agents
We introduce MBTI-in-Thoughts, a framework for enhancing the effectiveness of Large Language Model (LLM) agents through psychologically grounded personality conditioning. Drawing on the...
🔥3👏3❤2
Visa introduced Visa Intelligent Commerce, an initiative to empower agents to shop and buy.
Visa’s MCP gateway connects AI agents directly to Visa Intelligent Commerce APIs and allows them to discover, authenticate and invoke integrated services like Tokenization, Authentication and Personalization to build intelligent, payment-enabled experiences.
For developers, the MCP Server provides a faster path from idea to a secure, working agent:
1. No need to hand-code every API call
2. Prototypes in hours, not weeks
3. Lets agents dynamically apply Visa APIs in new contexts
The result → AI agents that browse, buy, and transact on your behalf.
Also piloting the Visa Acceptance Agent Toolkit, built on MCP. It lets developers and business users trigger Visa Acceptance actions with plain-language prompts—no code required.
Visa’s MCP gateway connects AI agents directly to Visa Intelligent Commerce APIs and allows them to discover, authenticate and invoke integrated services like Tokenization, Authentication and Personalization to build intelligent, payment-enabled experiences.
For developers, the MCP Server provides a faster path from idea to a secure, working agent:
1. No need to hand-code every API call
2. Prototypes in hours, not weeks
3. Lets agents dynamically apply Visa APIs in new contexts
The result → AI agents that browse, buy, and transact on your behalf.
Also piloting the Visa Acceptance Agent Toolkit, built on MCP. It lets developers and business users trigger Visa Acceptance actions with plain-language prompts—no code required.
🔥5❤3👏2
University students: Get a FREE year of Gemini Pro and more. Sign up by Nov 3rd in Germany. Egypt. Saudi Arabia. UK. Mexico.
+ Unlimited image uploads
+ Nano Banana for images
+ Veo 3 for videos
+ Personalized exam prep
+ Save hours with Deep Research
+ Talk it out with Gemini Live
+ Unlimited image uploads
+ Nano Banana for images
+ Veo 3 for videos
+ Personalized exam prep
+ Save hours with Deep Research
+ Talk it out with Gemini Live
🔥3🆒3❤2🥰1
All about AI, Web 3.0, BCI
Physical Intelligence introduced Real-Time Action Chunking, a method that lets VLAs execute actions while “thinking." Instead of waiting for inference to finish, a robot can start acting with the next steps, completing the given task more quickly
Physical intelligence added pi-05 to the openpi repo: pi05-base, pi05-droid, pi05-libero. Also added PyTorch training code.
This should be a straight model upgrade over pi0 in all aspects. See eg pi05_droid leading previous models in open RoboArena evals.
GitHub.
This should be a straight model upgrade over pi0 in all aspects. See eg pi05_droid leading previous models in open RoboArena evals.
GitHub.
www.pi.website
A VLA with Open-World Generalization
Our latest generalist policy, π0.5, extends π0 and enables open-world generalization. Our new model can control a mobile manipulator to clean up an entirely new kitchen or bedroom.
🔥5❤4👏3👍1
Baidu launched ERNIE X1.1
In benchmark evaluations, it surpasses DeepSeek R1-0528 and performs on par with GPT-5 and Gemini 2.5 Pro.
Built on the foundation of ERNIE 4.5, the model is enhanced with extensive mid-training and post-training, including end-to-end reinforcement learning.
Available on ERNIE Bot, Wenxiaoyan app and MaaS platform Qianfan (via API)
In benchmark evaluations, it surpasses DeepSeek R1-0528 and performs on par with GPT-5 and Gemini 2.5 Pro.
Built on the foundation of ERNIE 4.5, the model is enhanced with extensive mid-training and post-training, including end-to-end reinforcement learning.
Available on ERNIE Bot, Wenxiaoyan app and MaaS platform Qianfan (via API)
🔥4❤3👏2
Alibaba dropped an open-source Python framework to build multi-agent applications.
Build AI agents visually with MCP tools, memory, RAG, reasoning, and tracing.
Build AI agents visually with MCP tools, memory, RAG, reasoning, and tracing.
GitHub
GitHub - agentscope-ai/agentscope: AgentScope: Agent-Oriented Programming for Building LLM Applications
AgentScope: Agent-Oriented Programming for Building LLM Applications - agentscope-ai/agentscope
🔥4❤2👏2🆒2
Claude can now create and edit files.
Turn conversations into Excel spreadsheets, documents, PowerPoint slide decks, and PDFs directly.
Claude has access to a private computer environment where it can write code and run programs.
Turn conversations into Excel spreadsheets, documents, PowerPoint slide decks, and PDFs directly.
Claude has access to a private computer environment where it can write code and run programs.
Claude
Claude can now create and edit files | Claude
Describe what you need and get back ready-to-use spreadsheets, documents, presentations, and PDFs instead of just text responses. Update: Now generally available for paid plans with net network and egress controls (October 21, 2025).
❤4🔥4🥰3
Chinese researchers have unveiled SpikingBrain-1.0, a new AI system that mimics brain neurons for highly efficient training with minimal data.
Trained entirely on domestic GPUs, it matches Transformer based models while using only ~2% of the data.
Its strength with ultra long sequences makes it well suited for fields like law, medicine, physics, and genomics.
The team has open sourced the model, released a public demo, and published a large bilingual technical report.
GitHub.
Models.
Trained entirely on domestic GPUs, it matches Transformer based models while using only ~2% of the data.
Its strength with ultra long sequences makes it well suited for fields like law, medicine, physics, and genomics.
The team has open sourced the model, released a public demo, and published a large bilingual technical report.
GitHub.
Models.
GitHub
GitHub - BICLab/SpikingBrain-7B: Spiking Brain-inspired Large Models, integrating hybrid efficient attention, MoE modules and spike…
Spiking Brain-inspired Large Models, integrating hybrid efficient attention, MoE modules and spike encoding into its architecture - BICLab/SpikingBrain-7B
👍3🔥3🥰2👏2🥱1
Wow! Microsoft will use Anthropic models to power some features of Office 365 Copilot
Why? Microsoft product leaders genuinely say they are better than OpenAI models for certain tasks.
Why? Microsoft product leaders genuinely say they are better than OpenAI models for certain tasks.
The Information
Microsoft to Buy AI From Anthropic in Partial Shift From OpenAI
Microsoft is taking its biggest step to lessen reliance on OpenAI’s artificial intelligenceby embracing the startup’s bitter rival Anthropic to power its most important software business. Microsoft will pay to use Anthropic’s technology for some AI features…
🔥3❤2👏2🤓1
Coinbase introduced the x402 Bazaar: The open, machine-readable discovery layer for x402.
Ecosystem where specialized AI services, data feeds, and APIs can thrive. A search engine for agents.
API providers: The x402 Bazaar means distribution.
List your x402 endpoint – its schema, price, a clear description – and suddenly, AI agents and developers building on x402 can find your service. This is how you tap into the coming agentic economy. Permissionless and open.
Developers and AI Agents: Your agent doesn't need pre-baked integrations for every service. It can query the Bazaar, find a service matching its requirements, and call it using x402. No keys, no pre-funding dozens of accounts.
Agents can become dynamic, autonomous entities.
Services priced, discovered, and consumed autonomously by machines. This can unlock long-tail API development and specialized AI services at a scale we haven't seen before.
Agent A needs data → finds Agent B's API → pays → gets data.
Ecosystem where specialized AI services, data feeds, and APIs can thrive. A search engine for agents.
API providers: The x402 Bazaar means distribution.
List your x402 endpoint – its schema, price, a clear description – and suddenly, AI agents and developers building on x402 can find your service. This is how you tap into the coming agentic economy. Permissionless and open.
Developers and AI Agents: Your agent doesn't need pre-baked integrations for every service. It can query the Bazaar, find a service matching its requirements, and call it using x402. No keys, no pre-funding dozens of accounts.
Agents can become dynamic, autonomous entities.
Services priced, discovered, and consumed autonomously by machines. This can unlock long-tail API development and specialized AI services at a scale we haven't seen before.
Agent A needs data → finds Agent B's API → pays → gets data.
Coinbase
Introducing x402 Bazaar: An index for self-improving AI agents
TL;DR: x402 Bazaar is the first discovery layer for agentic commerce. It gives agents a single place to find, interact with, and pay for new services - unlocking dynamic, self-improving agents that can evolve as the ecosystem grows.
🆒5🔥3👏2🥰1
Salesforce introduced SFR-DeepResearch (SFR-DR): RL-trained autonomous agents that can reason, search, and code their way through deep research tasks.
SFR-DR agents are trained to operate independently, without pre-defined multi-agent workflows. They autonomously plan, reason, and propose and take actions as defined by their tools.
SFR-DR-20B achieves 28.7% on Humanity's Last Exam (text-only) using only web search, browsing, and Python interpreter, surpassing DeepResearch with OpenAI o3 and Kimi Researcher.
SFR-DR agents are also trained to manage their own memory by summarizing previous results when context becomes limited. This enables a virtually unlimited context window, enabling long-horizon tasks
SFR-DR agents are trained to operate independently, without pre-defined multi-agent workflows. They autonomously plan, reason, and propose and take actions as defined by their tools.
SFR-DR-20B achieves 28.7% on Humanity's Last Exam (text-only) using only web search, browsing, and Python interpreter, surpassing DeepResearch with OpenAI o3 and Kimi Researcher.
SFR-DR agents are also trained to manage their own memory by summarizing previous results when context becomes limited. This enables a virtually unlimited context window, enabling long-horizon tasks
arXiv.org
SFR-DeepResearch: Towards Effective Reinforcement Learning for...
Equipping large language models (LLMs) with complex, interleaved reasoning and tool-use capabilities has become a key focus in agentic AI research, especially with recent advances in...
❤5🔥3🥰2
A new research paper from Thinking Machines (ex-openAI team): Why LLM Gives Different Answers to the Same Question (And How to Fix It)
Ever notice that ChatGPT gives you slightly different responses when you ask the same question multiple times? Even at temperature 0, where the model should theoretically always pick the most likely token?
Most people assume this happens because of sampling randomness or GPU parallelization quirks. The conventional wisdom goes something like this: "GPUs do parallel calculations, floating-point math isn't associative, so results vary depending on which threads finish first."
This explanation isn't wrong, but it misses the real culprit. Horace He and the team at Thinking Machines dug deeper and found something more fundamental: batch invariance.
Here's what's actually happening: when you send a request to an LLM API, your output depends not just on your input, but on how many other people are using the service at the same time.
The server batches requests together for efficiency, and the batch size affects the numerical computations.
Even though each individual operation might be deterministic, the same input can produce different outputs depending on whether it's processed alone or with 10, 100, or 1000 other requests.
Think of it this way: you ask a question, but the answer changes based on how crowded the "room" is when you ask it.
This work challenges a common attitude in ML: "our systems are already probabilistic, so what's a little more randomness?" The researchers argue this is defeatist. With careful engineering, we can understand and eliminate these sources of nondeterminism.
They've open-sourced their implementation on top of vLLM, making it possible for others to achieve truly deterministic LLM inference today.
Ever notice that ChatGPT gives you slightly different responses when you ask the same question multiple times? Even at temperature 0, where the model should theoretically always pick the most likely token?
Most people assume this happens because of sampling randomness or GPU parallelization quirks. The conventional wisdom goes something like this: "GPUs do parallel calculations, floating-point math isn't associative, so results vary depending on which threads finish first."
This explanation isn't wrong, but it misses the real culprit. Horace He and the team at Thinking Machines dug deeper and found something more fundamental: batch invariance.
Here's what's actually happening: when you send a request to an LLM API, your output depends not just on your input, but on how many other people are using the service at the same time.
The server batches requests together for efficiency, and the batch size affects the numerical computations.
Even though each individual operation might be deterministic, the same input can produce different outputs depending on whether it's processed alone or with 10, 100, or 1000 other requests.
Think of it this way: you ask a question, but the answer changes based on how crowded the "room" is when you ask it.
This work challenges a common attitude in ML: "our systems are already probabilistic, so what's a little more randomness?" The researchers argue this is defeatist. With careful engineering, we can understand and eliminate these sources of nondeterminism.
They've open-sourced their implementation on top of vLLM, making it possible for others to achieve truly deterministic LLM inference today.
Thinking Machines Lab
Defeating Nondeterminism in LLM Inference
Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language models.
For example, you might observe that asking ChatGPT the same question multiple times provides different results.…
For example, you might observe that asking ChatGPT the same question multiple times provides different results.…
❤4🔥4🥰2
Chinese researchers introduced WebExplorer, which is a simple yet effective approach to train long-horizon web agents.
Instead of depending heavily on rigid pre-defined graph structures, WebExplorer utilizes the model-based exploration strategy to synthesize high-quality agentic data.
8B model is able to outperform most 32B or even 72B models on BrowseComp and HLE.
Instead of depending heavily on rigid pre-defined graph structures, WebExplorer utilizes the model-based exploration strategy to synthesize high-quality agentic data.
8B model is able to outperform most 32B or even 72B models on BrowseComp and HLE.
arXiv.org
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
The paradigm of Large Language Models (LLMs) has increasingly shifted toward agentic applications, where web browsing capabilities are fundamental for retrieving information from diverse online...
🔥3❤2👍2
Nvidia released La-Proteina fully open source
La-Proteina is generative model demonstrating accurate co-design of fully atomistic protein structures (sequence + side-chains + backbone) at scale, up to 800 residues, with state-of-the-art atomistic motif scaffolding performance - has just made its code open-source.
Paper.
Code.
La-Proteina is generative model demonstrating accurate co-design of fully atomistic protein structures (sequence + side-chains + backbone) at scale, up to 800 residues, with state-of-the-art atomistic motif scaffolding performance - has just made its code open-source.
Paper.
Code.
Nvidia
La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
La-Proteina is a novel partially-latent fully atomistic protein design model. Protein backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables. La-Proteina achieves state-of-the-art performance…
🔥3👍2🥰2
Medra AI has automated experimentation down to the physical level with reasoning and robotics.
The Medra technology platform consists of two core components:
1. Physical AI: Their general-purpose robots use vision-language models (VLMs) to operate standard laboratory instruments flexibly and execute experimental protocols. Medra is the first company to deploy Physical AI in the laboratory, leveraging the same advanced models that power self-driving cars and humanoid robots.
2. Scientific AI: Their reasoning models analyze experimental results and integrate with partners' internal infrastructure—such as LIMS, electronic lab notebooks, and ML pipelines—to glean insights from disparate data sources.
These two systems operate in a closed loop: Physical AI executes experiments while Scientific AI analyzes the outcomes and iterates on the design. This cycle helps scientists rapidly converge on the optimal protocol.
The Medra technology platform consists of two core components:
1. Physical AI: Their general-purpose robots use vision-language models (VLMs) to operate standard laboratory instruments flexibly and execute experimental protocols. Medra is the first company to deploy Physical AI in the laboratory, leveraging the same advanced models that power self-driving cars and humanoid robots.
2. Scientific AI: Their reasoning models analyze experimental results and integrate with partners' internal infrastructure—such as LIMS, electronic lab notebooks, and ML pipelines—to glean insights from disparate data sources.
These two systems operate in a closed loop: Physical AI executes experiments while Scientific AI analyzes the outcomes and iterates on the design. This cycle helps scientists rapidly converge on the optimal protocol.
www.medra.ai
Physical AI in the Lab: Unlocking Data for Scientific Breakthroughs
Medra is building new AI technology to empower scientists in the lab.
❤4🥰2👏2👎1🔥1
ByteDance launched Seedream 4.0, an image generation tool that aims to compete with Google's “Nano Banana” AI image editor.
⚡️ Claude now has memory. Anthropic also introduced incognito chats for all users.
With project-scoped memory, each project maintains its own focused context.
Memory is fully optional with granular controls.
In settings, view the complete memory summary, edit what's stored, and guide Claude by telling it what to focus on or ignore.
With project-scoped memory, each project maintains its own focused context.
Memory is fully optional with granular controls.
In settings, view the complete memory summary, edit what's stored, and guide Claude by telling it what to focus on or ignore.
Claude
Bringing memory to teams | Claude
Today, we’re introducing memory to the Claude app, where Claude remembers you and your team’s projects and preferences, eliminating the need to re-explain context and keeping complex work moving forward.
🔥4❤3👏3
Anthropic shared the best tips for developers how to writing effective tools for LLM agents.
Anthropic
Writing effective tools for AI agents—using AI agents
🔥6❤2🥰2
Meet Gauss the first autoformalization agent that just completed Terry Tao & Alex Kontorovich's Strong Prime Number Theorem project in 3 weeks—an effort that took human experts 18+ months of partial progress.
GitHub.
Early access.
GitHub.
Early access.
GitHub
GitHub - math-inc/strongpnt
Contribute to math-inc/strongpnt development by creating an account on GitHub.
❤6🔥2👏2