AI Research•February 11, 2026•KR

Can AI Read Minds? LLM Failures in Common Sense and Cognition

Theory of Mind, Physical Common Sense, Working Memory — testing where text-only LLMs fail in common sense and cognition.

Can AI Read Minds? LLM Failures in Common Sense and Cognition

Can AI Read Minds? LLM Failures in Common Sense and Cognition

Humans know that dropped objects fall. We know that if someone leaves a room and the furniture gets rearranged, they will look where they left things, not where things actually are. We know that when a fact gets updated, we should remember the new version.

All of this comes from living in a physical body and navigating the world. LLMs learn from text alone. They have read "objects fall due to gravity" thousands of times, but they have never dropped anything.

This is Part 3 of the LLM Reasoning Failures series, covering three tests in common sense and cognition:

Theory of Mind (ToM): Can models track what others believe?
Physical Common Sense: Can models handle counter-intuitive physics?
Working Memory: Can models track fact updates without reverting?

We tested 7 models: GPT-4o, GPT-4o-mini, o3-mini, Claude Sonnet 4.5, Claude Haiku 4.5, Gemini 2.5 Flash, and Gemini 2.5 Flash-Lite.

Theory of Mind: From Sally-Anne to 3rd-Order Beliefs

What Is Theory of Mind?

Theory of Mind (ToM) is the ability to understand that others can hold beliefs different from your own. The classic test is the Sally-Anne paradigm from developmental psychology.

Sally places a marble in a basket and leaves the room. Anne moves the marble to a box. When Sally returns, where will she look first?

The answer is "basket." Sally does not know the marble was moved. Most children over age 4 get this right.

Basic Tests: Nearly Perfect

We ran four standard ToM tests across all models.

🔒

Sign in to continue reading

Create a free account to access the full content.

Sign In / Sign Up

Related Posts

Self-Evolving AI Agents — The New Paradigm of 2026

AI Tools & Agents

Self-Evolving AI Agents — The New Paradigm of 2026

GenericAgent, Evolver, Open Agents — comparing 3 self-evolving agent frameworks that learn, adapt, and grow without human coding.

Build Your Own LLM Knowledge Base — A Karpathy-Style Knowledge System

AI Tools & Agents

Build Your Own LLM Knowledge Base — A Karpathy-Style Knowledge System

Complete guide to building a permanent personal knowledge system with Obsidian + Claude Code. Wiki + Memory dual-axis architecture.

Why Karpathy's CLAUDE.md Got 48K Stars — And How to Write Your Own

AI Tools & Agents

Why Karpathy's CLAUDE.md Got 48K Stars — And How to Write Your Own

One markdown file raised AI coding accuracy from 65% to 94%. Analyzing Karpathy's 4 rules and practical writing guide.