Can AI Read Minds? LLM Failures in Common Sense and Cognition
Theory of Mind, Physical Common Sense, Working Memory — testing where text-only LLMs fail in common sense and cognition.

Can AI Read Minds? LLM Failures in Common Sense and Cognition
Humans know that dropped objects fall. We know that if someone leaves a room and the furniture gets rearranged, they will look where they left things, not where things actually are. We know that when a fact gets updated, we should remember the new version.
All of this comes from living in a physical body and navigating the world. LLMs learn from text alone. They have read "objects fall due to gravity" thousands of times, but they have never dropped anything.
This is Part 3 of the LLM Reasoning Failures series, covering three tests in common sense and cognition:
- Theory of Mind (ToM): Can models track what others believe?
- Physical Common Sense: Can models handle counter-intuitive physics?
- Working Memory: Can models track fact updates without reverting?
We tested 7 models: GPT-4o, GPT-4o-mini, o3-mini, Claude Sonnet 4.5, Claude Haiku 4.5, Gemini 2.5 Flash, and Gemini 2.5 Flash-Lite.
Theory of Mind: From Sally-Anne to 3rd-Order Beliefs
What Is Theory of Mind?
Related Posts

Self-Evolving AI Agents — The New Paradigm of 2026
GenericAgent, Evolver, Open Agents — comparing 3 self-evolving agent frameworks that learn, adapt, and grow without human coding.

Build Your Own LLM Knowledge Base — A Karpathy-Style Knowledge System
Complete guide to building a permanent personal knowledge system with Obsidian + Claude Code. Wiki + Memory dual-axis architecture.

Why Karpathy's CLAUDE.md Got 48K Stars — And How to Write Your Own
One markdown file raised AI coding accuracy from 65% to 94%. Analyzing Karpathy's 4 rules and practical writing guide.