Can AI Read Minds? LLM Failures in Common Sense and Cognition
Theory of Mind, Physical Common Sense, Working Memory — testing where text-only LLMs fail in common sense and cognition.

Can AI Read Minds? LLM Failures in Common Sense and Cognition
Humans know that dropped objects fall. We know that if someone leaves a room and the furniture gets rearranged, they will look where they left things, not where things actually are. We know that when a fact gets updated, we should remember the new version.
All of this comes from living in a physical body and navigating the world. LLMs learn from text alone. They have read "objects fall due to gravity" thousands of times, but they have never dropped anything.
This is Part 3 of the LLM Reasoning Failures series, covering three tests in common sense and cognition:
- Theory of Mind (ToM): Can models track what others believe?
- Physical Common Sense: Can models handle counter-intuitive physics?
- Working Memory: Can models track fact updates without reverting?
We tested 7 models: GPT-4o, GPT-4o-mini, o3-mini, Claude Sonnet 4.5, Claude Haiku 4.5, Gemini 2.5 Flash, and Gemini 2.5 Flash-Lite.
Theory of Mind: From Sally-Anne to 3rd-Order Beliefs
What Is Theory of Mind?
Related Posts

LLM Inference Optimization Part 4 — Production Serving
Production deployment with vLLM and TGI. Continuous Batching, Speculative Decoding, memory budget design, and throughput benchmarks.

LLM Inference Optimization Part 3 — Sparse Attention in Practice
Sliding Window, Sink Attention, DeepSeek DSA, IndexCache, and Nvidia DMS. From dynamic token selection to Needle-in-a-Haystack evaluation.

LLM Inference Optimization Part 2 — KV Cache Optimization
KV Cache quantization (int8/int4), PCA compression (KVTC), and PagedAttention (vLLM). Hands-on memory reduction code and scenario-based configuration guide.