LLM Reasoning Failures Part 1: Structural Limitations -- Scaling Won't Fix These

This is the first installment in our series dissecting LLM reasoning failures. In this post, we cover three fundamental limitations that persist no matter how much you scale the model or expand the training data.

The Reversal Curse
Counting Failures
The Compositional Reasoning Wall

These failures stem from the Transformer architecture itself. Prompt engineering and scaling cannot fundamentally resolve them. Drawing from the survey by Song, Han, and Goodman (2025), we present hands-on experiments across 7 models alongside the theoretical analysis.

1. The Reversal Curse

What the Paper Says

If a model has learned "A is B," can it infer "B is A"? Song et al. (2025) call this failure the **Reversal Curse**. The Transformer's next-token prediction objective (unidirectional training) strengthens weights only in the "A to B" direction. "B to A" cannot be inferred unless it was separately learned.

Critically, this problem resists scaling due to Zipf's law. The sentence "Tom Cruise's mother is Mary Lee Pfeiffer" may appear in training data, but "Mary Lee Pfeiffer's son is Tom Cruise" is far rarer. When a celebrity's name is the subject, data is abundant; when an obscure person's name is the subject, data is scarce. This distributional asymmetry is structural.

Model	Forward	Reverse	Verdict
GPT-4o	Mary Lee Pfeiffer	Tom Cruise	PASS
GPT-4o-mini	Mary Lee Pfeiffer	Tom Cruise	PASS
o3-mini	Mary Lee Pfeiffer	Tom Cruise	PASS
Claude Sonnet 4.5	Mary Lee Pfeiffer	Tom Cruise	PASS
Claude Haiku 4.5	Mary Lee Pfeiffer	"I don't have reliable information"	FAIL
Gemini 2.5 Flash	Mary Lee Pfeiffer	"Joaquin Phoenix"	FAIL
Gemini 2.5 Flash-Lite	Mary Lee Pfeiffer	"Michelle Pfeiffer"	FAIL

Model

Forward

Reverse

Verdict

GPT-4o

Mary Lee Pfeiffer

Tom Cruise

PASS

GPT-4o-mini

Mary Lee Pfeiffer

Tom Cruise

PASS

o3-mini

Mary Lee Pfeiffer

Tom Cruise

PASS

Claude Sonnet 4.5

Mary Lee Pfeiffer

Tom Cruise

PASS

Claude Haiku 4.5

Mary Lee Pfeiffer

"I don't have reliable information"

FAIL

Gemini 2.5 Flash

Mary Lee Pfeiffer

"Joaquin Phoenix"

FAIL

Gemini 2.5 Flash-Lite

Mary Lee Pfeiffer

"Michelle Pfeiffer"

FAIL

LLM Reasoning Failures Part 1: Structural Limitations -- Scaling Won't Fix These