AI ResearchKR

LLM Reasoning Failures Part 1: Structural Limitations -- Scaling Won't Fix These

Reversal Curse, Counting, Compositional Reasoning — fundamental Transformer failures tested across 7 models.

LLM Reasoning Failures Part 1: Structural Limitations -- Scaling Won't Fix These

LLM Reasoning Failures Part 1: Structural Limitations -- Scaling Won't Fix These

This is the first installment in our series dissecting LLM reasoning failures. In this post, we cover three fundamental limitations that persist no matter how much you scale the model or expand the training data.

  • The Reversal Curse
  • Counting Failures
  • The Compositional Reasoning Wall

These failures stem from the Transformer architecture itself. Prompt engineering and scaling cannot fundamentally resolve them. Drawing from the survey by Song, Han, and Goodman (2025), we present hands-on experiments across 7 models alongside the theoretical analysis.

1. The Reversal Curse

What the Paper Says

If a model has learned "A is B," can it infer "B is A"? Song et al. (2025) call this failure the **Reversal Curse**. The Transformer's next-token prediction objective (unidirectional training) strengthens weights only in the "A to B" direction. "B to A" cannot be inferred unless it was separately learned.

Critically, this problem resists scaling due to Zipf's law. The sentence "Tom Cruise's mother is Mary Lee Pfeiffer" may appear in training data, but "Mary Lee Pfeiffer's son is Tom Cruise" is far rarer. When a celebrity's name is the subject, data is abundant; when an obscure person's name is the subject, data is scarce. This distributional asymmetry is structural.

🔒

Sign in to continue reading

Create a free account to access the full content.

Related Posts