TransformerLens in Practice: Reading Model Circuits with Activation Patching
Using TransformerLens to directly manipulate model activations, we trace which layers and heads causally produce the answer. A hands-on guide to activation patching.

TransformerLens in Practice: Reading Model Circuits with Activation Patching
In the previous post, we treated Lens as a window into the model's intermediate thoughts.
But "reading" alone cannot answer the most important question:
Does the model actually *use* this information?
Just because a hidden state at some layer contains "Paris" does not mean that layer causally contributes to the final answer. Information can be present but unused. A layer might hold the right answer in its representation, yet the model might arrive at its output through entirely different pathways.
Related Posts

Claude Sonnet 4.6: Opus-Level Performance, 40% Cheaper — Benchmark Deep Dive
Claude Sonnet 4.6 scores 79.6% on SWE-bench, 72.5% on OSWorld, and 1633 Elo on GDPval-AA — matching or beating Opus 4.6 on production tasks. $3/$15 vs $5/$25 per M tokens. Analysis of Adaptive Thinking, Context Compaction, and OSWorld growth trajectory.

MiniMax M2.5: Opus-Level Performance at $1 per Hour
MiniMax M2.5 achieves SWE-bench 80.2% using only 10B active parameters from a 230B MoE architecture. 1/20th the cost of Claude Opus with comparable coding performance. Forge RL framework, benchmark analysis, pricing comparison.

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort
How microgpt.py's 15-line backward() works. From high school calculus to chain rule, computation graphs, topological sort, and backpropagation.