Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines

Andrej Karpathy has released new code. This time, it is even more extreme than nanoGPT. A 150-line script that trains and runs inference on a GPT, using pure Python with no external libraries.

No PyTorch. No NumPy. Just three imports: os, math, random.

The comment at the top of the code says it all:

"This file is the complete algorithm. Everything else is just efficiency."

In this post, we dissect microgpt.py line by line. Follow along with the code, and you will see that the algorithm behind GPT is a surprisingly simple composition of mathematical operations.

Overall Structure

microgpt.py breaks down into roughly 6 parts:

Part	Lines	Role
Data & Tokenizer	~10	Load name dataset, character-level tokenization
Value Class (Autograd)	~35	Scalar automatic differentiation engine
Parameter Initialization	~15	Weight matrix creation (4,192 parameters)
Model Architecture	~40	Embedding + Attention + MLP + RMSNorm
Training Loop	~20	Cross-entropy loss + Adam optimizer
Inference	~15	Name generation via temperature sampling

Total parameters: 4,192. Compared to GPT-2 Small's 124M, that is roughly 30,000x smaller. But the algorithm is identical.

import os import math import random random.seed(42) if not os.path.exists('input.txt'): import urllib.request names_url = 'https://raw.githubusercontent.com/karpathy/makemore/refs/heads/master/names.txt' urllib.request.urlretrieve(names_url, 'input.txt') docs = [l.strip() for l in open('input.txt').read().strip().split('\n') if l.strip()] random.shuffle(docs)

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines

Overall Structure

1. Data and Tokenizer

Sign in to continue reading

Related Posts

Build Your Own LLM Knowledge Base — A Karpathy-Style Knowledge System

Why Karpathy's CLAUDE.md Got 48K Stars — And How to Write Your Own

Why AI Forgets Everything — 3 Open-Source Solutions to the Memory Crisis