Diffusion LLM Part 2: Discrete Diffusion -- How to Add Noise to Text

Diffusion LLM Part 2: Discrete Diffusion -- How Do You Add Noise to Text?

In Part 1, we explored the principles of Diffusion operating in continuous space. Adding Gaussian noise to image pixels is natural, but text tokens are discrete data. What happens if you add noise of 0.3 to "hello"?

In this post, we cover how to bring Diffusion into discrete space. Starting from D3PM's Transition Matrix and arriving at MDLM's Masked Diffusion -- the direct ancestors of LLaDA.

D3PM: Diffusion in Discrete Space

Austin et al. (2021) raise a fundamental question in D3PM (Discrete Denoising Diffusion Probabilistic Models): how do you define a forward process for discrete data where you can't add Gaussian noise?

The answer: use a Transition Matrix.

In continuous Diffusion, Gaussian noise plays a central role. In discrete Diffusion, a transition matrix Q_t takes its place. At each step t, the probability of token x_{t-1} changing to x_t is defined by a matrix:

q(x_t | x_{t-1}) = Cat(x_t; p = x_{t-1} * Q_t)

Here, Cat is the Categorical distribution, and Q_t is a K x K matrix (where K is the vocabulary size). Q_t[i][j] represents the probability of token i changing to token j.

Correspondence with continuous Diffusion:

Continuous Diffusion	Discrete Diffusion
Add Gaussian noise	Apply Transition Matrix
x_t = sqrt(a) * x_0 + sqrt(1-a) * epsilon	q(x_t given x_{t-1}) = Cat(x_t; x_{t-1} Q_t)
Final state: N(0, I)	Final state: uniform distribution or [MASK]
Predict noise	Predict original token

Continuous Diffusion

Discrete Diffusion

Add Gaussian noise

Apply Transition Matrix

x_t = sqrt(a) * x_0 + sqrt(1-a) * epsilon

q(x_t given x_{t-1}) = Cat(x_t; x_{t-1} Q_t)

Final state: N(0, I)

Final state: uniform distribution or [MASK]

Predict noise

Predict original token

Diffusion LLM Part 2: Discrete Diffusion -- How to Add Noise to Text

Diffusion LLM Part 2: Discrete Diffusion -- How Do You Add Noise to Text?

D3PM: Diffusion in Discrete Space

Three Choices for the Transition Matrix

Sign in to continue reading

Related Posts

Self-Evolving AI Agents — The New Paradigm of 2026

Build Your Own LLM Knowledge Base — A Karpathy-Style Knowledge System

Why Karpathy's CLAUDE.md Got 48K Stars — And How to Write Your Own