Flow Matching vs DDPM: Why ODE Beats SDE in Diffusion Models

Flow Matching vs DDPM: Why ODE Beats SDE in Diffusion Models
DDPM needs 1000 steps, Flow Matching needs 10. The mathematics of straight-line generation.
TL;DR
- DDPM: Remove noise gradually via stochastic process. Random perturbations at each step create curved paths
- Flow Matching: Move directly toward data via deterministic process. Straight paths enable fast generation
- Key Difference: DDPM predicts "noise", Flow Matching predicts "velocity field"
1. Problem Setup: The Path from Noise to Data
The goal of generative models is simple:
$$\text{Noise } z \sim \mathcal{N}(0, I) \quad \longrightarrow \quad \text{Data } x \sim p_{\text{data}}$$
How do we achieve this transformation? Two paradigms emerge.
DDPM's Approach: "Slowly and Stochastically"
DDPM defines a Markov chain:
$$x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$$
where $\bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s$ and $\alpha_t = 1 - \beta_t$.
As time progresses ($t \to T$), information about data $x_0$ vanishes, leaving only pure noise $\epsilon$.
Flow Matching's Approach: "Straight and Deterministic"
Flow Matching uses linear interpolation:
$$x_t = (1 - t) x_0 + t \epsilon, \quad t \in [0, 1]$$
At $t=0$, we have $x_0$ (data). At $t=1$, we have $\epsilon$ (noise). Everything in between is a straight line.
2. Different Training Objectives
DDPM: Noise Prediction
DDPM trains a neural network $\epsilon_\theta$ to predict the added noise:
$$\mathcal{L}_{\text{DDPM}} = \mathbb{E}_{x_0, \epsilon, t} \left[ \| \epsilon - \epsilon_\theta(x_t, t) \|^2 \right]$$
Why predict noise? To recover $x_{t-1}$ in the reverse process:
$$x_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \epsilon_\theta(x_t, t) \right) + \sigma_t z$$
where $z \sim \mathcal{N}(0, I)$ is fresh randomness added at each step. This is what curves the path.
Flow Matching: Velocity Prediction
Flow Matching trains a neural network $v_\theta$ to predict the velocity field:
$$\mathcal{L}_{\text{FM}} = \mathbb{E}_{x_0, \epsilon, t} \left[ \| v_t(x_t) - v_\theta(x_t, t) \|^2 \right]$$
The target velocity field is the time derivative of the conditional path:
$$v_t(x_t | x_0, \epsilon) = \frac{d}{dt} x_t = \frac{d}{dt} \left[ (1-t)x_0 + t\epsilon \right] = \epsilon - x_0$$
This velocity is constant! Regardless of time $t$, we always move in the $\epsilon - x_0$ direction at constant speed.
3. Sampling: SDE vs ODE
DDPM Sampling: SDE-Based
DDPM's reverse process follows a Stochastic Differential Equation (SDE):
$$dx = \left[ f(x, t) - g(t)^2 \nabla_x \log p_t(x) \right] dt + g(t) d\bar{w}$$
where:
- $f(x, t)$: drift coefficient
- $g(t)$: diffusion coefficient (noise magnitude)
- $d\bar{w}$: reverse-time Brownian motion
The Problem: The $g(t) d\bar{w}$ term adds randomness at every step. The path meanders like Brownian motion, requiring many small steps to reach the target.
Flow Matching Sampling: ODE-Based
Flow Matching follows an Ordinary Differential Equation (ODE):
$$\frac{dx}{dt} = v_\theta(x, t)$$
No stochastic term. We move deterministically along the learned velocity field.
Sampling:
# Euler method
x = torch.randn(batch_size, dim) # Start: pure noise
dt = 1.0 / num_steps
for t in torch.linspace(1, 0, num_steps):
v = model(x, t) # Predict velocity
x = x - v * dt # Move along straight line4. Why is Flow Matching Faster?
Mathematical Intuition
Consider the expected path length for DDPM. Due to Brownian motion characteristics:
$$\mathbb{E}\left[ \text{Path Length} \right] = \mathcal{O}(\sqrt{T})$$
where $T$ is the number of steps. More steps mean longer paths.
For Flow Matching's straight-line path:
$$\text{Path Length} = \| \epsilon - x_0 \| = \mathcal{O}(1)$$
Independent of step count. Shortest possible distance.
Empirical Evidence
DDPM requires 1000 steps for quality results; Flow Matching achieves comparable quality with just 10.
5. Rectified Flow: Evolution of Flow Matching
Rectified Flow advances Flow Matching further.
Core Idea: Reflow
The learned flow may not be perfectly straight. Reflow "straightens" it:
- Generate $(z, x_0)$ pairs using the learned model
- Train a new straight-line path on these pairs
- Repeat to progressively straighten the trajectory
$$\mathcal{L}_{\text{reflow}} = \mathbb{E}_{(z, x_0) \sim \pi_k} \left[ \| (x_0 - z) - v_\theta(x_t, t) \|^2 \right]$$
Combined with Distillation
For 1-step generation, apply distillation:
$$\mathcal{L}_{\text{distill}} = \mathbb{E}_{z} \left[ \| x_0^{\text{teacher}} - G_\theta(z) \|^2 \right]$$
where $G_\theta(z)$ generates data in a single forward pass.
6. Implementation Comparison
DDPM Forward Process
def ddpm_forward(x0, t, noise_schedule):
"""
x_t = sqrt(alpha_bar_t) * x0 + sqrt(1 - alpha_bar_t) * epsilon
"""
alpha_bar = noise_schedule.alpha_bar[t]
epsilon = torch.randn_like(x0)
x_t = torch.sqrt(alpha_bar) * x0 + torch.sqrt(1 - alpha_bar) * epsilon
return x_t, epsilonFlow Matching Forward Process
def flow_matching_forward(x0, t):
"""
x_t = (1 - t) * x0 + t * epsilon
"""
epsilon = torch.randn_like(x0)
# Reshape t for broadcasting
t = t.view(-1, 1, 1, 1) # for images
x_t = (1 - t) * x0 + t * epsilon
velocity = epsilon - x0 # target velocity
return x_t, velocityTraining Loop Comparison
# DDPM
for x0 in dataloader:
t = torch.randint(0, T, (batch_size,))
x_t, epsilon = ddpm_forward(x0, t, noise_schedule)
epsilon_pred = model(x_t, t)
loss = F.mse_loss(epsilon_pred, epsilon)
# Flow Matching
for x0 in dataloader:
t = torch.rand(batch_size) # uniform [0, 1]
x_t, velocity = flow_matching_forward(x0, t)
velocity_pred = model(x_t, t)
loss = F.mse_loss(velocity_pred, velocity)7. When to Use What?
Choose DDPM/DDIM When:
- Leveraging existing pretrained models (Stable Diffusion, etc.)
- High diversity is critical
- Stochastic sampling is required
Choose Flow Matching When:
- Fast inference is the priority
- Training from scratch
- Simple, intuitive implementation is valued
Choose Rectified Flow When:
- 1-step or few-step generation is the goal
- Real-time applications
- Mobile/edge device deployment
8. Mathematical Connection: Score and Velocity
DDPM's score function and Flow Matching's velocity are closely related.
The score function is the gradient of log probability:
$$s_\theta(x, t) = \nabla_x \log p_t(x)$$
Relationship between score and noise prediction in DDPM:
$$s_\theta(x_t, t) = -\frac{\epsilon_\theta(x_t, t)}{\sqrt{1 - \bar{\alpha}_t}}$$
Relationship between velocity and score via probability flow ODE:
$$v_\theta(x, t) = f(x, t) - \frac{1}{2} g(t)^2 s_\theta(x, t)$$
Therefore, a well-trained DDPM model can be converted to a Flow Matching model. This is one reason Stable Diffusion 3 transitioned to Rectified Flow.
Conclusion
Flow Matching emerged from asking "Why take the long way?" The simple insight that straight lines are shortest has dramatically improved generation efficiency.
References
- Ho, J., et al. "Denoising Diffusion Probabilistic Models" (NeurIPS 2020)
- Lipman, Y., et al. "Flow Matching for Generative Modeling" (ICLR 2023)
- Liu, X., et al. "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow" (ICLR 2023)
- Song, Y., et al. "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021)