Flow Matching vs DDPM: Why ODE Beats SDE in Diffusion Models

DDPM needs 1000 steps, Flow Matching needs 10. The mathematics of straight-line generation.

TL;DR

DDPM: Remove noise gradually via stochastic process. Random perturbations at each step create curved paths
Flow Matching: Move directly toward data via deterministic process. Straight paths enable fast generation
Key Difference: DDPM predicts "noise", Flow Matching predicts "velocity field"

1. Problem Setup: The Path from Noise to Data

The goal of generative models is simple:

$$\text{Noise } z \sim \mathcal{N}(0, I) \quad \longrightarrow \quad \text{Data } x \sim p_{\text{data}}$$

How do we achieve this transformation? Two paradigms emerge.

DDPM's Approach: "Slowly and Stochastically"

DDPM defines a Markov chain:

$$x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$$

where $\bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s$ and $\alpha_t = 1 - \beta_t$.

As time progresses ($t \to T$), information about data $x_0$ vanishes, leaving only pure noise $\epsilon$.

Flow Matching's Approach: "Straight and Deterministic"

Flow Matching uses linear interpolation:

$$x_t = (1 - t) x_0 + t \epsilon, \quad t \in [0, 1]$$

At $t=0$, we have $x_0$ (data). At $t=1$, we have $\epsilon$ (noise). Everything in between is a straight line.

2. Different Training Objectives

DDPM: Noise Prediction

DDPM trains a neural network $\epsilon_\theta$ to predict the added noise:

$$\mathcal{L}_{\text{DDPM}} = \mathbb{E}_{x_0, \epsilon, t} \left[ \| \epsilon - \epsilon_\theta(x_t, t) \|^2 \right]$$

Why predict noise? To recover $x_{t-1}$ in the reverse process:

$$x_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \epsilon_\theta(x_t, t) \right) + \sigma_t z$$

where $z \sim \mathcal{N}(0, I)$ is fresh randomness added at each step. This is what curves the path.

Flow Matching: Velocity Prediction

Flow Matching trains a neural network $v_\theta$ to predict the velocity field:

$$\mathcal{L}_{\text{FM}} = \mathbb{E}_{x_0, \epsilon, t} \left[ \| v_t(x_t) - v_\theta(x_t, t) \|^2 \right]$$

The target velocity field is the time derivative of the conditional path:

$$v_t(x_t | x_0, \epsilon) = \frac{d}{dt} x_t = \frac{d}{dt} \left[ (1-t)x_0 + t\epsilon \right] = \epsilon - x_0$$

This velocity is constant! Regardless of time $t$, we always move in the $\epsilon - x_0$ direction at constant speed.

3. Sampling: SDE vs ODE

DDPM Sampling: SDE-Based

DDPM's reverse process follows a Stochastic Differential Equation (SDE):

$$dx = \left[ f(x, t) - g(t)^2 \nabla_x \log p_t(x) \right] dt + g(t) d\bar{w}$$

where:

$f(x, t)$: drift coefficient
$g(t)$: diffusion coefficient (noise magnitude)
$d\bar{w}$: reverse-time Brownian motion

The Problem: The $g(t) d\bar{w}$ term adds randomness at every step. The path meanders like Brownian motion, requiring many small steps to reach the target.

Flow Matching Sampling: ODE-Based

Flow Matching follows an Ordinary Differential Equation (ODE):

$$\frac{dx}{dt} = v_\theta(x, t)$$

No stochastic term. We move deterministically along the learned velocity field.

Sampling:

# Euler method
x = torch.randn(batch_size, dim)  # Start: pure noise
dt = 1.0 / num_steps

for t in torch.linspace(1, 0, num_steps):
    v = model(x, t)  # Predict velocity
    x = x - v * dt   # Move along straight line

4. Why is Flow Matching Faster?

Mathematical Intuition

Consider the expected path length for DDPM. Due to Brownian motion characteristics:

$$\mathbb{E}\left[ \text{Path Length} \right] = \mathcal{O}(\sqrt{T})$$

where $T$ is the number of steps. More steps mean longer paths.

For Flow Matching's straight-line path:

$$\text{Path Length} = \| \epsilon - x_0 \| = \mathcal{O}(1)$$

Independent of step count. Shortest possible distance.

Empirical Evidence

DDPM requires 1000 steps for quality results; Flow Matching achieves comparable quality with just 10.

5. Rectified Flow: Evolution of Flow Matching

Rectified Flow advances Flow Matching further.

Core Idea: Reflow

The learned flow may not be perfectly straight. Reflow "straightens" it:

Generate $(z, x_0)$ pairs using the learned model
Train a new straight-line path on these pairs
Repeat to progressively straighten the trajectory

$$\mathcal{L}_{\text{reflow}} = \mathbb{E}_{(z, x_0) \sim \pi_k} \left[ \| (x_0 - z) - v_\theta(x_t, t) \|^2 \right]$$

Combined with Distillation

For 1-step generation, apply distillation:

$$\mathcal{L}_{\text{distill}} = \mathbb{E}_{z} \left[ \| x_0^{\text{teacher}} - G_\theta(z) \|^2 \right]$$

where $G_\theta(z)$ generates data in a single forward pass.

6. Implementation Comparison

DDPM Forward Process

def ddpm_forward(x0, t, noise_schedule):
    """
    x_t = sqrt(alpha_bar_t) * x0 + sqrt(1 - alpha_bar_t) * epsilon
    """
    alpha_bar = noise_schedule.alpha_bar[t]
    epsilon = torch.randn_like(x0)

    x_t = torch.sqrt(alpha_bar) * x0 + torch.sqrt(1 - alpha_bar) * epsilon
    return x_t, epsilon

Flow Matching Forward Process

def flow_matching_forward(x0, t):
    """
    x_t = (1 - t) * x0 + t * epsilon
    """
    epsilon = torch.randn_like(x0)

    # Reshape t for broadcasting
    t = t.view(-1, 1, 1, 1)  # for images

    x_t = (1 - t) * x0 + t * epsilon
    velocity = epsilon - x0  # target velocity

    return x_t, velocity

Training Loop Comparison

# DDPM
for x0 in dataloader:
    t = torch.randint(0, T, (batch_size,))
    x_t, epsilon = ddpm_forward(x0, t, noise_schedule)

    epsilon_pred = model(x_t, t)
    loss = F.mse_loss(epsilon_pred, epsilon)

# Flow Matching
for x0 in dataloader:
    t = torch.rand(batch_size)  # uniform [0, 1]
    x_t, velocity = flow_matching_forward(x0, t)

    velocity_pred = model(x_t, t)
    loss = F.mse_loss(velocity_pred, velocity)

7. When to Use What?

Choose DDPM/DDIM When:

Leveraging existing pretrained models (Stable Diffusion, etc.)
High diversity is critical
Stochastic sampling is required

Choose Flow Matching When:

Fast inference is the priority
Training from scratch
Simple, intuitive implementation is valued

Choose Rectified Flow When:

1-step or few-step generation is the goal
Real-time applications
Mobile/edge device deployment

8. Mathematical Connection: Score and Velocity

DDPM's score function and Flow Matching's velocity are closely related.

The score function is the gradient of log probability:

$$s_\theta(x, t) = \nabla_x \log p_t(x)$$

Relationship between score and noise prediction in DDPM:

$$s_\theta(x_t, t) = -\frac{\epsilon_\theta(x_t, t)}{\sqrt{1 - \bar{\alpha}_t}}$$

Relationship between velocity and score via probability flow ODE:

$$v_\theta(x, t) = f(x, t) - \frac{1}{2} g(t)^2 s_\theta(x, t)$$

Therefore, a well-trained DDPM model can be converted to a Flow Matching model. This is one reason Stable Diffusion 3 transitioned to Rectified Flow.

Conclusion

Flow Matching emerged from asking "Why take the long way?" The simple insight that straight lines are shortest has dramatically improved generation efficiency.

References

Ho, J., et al. "Denoising Diffusion Probabilistic Models" (NeurIPS 2020)
Lipman, Y., et al. "Flow Matching for Generative Modeling" (ICLR 2023)
Liu, X., et al. "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow" (ICLR 2023)
Song, Y., et al. "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021)