Rectified Flow: Straightening Paths Toward 1-Step Generation

Flow Matching still too slow? Reflow straightens trajectories to enable 1-step generation.

TL;DR

Rectified Flow: Iteratively "straightens" Flow Matching trajectories
Reflow: Generate (noise, data) pairs with learned model, then retrain on straight-line paths
Key Benefit: More reflow iterations → straighter paths → eventually 1-step generation
Real Applications: Stable Diffusion 3 and FLUX are built on Rectified Flow

1. Why Flow Matching Alone Isn't Enough

Flow Matching generates samples in far fewer steps (10-50) than DDPM. But limitations remain.

Flow Matching's Limitation

The target velocity field in Flow Matching is:

$$v_t(x_t | x_0, z) = z - x_0$$

Theoretically constant, but in practice we learn the marginal velocity field:

$$v_t(x_t) = \mathbb{E}_{x_0, z | x_t}[z - x_0]$$

The problem: different $(x_0, z)$ pairs can pass through the same $x_t$. When these trajectories cross, the learned velocity becomes their average, resulting in curved paths.

The Crossing Problem

Given two data points $x_0^{(1)}, x_0^{(2)}$ and two noise samples $z^{(1)}, z^{(2)}$:

$$x_t^{(1)} = (1-t)x_0^{(1)} + tz^{(1)}$$

$$x_t^{(2)} = (1-t)x_0^{(2)} + tz^{(2)}$$

If $x_t^{(1)} = x_t^{(2)}$ at some $t$, the network predicts the average of both directions. This increases transport cost and requires more sampling steps.

2. The Core Idea of Rectified Flow

Rectified Flow is simple but powerful:

**"Generate (z, x₀) pairs using the learned flow, then retrain on straight-line paths between them—this straightens the trajectories"**

The Reflow Procedure

Initial Flow Matching: Train base model $v_{\theta_0}$ on random $(x_0, z)$ pairs
Generate Coupling: Use trained model to generate data $\hat{x}_0$ from noise $z$

- Now $(z, \hat{x}_0)$ are pairs actually connected by the flow

Reflow Training: Train new model $v_{\theta_1}$ on straight-line paths between $(z, \hat{x}_0)$
Iterate: Repeat steps 2-3 for increasingly straight paths

Mathematical Formulation

Let $\pi_k$ denote the coupling after $k$ reflows:

$$\mathcal{L}_{\text{reflow}}^{(k)} = \mathbb{E}_{(x_0, z) \sim \pi_k, t} \left[ \| (z - x_0) - v_{\theta}(x_t, t) \|^2 \right]$$

where $x_t = (1-t)x_0 + tz$ and $\pi_k$ is the coupling generated by the $k$-th model.

3. Why Does Reflow Straighten Paths?

Intuitive Understanding

Initially, we use random couplings $(x_0, z)$. These trajectories can cross each other.

But following the learned flow $\phi_1$:

A trajectory starting from $z$ arrives at a specific $\hat{x}_0$
This $(z, \hat{x}_0)$ pair is already connected by the flow
Therefore, straight-line paths between these pairs cross less

Transport Cost Reduction

The key to reflow is reducing transport cost:

$$\text{Cost}(\pi) = \mathbb{E}_{(x_0, z) \sim \pi} \left[ \| z - x_0 \|^2 \right]$$

With successive reflows:

$$\text{Cost}(\pi_0) \geq \text{Cost}(\pi_1) \geq \text{Cost}(\pi_2) \geq \cdots$$

As paths straighten, transport cost decreases.

Theoretical Guarantees

Key properties proven in the paper:

Causality: Reflowed couplings are causal—given $z$, $x_0$ is determined
Straightness: Infinite reflows yield perfectly straight paths
1-Step Possibility: With perfectly straight paths, 1-step Euler sampling is exact

4. 1-Step Distillation

While reflow straightens paths, practical 1-step generation requires distillation.

Progressive Distillation

Gradually reduce step count:

Teacher model: N steps
Student model: Mimic teacher output with N/2 steps
Repeat until reaching 1 step

$$\mathcal{L}_{\text{distill}} = \mathbb{E}_{z} \left[ \| \phi_{\text{teacher}}(z) - G_{\theta}(z) \|^2 \right]$$

Direct Distillation

Rectified Flow's advantage: paths are already nearly straight, enabling direct 1-step distillation:

$$\mathcal{L}_{\text{1-step}} = \mathbb{E}_{z} \left[ \| x_0 - (z - v_{\theta}(z, 1)) \|^2 \right]$$

where $v_{\theta}(z, 1)$ is the velocity prediction at $t=1$.

5. Implementation

Reflow Training

class RectifiedFlow:
    def __init__(self, model):
        self.model = model

    def loss(self, x0, z):
        """Reflow loss with fixed coupling."""
        t = torch.rand(x0.shape[0], device=x0.device)

        # Linear interpolation
        x_t = (1 - t[:, None]) * x0 + t[:, None] * z

        # Target velocity (straight line)
        v_target = z - x0

        # Predicted velocity
        v_pred = self.model(x_t, t)

        return F.mse_loss(v_pred, v_target)

    @torch.no_grad()
    def sample(self, z, n_steps=1):
        """Sample with Euler method."""
        x = z
        dt = 1.0 / n_steps

        for i in range(n_steps):
            t = 1.0 - i * dt
            t_batch = torch.full((x.shape[0],), t, device=x.device)
            v = self.model(x, t_batch)
            x = x - v * dt

        return x

    @torch.no_grad()
    def generate_coupling(self, z, n_steps=50):
        """Generate (z, x0) coupling pairs."""
        x0 = self.sample(z, n_steps=n_steps)
        return z, x0

Reflow Training Loop

def train_reflow(data, n_reflows=3, n_epochs=500):
    """Train with multiple reflow iterations."""

    # Initial Flow Matching
    model = create_model()
    rf = RectifiedFlow(model)

    # Train on random coupling
    for epoch in range(n_epochs):
        x0 = sample_data(data)
        z = torch.randn_like(x0)
        loss = rf.loss(x0, z)
        loss.backward()
        optimizer.step()

    # Reflow iterations
    for k in range(n_reflows):
        print(f"Reflow {k+1}")

        # Generate coupling from current model
        z_all = torch.randn(len(data), dim)
        z_all, x0_all = rf.generate_coupling(z_all, n_steps=50)

        # Train new model on this coupling
        new_model = create_model()
        new_rf = RectifiedFlow(new_model)

        for epoch in range(n_epochs):
            idx = torch.randperm(len(x0_all))[:batch_size]
            loss = new_rf.loss(x0_all[idx], z_all[idx])
            loss.backward()
            optimizer.step()

        rf = new_rf

    return rf

1-Step Distillation

def distill_to_one_step(teacher_rf, student_model, data, n_epochs=1000):
    """Distill to 1-step generator."""
    optimizer = torch.optim.Adam(student_model.parameters(), lr=1e-4)

    for epoch in range(n_epochs):
        z = torch.randn(batch_size, dim)

        # Teacher generates target
        with torch.no_grad():
            x0_teacher = teacher_rf.sample(z, n_steps=10)

        # Student predicts in 1 step
        # x0 = z - v(z, t=1)
        v_pred = student_model(z, torch.ones(batch_size))
        x0_student = z - v_pred

        loss = F.mse_loss(x0_student, x0_teacher)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    return student_model

6. Stable Diffusion 3 and FLUX

SD3's Rectified Flow Adoption

Stable Diffusion 3 adopted Rectified Flow:

MMDiT Architecture: Multimodal DiT processing text and image together
Rectified Flow: Straight-line paths instead of DDPM
Result: Fewer steps needed for same quality

FLUX Improvements

FLUX (by Black Forest Labs) advances SD3 further:

Guidance Distillation: CFG internalized into the model
Fewer Steps: 4-8 steps for high quality
FLUX.1-schnell: Distilled version capable of 1-4 step generation

Why Rectified Flow?

Reasons for transitioning from DDPM-based Stable Diffusion:

7. Reflow Iterations and Quality

How Many Reflows Are Needed?

Empirically:

1-Reflow: Significant straightening, good quality at 10 steps
2-Reflow: More straightening, 5 steps possible
3-Reflow: Nearly straight, 1-2 steps possible

However, more reflows mean:

Increased training time
Time spent generating couplings
Potentially slower convergence

Practical Choice

In most cases, 1-2 reflows + distillation is most efficient.

8. Limitations and Considerations

Coupling Quality Dependency

Reflow depends on the previous model's generation quality:

Poor initial model → poor coupling → poor reflow results
Solution: Train initial Flow Matching sufficiently

Mode Collapse Risk

Too many reflows can:

Concentrate couplings on specific modes
Reduce diversity
Solution: Choose appropriate reflow count, add regularization

Computational Cost

Each reflow stage requires:

Coupling generation for entire dataset
Training a new model
Total cost = (1 + n_reflows) × base training cost

Conclusion

Rectified Flow realizes the intuitive idea that "straighter paths are faster." The success of Stable Diffusion 3 and FLUX demonstrates this approach's practicality.

References

Liu, X., et al. "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow" (ICLR 2023)
Esser, P., et al. "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis" (Stable Diffusion 3, 2024)
Lipman, Y., et al. "Flow Matching for Generative Modeling" (ICLR 2023)
Salimans, T. & Ho, J. "Progressive Distillation for Fast Sampling of Diffusion Models" (ICLR 2022)