Rectified Flow: Straightening Paths Toward 1-Step Generation

Rectified Flow: Straightening Paths Toward 1-Step Generation
Flow Matching still too slow? Reflow straightens trajectories to enable 1-step generation.
TL;DR
- Rectified Flow: Iteratively "straightens" Flow Matching trajectories
- Reflow: Generate (noise, data) pairs with learned model, then retrain on straight-line paths
- Key Benefit: More reflow iterations → straighter paths → eventually 1-step generation
- Real Applications: Stable Diffusion 3 and FLUX are built on Rectified Flow
1. Why Flow Matching Alone Isn't Enough
Flow Matching generates samples in far fewer steps (10-50) than DDPM. But limitations remain.
Flow Matching's Limitation
The target velocity field in Flow Matching is:
$$v_t(x_t | x_0, z) = z - x_0$$
Theoretically constant, but in practice we learn the marginal velocity field:
$$v_t(x_t) = \mathbb{E}_{x_0, z | x_t}[z - x_0]$$
The problem: different $(x_0, z)$ pairs can pass through the same $x_t$. When these trajectories cross, the learned velocity becomes their average, resulting in curved paths.
The Crossing Problem
Given two data points $x_0^{(1)}, x_0^{(2)}$ and two noise samples $z^{(1)}, z^{(2)}$:
$$x_t^{(1)} = (1-t)x_0^{(1)} + tz^{(1)}$$
$$x_t^{(2)} = (1-t)x_0^{(2)} + tz^{(2)}$$
If $x_t^{(1)} = x_t^{(2)}$ at some $t$, the network predicts the average of both directions. This increases transport cost and requires more sampling steps.
2. The Core Idea of Rectified Flow
Rectified Flow is simple but powerful:
**"Generate (z, x₀) pairs using the learned flow, then retrain on straight-line paths between them—this straightens the trajectories"**
The Reflow Procedure
- Initial Flow Matching: Train base model $v_{\theta_0}$ on random $(x_0, z)$ pairs
- Generate Coupling: Use trained model to generate data $\hat{x}_0$ from noise $z$
- Now $(z, \hat{x}_0)$ are pairs actually connected by the flow
- Reflow Training: Train new model $v_{\theta_1}$ on straight-line paths between $(z, \hat{x}_0)$
- Iterate: Repeat steps 2-3 for increasingly straight paths
Mathematical Formulation
Let $\pi_k$ denote the coupling after $k$ reflows:
$$\mathcal{L}_{\text{reflow}}^{(k)} = \mathbb{E}_{(x_0, z) \sim \pi_k, t} \left[ \| (z - x_0) - v_{\theta}(x_t, t) \|^2 \right]$$
where $x_t = (1-t)x_0 + tz$ and $\pi_k$ is the coupling generated by the $k$-th model.
3. Why Does Reflow Straighten Paths?
Intuitive Understanding
Initially, we use random couplings $(x_0, z)$. These trajectories can cross each other.
But following the learned flow $\phi_1$:
- A trajectory starting from $z$ arrives at a specific $\hat{x}_0$
- This $(z, \hat{x}_0)$ pair is already connected by the flow
- Therefore, straight-line paths between these pairs cross less
Transport Cost Reduction
The key to reflow is reducing transport cost:
$$\text{Cost}(\pi) = \mathbb{E}_{(x_0, z) \sim \pi} \left[ \| z - x_0 \|^2 \right]$$
With successive reflows:
$$\text{Cost}(\pi_0) \geq \text{Cost}(\pi_1) \geq \text{Cost}(\pi_2) \geq \cdots$$
As paths straighten, transport cost decreases.
Theoretical Guarantees
Key properties proven in the paper:
- Causality: Reflowed couplings are causal—given $z$, $x_0$ is determined
- Straightness: Infinite reflows yield perfectly straight paths
- 1-Step Possibility: With perfectly straight paths, 1-step Euler sampling is exact
4. 1-Step Distillation
While reflow straightens paths, practical 1-step generation requires distillation.
Progressive Distillation
Gradually reduce step count:
- Teacher model: N steps
- Student model: Mimic teacher output with N/2 steps
- Repeat until reaching 1 step
$$\mathcal{L}_{\text{distill}} = \mathbb{E}_{z} \left[ \| \phi_{\text{teacher}}(z) - G_{\theta}(z) \|^2 \right]$$
Direct Distillation
Rectified Flow's advantage: paths are already nearly straight, enabling direct 1-step distillation:
$$\mathcal{L}_{\text{1-step}} = \mathbb{E}_{z} \left[ \| x_0 - (z - v_{\theta}(z, 1)) \|^2 \right]$$
where $v_{\theta}(z, 1)$ is the velocity prediction at $t=1$.
5. Implementation
Reflow Training
class RectifiedFlow:
def __init__(self, model):
self.model = model
def loss(self, x0, z):
"""Reflow loss with fixed coupling."""
t = torch.rand(x0.shape[0], device=x0.device)
# Linear interpolation
x_t = (1 - t[:, None]) * x0 + t[:, None] * z
# Target velocity (straight line)
v_target = z - x0
# Predicted velocity
v_pred = self.model(x_t, t)
return F.mse_loss(v_pred, v_target)
@torch.no_grad()
def sample(self, z, n_steps=1):
"""Sample with Euler method."""
x = z
dt = 1.0 / n_steps
for i in range(n_steps):
t = 1.0 - i * dt
t_batch = torch.full((x.shape[0],), t, device=x.device)
v = self.model(x, t_batch)
x = x - v * dt
return x
@torch.no_grad()
def generate_coupling(self, z, n_steps=50):
"""Generate (z, x0) coupling pairs."""
x0 = self.sample(z, n_steps=n_steps)
return z, x0Reflow Training Loop
def train_reflow(data, n_reflows=3, n_epochs=500):
"""Train with multiple reflow iterations."""
# Initial Flow Matching
model = create_model()
rf = RectifiedFlow(model)
# Train on random coupling
for epoch in range(n_epochs):
x0 = sample_data(data)
z = torch.randn_like(x0)
loss = rf.loss(x0, z)
loss.backward()
optimizer.step()
# Reflow iterations
for k in range(n_reflows):
print(f"Reflow {k+1}")
# Generate coupling from current model
z_all = torch.randn(len(data), dim)
z_all, x0_all = rf.generate_coupling(z_all, n_steps=50)
# Train new model on this coupling
new_model = create_model()
new_rf = RectifiedFlow(new_model)
for epoch in range(n_epochs):
idx = torch.randperm(len(x0_all))[:batch_size]
loss = new_rf.loss(x0_all[idx], z_all[idx])
loss.backward()
optimizer.step()
rf = new_rf
return rf1-Step Distillation
def distill_to_one_step(teacher_rf, student_model, data, n_epochs=1000):
"""Distill to 1-step generator."""
optimizer = torch.optim.Adam(student_model.parameters(), lr=1e-4)
for epoch in range(n_epochs):
z = torch.randn(batch_size, dim)
# Teacher generates target
with torch.no_grad():
x0_teacher = teacher_rf.sample(z, n_steps=10)
# Student predicts in 1 step
# x0 = z - v(z, t=1)
v_pred = student_model(z, torch.ones(batch_size))
x0_student = z - v_pred
loss = F.mse_loss(x0_student, x0_teacher)
optimizer.zero_grad()
loss.backward()
optimizer.step()
return student_model6. Stable Diffusion 3 and FLUX
SD3's Rectified Flow Adoption
Stable Diffusion 3 adopted Rectified Flow:
- MMDiT Architecture: Multimodal DiT processing text and image together
- Rectified Flow: Straight-line paths instead of DDPM
- Result: Fewer steps needed for same quality
FLUX Improvements
FLUX (by Black Forest Labs) advances SD3 further:
- Guidance Distillation: CFG internalized into the model
- Fewer Steps: 4-8 steps for high quality
- FLUX.1-schnell: Distilled version capable of 1-4 step generation
Why Rectified Flow?
Reasons for transitioning from DDPM-based Stable Diffusion:
7. Reflow Iterations and Quality
How Many Reflows Are Needed?
Empirically:
- 1-Reflow: Significant straightening, good quality at 10 steps
- 2-Reflow: More straightening, 5 steps possible
- 3-Reflow: Nearly straight, 1-2 steps possible
However, more reflows mean:
- Increased training time
- Time spent generating couplings
- Potentially slower convergence
Practical Choice
In most cases, 1-2 reflows + distillation is most efficient.
8. Limitations and Considerations
Coupling Quality Dependency
Reflow depends on the previous model's generation quality:
- Poor initial model → poor coupling → poor reflow results
- Solution: Train initial Flow Matching sufficiently
Mode Collapse Risk
Too many reflows can:
- Concentrate couplings on specific modes
- Reduce diversity
- Solution: Choose appropriate reflow count, add regularization
Computational Cost
Each reflow stage requires:
- Coupling generation for entire dataset
- Training a new model
- Total cost = (1 + n_reflows) × base training cost
Conclusion
Rectified Flow realizes the intuitive idea that "straighter paths are faster." The success of Stable Diffusion 3 and FLUX demonstrates this approach's practicality.
References
- Liu, X., et al. "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow" (ICLR 2023)
- Esser, P., et al. "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis" (Stable Diffusion 3, 2024)
- Lipman, Y., et al. "Flow Matching for Generative Modeling" (ICLR 2023)
- Salimans, T. & Ho, J. "Progressive Distillation for Fast Sampling of Diffusion Models" (ICLR 2022)