Diffusion LLM Part 1: Diffusion Fundamentals -- From DDPM to Score Matching
Forward/Reverse Process, ELBO, Simplified Loss, Score Function -- the mathematical principles of diffusion models explained intuitively.

Diffusion LLM Part 1: Diffusion Fundamentals -- From DDPM to Score Matching
To understand Diffusion-based language models, you first need to understand Diffusion models themselves. In this post, we cover the core principles of Diffusion that have been proven in image generation. There is some math involved, but I have included intuitive explanations alongside the formulas, so you can follow the flow even if the equations feel unfamiliar.
This is the first installment of the Diffusion LLM series. See the Hub post for a series overview.
The Core Idea Behind Diffusion
The idea behind Diffusion models is surprisingly simple.
- Gradually add noise to clean data until it becomes pure random noise (Forward Process)
- Train a neural network to learn the reverse -- how to recover clean data from noise (Reverse Process)
Think of dropping a single drop of ink into water. It gradually spreads out until the color is uniform. The forward process is this diffusion. The reverse process is recovering the original shape of the ink drop from the uniformly colored water. Physically this is impossible, but the key insight of Diffusion models is that a neural network can learn this "time reversal."
Forward Process: Adding Noise
The forward process starts from the original data x_0 and progressively adds Gaussian noise over T steps.
Related Posts

Spectrum: 3-5x Diffusion Speedup Without Any Training -- The Power of Chebyshev Polynomials
CVPR 2026 paper from Stanford/ByteDance. Chebyshev polynomial feature forecasting achieves 4.79x speedup on FLUX.1, 4.56x on HunyuanVideo. Training-free, instantly applicable to any model.

From Evaluation to Deployment — The Complete Fine-tuning Guide
Evaluate with Perplexity, KoBEST, ROUGE-L. Merge adapters with merge_and_unload(), convert to GGUF, deploy via vLLM/Ollama. Overfitting prevention, data quality, hyperparameter guide.

QLoRA + Custom Dataset — Fine-tune 7B on a Single T4 GPU
Fine-tune Qwen 2.5 7B on a T4 16GB using QLoRA (4-bit NormalFloat + LoRA). Korean dataset preparation guide, NF4/Double Quantization/Paged Optimizer explained, Wandb monitoring.