Models & Algorithms•
CFG-free Distillation: Fast Generation Without Guidance
Eliminating the 2x computational cost of CFG. Achieving same quality with single forward pass.

CFG-free Distillation: Fast Generation Without Guidance
Eliminating the 2x computational cost of Classifier-Free Guidance. Achieving CFG quality with a single forward pass.
TL;DR
- Problem: CFG requires two forward passes (conditional + unconditional) = 2x cost
- Solution: Distill CFG effect into a single model
Related Posts

Models & Algorithms
TurboQuant in Practice — KV Cache Compression with llama.cpp and HuggingFace
Build llama.cpp with turbo3, HuggingFace integration, memory calculator, config guide. 536K context on 70B models.

Models & Algorithms
TurboQuant Explained — Google's Extreme KV Cache Compression Algorithm
Compress KV cache to 3-bit with PolarQuant + Lloyd-Max. 4.6x memory savings with zero accuracy loss, no retraining.

Models & Algorithms
Qwen 3.5 Fine-Tuning Practical Guide — Build Your Own Model with LoRA
Complete guide to fine-tuning Qwen 3.5 with LoRA/QLoRA. From 8GB GPU QLoRA setup to Unsloth optimization, GGUF conversion, and Ollama deployment.