AI & ML•April 6, 2026•KR

Fine-tuning Gemma 4 MoE — Customizing Arena #6 with 3.8B Active Parameters

Apply QLoRA to Gemma 4 26B MoE. Expert layer LoRA strategies, Dense vs MoE comparison, MoE-specific training tips, and Ollama deployment. LoRA Series Part 4.

Fine-tuning Gemma 4 MoE — Customizing Arena #6 with Just 3.8B Active Parameters

Series: Part 1: LoRA Theory | Part 2: QLoRA + Custom Data | Part 3: Eval + Deploy | Part 4 (this post)

Parts 1-3 covered LoRA fundamentals through deployment using Qwen 2.5 7B. Part 4 levels up — we apply LoRA to a Gemma 4 MoE model.

Why Gemma 4? Three reasons:

MoE architecture: 26B total params, only 3.8B active. Inference cost is 4B-class, but performance is Arena #6

Apache 2.0: First for Gemma. Commercial deployment of fine-tuned models is fully unrestricted
New LoRA considerations: How to target Expert layers effectively

LoRA on MoE: What's Different?

With Dense models (Qwen 2.5 7B), it was simple: target q_proj, k_proj, v_proj, o_proj and you're done. MoE models have a different structure:

Transformer Block
├── Self-Attention (all tokens pass through)
│   ├── q_proj, k_proj, v_proj, o_proj  ← LoRA target ①
│
├── Router (selects experts per token)
│   └── gate_proj  ← leave this alone
│
└── Expert Layers (only selected experts activate)
    ├── Expert 0: up_proj, gate_proj, down_proj  ← LoRA target ②
    ├── Expert 1: up_proj, gate_proj, down_proj
    └── Expert N: up_proj, gate_proj, down_proj

🔒