Models & Algorithms•March 24, 2026•KR

Qwen 3.5 Local Installation & Setup Guide — From Ollama to vLLM

Step-by-step guide to running Qwen 3.5 locally. From 5-minute Ollama setup to production vLLM servers, plus optimal model size selection per GPU.

Qwen 3.5 Local Installation & Setup Guide — From Ollama to vLLM

In the previous post, we compared Qwen 3.5 and DeepSeek V3.2. Now let's get Qwen 3.5 running locally on your machine, step by step.

From a 5-minute Ollama setup to a production-grade vLLM API server, plus optimal model size selection per GPU — this guide covers everything.

1. Which Size Should You Pick?

Qwen 3.5 comes in 8 sizes. Matching the right model to your GPU is step one.

Model	Type	VRAM (Q4_K_M)	Recommended GPU	Performance Level
0.8B	Dense	~500MB	CPU / Any device	Simple text tasks
2B	Dense	~1.5GB	Any GPU	Light chatbot
4B	Dense	~2.5GB	GTX 1660+	GPT-3.5 level
9B	Dense	~5GB	RTX 3060 (8GB+)	Practical minimum
27B	Dense	~17GB	RTX 4090 (24GB)	Approaching GPT-4
35B-A3B	MoE	~20GB	RTX 4090 (24GB)	Best value
122B-A10B	MoE	GPU + 256GB RAM	GPU + CPU offload	Sonnet 4.5 level
397B-A17B	MoE	~214GB	Server-grade	Flagship

Recommendations:

No GPU → 4B (runs on CPU)
8GB GPU → 9B Q4
24GB GPU → 35B-A3B Q4_K_M (sweet spot)

🔒

Sign in to continue reading

Create a free account to access the full content.

AI Engineering

LLM Inference Optimization Part 4 — Production Serving

Production deployment with vLLM and TGI. Continuous Batching, Speculative Decoding, memory budget design, and throughput benchmarks.

AI Engineering

LLM Inference Optimization Part 3 — Sparse Attention in Practice

Sliding Window, Sink Attention, DeepSeek DSA, IndexCache, and Nvidia DMS. From dynamic token selection to Needle-in-a-Haystack evaluation.

AI Engineering

LLM Inference Optimization Part 2 — KV Cache Optimization

KV Cache quantization (int8/int4), PCA compression (KVTC), and PagedAttention (vLLM). Hands-on memory reduction code and scenario-based configuration guide.

Qwen 3.5 Local Installation & Setup Guide — From Ollama to vLLM

1. Which Size Should You Pick?

Sign in to continue reading

Related Posts

LLM Inference Optimization Part 4 — Production Serving

LLM Inference Optimization Part 3 — Sparse Attention in Practice

LLM Inference Optimization Part 2 — KV Cache Optimization