We Benchmarked MiniCPM-o 4.5 in Korean. Here's What Actually Happens.
We benchmarked MiniCPM-o 4.5's Korean performance side by side with English. Image descriptions, OCR, document extraction — what works, what breaks, and why the root cause is architecture, not prompts.

We Benchmarked MiniCPM-o 4.5 in Korean. Here's What Actually Happens.
MiniCPM-o 4.5 is an omni model optimized for English and Chinese. How well does it handle Korean?
We tested with the same images, same questions — one in Korean, one in English, side by side. Image description, OCR, document extraction, and fine-tuning, all tested hands-on.
The short answer: Korean works. But there are fascinating failure modes, and the root cause isn't what you'd expect.
Test Setup
| Item | Spec |
|---|---|
| Model | MiniCPM-o 4.5 (BF16, 17.6GB VRAM) |
| Framework | transformers 4.51.0, PyTorch 2.x |
| Method | Same image + semantically identical Korean/English prompts |
| Decoding | sampling=True, temperature=0.7, repetition_penalty=1.2 |
System prompts were set per language:
system_prompts = {
"ko": "당신은 한국어 전문 어시스턴트입니다. 반드시 한국어로만 답변하세요. 중국어, 영어, 러시아어 등 다른 언어를 섞지 마세요.",
"en": "You are a helpful assistant. Respond only in English.",
}What Works Well
Image Description: Eiffel Tower
Related Posts

InternVL-U: Understanding + Generation + Editing in One 4B Model -- A New Standard for Unified Multimodal AI
Shanghai AI Lab's InternVL-U. A single 4B parameter model handles image understanding, generation, editing, and reasoning-based generation. Decoupled visual representations outperform 14B BAGEL on GenEval and DPG-Bench.

Hybrid Mamba-Transformer MoE: Three Teams, One Architecture -- The 2026 LLM Convergence
NVIDIA Nemotron 3 Nano, Qwen 3.5, and Mamba-3 independently converge on 75% linear layers + 25% attention + MoE. 88% KV-cache reduction, O(n) complexity for long-context processing.

Spectrum: 3-5x Diffusion Speedup Without Any Training -- The Power of Chebyshev Polynomials
CVPR 2026 paper from Stanford/ByteDance. Chebyshev polynomial feature forecasting achieves 4.79x speedup on FLUX.1, 4.56x on HunyuanVideo. Training-free, instantly applicable to any model.