MiniMax M2.5: Opus-Level Performance at $1 per Hour
MiniMax M2.5 achieves SWE-bench 80.2% using only 10B active parameters from a 230B MoE architecture. 1/20th the cost of Claude Opus with comparable coding performance. Forge RL framework, benchmark analysis, pricing comparison.

MiniMax M2.5: Opus-Level Performance for $1 per Hour
On February 12, 2026, Shanghai-based AI startup MiniMax released M2.5. SWE-bench Verified 80.2%, BrowseComp 76.3%, Multi-SWE-Bench 51.3%. All within 0.6%p of Claude Opus 4.6, at 1/20th the price.
The model is available as open weights on Hugging Face under a modified MIT license. It runs on a 230B parameter MoE architecture, activating only 10B at inference time. Running the 100 TPS (tokens per second) Lightning variant continuously for one hour costs about $1.
This post analyzes M2.5's architecture, training methodology, benchmark performance, and pricing structure, and examines what it means for the AI industry.
Architecture: 230B Total, 10B Active
MiniMax M2.5 uses a Mixture of Experts (MoE) architecture.
| Spec | Value |
|---|---|
| Total Parameters | 230B |
| Active Parameters | 10B — roughly 4% of total |
| Context Window | 204,800 tokens (~205K) |
| Training Languages | 13 (Python, Go, C, C++, TypeScript, Rust, Kotlin, Java, JavaScript, PHP, Lua, Dart, Ruby) |
The core idea behind MoE: for each input token, only a subset of "expert" parameters are activated. This preserves the knowledge capacity of a 230B model while keeping actual compute at the level of a 10B model. That is the secret behind the price and speed.
It ships in two variants:
| Variant | Speed | Input (1M tokens) | Output (1M tokens) |
|---|---|---|---|
| M2.5 (Standard) | 50 TPS | $0.15 | $1.20 |
| M2.5-Lightning | 100 TPS | $0.30 | $2.40 |
Related Posts

InternVL-U: Understanding + Generation + Editing in One 4B Model -- A New Standard for Unified Multimodal AI
Shanghai AI Lab's InternVL-U. A single 4B parameter model handles image understanding, generation, editing, and reasoning-based generation. Decoupled visual representations outperform 14B BAGEL on GenEval and DPG-Bench.

Hybrid Mamba-Transformer MoE: Three Teams, One Architecture -- The 2026 LLM Convergence
NVIDIA Nemotron 3 Nano, Qwen 3.5, and Mamba-3 independently converge on 75% linear layers + 25% attention + MoE. 88% KV-cache reduction, O(n) complexity for long-context processing.

Spectrum: 3-5x Diffusion Speedup Without Any Training -- The Power of Chebyshev Polynomials
CVPR 2026 paper from Stanford/ByteDance. Chebyshev polynomial feature forecasting achieves 4.79x speedup on FLUX.1, 4.56x on HunyuanVideo. Training-free, instantly applicable to any model.