MiniMax M3’s 1M‑Token Context Fixes Long‑Document AI Limits

MiniMax released M3 on June 1 2026 with a new sparse attention design called MSA that gives the model a one million token context window while cutting compute needs dramatically. The biggest pain point for developers working on long codebases or large documents is the quadratic cost of traditional attention, which makes processing beyond a few thousand tokens slow and expensive. M3 solves this by partitioning the key‑value cache into blocks and using a KV outer gather Q approach, delivering more than nine times faster prefill and over fifteen times faster decoding at full length, with per‑token compute only one twentieth of the previous M2 model.

Another common frustration is the need to stitch together separate models for text, image and video understanding. M3 was trained from step 0 on interleaved text‑image‑video data, scaling to roughly one hundred trillion tokens, so it natively handles multimodal input without extra adapters. This enables real‑world workflows such as reading a research paper that contains formulas and figures, extracting the relevant data, and running experiments autonomously—all inside a single model session.

For coding agents, the benchmark results show M3 surpasses GPT‑5.5 and Gemini 3.1 Pro on SWE‑Bench Pro with a 59 % score, leads on Terminal‑Bench 2.1 at 66 %, and achieves the highest Claw‑Eval score among tested models. It also posts a 70.06 % success rate on OSWorld‑Verified for computer use, meaning it can control a desktop, open applications and manipulate files across systems without human intervention.

The model is available now through the MiniMax API, with a tiered token plan starting at $20 per month for roughly 1.7 billion tokens. Thinking mode can be toggled per request without changing price, and priority access will open to all users soon. By removing the context length bottleneck, unifying multimodal understanding, and delivering strong coding and agentic performance, M3 directly addresses the core obstacles that slow down complex software development, research reproduction and large‑scale automation projects.

#AI #Product #LLM #Multimodal #CodingAgents #APIAccess