Open-weight models
MiniMax M3: the open-weight frontier, with an asterisk on 'open'
A genuinely ambitious model — frontier coding, 1M context, multimodal — shipped before its weights did. Here is what that means.
The answer
MiniMax M3, launched 1 June 2026, is an open-weight frontier coding model with 1M context.
MiniMax's pitch for M3 is the kind of sentence that stops a scroll: the first open-weight model to put frontier-grade coding, a million-token context window and native image-and-video understanding into a single architecture, at a fraction of the price of the closed frontier. If it holds up, it compresses years of US lab investment into something that anyone can, in principle, run themselves. The catch — and it is a genuine one — is that on 1 June 2026, the day it launched, you couldn't.
The architecture and what it claims to do
The technical spine is MiniMax Sparse Attention (MSA), the company's proprietary attention mechanism, which MiniMax says delivers meaningful speed-ups for both prefill and decoding at very long contexts. That matters practically: a 1M-token context window is only useful if retrieving from it doesn't cost you three seconds of wall-clock latency per call. On agentic workloads — where a coding agent is reading an entire repository and writing back patches — that efficiency gap is often the difference between a tool people actually use and one they benchmark and shelve.
The claimed multimodality covers images and video natively, not via a bolt-on vision encoder after the fact, which is MiniMax's stated design choice — architecture, not a feature flag.
On coding specifically, MiniMax reports 59.0% on SWE-Bench Pro, a benchmark that asks models to write real patches for real GitHub issues. That figure, if it holds, would place M3 ahead of both GPT-5.5 and Gemini 3.1 Pro on that test. But here is the first asterisk: the number is MiniMax's own, run on MiniMax's infrastructure, on a model whose weights the public did not have. SWE-Bench Pro results are not reproducible without the weights, so the figure is a claim, not a finding — a distinction that matters when you're building a system atop it.
MiniMax M3 is billed as the first open-weight model to combine frontier coding, a 1M-token context window and native multimodality — though the weights were not published at launch and the headline benchmarks are vendor-reported.
The open-weight IOU
The bigger issue is structural. On 1 June, MiniMax offered the model through its API and on OpenRouter. What it did not offer was the weights — the files you'd need to run the model yourself, audit its behaviour, fine-tune it for your use case, or reproduce the benchmark numbers. Those were promised on Hugging Face within about ten days of launch.
'Open-weight' has a clear community meaning: the weights are available, you can self-host, and you can verify the lab's claims for yourself. On launch day, M3 didn't meet that bar. The honest framing is that MiniMax gave a dated public commitment rather than an open release. That is a different thing, and conflating the two sets a bad precedent — both for builders making stack decisions and for the broader ecosystem's trust in the open-source label.
Price: the part that actually changes builder behaviour
Set aside the open-weight question for a moment, and the pricing story is real regardless. The launch listing on OpenRouter placed M3 at roughly $0.30 input / $1.20 output per million tokens — a promotional rate, but one that undercuts the closed frontier by close to an order of magnitude. For context:
| Model | Input ($/M tokens) | Output ($/M tokens) |
|---|---|---|
| MiniMax M3 (promo) | ~$0.30 | ~$1.20 |
| GPT-5.5 | ~$2.50–$5.00 | ~$10.00–$15.00 |
| Claude Opus 4.x | ~$3.00–$5.00 | ~$15.00 |
| Gemini 3.1 Pro | ~$1.25 | ~$5.00 |
The figures above use approximate published rates at the time of M3's launch; exact pricing varies by tier and usage. The point is the magnitude: if M3's coding performance holds at independent testing, a team running 100M tokens a month sees the bill fall from several thousand dollars to a few hundred. That doesn't just save money; it removes the incentive to gate certain requests or optimise aggressively for token count — a productivity change as much as a cost one.
The capability gap worth keeping in frame
There is a number in the M3 release that the press release did not headline, and it matters: M3 reportedly scores under 12% on ARC-AGI-2, the abstract-reasoning benchmark where Western frontier models still lead. That is not unusual for Chinese frontier models — Qwen 3.x and DeepSeek V4 Pro show a similar profile — and it is not necessarily disqualifying for coding work, which is more about pattern application and structured generation than novel abstract reasoning. But it is a real constraint, and a user planning to deploy M3 for open-ended research synthesis or complex multi-step planning (rather than pure code tasks) should weight it accordingly.
Read together, M3's profile is: strong on coding and long-context multimodal retrieval, competitive on cost, behind on raw reasoning — a coherent specialisation, not a universal crown. MiniMax's Hong Kong shares appeared to process the same ambiguity on the day: the stock reportedly swung up around 5% before closing sharply lower, which is market shorthand for 'exciting but unresolved'.
MiniMax M3 launches with frontier coding claims and a 1M context window built on MiniMax Sparse Attention — offering a low-cost API alternative while weights remain pending on Hugging Face.
What to watch and what to do now
The ten-day weight window is the first gate. If the weights land on schedule, the benchmark conversation immediately changes — independent engineers can reproduce the SWE-Bench Pro run, and the open-source ecosystem can begin fine-tuning and deployment work in earnest. If the weights slip, the open-weight marketing claim will take a credibility hit that will be hard to walk back.
For builders, the calculus is: the API is live, the pricing is real and disruptive, and the risk of building on it before the weights land is modest if your workload is standard coding or document retrieval. The risk is higher if you need to audit the model, customise weights, or stake a compliance argument on self-hosting. The sensible move is to test it on the API now and gate any production commitment on the weight release and independent benchmark confirmation.
Frequently asked questions
Can I download and run MiniMax M3 myself?
Is MiniMax M3 really better than GPT-5.5 at coding?
What is MiniMax Sparse Attention and why does it matter?
Why did MiniMax's stock price swing on launch day?
What is MiniMax M3 weak at?
Sources
- MiniMax M3 Open-Weight Coding Model: Frontier Claims, Unverified Benchmarks — Tech Times, 1 June 2026
- MiniMax launches M3, an open-weight frontier model with 1M context — DataNorth, 1 June 2026
- What Is MiniMax M3? The First Open-Weight Frontier Coding Model — Apidog, 2 June 2026