Open-weight models
DeepSeek V4: what the architecture shift actually signals
A 1.6-trillion-parameter open model at a fraction of the closed-frontier price — but the engineering story beneath it is more durable than the benchmark.
The answer
DeepSeek V4-Pro, released 24 April 2026, is a 1.6T-parameter open-weight MoE under an MIT licence.
The headline number for DeepSeek V4 is 1.6 trillion parameters. The number that actually matters is 27%. That is how much of V3.2's single-token inference compute V4-Pro requires, thanks to a redesigned sparse-attention architecture — and it is the engineering fact that explains why DeepSeek can price V4-Pro at a small fraction of U.S. closed-frontier rates ($1.74/M input at list, $0.435/M during its launch promotion) while running a model that, on most benchmarks, sits within a few points of the best closed-frontier systems from the United States.
What DeepSeek actually shipped
On 24 April 2026 DeepSeek released two models under the MIT licence, with weights on Hugging Face and both available via its API the same day:
| V4-Pro | V4-Flash | |
|---|---|---|
| Total parameters | 1.6 trillion | 284 billion |
| Active parameters | 49 billion | 13 billion |
| Context window | 1 million tokens | 1 million tokens |
| Max output | 384K tokens | 384K tokens |
| API input price (standard) | $1.74 / M tokens | $0.14 / M tokens |
| API output price (standard) | $3.48 / M tokens | $0.28 / M tokens |
| Inference FLOPs vs V3.2 | 27% | ~10% |
| KV-cache vs V3.2 | 10% | 7% |
Those are the standard list prices reported by Artificial Analysis; at launch DeepSeek ran a 75% promotion that put V4-Pro at $0.435/M input and $0.87/M output. V4-Pro is the largest open-weight model currently available, ahead of Moonshot's Kimi K2.6 (1.1T) and MiniMax M3 (456B). V4-Flash is the smaller, faster tier — positioned where deepseek-chat and deepseek-reasoner used to be, both of which will be retired on 24 July 2026.
DeepSeek reported that in the 1M-token context setting, V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.
The architecture: why efficiency is the actual story
The core change in V4 is how it handles attention across very long contexts. The new architecture combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA): instead of attending to all tokens equally, it selectively compresses representations of older text while maintaining full resolution on nearby content. The result is that processing a million-token prompt does not cost a million times more than processing a short one — at least not in the naïve way V3.2 did. This is the engineering primitive that made the 1M context window economically feasible at DeepSeek's price points.
DeepSeek has not published a full technical paper accompanying this release; the architecture details come from its API documentation. Third-party benchmarking from Artificial Analysis placed V4-Pro at #2 on the Intelligence Index for open-weight reasoning models (behind Kimi K2.6) and reported a leading GDPval-AA agentic-coding score of 1,554 among open-weight models. DeepSeek's own technical report also claims a strong LiveCodeBench score of 93.5 (versus Claude Opus 4.6's 88.8) — but that LiveCodeBench figure is self-reported by DeepSeek and not independently confirmed, so it should be read as a vendor claim. The composite, independently administered Intelligence Index still places V4-Pro second, behind Kimi K2.6.
The Huawei pivot: chips as the geopolitical subtext
V4 is the first DeepSeek model engineered for Chinese domestic silicon. Per MIT Technology Review, DeepSeek reportedly gave early access only to Chinese chipmakers; Huawei announced same-day 'day zero' support on its Ascend 950 series. This is not coincidental: the US export controls that prevent China acquiring advanced Nvidia H100s and H200s have forced a fork in the supply chain, and V4 is the first DeepSeek product built from the outset assuming that fork is permanent.
CFR fellow Michael Horowitz observed that 'second-best models carry enormous competitive value when they are cheap and open, which makes them easy to widely diffuse', reframing the competitive question from benchmark supremacy to adoption scale.
The CFR analysis put the U.S. lead over China at roughly seven months (CFR's Chris McGuire), while noting the gap in deployment reach is a separate and potentially more consequential question. For a developer in Southeast Asia or Africa choosing between an open-weight model they can self-host for the cost of a launch-promo $0.435/M token (or $1.74 at list) and a $25/M-token closed model they cannot, the performance delta may be secondary. That is the strategic framing the Western AI community is still figuring out how to answer.
What to watch next
Three things are worth tracking in the months that follow. First: whether DeepSeek publishes a technical paper. The architecture claims in the API docs are plausible and consistent with third-party benchmark behaviour, but a preprint would allow independent replication of the efficiency numbers. Second: whether the chip partnership with Huawei produces a verifiable inference cost advantage on Ascend versus Nvidia — if it does, that removes a remaining U.S. leverage point. Third: whether V4's open weights accelerate fine-tuning adoption outside China at a scale that the closed frontier labs, with their access controls, struggle to match. The adoption race is the one that matters for the next 18 months.
Frequently asked questions
What is DeepSeek V4-Pro and when was it released?
How does DeepSeek V4 compare to GPT-5 and Claude?
What is the difference between V4-Pro and V4-Flash?
deepseek-chat model.Is DeepSeek V4 truly open-source?
What chips does DeepSeek V4 run on?
Sources
- DeepSeek V4 Preview Release — DeepSeek, 24 April 2026
- Three reasons why DeepSeek's new model matters — MIT Technology Review, 24 April 2026
- DeepSeek is back among the leading open-weights models with V4 Pro and V4 Flash — Artificial Analysis, 27 April 2026
- DeepSeek V4 signals a new phase in the US–China AI rivalry — Council on Foreign Relations, 29 April 2026
- DeepSeek previews new AI model that 'closes the gap' with frontier models — TechCrunch, 24 April 2026