Open-weight models

DeepSeek V4: what the architecture shift actually signals

A 1.6-trillion-parameter open model at a fraction of the closed-frontier price — but the engineering story beneath it is more durable than the benchmark.

WireRead Editorial24 April 2026Verified April 2026

DeepSeek Open-weight models Model launches

The answer

DeepSeek V4-Pro, released 24 April 2026, is a 1.6T-parameter open-weight MoE under an MIT licence.

TL;DR — the 20-second read

DeepSeek released two open-weight MoE models on 24 April 2026 — V4-Pro (1.6T / 49B active parameters) and V4-Flash (284B / 13B active) — both MIT-licensed and published on Hugging Face.
A redesigned sparse-attention mechanism cuts V4-Pro's inference FLOPs to 27% of V3.2 and KV-cache to 10%, making 1M-token context windows viable at the pricing DeepSeek is charging.
Standard API pricing (Artificial Analysis) is $1.74/$3.48 per million tokens for V4-Pro and $0.14/$0.28 for V4-Flash; DeepSeek launched V4-Pro on a 75% promo at $0.435/$0.87 — still a fraction of U.S. closed-frontier rates.
V4 is the first DeepSeek model built for domestic Huawei Ascend 950 chips — DeepSeek reportedly gave early access only to Chinese chipmakers.
CFR rates the U.S. as holding roughly a seven-month AI lead — but notes that 'second-best models carry enormous competitive value when cheap and open'.

The headline number for DeepSeek V4 is 1.6 trillion parameters. The number that actually matters is 27%. That is how much of V3.2's single-token inference compute V4-Pro requires, thanks to a redesigned sparse-attention architecture — and it is the engineering fact that explains why DeepSeek can price V4-Pro at a small fraction of U.S. closed-frontier rates ($1.74/M input at list, $0.435/M during its launch promotion) while running a model that, on most benchmarks, sits within a few points of the best closed-frontier systems from the United States.

What DeepSeek actually shipped

On 24 April 2026 DeepSeek released two models under the MIT licence, with weights on Hugging Face and both available via its API the same day:

	V4-Pro	V4-Flash
Total parameters	1.6 trillion	284 billion
Active parameters	49 billion	13 billion
Context window	1 million tokens	1 million tokens
Max output	384K tokens	384K tokens
API input price (standard)	$1.74 / M tokens	$0.14 / M tokens
API output price (standard)	$3.48 / M tokens	$0.28 / M tokens
Inference FLOPs vs V3.2	27%	~10%
KV-cache vs V3.2	10%	7%

Those are the standard list prices reported by Artificial Analysis; at launch DeepSeek ran a 75% promotion that put V4-Pro at $0.435/M input and $0.87/M output. V4-Pro is the largest open-weight model currently available, ahead of Moonshot's Kimi K2.6 (1.1T) and MiniMax M3 (456B). V4-Flash is the smaller, faster tier — positioned where deepseek-chat and deepseek-reasoner used to be, both of which will be retired on 24 July 2026.

DeepSeek reported that in the 1M-token context setting, V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.

Source: DeepSeek · 24 April 2026

The architecture: why efficiency is the actual story

The core change in V4 is how it handles attention across very long contexts. The new architecture combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA): instead of attending to all tokens equally, it selectively compresses representations of older text while maintaining full resolution on nearby content. The result is that processing a million-token prompt does not cost a million times more than processing a short one — at least not in the naïve way V3.2 did. This is the engineering primitive that made the 1M context window economically feasible at DeepSeek's price points.

DeepSeek has not published a full technical paper accompanying this release; the architecture details come from its API documentation. Third-party benchmarking from Artificial Analysis placed V4-Pro at #2 on the Intelligence Index for open-weight reasoning models (behind Kimi K2.6) and reported a leading GDPval-AA agentic-coding score of 1,554 among open-weight models. DeepSeek's own technical report also claims a strong LiveCodeBench score of 93.5 (versus Claude Opus 4.6's 88.8) — but that LiveCodeBench figure is self-reported by DeepSeek and not independently confirmed, so it should be read as a vendor claim. The composite, independently administered Intelligence Index still places V4-Pro second, behind Kimi K2.6.

The Huawei pivot: chips as the geopolitical subtext

V4 is the first DeepSeek model engineered for Chinese domestic silicon. Per MIT Technology Review, DeepSeek reportedly gave early access only to Chinese chipmakers; Huawei announced same-day 'day zero' support on its Ascend 950 series. This is not coincidental: the US export controls that prevent China acquiring advanced Nvidia H100s and H200s have forced a fork in the supply chain, and V4 is the first DeepSeek product built from the outset assuming that fork is permanent.

CFR fellow Michael Horowitz observed that 'second-best models carry enormous competitive value when they are cheap and open, which makes them easy to widely diffuse', reframing the competitive question from benchmark supremacy to adoption scale.

Source: Council on Foreign Relations · 29 April 2026

The CFR analysis put the U.S. lead over China at roughly seven months (CFR's Chris McGuire), while noting the gap in deployment reach is a separate and potentially more consequential question. For a developer in Southeast Asia or Africa choosing between an open-weight model they can self-host for the cost of a launch-promo $0.435/M token (or $1.74 at list) and a $25/M-token closed model they cannot, the performance delta may be secondary. That is the strategic framing the Western AI community is still figuring out how to answer.

What to watch next

Three things are worth tracking in the months that follow. First: whether DeepSeek publishes a technical paper. The architecture claims in the API docs are plausible and consistent with third-party benchmark behaviour, but a preprint would allow independent replication of the efficiency numbers. Second: whether the chip partnership with Huawei produces a verifiable inference cost advantage on Ascend versus Nvidia — if it does, that removes a remaining U.S. leverage point. Third: whether V4's open weights accelerate fine-tuning adoption outside China at a scale that the closed frontier labs, with their access controls, struggle to match. The adoption race is the one that matters for the next 18 months.

Frequently asked questions

What is DeepSeek V4-Pro and when was it released?

DeepSeek V4-Pro is a 1.6-trillion-parameter open-weight Mixture-of-Experts language model released on 24 April 2026 under an MIT licence. It has 49 billion active parameters, a 1M-token context window, and weights published on Hugging Face.

How does DeepSeek V4 compare to GPT-5 and Claude?

On most benchmarks V4-Pro sits within a few points of U.S. frontier models — DeepSeek's own report claims a leading LiveCodeBench score of 93.5 (vs Claude Opus 4.6's 88.8), though that figure is self-reported and unconfirmed — and the CFR put the U.S. AI lead at roughly seven months overall. The key differentiator is cost: V4-Pro lists at $1.74/M input (and launched on a 75% promo at $0.435/M), far cheaper than leading U.S. closed models, and can be self-hosted.

What is the difference between V4-Pro and V4-Flash?

V4-Pro has 1.6 trillion total / 49 billion active parameters and is optimised for complex reasoning and coding. V4-Flash has 284 billion total / 13 billion active parameters, is faster and cheaper ($0.14/M input), and replaces the old deepseek-chat model.

Is DeepSeek V4 truly open-source?

DeepSeek released the weights under an MIT licence, making it open-weight and self-hostable. The full training code and dataset are not published — so 'open-weight' is accurate; 'fully open-source' is a stretch. Weights are on Hugging Face.

What chips does DeepSeek V4 run on?

V4 is the first DeepSeek model engineered for Chinese domestic chips — specifically Huawei's Ascend 950 series. Per MIT Technology Review, DeepSeek reportedly gave early access only to Chinese chipmakers, and Huawei announced same-day Ascend 950 support.

Sources

DeepSeek V4 Preview Release — DeepSeek, 24 April 2026
Three reasons why DeepSeek's new model matters — MIT Technology Review, 24 April 2026
DeepSeek is back among the leading open-weights models with V4 Pro and V4 Flash — Artificial Analysis, 27 April 2026
DeepSeek V4 signals a new phase in the US–China AI rivalry — Council on Foreign Relations, 29 April 2026
DeepSeek previews new AI model that 'closes the gap' with frontier models — TechCrunch, 24 April 2026

← All news

What DeepSeek actually shipped

On 24 April 2026 DeepSeek released two models under the MIT licence, with weights on Hugging Face and both available via its API the same day:

	V4-Pro	V4-Flash
Total parameters	1.6 trillion	284 billion
Active parameters	49 billion	13 billion
Context window	1 million tokens	1 million tokens
Max output	384K tokens	384K tokens
API input price (standard)	$1.74 / M tokens	$0.14 / M tokens
API output price (standard)	$3.48 / M tokens	$0.28 / M tokens
Inference FLOPs vs V3.2	27%	~10%
KV-cache vs V3.2	10%	7%

DeepSeek reported that in the 1M-token context setting, V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.

Source: DeepSeek · 24 April 2026

The architecture: why efficiency is the actual story

The Huawei pivot: chips as the geopolitical subtext

Source: Council on Foreign Relations · 29 April 2026

What to watch next

Frequently asked questions

What is DeepSeek V4-Pro and when was it released?

How does DeepSeek V4 compare to GPT-5 and Claude?

What is the difference between V4-Pro and V4-Flash?

Is DeepSeek V4 truly open-source?

What chips does DeepSeek V4 run on?

DeepSeek V4: what the architecture shift actually signals

What DeepSeek actually shipped

The architecture: why efficiency is the actual story

The Huawei pivot: chips as the geopolitical subtext

What to watch next

Frequently asked questions

Sources

Related

The 2026 open-weight surge, explained

MiniMax M3: the open-weight frontier, with an asterisk on 'open'

Qwen and the rise of open-source AI from China

DeepSeek V4: what the architecture shift actually signals

What DeepSeek actually shipped

The architecture: why efficiency is the actual story

The Huawei pivot: chips as the geopolitical subtext

What to watch next

Frequently asked questions

Sources

Related

The 2026 open-weight surge, explained

MiniMax M3: the open-weight frontier, with an asterisk on 'open'

Qwen and the rise of open-source AI from China