# DeepSeek V4: what the architecture shift actually signals

> DeepSeek V4-Pro, released 24 April 2026, is a 1.6T-parameter open-weight MoE under an MIT licence.

*A 1.6-trillion-parameter open model at a fraction of the closed-frontier price — but the engineering story beneath it is more durable than the benchmark.*

By WireRead Editorial · WireRead
Canonical: https://wireread.com/news/deepseek-v4-open-weight-architecture-analysis

The headline number for DeepSeek V4 is 1.6 trillion parameters. The number that actually matters is 27%. That is how much of V3.2's single-token inference compute V4-Pro requires, thanks to a redesigned sparse-attention architecture — and it is the engineering fact that explains why DeepSeek can price V4-Pro at a small fraction of U.S. closed-frontier rates ($1.74/M input at list, $0.435/M during its launch promotion) while running a model that, on most benchmarks, sits within a few points of the best closed-frontier systems from the United States.

## What DeepSeek actually shipped

On **24 April 2026** DeepSeek released two models under the MIT licence, with weights on Hugging Face and both available via its API the same day:

| | **V4-Pro** | **V4-Flash** |
| --- | --- | --- |
| **Total parameters** | 1.6 trillion | 284 billion |
| **Active parameters** | 49 billion | 13 billion |
| **Context window** | 1 million tokens | 1 million tokens |
| **Max output** | 384K tokens | 384K tokens |
| **API input price (standard)** | $1.74 / M tokens | $0.14 / M tokens |
| **API output price (standard)** | $3.48 / M tokens | $0.28 / M tokens |
| **Inference FLOPs vs V3.2** | 27% | ~10% |
| **KV-cache vs V3.2** | 10% | 7% |

Those are the standard list prices reported by Artificial Analysis; at launch DeepSeek ran a 75% promotion that put V4-Pro at **$0.435/M input and $0.87/M output**. V4-Pro is the largest open-weight model currently available, ahead of Moonshot's Kimi K2.6 (1.1T) and MiniMax M3 (456B). V4-Flash is the smaller, faster tier — positioned where `deepseek-chat` and `deepseek-reasoner` used to be, both of which will be retired on **24 July 2026**.

> DeepSeek reported that in the 1M-token context setting, V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.
> — [DeepSeek](https://api-docs.deepseek.com/news/news260424), 2026-04-24

## The architecture: why efficiency is the actual story

The core change in V4 is how it handles attention across very long contexts. The new architecture combines **Compressed Sparse Attention (CSA)** and **Heavily Compressed Attention (HCA)**: instead of attending to all tokens equally, it selectively compresses representations of older text while maintaining full resolution on nearby content. The result is that processing a million-token prompt does not cost a million times more than processing a short one — at least not in the naïve way V3.2 did. This is the engineering primitive that made the 1M context window economically feasible at DeepSeek's price points.

DeepSeek has not published a full technical paper accompanying this release; the architecture details come from its API documentation. Third-party benchmarking from Artificial Analysis placed V4-Pro at **#2 on the Intelligence Index for open-weight reasoning models** (behind Kimi K2.6) and reported a leading **GDPval-AA agentic-coding score of 1,554** among open-weight models. DeepSeek's own technical report also claims a strong **LiveCodeBench score of 93.5** (versus Claude Opus 4.6's 88.8) — but that LiveCodeBench figure is self-reported by DeepSeek and not independently confirmed, so it should be read as a vendor claim. The composite, independently administered Intelligence Index still places V4-Pro second, behind Kimi K2.6.

> **Key:** **The throughline.** Efficiency — not raw parameter count — is DeepSeek's recurring moat. V4 is not the most capable model in the world; it's the most capable model that can be run cheaply. When Artificial Analysis priced a full benchmark suite at $1,071 for V4-Pro versus the cost for equivalent closed models, the number that matters is the ratio, not the absolute.

## The Huawei pivot: chips as the geopolitical subtext

V4 is the **first DeepSeek model engineered for Chinese domestic silicon**. Per MIT Technology Review, DeepSeek reportedly gave early access only to Chinese chipmakers; Huawei announced same-day 'day zero' support on its Ascend 950 series. This is not coincidental: the US export controls that prevent China acquiring advanced Nvidia H100s and H200s have forced a fork in the supply chain, and V4 is the first DeepSeek product built from the outset assuming that fork is permanent.

> CFR fellow Michael Horowitz observed that 'second-best models carry enormous competitive value when they are cheap and open, which makes them easy to widely diffuse', reframing the competitive question from benchmark supremacy to adoption scale.
> — [Council on Foreign Relations](https://www.cfr.org/articles/deepseek-v4-signals-a-new-phase-in-the-u-s-china-ai-rivalry), 2026-04-29

The CFR analysis put the U.S. lead over China at roughly **seven months** (CFR's Chris McGuire), while noting the gap in *deployment reach* is a separate and potentially more consequential question. For a developer in Southeast Asia or Africa choosing between an open-weight model they can self-host for the cost of a launch-promo $0.435/M token (or $1.74 at list) and a $25/M-token closed model they cannot, the performance delta may be secondary. That is the strategic framing the Western AI community is still figuring out how to answer.

## What to watch next

Three things are worth tracking in the months that follow. First: whether DeepSeek publishes a technical paper. The architecture claims in the API docs are plausible and consistent with third-party benchmark behaviour, but a preprint would allow independent replication of the efficiency numbers. Second: whether the chip partnership with Huawei produces a verifiable inference cost advantage on Ascend versus Nvidia — if it does, that removes a remaining U.S. leverage point. Third: whether V4's open weights accelerate fine-tuning adoption outside China at a scale that the closed frontier labs, with their access controls, struggle to match. The adoption race is the one that matters for the next 18 months.

## Key takeaways

- DeepSeek released two open-weight MoE models on 24 April 2026 — V4-Pro (1.6T / 49B active parameters) and V4-Flash (284B / 13B active) — both MIT-licensed and published on Hugging Face.
- A redesigned sparse-attention mechanism cuts V4-Pro's inference FLOPs to 27% of V3.2 and KV-cache to 10%, making 1M-token context windows viable at the pricing DeepSeek is charging.
- Standard API pricing (Artificial Analysis) is $1.74/$3.48 per million tokens for V4-Pro and $0.14/$0.28 for V4-Flash; DeepSeek launched V4-Pro on a 75% promo at $0.435/$0.87 — still a fraction of U.S. closed-frontier rates.
- V4 is the first DeepSeek model built for domestic Huawei Ascend 950 chips — DeepSeek reportedly gave early access only to Chinese chipmakers.
- CFR rates the U.S. as holding roughly a seven-month AI lead — but notes that 'second-best models carry enormous competitive value when cheap and open'.

## FAQ

### What is DeepSeek V4-Pro and when was it released?
DeepSeek V4-Pro is a 1.6-trillion-parameter open-weight Mixture-of-Experts language model released on 24 April 2026 under an MIT licence. It has 49 billion active parameters, a 1M-token context window, and weights published on Hugging Face.

### How does DeepSeek V4 compare to GPT-5 and Claude?
On most benchmarks V4-Pro sits within a few points of U.S. frontier models — DeepSeek's own report claims a leading LiveCodeBench score of 93.5 (vs Claude Opus 4.6's 88.8), though that figure is self-reported and unconfirmed — and the CFR put the U.S. AI lead at roughly seven months overall. The key differentiator is cost: V4-Pro lists at $1.74/M input (and launched on a 75% promo at $0.435/M), far cheaper than leading U.S. closed models, and can be self-hosted.

### What is the difference between V4-Pro and V4-Flash?
V4-Pro has 1.6 trillion total / 49 billion active parameters and is optimised for complex reasoning and coding. V4-Flash has 284 billion total / 13 billion active parameters, is faster and cheaper ($0.14/M input), and replaces the old `deepseek-chat` model.

### Is DeepSeek V4 truly open-source?
DeepSeek released the weights under an MIT licence, making it open-weight and self-hostable. The full training code and dataset are not published — so 'open-weight' is accurate; 'fully open-source' is a stretch. Weights are on Hugging Face.

### What chips does DeepSeek V4 run on?
V4 is the first DeepSeek model engineered for Chinese domestic chips — specifically Huawei's Ascend 950 series. Per MIT Technology Review, DeepSeek reportedly gave early access only to Chinese chipmakers, and Huawei announced same-day Ascend 950 support.

## Sources

- [DeepSeek V4 Preview Release](https://api-docs.deepseek.com/news/news260424) — DeepSeek, 2026-04-24
- [Three reasons why DeepSeek's new model matters](https://www.technologyreview.com/2026/04/24/1136422/why-deepseeks-v4-matters/) — MIT Technology Review, 2026-04-24
- [DeepSeek is back among the leading open-weights models with V4 Pro and V4 Flash](https://artificialanalysis.ai/articles/deepseek-is-back-among-the-leading-open-weights-models-with-v4-pro-and-v4-flash) — Artificial Analysis, 2026-04-27
- [DeepSeek V4 signals a new phase in the US–China AI rivalry](https://www.cfr.org/articles/deepseek-v4-signals-a-new-phase-in-the-u-s-china-ai-rivalry) — Council on Foreign Relations, 2026-04-29
- [DeepSeek previews new AI model that 'closes the gap' with frontier models](https://techcrunch.com/2026/04/24/deepseek-previews-new-ai-model-that-closes-the-gap-with-frontier-models/) — TechCrunch, 2026-04-24