# AI is doing real mathematics — what that actually means

> In 2026 AI began contributing original, verifiable mathematics, not just summarising it.

*Not summarising maths — contributing it. Here's how, why it's verifiable, and where the limits are.*

By WireRead Editorial · WireRead
Canonical: https://wireread.com/news/ai-doing-real-mathematics-meaning

For years, AI 'doing maths' meant solving problems it had effectively seen before — well-trodden competition questions with memorised solution patterns. In 2026 that changed: systems began producing genuinely *new* results on problems that had stumped human mathematicians for decades. Understanding what actually happened — mechanism, guarantee, and limit — matters more than the headline count.

## Why Erdős problems became the testbed

The late mathematician Paul Erdős posed hundreds of precise, famously hard open problems over his career, now catalogued online. They're a near-perfect AI proving ground for three reasons. First, they're *easy to state*: a non-specialist can read the question in a sentence. Second, they're *impossible to fake*: the problems are hard enough that no amount of fluent writing substitutes for a correct solution. Third, and most important: they're *impossible to have memorised*. Because no answer existed, any AI that produces one has done something beyond recall — it has reasoned to a novel conclusion. That's the diagnostic power of the Erdős testbed.

When OpenAI reported a model disproving the 1946 planar unit-distance conjecture, and DeepMind's AlphaProof Nexus solved nine open Erdős problems plus 44 conjectures from the integer-sequence catalogue, the significance wasn't just the number of results — it was what those results ruled out as explanation. There was no lookup table. The AI had to derive.

## The trick that makes it trustworthy — and two types of 'solved'

The breakthrough that matters most is *verification*. AI can sound rigorous while being subtly wrong — a convincing-sounding proof with a gap on line 47 is a real failure mode. The strongest systems (DeepMind's AlphaProof Nexus) address this by pairing a language model with a **proof assistant called Lean** — software that refuses to accept a proof unless every logical step checks out against mathematical axioms. The AI proposes; the checker certifies. That's what separates 'an AI claimed a proof' from 'the proof is correct'.

But not all results carry the same guarantee. The two May 2026 milestones differ precisely here:

| | OpenAI | Google DeepMind |
| --- | --- | --- |
| **Method** | General reasoning model; algebraic number theory | Gemini 3.1 Pro paired with Lean proof assistant |
| **Verification** | Human-checked (incl. Timothy Gowers) | Machine-checked, every step, in Lean |
| **Status** | Formal peer review pending | Each accepted proof is formally certified |
| **Guarantee strength** | Credible — expert eyes, not infallible | Strong — machine rejects any flawed step |

The difference matters. A Lean-certified proof has survived a rigorous, automated check against axioms; a human-verified proof is very credible but depends on the fallibility of expert eyes — the whole reason peer review exists is that those eyes sometimes miss things. Understanding which type of guarantee a given AI-maths result carries is the single most important question to ask when a headline says 'AI solved X'.

> AlphaProof Nexus addresses AI hallucination by pairing an AI model's generative capacity with formal proof-checking through the Lean proof assistant. The AI proposes a proof, and a separate verification system checks every logical step.
> — [Crypto Briefing](https://cryptobriefing.com/deepmind-alphaproof-nexus-erdos-problems/), 2026-05-26

## What this means — and what it doesn't

> **Key:** **The correct frame:** AI can now generate ideas a formal checker will certify, in narrow, hard domains — a real research instrument. It doesn't mean general intelligence. DeepMind's own leadership said the system is 'still not AGI', and the problems solved are a small fraction of mathematics. The right frame is 'powerful new tool', not 'machines out-think mathematicians'.

The limitation is structural, not temporary. The generate-then-verify loop works because, in maths, 'correct' is mechanically checkable: you can write a proof in Lean's formal language, and the compiler either accepts or rejects it. Most domains don't offer that — strategy, taste, ethics, the messier corners of science have no equivalent of a proof checker. So the achievement is sharply bounded: exceptional inside a mechanically checkable box, untested and probably far weaker outside it.

> OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.
> — [OpenAI](https://openai.com/index/model-disproves-discrete-geometry-conjecture/), 2026-05-20

## The durable implication: a template for science

The bigger implication of May 2026 is not any single proof — it's the pipeline. A reliable loop of *AI generates, formal system verifies* is a template that could extend beyond pure mathematics to anywhere claims can be mechanically checked: formally verified software, certain classes of physical modelling, computational biology with checkable outputs. The key is that the formal checker does the trusting so humans don't have to — it neutralises AI's worst failure mode (confident wrongness) in exactly the contexts where that failure mode is most expensive. Whether that template travels depends on how many domains can be formalised. But maths was the hardest case — if it works here, it works.

Watch for two things in the year ahead. First, whether the Lean-certified results survive post-publication scrutiny from the broader maths community — the arXiv preprint is not the same as a refereed journal acceptance. Second, whether other scientific fields begin building analogous generate-then-verify pipelines. That second development, if it happens, will matter far more than any individual result. The field to watch isn't maths; it's the infrastructure being built around it.

## Key takeaways

- In 2026 AI shifted from summarising maths to contributing original results — cracking Erdős problems open for decades.
- Erdős's hundreds of open problems became a fair, hard testbed: easy to state, impossible to fake, and impossible to have memorised an answer to.
- Pairing an AI with the Lean proof assistant means each proof step is formally verified against axioms — distinct from human-checked results still pending peer review.
- The generate-then-verify loop is the durable innovation: a template that could extend beyond maths to any domain where claims can be mechanically checked.
- It's a powerful, narrow tool — DeepMind's own leadership said it is 'still not AGI'.

## FAQ

### Can AI really do original maths now?
In narrow, hard domains, yes — in 2026 AI systems produced new results on long-open Erdős problems. DeepMind's are machine-verified in the Lean proof checker; OpenAI's is human-verified with formal peer review pending. It's a genuine research instrument for specific problem classes, not a general mathematician.

### How do we know the AI's maths is correct?
The most trustworthy systems pair the AI with a proof assistant (Lean) that formally checks every step against mathematical axioms. If the checker accepts it, the proof is machine-certified — a stronger guarantee than human review alone, which can miss subtle gaps.

### What's the difference between 'verified in Lean' and 'human-checked'?
Lean verifies every logical step automatically against axioms; any flaw causes rejection. Human checking is done by expert mathematicians — credible, but fallible. DeepMind's May 2026 results are Lean-certified; OpenAI's was human-checked and awaits formal peer review.

### Will AI replace mathematicians?
No. Researchers, including DeepMind's leadership, say these systems are 'not AGI'. They're narrow, powerful tools for verifiable problems — impressive in their domain, far from human-level reasoning across mathematics broadly.

### Could this approach work outside maths?
In principle, yes — anywhere claims can be mechanically checked. Formally verified software is the most obvious adjacent domain. The constraint is that most fields lack an equivalent of Lean: a machine that can definitively reject a flawed step. Maths is unusually well-suited because 'correct' is precisely defined.

## Sources

- [An OpenAI model has disproved a central conjecture in discrete geometry](https://openai.com/index/model-disproves-discrete-geometry-conjecture/) — OpenAI, 2026-05-20
- [Solving open problems with AlphaProof Nexus (preprint, arXiv:2605.22763)](https://arxiv.org/abs/2605.22763) — Google DeepMind / arXiv, 2026-05-21
- [OpenAI's milestone math breakthrough played to AI's strengths](https://www.understandingai.org/p/openais-milestone-math-breakthrough) — Understanding AI, 2026-05-22
- [Google DeepMind's AlphaProof Nexus solves 9 Erdős problems and 44 conjectures](https://cryptobriefing.com/deepmind-alphaproof-nexus-erdos-problems/) — Crypto Briefing, 2026-05-26