AI research
AI is doing real mathematics — what that actually means
Not summarising maths — contributing it. Here's how, why it's verifiable, and where the limits are.
The answer
In 2026 AI began contributing original, verifiable mathematics, not just summarising it.
For years, AI 'doing maths' meant solving problems it had effectively seen before — well-trodden competition questions with memorised solution patterns. In 2026 that changed: systems began producing genuinely new results on problems that had stumped human mathematicians for decades. Understanding what actually happened — mechanism, guarantee, and limit — matters more than the headline count.
Why Erdős problems became the testbed
The late mathematician Paul Erdős posed hundreds of precise, famously hard open problems over his career, now catalogued online. They're a near-perfect AI proving ground for three reasons. First, they're easy to state: a non-specialist can read the question in a sentence. Second, they're impossible to fake: the problems are hard enough that no amount of fluent writing substitutes for a correct solution. Third, and most important: they're impossible to have memorised. Because no answer existed, any AI that produces one has done something beyond recall — it has reasoned to a novel conclusion. That's the diagnostic power of the Erdős testbed.
When OpenAI reported a model disproving the 1946 planar unit-distance conjecture, and DeepMind's AlphaProof Nexus solved nine open Erdős problems plus 44 conjectures from the integer-sequence catalogue, the significance wasn't just the number of results — it was what those results ruled out as explanation. There was no lookup table. The AI had to derive.
The trick that makes it trustworthy — and two types of 'solved'
The breakthrough that matters most is verification. AI can sound rigorous while being subtly wrong — a convincing-sounding proof with a gap on line 47 is a real failure mode. The strongest systems (DeepMind's AlphaProof Nexus) address this by pairing a language model with a proof assistant called Lean — software that refuses to accept a proof unless every logical step checks out against mathematical axioms. The AI proposes; the checker certifies. That's what separates 'an AI claimed a proof' from 'the proof is correct'.
But not all results carry the same guarantee. The two May 2026 milestones differ precisely here:
| OpenAI | Google DeepMind | |
|---|---|---|
| Method | General reasoning model; algebraic number theory | Gemini 3.1 Pro paired with Lean proof assistant |
| Verification | Human-checked (incl. Timothy Gowers) | Machine-checked, every step, in Lean |
| Status | Formal peer review pending | Each accepted proof is formally certified |
| Guarantee strength | Credible — expert eyes, not infallible | Strong — machine rejects any flawed step |
The difference matters. A Lean-certified proof has survived a rigorous, automated check against axioms; a human-verified proof is very credible but depends on the fallibility of expert eyes — the whole reason peer review exists is that those eyes sometimes miss things. Understanding which type of guarantee a given AI-maths result carries is the single most important question to ask when a headline says 'AI solved X'.
AlphaProof Nexus addresses AI hallucination by pairing an AI model's generative capacity with formal proof-checking through the Lean proof assistant. The AI proposes a proof, and a separate verification system checks every logical step.
What this means — and what it doesn't
The limitation is structural, not temporary. The generate-then-verify loop works because, in maths, 'correct' is mechanically checkable: you can write a proof in Lean's formal language, and the compiler either accepts or rejects it. Most domains don't offer that — strategy, taste, ethics, the messier corners of science have no equivalent of a proof checker. So the achievement is sharply bounded: exceptional inside a mechanically checkable box, untested and probably far weaker outside it.
OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.
The durable implication: a template for science
The bigger implication of May 2026 is not any single proof — it's the pipeline. A reliable loop of AI generates, formal system verifies is a template that could extend beyond pure mathematics to anywhere claims can be mechanically checked: formally verified software, certain classes of physical modelling, computational biology with checkable outputs. The key is that the formal checker does the trusting so humans don't have to — it neutralises AI's worst failure mode (confident wrongness) in exactly the contexts where that failure mode is most expensive. Whether that template travels depends on how many domains can be formalised. But maths was the hardest case — if it works here, it works.
Watch for two things in the year ahead. First, whether the Lean-certified results survive post-publication scrutiny from the broader maths community — the arXiv preprint is not the same as a refereed journal acceptance. Second, whether other scientific fields begin building analogous generate-then-verify pipelines. That second development, if it happens, will matter far more than any individual result. The field to watch isn't maths; it's the infrastructure being built around it.
Frequently asked questions
Can AI really do original maths now?
How do we know the AI's maths is correct?
What's the difference between 'verified in Lean' and 'human-checked'?
Will AI replace mathematicians?
Could this approach work outside maths?
Sources
- An OpenAI model has disproved a central conjecture in discrete geometry — OpenAI, 20 May 2026
- Solving open problems with AlphaProof Nexus (preprint, arXiv:2605.22763) — Google DeepMind / arXiv, 21 May 2026
- OpenAI's milestone math breakthrough played to AI's strengths — Understanding AI, 22 May 2026
- Google DeepMind's AlphaProof Nexus solves 9 Erdős problems and 44 conjectures — Crypto Briefing, 26 May 2026