AI research
The month AI started doing real mathematics
Two labs, days apart, used AI to produce genuinely new maths. How they did it matters more than that they did it.
The answer
In May 2026 AI from OpenAI and Google produced genuinely new, checkable mathematics.
Inside a single week in May, two of the largest AI labs claimed something that would have read as science fiction a year ago: that their systems had contributed original mathematics — not summarised it, not retrieved it, but produced results human mathematicians had not. The temptation is to score it as a race. The more useful question is how each result was checked, because that is what separates a genuine advance from a confident-sounding paragraph.
What OpenAI did
On 20 May, OpenAI said a general-purpose reasoning model found a construction disproving the planar unit-distance conjecture, a problem Paul Erdős posed in 1946. The surprise was not only the answer but the route. Instead of refining the long-assumed grid arrangement, the model reached into algebraic number theory, establishing a super-linear lower bound (on the order of n^1.014). That is the kind of move that signals reasoning rather than recall — there was no prior answer to memorise, because none existed.
Crucially, this was a general reasoning model, not a maths-specific engine. External mathematicians — among them Fields medallist Timothy Gowers — checked the write-up before it reached arXiv. That human review is real evidence, but it is a softer guarantee than a machine proof: expert eyes can miss a subtle gap, which is precisely why formal peer review is still pending.
OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.
What DeepMind did
Days later, Google DeepMind's AlphaProof Nexus — pairing Gemini 3.1 Pro with the Lean proof assistant — reported solving nine open Erdős problems (two unsolved for 56 years) plus 44 conjectures from the integer-sequence encyclopedia (OEIS), at a few hundred dollars of compute each. The system builds on DeepMind's earlier AlphaProof, which reached silver-medal level at the 2024 International Mathematical Olympiad; the jump here is from competition problems, solvable by talented humans in hours, to research problems with no such guarantee.
The two efforts took different paths to the same testbed, and the press framed it as a scoreboard — nine to one. That framing misses the point. The results are not the same kind of object, and the difference is the whole story.
OpenAI vs DeepMind, side by side
The two milestones differ on every axis that matters for how much you should trust them. Set against each other:
| OpenAI | Google DeepMind | |
|---|---|---|
| What was claimed | Disproved Erdős's 1946 unit-distance conjecture | Solved 9 open Erdős problems + 44 OEIS conjectures |
| Approach | General reasoning model; algebraic number theory | Gemini 3.1 Pro paired with the Lean proof assistant |
| Verification | Human-checked (incl. Timothy Gowers) | Machine-checked, every step, in Lean |
| What is proven | One construction; peer review pending | Each accepted proof is formally certified |
| Main caveat | Formal review still to come | Narrow domain; Hassabis: 'still not AGI' |
Read the bottom rows, not the count. A Lean certificate is a stronger object than a human read-through — which is why DeepMind's nine, individually less glamorous than disproving a famous conjecture, are in one sense the more solid result.
Hassabis moved quickly to temper expectations, saying the system is 'still not AGI' even as it points toward a more practical role for AI in verified mathematical research.
Why now, and what to watch
Why this month? Because the late mathematician's hundreds of open problems, catalogued at erdosproblems.com, have become the field's favourite proving ground: easy to state, impossible to fake, with no memorised answer to crib. The honest read is that this is a genuine step — AI generating ideas a checker can certify — not a machine replacing mathematicians. The constraint that makes it trustworthy is the same one that keeps it grounded.
What to watch next is whether the generate-then-verify loop travels. A reliable pipeline of 'AI proposes, formal system certifies' neutralises AI's worst failure mode — confident wrongness — anywhere a claim can be mechanically checked. If that template spreads from maths to other formalisable corners of science, the lasting result of May 2026 will be the machinery, not any single proof.
Frequently asked questions
Did AI really solve maths problems humans couldn't?
What is the Lean proof assistant, and why does it matter?
Who actually 'won', OpenAI or DeepMind?
Is this AGI?
How much did it cost to run?
Sources
- An OpenAI model has disproved a central conjecture in discrete geometry — OpenAI, 20 May 2026
- Advancing Mathematics Research with AI-Driven Formal Proof Search (AlphaProof Nexus preprint, arXiv:2605.22763) — Google DeepMind / arXiv, 21 May 2026
- OpenAI's milestone math breakthrough played to AI's strengths — Understanding AI, 22 May 2026
- Google DeepMind's AlphaProof Nexus Solves Erdős Problems as AI Math Race Moves Beyond Benchmarks — WinBuzzer, 26 May 2026