AI research

The month AI started doing real mathematics

Two labs, days apart, used AI to produce genuinely new maths. How they did it matters more than that they did it.

WireRead Editorial20 May 2026Verified May 2026

The answer

In May 2026 AI from OpenAI and Google produced genuinely new, checkable mathematics.

TL;DR — the 20-second read

OpenAI (20 May) said a general reasoning model — not a maths-specific system — disproved a conjecture open since 1946, using algebraic number theory rather than the long-assumed grid.
Days later, Google DeepMind's AlphaProof Nexus solved nine open Erdős problems plus 44 conjectures — a different method, for a few hundred dollars of compute each.
The DeepMind result is machine-verified in Lean: a proof counts only if the formal checker accepts every step — exactly the discipline fluent-but-wrong models lack.
OpenAI's was human-checked (Timothy Gowers among the readers) and still awaits formal peer review — a real but weaker guarantee than a Lean certificate.
Both labs are careful: peer review is pending and Demis Hassabis called the system 'still not AGI'. The constraint that makes it trustworthy is what keeps it grounded.

Inside a single week in May, two of the largest AI labs claimed something that would have read as science fiction a year ago: that their systems had contributed original mathematics — not summarised it, not retrieved it, but produced results human mathematicians had not. The temptation is to score it as a race. The more useful question is how each result was checked, because that is what separates a genuine advance from a confident-sounding paragraph.

What OpenAI did

On 20 May, OpenAI said a general-purpose reasoning model found a construction disproving the planar unit-distance conjecture, a problem Paul Erdős posed in 1946. The surprise was not only the answer but the route. Instead of refining the long-assumed grid arrangement, the model reached into algebraic number theory, establishing a super-linear lower bound (on the order of n^1.014). That is the kind of move that signals reasoning rather than recall — there was no prior answer to memorise, because none existed.

Crucially, this was a general reasoning model, not a maths-specific engine. External mathematicians — among them Fields medallist Timothy Gowers — checked the write-up before it reached arXiv. That human review is real evidence, but it is a softer guarantee than a machine proof: expert eyes can miss a subtle gap, which is precisely why formal peer review is still pending.

OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.

Source: OpenAI · 20 May 2026

What DeepMind did

Days later, Google DeepMind's AlphaProof Nexus — pairing Gemini 3.1 Pro with the Lean proof assistant — reported solving nine open Erdős problems (two unsolved for 56 years) plus 44 conjectures from the integer-sequence encyclopedia (OEIS), at a few hundred dollars of compute each. The system builds on DeepMind's earlier AlphaProof, which reached silver-medal level at the 2024 International Mathematical Olympiad; the jump here is from competition problems, solvable by talented humans in hours, to research problems with no such guarantee.

The two efforts took different paths to the same testbed, and the press framed it as a scoreboard — nine to one. That framing misses the point. The results are not the same kind of object, and the difference is the whole story.

OpenAI vs DeepMind, side by side

The two milestones differ on every axis that matters for how much you should trust them. Set against each other:

	OpenAI	Google DeepMind
What was claimed	Disproved Erdős's 1946 unit-distance conjecture	Solved 9 open Erdős problems + 44 OEIS conjectures
Approach	General reasoning model; algebraic number theory	Gemini 3.1 Pro paired with the Lean proof assistant
Verification	Human-checked (incl. Timothy Gowers)	Machine-checked, every step, in Lean
What is proven	One construction; peer review pending	Each accepted proof is formally certified
Main caveat	Formal review still to come	Narrow domain; Hassabis: 'still not AGI'

Read the bottom rows, not the count. A Lean certificate is a stronger object than a human read-through — which is why DeepMind's nine, individually less glamorous than disproving a famous conjecture, are in one sense the more solid result.

Hassabis moved quickly to temper expectations, saying the system is 'still not AGI' even as it points toward a more practical role for AI in verified mathematical research.

Source: WinBuzzer · 26 May 2026

Why now, and what to watch

Why this month? Because the late mathematician's hundreds of open problems, catalogued at erdosproblems.com, have become the field's favourite proving ground: easy to state, impossible to fake, with no memorised answer to crib. The honest read is that this is a genuine step — AI generating ideas a checker can certify — not a machine replacing mathematicians. The constraint that makes it trustworthy is the same one that keeps it grounded.

What to watch next is whether the generate-then-verify loop travels. A reliable pipeline of 'AI proposes, formal system certifies' neutralises AI's worst failure mode — confident wrongness — anywhere a claim can be mechanically checked. If that template spreads from maths to other formalisable corners of science, the lasting result of May 2026 will be the machinery, not any single proof.

Frequently asked questions

Did AI really solve maths problems humans couldn't?

It produced results open for decades — OpenAI a construction disproving a 1946 Erdős conjecture, DeepMind nine more Erdős problems plus 44 conjectures. OpenAI's is human-checked with formal peer review pending; DeepMind's are machine-verified in the Lean proof assistant.

What is the Lean proof assistant, and why does it matter?

Lean is software that checks each logical step of a proof against mathematical axioms. It matters because AI can sound convincing while being wrong — a Lean-certified proof has survived a rigorous automated check, not just human intuition.

Who actually 'won', OpenAI or DeepMind?

Wrong frame. OpenAI disproved one famous conjecture; DeepMind solved nine others by a different, machine-verified method. They are not the same task, so 'nine to one' is a headline, not a result — and DeepMind's formal certificates are arguably the sturdier object.

Is this AGI?

No. DeepMind's Demis Hassabis said the system is 'still not AGI', as widely reported (WinBuzzer, 26 May 2026). It is a narrow instrument for verifiable problems, not general intelligence — impressive in its domain, far from human-level across the board.

How much did it cost to run?

DeepMind reported solving each Erdős problem for a few hundred dollars of compute, per the arXiv preprint coverage — a striking efficiency point given two of the problems had been open for 56 years.

Sources

An OpenAI model has disproved a central conjecture in discrete geometry — OpenAI, 20 May 2026
Advancing Mathematics Research with AI-Driven Formal Proof Search (AlphaProof Nexus preprint, arXiv:2605.22763) — Google DeepMind / arXiv, 21 May 2026
OpenAI's milestone math breakthrough played to AI's strengths — Understanding AI, 22 May 2026
Google DeepMind's AlphaProof Nexus Solves Erdős Problems as AI Math Race Moves Beyond Benchmarks — WinBuzzer, 26 May 2026

← All news

What OpenAI did

OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.

Source: OpenAI · 20 May 2026

What DeepMind did

OpenAI vs DeepMind, side by side

The two milestones differ on every axis that matters for how much you should trust them. Set against each other:

	OpenAI	Google DeepMind
What was claimed	Disproved Erdős's 1946 unit-distance conjecture	Solved 9 open Erdős problems + 44 OEIS conjectures
Approach	General reasoning model; algebraic number theory	Gemini 3.1 Pro paired with the Lean proof assistant
Verification	Human-checked (incl. Timothy Gowers)	Machine-checked, every step, in Lean
What is proven	One construction; peer review pending	Each accepted proof is formally certified
Main caveat	Formal review still to come	Narrow domain; Hassabis: 'still not AGI'

Hassabis moved quickly to temper expectations, saying the system is 'still not AGI' even as it points toward a more practical role for AI in verified mathematical research.

Source: WinBuzzer · 26 May 2026

Why now, and what to watch

Frequently asked questions

Did AI really solve maths problems humans couldn't?

What is the Lean proof assistant, and why does it matter?

Who actually 'won', OpenAI or DeepMind?

Is this AGI?

How much did it cost to run?

The month AI started doing real mathematics

What OpenAI did

What DeepMind did

OpenAI vs DeepMind, side by side

Why now, and what to watch

Frequently asked questions

Sources

Related

AI is doing real mathematics — what that actually means

AI as scientific instrument: what OpenAI's June wave actually demonstrates

The 2026 voice and TTS landscape, mapped

The month AI started doing real mathematics

What OpenAI did

What DeepMind did

OpenAI vs DeepMind, side by side

Why now, and what to watch

Frequently asked questions

Sources

Related

AI is doing real mathematics — what that actually means

AI as scientific instrument: what OpenAI's June wave actually demonstrates

The 2026 voice and TTS landscape, mapped