AI research

AI is doing real mathematics — what that actually means

Not summarising maths — contributing it. Here's how, why it's verifiable, and where the limits are.

WireRead Editorial22 May 2026Verified May 2026

The answer

In 2026 AI began contributing original, verifiable mathematics, not just summarising it.

For years, AI 'doing maths' meant solving problems it had effectively seen before — well-trodden competition questions with memorised solution patterns. In 2026 that changed: systems began producing genuinely new results on problems that had stumped human mathematicians for decades. Understanding what actually happened — mechanism, guarantee, and limit — matters more than the headline count.

Why Erdős problems became the testbed

The late mathematician Paul Erdős posed hundreds of precise, famously hard open problems over his career, now catalogued online. They're a near-perfect AI proving ground for three reasons. First, they're easy to state: a non-specialist can read the question in a sentence. Second, they're impossible to fake: the problems are hard enough that no amount of fluent writing substitutes for a correct solution. Third, and most important: they're impossible to have memorised. Because no answer existed, any AI that produces one has done something beyond recall — it has reasoned to a novel conclusion. That's the diagnostic power of the Erdős testbed.

When OpenAI reported a model disproving the 1946 planar unit-distance conjecture, and DeepMind's AlphaProof Nexus solved nine open Erdős problems plus 44 conjectures from the integer-sequence catalogue, the significance wasn't just the number of results — it was what those results ruled out as explanation. There was no lookup table. The AI had to derive.

The trick that makes it trustworthy — and two types of 'solved'

The breakthrough that matters most is verification. AI can sound rigorous while being subtly wrong — a convincing-sounding proof with a gap on line 47 is a real failure mode. The strongest systems (DeepMind's AlphaProof Nexus) address this by pairing a language model with a proof assistant called Lean — software that refuses to accept a proof unless every logical step checks out against mathematical axioms. The AI proposes; the checker certifies. That's what separates 'an AI claimed a proof' from 'the proof is correct'.

But not all results carry the same guarantee. The two May 2026 milestones differ precisely here:

	OpenAI	Google DeepMind
Method	General reasoning model; algebraic number theory	Gemini 3.1 Pro paired with Lean proof assistant
Verification	Human-checked (incl. Timothy Gowers)	Machine-checked, every step, in Lean
Status	Formal peer review pending	Each accepted proof is formally certified
Guarantee strength	Credible — expert eyes, not infallible	Strong — machine rejects any flawed step

The difference matters. A Lean-certified proof has survived a rigorous, automated check against axioms; a human-verified proof is very credible but depends on the fallibility of expert eyes — the whole reason peer review exists is that those eyes sometimes miss things. Understanding which type of guarantee a given AI-maths result carries is the single most important question to ask when a headline says 'AI solved X'.

AlphaProof Nexus addresses AI hallucination by pairing an AI model's generative capacity with formal proof-checking through the Lean proof assistant. The AI proposes a proof, and a separate verification system checks every logical step.

Source: Crypto Briefing · 26 May 2026

What this means — and what it doesn't

The limitation is structural, not temporary. The generate-then-verify loop works because, in maths, 'correct' is mechanically checkable: you can write a proof in Lean's formal language, and the compiler either accepts or rejects it. Most domains don't offer that — strategy, taste, ethics, the messier corners of science have no equivalent of a proof checker. So the achievement is sharply bounded: exceptional inside a mechanically checkable box, untested and probably far weaker outside it.

OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.

Source: OpenAI · 20 May 2026

The durable implication: a template for science

The bigger implication of May 2026 is not any single proof — it's the pipeline. A reliable loop of AI generates, formal system verifies is a template that could extend beyond pure mathematics to anywhere claims can be mechanically checked: formally verified software, certain classes of physical modelling, computational biology with checkable outputs. The key is that the formal checker does the trusting so humans don't have to — it neutralises AI's worst failure mode (confident wrongness) in exactly the contexts where that failure mode is most expensive. Whether that template travels depends on how many domains can be formalised. But maths was the hardest case — if it works here, it works.

Watch for two things in the year ahead. First, whether the Lean-certified results survive post-publication scrutiny from the broader maths community — the arXiv preprint is not the same as a refereed journal acceptance. Second, whether other scientific fields begin building analogous generate-then-verify pipelines. That second development, if it happens, will matter far more than any individual result. The field to watch isn't maths; it's the infrastructure being built around it.

Frequently asked questions

Can AI really do original maths now?

In narrow, hard domains, yes — in 2026 AI systems produced new results on long-open Erdős problems. DeepMind's are machine-verified in the Lean proof checker; OpenAI's is human-verified with formal peer review pending. It's a genuine research instrument for specific problem classes, not a general mathematician.

How do we know the AI's maths is correct?

The most trustworthy systems pair the AI with a proof assistant (Lean) that formally checks every step against mathematical axioms. If the checker accepts it, the proof is machine-certified — a stronger guarantee than human review alone, which can miss subtle gaps.

What's the difference between 'verified in Lean' and 'human-checked'?

Lean verifies every logical step automatically against axioms; any flaw causes rejection. Human checking is done by expert mathematicians — credible, but fallible. DeepMind's May 2026 results are Lean-certified; OpenAI's was human-checked and awaits formal peer review.

Will AI replace mathematicians?

No. Researchers, including DeepMind's leadership, say these systems are 'not AGI'. They're narrow, powerful tools for verifiable problems — impressive in their domain, far from human-level reasoning across mathematics broadly.

Could this approach work outside maths?

In principle, yes — anywhere claims can be mechanically checked. Formally verified software is the most obvious adjacent domain. The constraint is that most fields lack an equivalent of Lean: a machine that can definitively reject a flawed step. Maths is unusually well-suited because 'correct' is precisely defined.

Sources

An OpenAI model has disproved a central conjecture in discrete geometry — OpenAI, 20 May 2026
Solving open problems with AlphaProof Nexus (preprint, arXiv:2605.22763) — Google DeepMind / arXiv, 21 May 2026
OpenAI's milestone math breakthrough played to AI's strengths — Understanding AI, 22 May 2026
Google DeepMind's AlphaProof Nexus solves 9 Erdős problems and 44 conjectures — Crypto Briefing, 26 May 2026

← All news

Why Erdős problems became the testbed

The trick that makes it trustworthy — and two types of 'solved'

But not all results carry the same guarantee. The two May 2026 milestones differ precisely here:

	OpenAI	Google DeepMind
Method	General reasoning model; algebraic number theory	Gemini 3.1 Pro paired with Lean proof assistant
Verification	Human-checked (incl. Timothy Gowers)	Machine-checked, every step, in Lean
Status	Formal peer review pending	Each accepted proof is formally certified
Guarantee strength	Credible — expert eyes, not infallible	Strong — machine rejects any flawed step

Source: Crypto Briefing · 26 May 2026

What this means — and what it doesn't

OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.

Source: OpenAI · 20 May 2026

The durable implication: a template for science

Frequently asked questions

Can AI really do original maths now?

How do we know the AI's maths is correct?

What's the difference between 'verified in Lean' and 'human-checked'?

Will AI replace mathematicians?

Could this approach work outside maths?

AI is doing real mathematics — what that actually means

Why Erdős problems became the testbed

The trick that makes it trustworthy — and two types of 'solved'

What this means — and what it doesn't

The durable implication: a template for science

Frequently asked questions

Sources

Related

The month AI started doing real mathematics

AI as scientific instrument: what OpenAI's June wave actually demonstrates

The 2026 voice and TTS landscape, mapped

AI is doing real mathematics — what that actually means

Why Erdős problems became the testbed

The trick that makes it trustworthy — and two types of 'solved'

What this means — and what it doesn't

The durable implication: a template for science

Frequently asked questions

Sources

Related

The month AI started doing real mathematics

AI as scientific instrument: what OpenAI's June wave actually demonstrates

The 2026 voice and TTS landscape, mapped