For decades, mathematicians thought their jobs were safe from the AI boom. Sure, AlphaGo humbled the Go grandmasters, and protein folders won Nobel prizes, but creative proofs just felt like a human sanctuary. Last week that illusion evaporated faster than an integral under a well-placed substitution.
Google DeepMind’s Gemini Deep Think and an experimental OpenAI model walked out of the 66th International Mathematical Olympiad (IMO) with gold-level scores (35/42 points or five of six problems solved).
The news shocked Polymarket when its “will AI win the IMO gold medal in 2025?” market odds spiked to 67% “yes” on July 19th only to crash down to 14% “yes” a week later.
Turns out the challenge only resolves when the models are open sourced and results fully reproducible, a bar neither Google nor OpenAI cleared. Fine, I suppose we have to wait until the 67th IMO next July in Shanghai for China to open source math superintelligence.
Still, there is no doubt, AIs are getting better at math. In just two years we’ve gone from ChatGPT failing basic math to this OpenAI and Gemini victory. All made possible because of longer context windows, reasoning, and reinforcement learning.
The wild part is OpenAI’s model wasn’t even optimized for geometry or number theory. It simply burned gargantuan test-time compute, thinking for the entire 4.5-hour window every human contestant faces until a clean proof survived peer review by three former IMO champions. Google’s gold run is estimated at $1.5 million of TPU time. OpenAI's costs are unknown.
The two labs are now arguing about who announced first and who used the official rubric. Basically: whose gold is shinier. That drama on X is fun, but the scoreboard reads Machines 2 – Humans 0.
Additionally, IMO President Prof. Dr. Gregor Dolinar said the IMO graders found the AIs’ answers to be clear, precise, and mostly easy to follow.
Which is impressive because these problems are complex and demand creative leaps. For example: Problem 3.
Let N denote the set of positive integers. A function f : N → N is said to be bonza if
f(a) divides b^a − f(b)^f(a)
for all positive integers a and b. Determine the smallest real constant c such that f(n) ⩽ cn for all bonza functions f and all positive integers n.
The official solution is 4 since every bonza function is one of the following:
The constant 1 function
The identity function
The function that is 1 for all odd n and 2 for all even n
The function that is 1 for all odd n, 2 for all even, n != 4, and 16 for n = 4
The 68th IMO will be held in 2027, but the location and exact dates have yet to be announced. Does it matter? Probably not. The real game has already moved beyond high school competitions to the frontiers of math research.
Some in the field are pivoting to solving "explain why the model is wrong" meta-questions. And math AI’s could cause grant committees to be flooded with generated lemmas, potentially requiring disclosure of AI contributions. As mathematicians start offloading boilerplate proofs, a new type of startup is emerging.
Like Harmonic.ai — which was started by RobinHood CEO Vlad Tenev and raised $100 million from Kleiner Perkins and another $75M from Sequoia — to build mathematical super intelligence. Or Lean FRO, a nonprofit which is rallying top formalization talent to build open, verifiable AI mathematicians using Microsoft developed LEAN language to ensure that the AI’s outputs are provably correct.
In the end, these algorithmic improvements are generalizable mechanisms. The IMO is just another milestone, like AlphaGo or AlphaFold, on AI’s inexorable march into every domain of human intellect. The drama and the caveats of this are temporary distractions. The fundamental truth is that a tool for automating discovery is here, and science will probably never be the same.