IMO Gold Achieved by New OpenAI Reasoning Techniques (Digital AGI)?, Society is Not Ready for This
The result closes a long‑standing gap between symbolic manipulation by machines and creative proof writing by humans, marking a genuine inflection point for AI reasoning (Rohan Paul)
In a post today, OpenAI’s Noam Brown posted that one of their new models (not ChatGPT 5, but that is coming soon) has achieved Gold on International Math Olympiad problems (35/42). For this year's competition, only 67 of the 630 total contestants received gold medals, or roughly 10 percent.
The previous best was Gemini Pro 2.5 (13/42; 19/42 is required for a Bronze). The model generated its solutions under standard competition conditions: two 4.5-hour sessions, no outside help, all answers written in natural language, and no tool use.
This is significant for a number of reasons.
(1) These problems are incredibly difficult and require a lot of reasoning, creativity, and argument development to solve. This is a sample from 2025.
While AI traditionally excels at processing large datasets and automating repetitive tasks, it has historically struggled with problems requiring sophisticated reasoning and creative problem-solving.
According to OpenAI researcher Alexander Wei (see below), this breakthrough demonstrates that AI can now "craft intricate, watertight arguments at the level of human mathematicians" when approaching complex mathematical challenges that require deep analytical thinking rather than mere computational power.
(2) The score was not achieved through any IMO or math specific training, but through advances in LLM-reasoning that can be applied to other areas.
It’s not just a narrow application…This, he suggests, may help enable scientific discovery…a key criteria for artificial general intelligence for many (and something with enormous implications regardless of whether we want to call it AGI).
Why does it apply to other areas? It applies to other areas because the researchers implemented novel verification mechanisms that require the model to explicitly articulate each step in its reasoning process, followed by systematic validation of these steps to identify logical inconsistencies. This additional verification layer transforms abstract reasoning into a transparent, sequential process that evaluators can examine at each stage. That recipe can travel to chemistry proofs, code correctness, or physics derivations without needing domain-specific plug-ins.
Consider the novel verification mechanisms researchers implemented: they require the model to explicitly outline every step of its reasoning and then systematically validate each step for logical consistency. This structured process makes abstract reasoning transparent and traceable, allowing scientists to evaluate the accuracy of each step independently.
For instance —
Chemistry: Researchers could use this approach to systematically verify each stage of complex chemical reaction predictions or molecular simulations, ensuring accurate identification of intermediate compounds and reaction conditions, thereby accelerating drug discovery processes.
Computer Science (Code Correctness): The method can systematically validate every step of software algorithms, identifying and isolating logical errors or security vulnerabilities before deployment, significantly improving software reliability.
Physics: Physicists could apply this sequential validation process to intricate theoretical derivations, verifying each mathematical transition explicitly. This would help quickly pinpoint incorrect assumptions or calculation errors, expediting breakthroughs in fields like quantum mechanics or cosmology.
Because the verification methodology is abstract and step-based rather than tied to domain-specific knowledge, it seamlessly translates into various fields, empowering faster, more transparent, and rigorous scientific discoveries.
The result closes a long‑standing gap between symbolic manipulation by machines and creative proof writing by humans, marking a genuine inflection point for AI reasoning (Rohan Paul)
(3) This was done without tools.
(4) Erik Brynjolfsson, an economist who studies the impact of AI on jobs and the economy says it’s a significant advance and that society is not ready for the AI’s emerging impact.
I’ve always found Professor Brynjolfsson’s predictions to be very conservative, so if he’s saying this is a signal that it’s fast and that society isn’t ready, you can’t just write it off to “hype.”
And, well, the prediction markets thought the chance of this occurring this year was about 20%.
I haven’t believed society is ready for a while.
(5) Yes, progress is quick. I just removed the prominent grade school math benchmark from a presentation because even though it represented the state of the art in 2024 it is no longer relevant.
(6) This improvement was achieved with very few people. So now we know why key people are getting paid tens of millions, if not $100+ million, dollars.
Based on the post, it seems we don’t have AGI yet, but we are close (and who knows what abilities labs have that haven’t been released).
(7) We can expect fast, unredictable advances. OpenAI’s Noah Brown adds that the results surprised even people inside OpenAI, calling it "a milestone that many considered years away."
(8) It’s not surprising that OpenAI won’t release this for months. ChatGPT was able to identify some specific dangers of the technology.
Dangerous Chemicals or Biological Agents:
Someone without advanced expertise could ask the AI to provide detailed, step-by-step reasoning for synthesizing dangerous compounds (e.g., explosives, poisons, or pathogens). By transparently verifying each step, the AI inadvertently simplifies the process for malicious actors.Cybersecurity Threats:
Attackers could leverage the AI’s explicit step-by-step reasoning to systematically identify and exploit vulnerabilities in software, network architectures, or critical infrastructure, greatly amplifying their capabilities.Example: International Security
An openly accessible, advanced reasoning AI could trigger global security concerns, as adversaries quickly replicate military technologies or exploit transparent AI-assisted research to build weapons, spy systems, or destabilizing technologies without needing domain-specific expertise.