• D.C.
  • BXL
  • Lagos
  • Riyadh
  • Beijing
  • SG
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Riyadh
  • Beijing
  • SG


Google DeepMind AI system reaches milestone in global math contest

Updated Jul 25, 2024, 11:30am EDT
tech
Jonathan Raa/NurPhoto/Reuters
PostEmailWhatsapp
Title icon

The News

Researchers at Google DeepMind have trained a new type of AI system capable of solving complex math problems from this year’s International Mathematical Olympiad, reaching a score equivalent to a silver medalist for the first time, the company said on Thursday.

The IMO is the world’s hardest math competition for high school students. Teams of six of the brightest young mathematicians around the world are given six problems, which they each have to try and crack over two days. They can earn a maximum of 7 points for each problem, with a perfect score totaling 42 points. The USA Math team won first place this year with five out of six teenagers bagging at least 29 points, the minimum required to win a gold medal.

Google DeepMind’s latest system, dubbed AlphaProof, wasn’t too far behind. Although it didn’t solve the six IMO problems in the competition’s time limit, it managed to achieve 28 points – the highest score that AI has reached so far. Companies like Google DeepMind believe that solving difficult abstract math problems is necessary for developing the logic and reasoning skills required for artificial general intelligence, where the technology is better than humans at most tasks.

AD

“No AI system has ever achieved a high success rate in these types of problems,” Pushmeet Kohli, VP of Research, AI for Science at Google DeepMind, said in a briefing. “A lot of reasoning is required to prove results.”

Its latest system is an improvement over the company’s previous AlphaGeometry model, and has solved 83% of all IMO geometry problems over the past 25 years, compared to the previous rate of 53% – including the hardest geometry problem ever set in the competition so far. AlphaGeometry could only handle geometry, whereas AlphaProof can tackle other areas of math like number theory, algebra, and combinatorics.

Title icon

Katyanna’s view

It’s difficult to assess how smart AlphaProof really is, and the level of reasoning it has, just as looking at how ChatGPT does on the LSAT doesn’t offer a clear metric of its intelligence.

AD

But being able to come to the correct answers on complex mathematical and reasoning problems may be a critical step in achieving the science fiction-like levels of AGI that companies like Google and OpenAI are chasing after. To be sure, it’s clear AlphaProof doesn’t solve math problems the same way humans do, using more brute force and guesswork in some cases. It generates proofs by searching and trying out various combinations of possible mathematical steps to arrive at the best solution.

Alex Davies, a research engineer at Google DeepMind, told Semafor that AlphaProof works a bit like the company’s previous model, AlphaGo, designed to play the strategy-based board game Go. Instead of searching over possible moves, however, AlphaProof tries to find the right sequence of proof steps. “In any given attempt at a problem, [it’s] limited to fewer moves than would be explored in chess or Go systems,” he said.

But human IMO participants rely on their knowledge of theorems, and develop an intuition for how to solve a problem. They don’t have to consider as many different possibilities, and in many ways, are more efficient and intelligent than machines. In one case, AlphaProof took three days to finally arrive at a solution, for example.

AD

Still, general large language models that power AI chatbots often struggle with basic math. They hallucinate and cannot even accurately determine which number is larger than another sometimes. AlphaProof avoids these problems by generating its answers in code. If it manages to write a program that can be executed on a computer without errors, its solution may be correct; but if it can’t, its answer is definitely wrong.

The researchers generated synthetic data to train the system. It learned the geometric relationships in various shapes by inspecting 100 million randomly drawn diagrams, and their specific proofs to figure out how to solve new problems.

The system, however, revealed glimmers of brilliance, according to Sir Timothy Gowers, a combinatorics professor at Collège de France, and a Fields medalist, the top prize in math that’s often compared to a Nobel Prize in science, and IMO gold medalist.

Gowers said that in order to find a solution, mathematicians have to find clever tricks like “magic keys” that unlock the problem. Initially, he thought that finding these magic keys was “probably a bit beyond” AlphaProof’s abilities, but was surprised to see it had found them once or twice in its solutions.

“I find it very impressive, and a significant jump from what was previously possible,” he said.

Title icon

Room for Disagreement

Although AI has gotten much better at math, it’s not clear how these skills can be applied in the real world yet. Google DeepMind is looking at how AlphaProof may aid researchers as a “proof assistant.”

David Silver, VP, Reinforcement Learning at Google DeepMind, said that good mathematicians do a lot more than just solving problems. They also pose questions, and come up with fresh ways of thinking to invent new fields of mathematics. He said that AlphaProof was “not adding to the mathematical body of knowledge that humans have created” so far. No AI can do that yet.

Silver told me that it’d take a lot more breakthroughs in math and technology to reach that point. Maybe this should be the true test for whether we’ve reached AGI or not.

Title icon

Notable

  • How Google DeepMind was convinced to launch a new Alpha-type project building new AI that might one day win the International Math Olympiad.
  • The list of problems in this year’s competition, check out the fifth problem about Turbo the snail.
AD