The Scoop
Google has found a cheaper way to run AI models, one of the tricks up its sleeve that could give it a long-term edge in the high-stakes race between the largest tech companies, DeepMind co-founder Demis Hassabis said in an interview with Semafor.
For years, the compute power used in generative artificial intelligence was concentrated in the “pre-training” phase, when a raw AI model is initially created. But as models have evolved, the demands of running them — known as inference — have grown.
If an AI model were a brain, inference would be akin to thinking. And it turns out thinking longer can drastically increase a model’s capabilities. That means the compute power available to AI companies today isn’t sufficient to extract the full value of the technology.
Hassabis said new processors — known as “light chips” — are in the works that could make it more cost-effective to run the models.
“Sometimes you have the ‘victim of your own success’ problem,” Hassabis said. “If you build a very performant model, like [Google’s Gemini] 2.0 Flash, everyone wants it, which is great. But then suddenly, you only have a set amount of chips. You need more for serving.”
He said the new Google chips are based on the same architecture as the company’s Tensor Processing Units, a custom-designed AI chip that the company has been working on for around a decade.
In this article:
Step Back
Hassabis’ comments come as the processing power of AI chips is front and center in both Silicon Valley and Washington. On Wednesday, President Donald Trump announced a $500 billion financial commitment from OpenAI, Oracle, and SoftBank to create a joint venture called Stargate to build AI compute clusters in Texas.
Amazon spent $75 billion last year on similar efforts and plans to dole out even more next year. Microsoft said it plans to spend $80 billion this year alone.
Hassabis said Google’s approach is unique in that it is building every component of the AI technology stack on its own. “We’re probably the only company that goes from the bare metal, the chips to the data centers,” he said. “There’s a feedback loop between knowing where the algorithms are going, and then what chips you would design.”
Google has also been using AI models, a project known as Alpha Chip created by DeepMind, to design its processors.
Know More
Hassabis is both a scientist — he won the Nobel Prize last year for his work in AI — and a competitive entrepreneur who relishes the race.
The surprising runaway success of ChatGPT in 2022 reoriented the industry around generative AI. Google’s efforts, which resembled university research labs and were spread out in different divisions, were brought under one roof with Hassabis at the helm.
The goal is to create AI that can reason as well as a human, or what some people call “Artificial General Intelligence,” or AGI. But unlike some of his competitors, Hassabis believes the finish line is years down the road, giving the company time to make long-term decisions that will pay dividends later.
For instance, when Google set out to compete with ChatGPT, Hassabis’ team chose to take the more difficult approach of building a “natively multimodal” AI model called Gemini, which was not just trained on text.
“We always felt [multimodal] was a key part of the model understanding the world,” he said. “Because ultimately we want a world model, not just a language model.”
DeepMind has also prioritized research into how to make AI models better “remember,” so that they can take on longer and more complex tasks. The company has increased the so-called “context window” to 1 million tokens, or fragments of a word.
“That’s the sort of equivalent to working memory for us, just a ginormous one,” he said. “But I think we probably also need a type of episodic memory.”
DeepMind’s prioritizing of context windows and research into memory could pay off down the road, as available compute increases and costs go down.
For now, the expenses of the most advanced AI techniques can be staggering. For instance, OpenAI’s most advanced model, called o3, recently achieved an impressive score on a benchmark consisting of a series of puzzles meant to be relatively easy for humans and impossible for AI. The catch? Answering 400 puzzle questions cost over $1 million in the most advanced version of o3.
Reed’s view
If Hassabis turns out to be right, and the AI race is actually a much longer term game than some people in the industry believe, Google DeepMind is in a better position than it sometimes appears.
Before we get to AGI, there may be a long-ish period where AI models begin to reason well enough to be incredibly useful.
If Hassabis and others at DeepMind are right about a multimodal approach leading to better reasoning and a “world model,” Gemini could ultimately evolve into robotics.
But multimodality and long context windows mean more compute power. The efficiencies that come from custom silicon optimized to work with Google’s own software would help the company scale that compute, allowing it to deploy its AI products to billions of people.
Those hypothetical products would also be a great source of training data for better AI models, creating a virtuous loop.
If you want to get a taste of just how bottlenecked AI is by current compute limitations, try building a basic software program using a model like Claude 3.5 Sonnet. You’ll more than likely hit token limitations and get very frustrated.
You’ll also get a sense of how AI with practically unlimited tokens and long context would be exponentially more powerful.