• D.C.
  • BXL
  • Lagos
  • Riyadh
  • Beijing
  • SG
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Riyadh
  • Beijing
  • SG


Amazon announces new ‘Rainier’ AI compute cluster with Anthropic

Dec 3, 2024, 1:11pm EST
Amazon Web Services logo
Benoit Tessier/File Photo/Reuters
PostEmailWhatsapp
Title icon

The News

Amazon and Anthropic are teaming up to build a massive compute cluster containing hundreds of thousands of powerful AI chips designed to train the next generation of models.

Dubbed “Rainier” after the 14,000-foot peak that looms to the south of Amazon’s Seattle headquarters, the undertaking dwarfs the processing power used to build current state-of-the-art models, including Anthropic’s “Claude,” Amazon said Tuesday during its annual cloud computing conference in Las Vegas.

The project is unique in another way: The compute cluster is not housed in a single facility, but spread out across multiple locations and connected to act as one, gigantic computer, according to Gadi Hutt, the company’s senior director of business development.

AD

Most powerful AI models are trained under one roof because the speed at which data travels through the system is crucial to its efficiency. Connections between facilities could act as bottlenecks that slow down the process.

Hutt told Semafor the company solved the problem with innovations like its Elastic Fabric Adapter, which amounts to a super high-speed data transfer method that Amazon has been working on for years.

Physically splitting up massive clusters could help solve some big problems facing AI development. Spreading things out makes it easier to find adequate energy to power the operation and mitigate cooling challenges that arise when AI data centers get too large.

AD

The Rainier cluster will use Amazon’s custom-built AI chips, called Trainium 2, that compete with Nvidia’s graphics processors.

The company didn’t say exactly how much processing power Rainier will be capable of producing, so it’s impossible to compare it head-to-head with other massive compute clusters, such as Elon Musk’s Colossus, located in Memphis and believed to be the largest supercomputer in the world so far.


Title icon

Know More

The AI race is turning into an industrial-scale scramble to build larger and larger data centers capable of producing ever more powerful models.

In recent weeks, the AI industry has been dogged by speculation that models may have reached a plateau, and could stop improving. Amazon’s decision to build Rainier suggests it believes the so-called “scaling laws” of AI will continue for the foreseeable future.

AD

Hutt dismissed the notion that AI model improvement is slowing down. “That is the opposite of what we see,” he told Semafor.

Amazon said it would spend $75 billion on capital expenditures in 2024, much of that going to AI data centers. It expects to spend even more next year.

The Rainier project pulls back the curtain on what that spending looks like. These data centers are among the most ambitious and expensive bets the big tech companies have ever made.

They are so large and the technology so potentially transformative that they may eventually become part of a “Manhattan Project for AI,” as Trump administration officials have called it. The Trump administration may even appoint an “AI Czar” to oversee the technology.

If the US takes on the task of building something on this scale, it will no doubt have to work with the private sector. Right now, Musk’s position as the “first buddy” puts his company, xAI, in position to benefit from those contracts.

But Amazon’s Rainier announcement shows that it could also be in a good position to work on the project. That’s complicated by Amazon founder Jeff Bezos’s relationship with Trump, who attacked him in the past because of his ownership of The Washington Post, often accused of bias by the right.

Title icon

Reed’s view

Amazon was caught off-guard by the “ChatGPT moment” a couple of years ago, but this phase of the AI race may actually favor them over rivals. Amazon is a huge legacy tech company. But compared to most that fit that description, it’s capable of mobilizing quickly and innovating. That’s largely due to a corporate culture that Jeff Bezos put in place when the company was still a startup.

After dominating the cloud computing market for nearly two decades, the company has built up the muscles to solve the hard logistical hurdles standing in the way of the next AI breakthroughs.

I recently interviewed Dave Brown, the head of the company’s chip efforts, at Amazon’s headquarters in Seattle.

Amazon had started planning for the AI boom several years ago, when it embarked on a project to build its own AI chips, similar to Nvidia’s graphics processors. Those chips, called Tranium, are a piece of the puzzle. But Amazon’s bread and butter as the cloud computing leader is the ability to innovate on how those chips are deployed.

I asked Brown about the challenge of spreading model training out over multiple locations and it was clear Amazon was pushing for breakthroughs there.

“There’s a whole new area of science that’s going to have to come up,” he said. “It’s just a physics problem. You can only fit so much land and power into a certain given area. Let’s say — and I’m just making up numbers — a model is going to require millions of accelerators in the fullness of time. You’re not going to get that in a few city blocks.”

Nvidia has become a household name for its powerful graphic processors used by the top AI companies, but the chips themselves are only part of the equation. Figuring out how to connect them together and get them to work in unison is a big hurdle. Each connection between the GPUs is like a tax on performance.

If Amazon has found a way to speed up those connections and even link multiple facilities, the chips don’t need to be quite as fast because Amazon can just use more of them.

Amazon is also less likely to face a shortage of chips, thanks to its decision more than a decade ago to build its own. It’s now one of TSMC’s biggest customers.

By investing in Anthropic, Amazon has a guaranteed customer that can put its Rainier project to good use. In some capabilities, models built by Anthropic are on par with market leader OpenAI. Anthropic is also helping Amazon design its chips to more efficiently run its cutting edge AI models. That benefits both companies.

Amazon wants to build the most powerful AI data centers in the world and it wants Anthropic to win the race to “artificial general intelligence” by training its models on those clusters. Of course, so does Microsoft, which has hitched its wagon to OpenAI. So does Oracle, which is also working with OpenAI on a massive AI data center in Texas. And so does xAI.

Soon, we’ll be referring to these data centers by their nicknames. We’ve got Colossus and Rainier. Microsoft and Oracle’s marketing teams are no doubt hard at work coming up with a name for their versions.

And in the end, it may all become part of the “Manhattan project.”

AD
AD