The Scoop
Elon Musk’s new xAI data center in Memphis reached a major milestone this week, bringing online all 100,000 advanced Nvidia chips at the same time, according to people who have knowledge of the matter.
The feat makes the data center, nicknamed “Colossus,” the most powerful known computer ever built and represents a significant technical achievement for xAI, a relatively young company that made the massive facility operational in less than six months.
While Musk has tweeted about the facility, calling it the largest in the world, industry experts have questioned whether xAI has the required energy or the technical ability to run so many GPUs — in this case, Nvidia’s H100 chips — all at once.
“Musk may be overstating how many of the GPUs are actually operating in a single cluster,” The Information reported earlier this month. “No other company has been able to successfully string together 100,000 GPUs, due to the limitations of networking technology that connects chips to each other so they can act like a single computer.”
The achievement came earlier this week, allowing the company to train an AI model with more compute power than any other known one in history. xAI is using the data center to train the AI model behind Grok, the company’s chatbot that bills itself as an uncensored version of ChatGPT.
Musk didn’t immediately respond to a request for comment.
Know More
xAI has aggressively pursued its goals, going so far as to connect natural gas turbines to supplement traditional power, a stop gap measure to keep iterating, even as utility officials work to bring more power to the facility.
Energy has become one of the major challenges in the effort to build more powerful AI models. Bloomberg reported that OpenAI CEO Sam Altman had asked US government officials for help in building data centers that require five gigawatts of power, or as much as five nuclear power plants.
Microsoft, BlackRock and Abu Dhabi’s MGX are collaborating on a $30 billion investment fund targeted at infrastructure projects for massive data centers used for AI.
The race to build bigger data centers is the reason OpenAI is pursuing billions of dollars in new funding and looking to change its corporate structure to allow for larger investments.
Compute power alone does not guarantee a better AI model, but one school of thought in the industry is that more of it generally equates to more capable models.
It’s also possible to train multiple AI models and then combine them into one larger model, sometimes referred to as a “mixture of experts.”
Historically, the more GPUs connected together under one roof, the more powerful the models that are produced.