• D.C.
  • BXL
  • Lagos
  • Riyadh
  • Beijing
  • SG
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Riyadh
  • Beijing
  • SG


Revealing the mysteries of ChatGPT

Jul 28, 2023, 1:17pm EDT
tech
Unsplash/Markus Spiske
PostEmailWhatsapp
Title icon

The Scene

OpenAI’s industry-leading large language model has been a black box. That’s why research and consulting firm SemiAnalysis’ exhaustive deep dive into the technical details of GPT-4 caught the attention of many in the industry. Its paper, revealing new details about the LLM, was a bombshell in the small but growing world of people who obsess about the technology.

SemiAnalysis covers the semiconductor and AI industries, and has five analysts across Asia, Europe, and the U.S. We talked to the company’s chief analyst Dylan Patel about its findings in an edited conversation below.

Title icon

The View From Dylan Patel

Q: You say in your research that ChatGPT is amazing, but that eventually open source and other competitors are going to surpass GPT-4. What led you to believe that?

AD

A: One of the most complex and costly things in AI is all the failures you have to go through to get to the point where it’s right. OpenAI and Google went through a lot of these failures already. Not all of them are in the open, but many of them are. So the cost of all these failures to get to a working AI is significantly reduced.

There’s quite a few startups and quite a few large companies that will be able to do this.The question is, what has OpenAI been working on? Will they break through barriers at some point again?

Q: You try to dissect GPT-4. What was the most surprising thing that you learned?

AD

A: The first time I used it, I was like, ‘wow, this is magic. How were they able to do this?’ As we dissected it, it was like, ‘wow, that’s actually quite logical. It’s not magic and there are so many engineering tradeoffs.’

Q: What are some examples?

A: In some cases, it’s, ‘hey, we don’t have enough data, so what am I doing to augment my data?’ Or ‘hey, the cost of running inference [the term for what happens when you prompt ChatGPT and other models] is so high, how do I reduce it?’ They created a mixture of experts, which is like a baby model within a model.

AD

Q: What’s an expert and how did that help?

A: Let’s take it even a step further back. People always talk about these parameter counts, which is the size of the model. It’s basically how many different numbers am I multiplying, adding, doing these mathematical operations. When you input something to the language model, you tokenize it. Tokens are basically four letters.

You feed that through the model, which is just predicting what the next token is going to be. When it’s done predicting that token, it feeds that token back into the prompt and it runs it through the model again, and so on and so on. And that’s how it generates the response.

The special thing about these experts is that instead of every single parameter being used every single time you run the model, you only use certain parameters. You have these experts specialized on a specific task. It comes down to, ‘hey, this expert is actually just really good at knowing prepositions. And this expert is really good at knowing animals and concepts about wildlife.’ It’s an oversimplification, but that’s basically what’s happening.

And it’ll change every single time, so if you ask it ‘what color is the sky?’ it might use these two experts to generate sky and might use these two experts to generate ‘is blue.’ GPT-4 has 16 experts.

Q: And they do this because it would be slow and expensive to run the whole thing every time?

A: Exactly. If you try to run a trillion parameters, the cost would be so expensive, and you couldn’t generate responses at human reading speed.

Q: What does that tell us about the future of large language models and how good they will ultimately get?

A: We’ve only scratched the surface. Most people, when they’re building a supercomputer, have been only doing it on the scale of a billion dollars. Now we’ve kind of unlocked a reason to go way, way past that value. Why don’t we build a $10 billion supercomputer? Should we build a $50 billion one? The value could be that high.

Q: As this has become demystified for you, what’s your take on how close we are to AGI?

A: I don’t even know the definition of AGI. If you asked me a few years ago, ‘what’s AGI?’ I would have said ‘it’s something that can pass the LSAT and can also pass some medical exams and can also write poetry.’ There’s a higher bar now.

Q: Do you think it is dangerous or is on the verge of becoming dangerous?

A: They could be dangerous. There will be societal upheaval from a lot of people’s jobs being automated and there are questions about who profits from that and how that gets distributed.

But also, it’s the most positive thing that has happened to humanity since the invention of the internal combustion engine. It will be able to unlock better lives for humanity. I’m an optimist. AI will not kill us and AI will actually do really great things.

AD