The News
Competition is heating up between big technology companies building smaller AI models, which can be used in a broader array of devices like smartphones, cameras, and sensors, potentially allowing them to tap more users.
This week, Microsoft and Apple launched Phi-3 and OpenELM, respectively, new large language models that use less compute than the likes of OpenAI’s GPT-4. The moves come as the AI industry realizes that the size of models should be tailored to different applications, and keep finding ways to make smaller, cheaper LLMs more capable.
“The approach that we’re taking in the Phi series is different,” Sébastién Bubeck, Microsoft’s vice president of generative AI research, told Semafor. “The rest of the industry seems to be mostly about scaling up, trying to add more data and keep making the model bigger.” Bubeck, however, wants to squeeze as much performance out of small models as possible.
For Microsoft, investing in smaller models means it can give customers more options beyond the larger systems it offers from its partnership with OpenAI. Those that can’t afford to use top tier models can use smaller alternatives like Phi-3.
For Apple, OpenELM is relatively slow and limited, which still leaves it behind in the AI race. But it can run on iPhones, an ecosystem the company is keen to develop.
The trick to building small but mighty models lies in the quality of the text used to train them. Researchers at Apple filtered text from publicly available datasets, keeping sentences that are made up of a wider variety of words and more complex.
Microsoft used a mixture of real data scraped from the web and synthetic data generated by AI to train Phi-3. Prompting models to produce data means developers can better control the text used for training. “The reason why Phi-3 is so good for its given size is because we have crafted the data much more carefully,” Bubeck said.
It’s unclear how much data is needed to make a model as powerful as possible, and what capabilities might arise as they improve. These small AI models show that there are a lot of performance gains to be made by training on higher quality data than just scraping from the internet.
“This kind of iterative process of finding the right complexity of data for model size is a journey that the community hasn’t really embarked on yet,” Bubeck said. “And this is why we released these models. It’s to empower all of the developers to use them to see how far you can go once you get into this data optimal regime.”
In this article:
Know More
Not every user needs the most advanced, cutting-edge LLM capable of ingesting hundreds of documents or analyzing scientific research. For many simple tasks, smaller models will do just fine and are more efficient, faster, and cheaper to use. In fact, Microsoft’s latest Phi-3 systems are as good as or even better than larger models despite being a fraction of their size.
Phi-mini, made up of 3.8 billion parameters, isn’t too far behind GPT-3.5’s abilities, a model containing 175 billion parameters, to answer questions and generate basic code, according to researchers at Microsoft. In industry benchmarking tests, it’s better than the 7-billion-parameter LLMs built by Google and Mistral, as well as Meta’s smallest Llama 3 model, made up of 8 billion parameters.
In a previous project, Microsoft tested AI’s ability to generate good synthetic data, with researchers asking a LLM to write children’s stories using only a single noun, verb, and adjective from a list of 3000 words. Over time, it managed to generate millions of short stories. Bubeck said the company now has “hundreds of tricks” to get AI to generate “textbook-quality” data.
“You’re not going to create new knowledge, but you’re going to be able to explain the existing knowledge, and distill it in a much cleaner way,” he said. Bubeck declined to say what model Microsoft used to generate the synthetic data to train Phi-3. Researchers previously used ChatGPT to produce text to train its older Phi-1 and Phi-2 models.
Meanwhile, the tiniest version of Apple’s OpenELM family has just 1.1 billion parameters, and appears to outperform models of comparable sizes while being trained on less data.
Katyanna’s view
AI has yet to make a huge impact on smartphones, and tech companies are quickly moving to explore the possibilities. There’s no better way to see what new AI products and apps can be built than opening it up for other developers to tinker with.
Even Apple, which is notoriously secretive about its technology, has released the source code and training instructions for its OpenELM system. In a paper, Apple researchers explained that the reproducibility and transparency of LLMs was vital to advance AI, and investigate its potential biases and risks. It might be a while yet before the technology arrives, however, given the memory and power constraints on mobile hardware.
Room for Disagreement
How good are these small models though, really? The benchmarks used to evaluate AI’s performance aren’t always reliable and it’s difficult to accurately compare them, according to one of the main takeaways from Stanford University’s latest AI Index Report.
Nestor Maslej, the research manager and editor-in-chief of the report, previously told Semafor that some of the benchmarks don’t always reflect how people actually use LLMs. The industry “is testing these models on competition level mathematics. But companies in the real world aren’t solving [these types of] questions,” he said. Most users don’t care too much about solving math equations and would rather a model be more accurate.
Small models are limited. Phi-3 is poor at language translation since it was only trained on English text, and isn’t capable of storing too much factual knowledge, meaning it won’t be good at answering questions with the most up to date information.
Notable
- There’s a lot of hype in AI and even Meta’s CEO says it will take years before the company makes any real money from it.