Microsoft Yusuf Mehdi is Microsoft’s chief consumer marketing officer and recently took over some of the duties of the company’s former lead product executive. Q: GPT-5 will be more multimodal with audio and video. What’s your vision for how multimodal will progress? A: We’re doing a lot of it already today. I had this scenario with my wife the other day. She sent me a picture of a piece of art. She was like, ‘hey, maybe we’ll buy it for birthdays or something. You remember it, right?’ And I wasn’t sure. So I took the picture and loaded it into Bing Chat. It came back and told me, this is the artist, this is the title, here’s a bit about the story behind this piece. So then I was able to go back and say, ‘yeah, I remember that.’ That was a cool multimodal. It’s extra compute power. So it costs more money to serve those GPT queries. And then there’s a lot of work to ground it with the search engine, so that we look at the image and the text. You’ll see more and more people using it to walk around the world and learn things. Q: How do you forecast the capabilities of these models so you can plan your products down the line? A: We have a tight relationship with OpenAI. And the model isn’t the only thing. There’s actually a lot of things on top. At our February event, we announced our Prometheus model, which is our proprietary way of working with an LLM. You get better results if you get better prompts. So when someone does a search, rather than just putting that in there, we run AI on it, we compare it against the big index, because we do a lot of query disambiguation to say, ‘what’s a smart way to put that into the LLM?’ The LLM runs it, it comes back and says, ‘I think this is the answer.’ We check that against the web. And we’re like, ‘no, this is probably not going to be a great answer.’ So we’re doing at multiple levels of the technology stack, grounding, training, even before you do anything special in the model. We can spend quite a long time innovating, writing code, and shipping things to make it better even before we get to one of the next models. Q: There is kind of an art to prompting. How does that change consumer behavior and the language that we use? A: We’re having to retrain our brains because we were taught to dumb down our searches. Fewer keywords give you better search results. The average today is 2.6 keywords, that’s the average in 10, 20 billion queries every single day. In AI, you’re so much better off saying, ‘Give me a picture of a sunset at the edge of the equator, in autumn, where the moonlight is reflecting off and the sun is halfway down. And I want to see lots of blues and oranges.’ You can’t put any of that in the search engine. But that makes your AI prompt better. So we’re having to retrain our brain to feel comfortable asking for more specific things. Q: I wonder if the younger generation will be better at that? A: My daughter is a budding artist. She had to write her artist statement for her class at school. She asked, ‘Dad, do you think I can use an AI?’ She’s of that generation that is very good with technology. But she’s an artist. She’s not a techie. She came back and said, ‘I got something great. But it took me a long time to get the prompts right.’ She couldn’t believe how much work she had to do, telling it what she’s focusing on, what her passion is, what she wanted it to do. It forced her to think before writing. What am I really about? Because she’s more of an artist, she’s not great at writing these long docs about herself. But she was great about putting her ideas down. There’ll be a class of people, maybe it’ll be a younger generation, who will understand these prompts. What points do I want to make? That’s why it’s not surprising that the people who are really using them today are writers, artists, coders, because it’s people who know how to create. For the rest of the conversation, read here. → |
|