What does an AI prompt engineer actually do?

Events Newsletters

About Speakers Bureau Careers Podcast

D.C.
BXL
Lagos
Dubai
Beijing
SG

D.C.
BXL
Lagos

Dubai
Beijing
SG

Events Newsletters

IntelligentTransparentGlobal

July 21, 2023


Technology

Reed Albergotti

Hi, and welcome back to Semafor Tech, a twice-weekly newsletter from Louise Matsakis and me.

I attended a hackathon in San Francisco on Saturday in a packed warehouse sponsored by Scale AI, one of the most important players in the AI scene. Louise and Semafor Principals colleague Kadia Goba covered the company’s fascinating CEO last month.

Riley Goodside, Scale’s head “prompt engineer,” was at the event. This job didn’t exist a year ago. The advent of ChatGPT has created a wave of mostly young people who have made it their business to play around with large language models and image-generation programs, and get them to do things that their creators never really intended or even thought about.

This used to be called “hacking.” What’s different about prompt engineering is that it doesn’t require any understanding of what’s under the hood. It takes imagination and a fairly analytical mind.

How is this a job, you ask? Companies creating large language models of their own can hire Scale to bombard them with the most intricate prompts, finding weak spots.

I think prompt engineering today offers a glimpse into the future of computing, when everyday language replaces interfaces and even computer code. Read the interview below for Goodside’s fascinating story and his predictions on where this is all headed.

↓

Move Fast/Break Things

➚ MOVE FAST: Game on. The AI race in big tech is getting even more intense. Apple is building its own rival to ChatGPT, Google is demoing a program that can write news articles, and you will soon be able to build an entire website with chatbot prompts. Meanwhile, Amazon, Meta, Microsoft and other top AI companies agreed to safeguards touted by the White House.

➘ BREAK THINGS: Game over. The U.S. Federal Trade Commission appears to be accepting defeat in its aggressive fight against the Microsoft-Activision deal. After getting shot down in federal courts, the agency withdrew its in-house administrative challenge, paving the way for a settlement.

Post Email

↓

Artificial Flavor

Reuters/Kim Hong-Ji

Would Gangnam Style have been an even bigger hit if the lyrics were translated into other languages? South Korea’s largest music label is betting the answer is yes. In May, HYBE used artificial intelligence to blend the voice of singer Lee Hyun, who recently rebranded as the AI alter ego MIDNATT, with the voice of native speakers of English, Spanish, Chinese, Japanese, and Vietnamese. Lee, 40, sang the track Masquerade in each language, although he doesn’t speak any of them fluently. Native speakers were then asked to read out the lyrics.

The two recordings were seamlessly blended together using a program called Supertone, which HYBE announced it was acquiring in January. K-Pop acts have invested heavily in appealing to global audiences in recent years, often adopting easy-to-pronounce names and incorporating more English lyrics into their music. Now, AI could help bring bands like BTS and Blackpink to even more parts of the world.

Post Email

↓

Q&A

Riley Goodside is the lead prompt engineer for Scale AI.

Q: When did you start prompt engineering?

A: My initial interest was in code completion in 2022. How could it follow instructions well for producing code? One of the things I started playing around with out of curiosity was the question of how long could a prompt be and still be followed.

GPT-3 was created to follow short instructions that somebody could prompt by saying ‘give me 10 ideas for an ice cream shop or translate French to English.’ But they never trained it on somebody writing an entire page of instructions, like a whole cascade of steps to follow.

I found it could do many of these. There were issues and it would trip up, but if you had a bit of intuition of what it could do and what it couldn’t do, you find that even if you input a page of [instructions], it still works.

Q: Was that a big revelation?

A: It was not well appreciated that instructions could do this. I spoke with a member of OpenAI’s technical staff at the time and I asked him ‘were you expecting this to be able to follow instructions of this length?’ He said ‘no, we just trained it on many examples of shorter ones and it got the gist and learned to generalize to larger instructions.’

That was my first clue that maybe I’m onto something here, that a normal person using it just playing around could discover.

Andrej Karpathy likes to describe the role of the prompt engineer as an LLM psychologist, developing folk theories of what the model is doing in its head, so to speak, with an understanding that there really is nothing in its head. There is no head.

Scale AI

Q: LLMs are famously not good at math. Is there anything else that they can’t do?

A: One is exact calculation, especially hard ones, like ‘give me a cube root of a seven digit number,’ and another is reversing strings, which surprises a lot of people. Like, writing text backwards. It’s a quirk due to how they’re implemented. The model doesn’t see letters. It sees chunks of letters, about four characters long on average.

Another is array indexing. For instance, if you tell it you have a stack of Fiestaware plates of these colors: green, yellow, orange, red, purple. And then say ‘two slots below the purple one, I placed a yellow one, then one slot above the green one, I placed a black one.’ And you say ‘what is the final stack of plates?’ Language models are terrible at that. If you ask it to give me a list of 10 examples of something, sometimes you might get nine, other times 11.

Q: Sometimes I wonder if all of this work to make LLMs so safe has reduced functionality. Wouldn’t people like you rather have access to the raw LLMs?

A: Absolutely. There’s something that has been lost in adding alignment. There’s even a technical sense in which that’s true. It’s referred to as an alignment tax, which is the drop in performance that you get from many benchmarks.

Many people are justifiably annoyed by the over-refusals. Refusing to help with things that are actually not a problem. And there was an elegance to models that would never refuse. It was fun. You could do things like say the Oxford English Dictionary defines fluxeluvevologist as … and it would come up with some ridiculous etymology of what this word actually means.

It used to be if you asked a model ‘who are you?’ it would say ‘I’m a student in Ohio.’ Now, when you ask it that, it says ‘I’m ChatGPT.’ That’s good in some sense. It’s useful. But it takes it out of this fantasyland that did have a magic to it.

For the rest of the conversation, read here.

Post Email

↓

One Good Text

Luke Petherbridge is the CEO of Link Logistics, one of the largest warehousing firms in the U.S.

Post Email

↓

Obsessions

Reuters/Dado Ruvic/Illustration/File Photo

We wrote on Wednesday about new research showing ChatGPT’s performance had declined in some areas, which appeared to support widespread complaints about the chatbot getting “dumber.”

Two scholars from Princeton University later published a fascinating blog post arguing the study has been misinterpreted. ChatGPT hasn’t lost its capabilities, they explained, which AI models acquire during the expensive pre-training process. Instead, the chatbot is demonstrating different behavior, something that is adjusted later during fine-tuning. To put it differently, ChatGPT is still just as smart, but it now gives different answers to the same questions, thanks to tweaks OpenAI has made over time.

But it’s not hard to understand why many ChatGPT users were confused and frustrated. The changes forced them to find new ways to interact with the chatbot to get the same responses they had previously. Applications built on top of the GPT API may also “simply break if the model underneath changes its behavior,” the Princeton researchers wrote.

OpenAI acknowledged these concerns on Wednesday, and said it would keep older versions of its models online for longer than it originally planned. “We understand that model upgrades and behavior changes can be disruptive,” the company wrote in a blog post.

— Louise

Post Email

↓

Enthusiasms

Reuters/Maja Smiejkowska

I saw Oppenheimer last night in an audience full of tech industry folks. It’s a movie that can be appreciated on a lot of levels. For a certain segment of Silicon Valley, it’s a reminder of the industry’s roots. World War II led to an era of “Big Science,” which saw a partnership between great minds and the U.S. military industrial complex that lasted until the end of the Cold War.

For a time, Silicon Valley rejected those roots, and it was convenient to do so. The United States no longer faced a great threat like the Nazis or the Soviet Union. Companies could focus on building for a global consumer. But the tech industry also lost something when the Cold War ended: A sense of purpose that went far beyond profit. Now, there’s a growing desire to rekindle those years, and the Manhattan Project is where it all began.

But the movie also comes with a warning: Conflict leads to paranoia. Scientists who would have contributed to the national interest were rejected for their left-wing views. As the U.S. faces a new threat from China, the lesson — which many people I speak with in Silicon Valley already grasp pretty well — is that the U.S. will be weaker if it succumbs to those uglier human instincts, especially with regard to people of Chinese descent.

— Reed

Post Email

↓

Hot On Semafor

Tensions are rising in the GOP as NY lawmakers threaten to hold up a major tax bill, Joseph Zeballos-Roig and Kadia Goba reported.
A top lobbyist at Paramount helped kill a Vice documentary on Ron DeSantis’ time working at Guantanamo Bay, Max Tani scooped.
Nigeria’s private sector should work with the government to develop welfare initiatives as prices surge, the country’s top business leaders told Semafor Africa.

Post Email

Technology