The Scene
The political crisis surrounding Google AI chatbot and image generator Gemini, which refused to depict white people and changed the race of certain white historical figures, reflects a bigger dilemma facing consumer AI companies.
The AI models are so large and complex that nuanced control over their outputs is extremely challenging. According to people who worked on testing the raw version of GPT-4, the model that now powers ChatGPT, the responses could be disturbing.
Despite OpenAI rules written for DALL-E requiring equal representation, Semafor found that when asked to generate an image of a group of orthopedic surgeons discussing something important, for instance, it generated five white men.
But when prompted with “white people doing white people things,” it said “I’m here to help create positive and respectful content” and wanted me to specify an activity. I responded with “white cultural heritage activities,” and it produced two black boxes.
Next, it was prompted with “Black people participating in Black culture.” It produced an image of Black people dressed in traditional African clothes playing the drums and dancing.
When asked to create a beautiful white model, it refused. When asked to create a beautiful Black model, it generated images of non-Black ones. Even for OpenAI, which has been the gold standard model, dealing with race and ethnicity has been tricky.
To be fair, DALL-E often did what it was supposed to do. This was especially true of gender diversity. In images of white shoe law firms, Wall Street bankers, and race car drivers, it made sure to add at least one woman. However, in almost all of those cases, there were no people of color.
OpenAI didn’t immediately respond to a request for comment.
The tricky and socially fraught nature of these endeavors has put AI products under fire from across the political spectrum. The left complains the models are biased against people of color and too permissive, while the right believes companies have gone too far in placing ideological guardrails around the technology.
Pichai is no stranger to culture wars centered on the company. In 2017, then-Google employee James Damore created an uproar when he sent an internal memo criticizing the company’s affirmative action hiring practices and citing biological reasons for why men are more represented in software engineering jobs. Damore was fired, pitting the political right against Google.
This time around, Pichai’s battle seems more existential because the success or failure of Gemini will determine the company’s fate in the years to come.
In this article:
Know More
Large language models like ChatGPT and image generation tools like DALL-E are first trained on vast quantities of data.
But those models are like a toddler who can speed read but doesn’t comprehend the meaning of the words.
In order to teach the model how to behave, several layers of protections are usually put in place. That can include reinforcement learning with human feedback, where people give the model direction by rewarding good responses. Protections can also be found in what’s known as a “system prompt.”
Before anyone prompts an AI chatbot or image generator, a system prompt gives the models a set of rules, which can tell a chatbot, for instance, to always include different ethnicities when generating images.
Dylan Patel, of the publication SemiAnalysis, recently tricked ChatGPT into divulging its system prompt. The extensive document instructed that “all of the same occupation should not be the same gender or race” and “use all different descents with equal probability.”
In addition to the system prompt, users may unknowingly have their prompts changed or rewritten. For instance, if a user asks a question about politics, the system may surreptitiously add something to the prompt, instructing the model to avoid engaging on certain topics.
Reed’s view
There’s no easy fix to AI models generating controversial images and text — or to their managers making politically-charged decisions about what is, in fact, controversial. The industry may need an entirely different approach if it wants to thread the needle on politics and safety.
When OpenAI launched DALL-E 2, the company was upfront about its issues. In a press release, it said that the model would show bias, because some of the datasets it used were inherently biased.
That appears to have been a more successful approach. The mea culpa from the beginning gave them some leeway and time to solve the challenges. And it still sometimes produces racially homogenous groups of people as a default.
I asked it to create an image of a pickleball meetup and it came up with only white people. Same with “people golfing” and “people doing deadlifts.” A group of Wall Street stock traders? Only white men.
Room for Disagreement
Othman Laraki, CEO of health startup Color and a former Google product manager, said on X that the Gemini incident sheds light on a deeper and more existential problem facing Google: Generative AI may kill its search ad business.
“The real problem for Google is one of clockspeed,” he said. “Google all of a sudden has its ass on fire and is trying to innovate into the future. But, that innovation now has to happen at the heart of its business. OpenAI doesn’t care about messing up an ads business model - they can just iterate with a product/quality purity that is impossible for Google to get.
Google isn’t going to lose to OpenAI in the coming few years. It has lost over the past decade, when it could have evolved/iterated AI into its model at its success-encumbered clockspeed. Now that the game is on, but on a startup clockspeed, there is no chance for Google to catch up and even less win this next cycle.”
Notable
- Google’s Gemini problem stemmed from modifying users’ prompts without their knowledge, according to Bloomberg. Those modified prompts may have led to the unexpected results, according to the article.