🟡Semafor Tech: Why don’t search engines integrate chatbots in more helpful ways?

Events Newsletters

About Speakers Bureau Careers Podcast

D.C.
BXL
Lagos
Dubai
Beijing
SG

D.C.
BXL
Lagos

Dubai
Beijing
SG

Events Newsletters

IntelligentTransparentGlobal

San Francisco

New Delhi

Beijing

February 24, 2023


Technology

Louise Matsakis

Hi, and welcome to Semafor Tech, a twice-weekly newsletter from Reed Albergotti and me. In today’s edition, Semafor’s Executive Editor Gina Chua looks at how chatbot-assisted search engines could be improved, and I report on what Microsoft missed during early testing of its bot in India. Plus, some thoughts about whether social media is responsible for the uptick in mood disorders among teenage girls.

Reed will be back from vacation next week. In the meantime, we would be so grateful if you took a few minutes to answer our new reader survey. The feedback makes our reporting stronger and helps us focus on what you want to read about the most. Don’t be shy! We’re looking for constructive criticism.

Are you enjoying Semafor Tech? Help us spread the word!

↓

Move Fast/Break Things

➚ MOVE FAST: Transparency. TikTok built a new free API for outside researchers, making it much easier for academics to collect data from the platform and study how it works. The move won’t stop growing opposition to the Chinese-owned app in the U.S. and Europe, but it’s still a welcome gesture, especially after Twitter put its own API behind a paywall.

➘ BREAK THINGS: Secrecy. Companies can’t require laid-off workers to sign NDAs as part of their severance agreements, according to a new ruling by the U.S. National Labor Relations Board. That means we may be hearing a lot more trash talking from the thousands of tech workers who were recently let go.

Post Email

↓

Semafor Stat

The number of chip companies in China that pulled or canceled their licenses in 2022, up nearly 70% from the previous year, according to Chinese news outlet TMTPost. The global slowdown in the semiconductor industry is partly to blame, but the data suggests new U.S. limits on chip exports to the People’s Republic are also taking their toll.

Post Email

↓

Gina Chua

Why don’t search engines integrate chatbots in more helpful ways?

Getty/Jovelle Tamayo for The Washington Post

THE NEWS

Chatbot-infused information systems are not off to a good start.

Microsoft’s ChatGPT-assisted Bing Chat service is being throttled to reduce oddball replies, and Google’s experimental Bard system managed to bungle an answer in a marketing demo, costing the company billions in market value. (Bing got things wrong too.)

Tech behemoths — and the public — have been so focused on the chatbots’ ability to hold human-like conversations with users that the core purpose of a search engine, which is to find useful and ideally, accurate, information seems to have been overshadowed. Instead, the public has seized upon professions of love, angry denials of basic realities, and many more mundane “hallucinations” of incorrect facts.

GINA’S VIEW

It didn’t have to be this way.

At its heart, a search engine does — at least to lay users like me — three things: Take in a query (e.g., “how effective are COVID-19 vaccines?”) and turn it into a search term; hunt for information on the internet, make some kind of judgment about what’s credible; and then present it back to users. Sometimes that comes as a simple, authoritative answer (“The population of New York City was 8.468 million in 2021”) and sometimes as a list of links.

Google — the king of search engines — does that second part extremely well, thanks to PageRank and other proprietary algorithms that it’s developed over the decades; it’s doing better on the first part, although it’s still a long way away from providing a conversational interface.

And it does less well on the third part, often presenting a list of links that users have to plow through, although it’s getting better at synthesizing the information all the time. Chatbots, on the other hand, are terrible at the second thing — because, bluntly, they’re optimized for language output and not for fact-finding or fact-checking. When they try to aggregate disparate information into a single definitive answer, they often get things wrong, or “hallucinate.”

And the lack of citations or links in their authoritative-sounding answers means it’s nearly impossible to check the facts for yourself. On the other hand, the chatbots are pretty good at parsing language and generating language, because they’re, well, language models. Doh.

So why are tech companies enamored with integrating them into the entire search process — even the parts they’re not good at? Why not marry the two capabilities? Why not have a chatbot take a normal human question and turn that into a search term (that’s a language skill), have a link system for finding relevant web pages (that’s a search and ranking skill), and then use the chatbot to summarize them (another language skill)?

Which is what I tried to do.

I used Claude — an AI chatbot built by Anthropic, in which Google just invested $300 million — to ask a simple question: Did Donald Trump pay adult film star Stormy Daniels for her silence? (I couldn’t ask an up-to-date question, because Claude’s database doesn’t extend to the present day.)

Here’s what I got: First, I just asked a question and got the standard summary:

That’s a pretty decent response and is essentially accurate, at least as far as 2021 sources are concerned. But that’s because I already knew the answer. If I didn’t, how could I check? It provides no citations, offers no links, and really doesn’t give users a chance to verify the information for themselves.

So then I asked it for links to stories, as a Google search might have turned up:

That’s helpful — and I’m sure a dedicated search engine would have provided even better links. But that’s a lot of reading to do, and how would I know which ones to dig into? I asked it to summarize the articles it linked to:

Much easier to digest, and actually gives a sense of the issues surrounding the question.

What if we had simply skipped all those steps, and instead, my original query just returned those summaries, with links, not unlike a Google search, but with more useful answers that don’t require as much clicking and reading?

To put it another way, why do tech companies seem so intent on blowing up the entire search experience when incremental changes could yield significant improvements?

ROOM FOR DISAGREEMENT

Google has made a long list of iterative — and impressive — improvements to its search product over the years, in many cases focused on ensuring that pages surfaced are in fact authoritative and relevant, but also to better understand natural language queries that users type in.

It’s also improved the output, and many queries now return a list of likely questions and answers lifted from web pages verbatim, saving readers the effort of digging through a host of links. Most of its AI improvements lie under the hood, so to speak, rather than in the flashier user experiences that chatbots promise.

And Microsoft says it’s doing similar work, both to use language models to better understand queries as well as to generate summaries of information that its search engine technology surfaces, and including links and citations to sources.

As for questions where the data is clearly defined and constrained — airline fares, or prices for comparison shopping, for example — and where the purpose is less to discover nuanced ideas and insights and more to find specific information (booking a trip from A to Z on a given day), chatbots could significantly improve the search experience.

Post Email

↓

Evidence

Post Email

↓

Louise Matsakis

Early testers reported problems with Microsoft’s chatbot ‘Sydney’

THE NEWS

Microsoft’s new chatbot-assisted search service was spitting out bizarre and inaccurate replies long before it was released to the public earlier this month.

During company testing in India last year, it told a user she was “irrelevant and doomed. You are wasting your time and energy,” according to a post on a Microsoft forum, echoing the same kind of belligerent rhetoric people encountered when they tried the program in the U.S. over the last two weeks.

“I think this bot either needs to be removed or fixed completely,” wrote another user. A customer service representative tasked with answering questions on the forum seemed to have no idea what was going on. “I’m not quite sure I understand your post. Is this regarding a Bing issue?” she asked, referring to Microsoft’s search engine, which works in tandem with the chatbot.

Mikhail Parakhin, Microsoft’s CEO for advertising and web services, recently said he was unaware that early testers had these types of problems. “Sydney was running in several markets for a year with no one complaining,” he tweeted last Sunday, adding that there was “literally zero negative feedback of this type.” He later acknowledged that his team had apparently missed some cases in their analysis.

Sydney is the code name that Microsoft gave to its chatbot several years ago. The moniker recently resurfaced in the U.S. when New York Times journalist Kevin Roose wrote about his disturbing conversations with the program, during which it referred to itself as Sydney.

A spokesperson for Microsoft emphasized that the company was still testing the chatbot and collecting feedback from users about how it could be improved.

“We began testing a chat feature based on earlier models in India in late 2020. The insights we gathered as part of that have helped to inform our work with the new Bing preview,” they said in a statement.

Unsplash/Volodymyr Hryshchenko

LOUISE’S VIEW

It’s bewildering that Microsoft didn’t catch some of the more obvious flaws with its new chatbot before they became headlines. Parakhin and other executives can be forgiven for failing to read a few forum posts, but they should have been cognizant of the fact that chatbots have a tendency to go off the rails, especially when provoked by users.

In 2016, Microsoft released another bot on Twitter, which quickly began using the N-word and calling feminism a “cancer,” leading the software giant to take it offline after less than 24 hours.

Similar incidents have occurred regularly at other tech companies since then, to the point that stories about rogue chatbots have become a familiar trope in the industry. Most recently, Meta took down a chatbot trained on scientific research papers after it generated things like a fake study about the benefits of eating crushed glass.

In a statement last year, an AI research director at Meta noted that large language models — the type of technology newer chatbots use — have a propensity to “generate text that may appear authentic, but is inaccurate.” Given that well-documented reality, Microsoft should be wary about integrating them into tools like search engines, which millions of people have learned to instinctually trust.

That doesn’t mean tech companies need to take down their chatbots when they misbehave, or even try to censor many of their weirder outputs. But they should be loudly talking about the shortcomings of these programs and disclosing as much as possible about how they were created — ideas Microsoft itself has advocated in the past.

Instead, research shows Big Tech is becoming more closed in its approach to artificial intelligence research, guarding its breakthroughs like trade secrets. Microsoft didn’t initially reveal that it had tested its chatbot in India, nor what it might have found collecting feedback there. Sharing that kind basic information is the bare minimum to live up to one of Microsoft’s own responsible AI principles: transparency.

ROOM FOR DISAGREEMENT

Microsoft has historically been a corporate leader in the field of artificial intelligence safety and ethics. Last year, the company’s Office of Responsible AI released a 27-page document describing in detail how it would implement its principles throughout Microsoft’s products.

As part of that work, Brad Smith, Microsoft’s vice chairman and president, said the company thoroughly assessed the AI technology powering its new chatbot before it was released. “Our researchers, policy experts and engineering teams joined forces to study the potential harms of the technology, build bespoke measurement pipelines and iterate on effective mitigation strategies,” he said in a blog post. “Much of this work was without precedent and some of it challenged our existing thinking.”

Post Email

↓

Release Notes

Lemon8

TikTok’s parent company, ByteDance, released a new app called Lemon8 in the U.K. and U.S. and is paying creators to post on it, Insider reported. The platform appears to be a clone of the popular Chinese social media app Xiaohongshu, or Little Red Book.
WhatsApp is reportedly working on a newsletter feature. The news comes just months after the app’s parent company, Meta, shuttered a separate Facebook newsletter program featuring writers like Malcolm Gladwell and Mitch Albom.

Post Email

↓

Obsessions

Daria Nepriakhina/Unsplash

More than half (57%) of teenage girls in the U.S. now say they experience persistent feelings of sadness or hopelessness, up from 36% a decade ago, according to data released by the Centers for Disease Control and Prevention earlier this month. Social psychologist Jon Haidt argued in a Substack post that there’s an obvious cause for this alarming trend: social media.

He points to dozens of studies that found a link between the use of apps like Instagram and the rise of mood disorders among teenage girls. But I worry that the kind of quantitative, survey-focused research Haidt favors can’t tell the full story.

When I was a teenage girl in 2012 — the year Haidt estimates that the mental health epidemic among my age group began — I used social media for a wide range of activities that would be misleading to lump together, as many studies on the subject have. (That’s because it’s easier to collect survey responses than, say, monitor everything a teenager does on their smartphone.)

I did sometimes spend “hours a day taking and editing selfies,” or scrolling through posts from “fabulously wealthy female celebrities with (seemingly) vastly superior bodies and lives,” the only things Haidt imagines young women doing on social media. But I also talked to long-distance friends who provided enormous comfort, created and shared artwork, and read news articles.

I’m not arguing that Instagram or TikTok don’t often have a negative impact on teenage girls — I also regularly looked at pro-anorexia content on Tumblr and lost sleep to Facebook drama. But flattening the complex and varied digital lives of young women into the number of hours they spend on Instagram will never be enough to explain why so many of them are struggling.

—Louise

Post Email

↓

How Are We Doing?

Are you enjoying Semafor Tech? The more people read us, the better we’ll get. So please share it with your family, friends and colleagues to get those network effects rolling.

And hey, we can’t inform you on what’s happening in tech from inside your spam folder. Be sure to add reed.albergotti@semafor.com (you can always reach me by replying to these emails) and lmatsakis@semafor.com to your contacts. In Gmail, drag this newsletter over to your ‘Primary’ tab.

Thanks for reading.

Want more Semafor? Explore all our newsletters at semafor.com/newsletters

— Reed and Louise

Post Email

Technology

Why don’t search engines integrate chatbots in more helpful ways?

Early testers reported problems with Microsoft’s chatbot ‘Sydney’