Researchers warned against using AI to peer review academic papers

Events Newsletters

About Speakers Bureau Careers Podcast

D.C.
BXL
Lagos
Dubai
Beijing
SG

D.C.
BXL
Lagos

Dubai
Beijing
SG

Events Newsletters

IntelligentTransparentGlobal

May 8, 2024


Technology

Reed Albergotti

Hi, and welcome back to Semafor Tech.

If you asked most people whether we should use large language models to conduct peer reviews of academic papers, you’d probably get a unanimous NO.

But as Katyanna reports below, there’s some academic research that shows they are being used in this way, and conference organizers and others are beginning to consider specifically banning AI’s use in peer reviews.

But the story may not end there. While LLMs are pretty unreliable today, some improvements in capabilities in the next round of training could make them more reliable than humans — especially humans who wait until the last minute to review articles.

Sometimes, when focusing on all the ways AI can let us down, we neglect to ask the same questions of humans — to analyze their success and failure rates with the same level of scrutiny. Academia has been plagued with very human problems, from pay-to-play scandals, to outright cheating, to cases of humans simply missing glaring issues in research.

While the peer review discussion happens, foundation models are crunching through vast troves of academic papers in search of new discoveries, patterns and scientific leads that humans might have missed. One of the big limitations of this kind of AI-enabled research is that so much of what’s published in academia is questionable. In other words, the ability to peer review already-published papers to determine their validity may be a skill that LLMs need to learn, regardless of whether they are used to conduct official reviews.

Prohibiting the use of LLMs for peer review seems like a good policy, for now. It may not be in the future. Read below for more on the topic.

↓

Move Fast/Break Things

➚ MOVE FAST: Partnering. Reddit’s quarterly revenue jumped nearly 50%, with the help of a licensing deal to make its content available to train Google’s AI models. And Microsoft is teaming up with Gateway Technical College in Wisconsin to train 1,000 people for data center work as it boosts its tech infrastructure there.

➘ BREAK THINGS: Decoupling. The US-China tech cold war is heating up, with the Biden administration mulling an export ban on advanced closed-source AI models like OpenAI’s GPT-4. Meanwhile, TikTok has sued the US, claiming that the law forcing it to separate from its parent company ByteDance or be banned is unconstitutional because it violates the First Amendment.

Mandel Ngan/AFP via Getty Images

Post Email

↓

Artificial Flavor

Nick Knight/Courtesy of The Metropolitan Museum of Art

It’s Met Gala season and the red carpet now includes AI chatbots. The Met used GPT-4 to impersonate Natalie Potter, a 1930s New York socialite whose dress is featured in this year’s exhibits. Visitors to a website featuring her will be able to chat with her through text.

The gimmick is a fun mix of AI and pop culture. Though Andrew Bolton, chief curator of the Met’s Costume Institute, told the Wall Street Journal an early version was a bit over the top, starting responses with, “Ah, my dear, let me explain.”

But it’s also a fairly obvious application of large language models that you can imagine will be popping up everywhere. In a lot of cases, it probably wouldn’t even require a museum’s participation. Users could just ask ChatGPT to pretend to be any historical figure. You could see this being incorporated into a device like Meta’s Ray-Ban Smart Glasses that can capture images and then explain what they are.

Post Email

↓

Katyanna Quach

Researchers warned against using AI to peer review academic papers

JHU Sheridan Libraries/Gado/Getty Images

THE SCOOP

Researchers should not be using tools like ChatGPT to automatically peer review papers, warned organizers of top AI conferences and academic publishers worried about maintaining intellectual integrity.

With recent advances in large language models, researchers have been increasingly using them to write peer reviews — a time-honored academic tradition that examines new research and assesses its merits, showing a person’s work has been vetted by other experts in the field.

That’s why asking ChatGPT to analyze manuscripts and critique the research, without having read the papers, would undermine the peer review process. To tackle the problem, AI and machine learning conferences are now thinking about updating their policies, as some guidelines don’t explicitly ban the use of AI to process manuscripts, and the language can be fuzzy.

The Conference and Workshop on Neural Information Processing Systems (NeurIPS) is considering setting up a committee to determine whether it should update its policies around using LLMs for peer review, a spokesperson told Semafor.

At NeurIPS, researchers should not “share submissions with anyone without prior approval” for example, while the ethics code at the International Conference on Learning Representations (ICLR), whose annual confab kicked off Tuesday, states that “LLMs are not eligible for authorship.” Representatives from NeurIPS and ICLR said “anyone” includes AI, and that authorship covers both papers and peer review comments.

A spokesperson for Springer Nature, an academic publishing company best known for its top research journal Nature, said that experts are required to evaluate research and leaving it to AI is risky. “Peer reviewers are accountable for the accuracy and views expressed in their reports and their expert evaluations help ensure the integrity, reproducibility and quality of the scientific record,” they said. “Their in-depth knowledge and expertise is irreplaceable and despite rapid progress, generative AI tools can lack up-to-date knowledge and may produce nonsensical, biased or false information.”

Other major scientific publishing companies such as Taylor & Francis and Sage told Semafor they prohibit reviewers from using AI, citing concerns like transparency and confidentiality.

KATYANNA’S VIEW

It’s not surprising that more researchers are turning to AI in a rush to meet deadlines on top of their already demanding workloads. It’s more acceptable to use it to improve writing and thinking, but less so when it’s being used to replace having to do any real work. I’d be annoyed if I spent time and effort working on a research paper only to be rejected by a machine and have no one read it.

Been Kim, the general chair for this year’s ICLR conference and a research scientist at Google DeepMind, told me that no formal complaints have been filed by researchers annoyed about LLMs reviewing their work. But conferences should be vigilant and more explicit in their policies around using AI for academic writing. It’s difficult to crack down on inappropriate usage of LLMs since it’s tricky to determine whether something is AI or human-written. But if the technology continues to degrade the research process, public trust in academia will weaken, too.

Why academics fear their research is being reviewed by AI. →

Post Email

↓

Everything is Predictable

A captivating look at Bayes’ Theorem, Everything is Predictable explores how this centuries-old theorem influences every aspect of our modern lives. Tom Chivers unveils the curious story of a Presbyterian minister whose brainchild impacts everything from medicine to law and even artificial intelligence. Everything is Predictable is available today on all book-buying platforms.

Post Email

↓

What We’re Tracking

Google DeepMind and spinoff biotech firm Isomorphic Labs announced AlphaFold 3 Wednesday. This is sort of like the biotech equivalent of a new Playstation console coming out. People who study human biology for a living are going to be pretty excited.

AlphaFold, which predicted the shape of human proteins based on their amino acid sequences, was a big leap and the announcement represents the third major iteration of a technology that has quickly been adopted by drug discovery companies.

The new version uses diffusion architecture (the technology behind Stable Diffusion’s image generator). It’s a lot more powerful and has more capabilities beyond simple protein predictions. But as the paper, published in Nature, points out, it comes with some of the drawbacks that generative AI is known for: for instance, hallucinations.

The company also announced AlphaFold Server, a free non-commercial tool that resource-strapped researchers can use to make new discoveries in biology.

Post Email

↓

Obsessions

Alexis Rosenfeld/Getty Images

A team of researchers led by the Massachusetts Institute of Technology have figured out a “sperm whale phonetic alphabet” by using machine learning algorithms to analyze conversations between the animals. Sperm whales talk to one another by making click sounds using their respiratory systems. These noises, dubbed coda, vary in rhythm and tempo.

It’s difficult to decode the meaning of their speech, but the algorithms show there are patterns that can be classified. They may get a better understanding of what sperm whales are saying by mapping the sounds to their behaviors and interactions. The most interesting part of the study, however, is that trying to decipher the language of another species using this kind of technology could maybe help us communicate with aliens one day.

— Katyanna Quach

Post Email

↓

Quotable

“We have not yet seen the predicted flood in misinformation. We still might.”

— OpenAI CEO Sam Altman in a conversation with the Brookings Institution this week. He said that he hasn’t seen evidence of bad actors launching sophisticated AI misinformation campaigns to meddle with elections yet.

Post Email

↓

Hot on Semafor

The world has passed a turning point in the history of energy.
The great Republican counter-protest of 2024.
Nuclear energy becomes a geopolitical flashpoint as Xi tours Europe.

Post Email