
Examining how ChatGPT and other large language models (LLMs) are affecting scientific papers, Cornell University researchers found a mixed bag of results. While LLMs boost paper production—especially for non-native speakers—everyone from reviewers to funders to policymakers are having a hard time distinguishing real papers from “AI slop.”
For their study, published in Science, assistant professor of information science Yian Yin and his team looked at over 2 million papers posted between January 2018 and June 2024 on three popular online preprint websites—arXiv, bioRxiv and Social Science Research Network (SSRN).
The researchers compared presumably human-authored papers posted before 2023 to AI-written text to develop an “AI detector.” With this detector, they could identify which scientists were probably using the technology for writing, count how many papers they published before and after adopting AI, and see whether those papers were ultimately published in scientific journals.
The analysis showed a big AI-powered productivity bump. On the arXiv site, scientists who appeared to use LLMs posted about one-third more papers than scientists who weren’t getting an assist from AI. The increase was more than 50% for bioRxiv and SSRN.
Scientists whose first language is not English benefited the most from LLMs. Researchers from Asian institutions, for example, posted between 43.0% and 89.3% more papers after the AI detector indicated a switch to using LLMs compared with similar scientists not using the technology.
But, while LLMs make it easier for individuals to produce papers, they also make it harder for others to evaluate their quality. For human-written work, clear yet complex language—with big words and long sentences—is usually a reliable indicator of quality research. Across all three preprint sites, papers likely written by humans that scored high on a writing complexity test were most likely to be accepted to a scientific journal. But high-scoring papers probably written by LLMs were less likely to be accepted, suggesting that despite the convincing language, reviewers deemed many of these papers to have little scientific value.
While these findings are based on observations, the team hopes to perform causal analysis next.
Data from Cornell University