Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review.

IF 7.2 Q1 ETHICS Research integrity and peer review Pub Date : 2023-05-18 DOI:10.1186/s41073-023-00133-5

Mohammad Hosseini, Serge P J M Horbach

{"title":"Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review.","authors":"Mohammad Hosseini, Serge P J M Horbach","doi":"10.1186/s41073-023-00133-5","DOIUrl":null,"url":null,"abstract":"Background: The emergence of systems based on large language models (LLMs) such as OpenAI's ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks.Methods: To investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers' role, 2) editors' role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT's performance regarding identified issues.Results: LLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs' training data, inner workings, data handling, and development processes raise concerns about potential biases, confidentiality and the reproducibility of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in a short period and expect LLMs to continue developing.Conclusions: We believe that LLMs are likely to have a profound impact on academia and scholarly communication. While potentially beneficial to the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews and decision letters, reviewers and editors should disclose their use and accept full responsibility for data security and confidentiality, and their reports' accuracy, tone, reasoning and originality.","PeriodicalId":74682,"journal":{"name":"Research integrity and peer review","volume":"8 1","pages":"4"},"PeriodicalIF":7.2000,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191680/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research integrity and peer review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41073-023-00133-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ETHICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The emergence of systems based on large language models (LLMs) such as OpenAI's ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks.

Methods: To investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers' role, 2) editors' role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT's performance regarding identified issues.

Results: LLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs' training data, inner workings, data handling, and development processes raise concerns about potential biases, confidentiality and the reproducibility of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in a short period and expect LLMs to continue developing.

Conclusions: We believe that LLMs are likely to have a profound impact on academia and scholarly communication. While potentially beneficial to the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews and decision letters, reviewers and editors should disclose their use and accept full responsibility for data security and confidentiality, and their reports' accuracy, tone, reasoning and originality.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

消除审稿人疲劳还是放大偏见？在学术同行评审中使用 ChatGPT 和其他大型语言模型的考虑因素和建议。

背景：基于大型语言模型（LLM）的系统（如 OpenAI 的 ChatGPT）的出现在学术界引起了一系列讨论。由于 LLMs 可以根据所提供的提示生成语法正确且大多相关（但有时也会完全错误、不相关或有偏见）的输出结果，因此在包括撰写同行评议报告在内的各种写作任务中使用 LLMs 可以提高工作效率。鉴于同行评议在现有学术出版物中的重要性，探索在同行评议中使用法律硕士的挑战和机遇似乎迫在眉睫。在利用 LLM 生成第一批学术成果之后，我们预计同行评审报告也将在这些系统的帮助下生成。然而，目前还没有关于如何在评审任务中使用这些系统的指南：为了研究使用 LLM 对同行评审过程的潜在影响，我们使用了 Tennant 和 Ross-Hellauer 提出的同行评审讨论中的五个核心主题。这些主题包括：1）审稿人的角色；2）编辑的角色；3）同行评审的功能和质量；4）可复制性；5）同行评审的社会和认识功能。我们对 ChatGPT 在上述问题上的表现进行了小规模的探讨：结果：LLM 有可能大大改变同行评审员和编辑的角色。通过支持这两个角色高效地撰写建设性报告或决定书，LLM 可以促进更高质量的评审，并解决评审不足的问题。然而，法律硕士的培训数据、内部运作、数据处理和开发过程从根本上是不透明的，这引发了人们对潜在偏见、保密性和审稿报告可重复性的担忧。此外，由于编辑工作在定义和塑造认识论社群以及在这些社群中协商规范性框架方面具有突出作用，因此将这项工作部分外包给法学硕士可能会对学术界内部的社会和认识论关系产生不可预见的后果。在绩效方面，我们在短期内发现了重大改进，并期待法律硕士继续发展：我们认为，法律硕士可能会对学术界和学术交流产生深远影响。虽然可能对学术交流系统有益，但仍存在许多不确定因素，而且其使用并非没有风险。特别是对现有偏见的放大和在使用适当基础设施方面的不平等的担忧，值得进一步关注。目前，我们建议，如果使用 LLM 撰写学术评论和决定信，审稿人和编辑应披露其使用情况，并对数据的安全性和保密性以及报告的准确性、语气、推理和原创性承担全部责任。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Research integrity and peer review

自引率

0.00%

发文量

审稿时长

5 weeks