Reviewer Experience Detecting and Judging Human Versus Artificial Intelligence Content: The Stroke Journal Essay Contest.

IF 7.8 1区 医学 Q1 CLINICAL NEUROLOGY Stroke Pub Date : 2024-10-01 Epub Date: 2024-09-03 DOI:10.1161/STROKEAHA.124.045012
Gisele S Silva, Rohan Khera, Lee H Schwamm
{"title":"Reviewer Experience Detecting and Judging Human Versus Artificial Intelligence Content: The <i>Stroke</i> Journal Essay Contest.","authors":"Gisele S Silva, Rohan Khera, Lee H Schwamm","doi":"10.1161/STROKEAHA.124.045012","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence (AI) large language models (LLMs) now produce human-like general text and images. LLMs' ability to generate persuasive scientific essays that undergo evaluation under traditional peer review has not been systematically studied. To measure perceptions of quality and the nature of authorship, we conducted a competitive essay contest in 2024 with both human and AI participants. Human authors and 4 distinct LLMs generated essays on controversial topics in stroke care and outcomes research. A panel of <i>Stroke</i> Editorial Board members (mostly vascular neurologists), blinded to author identity and with varying levels of AI expertise, rated the essays for quality, persuasiveness, best in topic, and author type. Among 34 submissions (22 human and 12 LLM) scored by 38 reviewers, human and AI essays received mostly similar ratings, though AI essays were rated higher for composition quality. Author type was accurately identified only 50% of the time, with prior LLM experience associated with improved accuracy. In multivariable analyses adjusted for author attributes and essay quality, only persuasiveness was independently associated with odds of a reviewer assigning AI as author type (adjusted odds ratio, 1.53 [95% CI, 1.09-2.16]; <i>P</i>=0.01). In conclusion, a group of experienced editorial board members struggled to distinguish human versus AI authorship, with a bias against best in topic for essays judged to be AI generated. Scientific journals may benefit from educating reviewers on the types and uses of AI in scientific writing and developing thoughtful policies on the appropriate use of AI in authoring manuscripts.</p>","PeriodicalId":21989,"journal":{"name":"Stroke","volume":null,"pages":null},"PeriodicalIF":7.8000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529699/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stroke","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1161/STROKEAHA.124.045012","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) large language models (LLMs) now produce human-like general text and images. LLMs' ability to generate persuasive scientific essays that undergo evaluation under traditional peer review has not been systematically studied. To measure perceptions of quality and the nature of authorship, we conducted a competitive essay contest in 2024 with both human and AI participants. Human authors and 4 distinct LLMs generated essays on controversial topics in stroke care and outcomes research. A panel of Stroke Editorial Board members (mostly vascular neurologists), blinded to author identity and with varying levels of AI expertise, rated the essays for quality, persuasiveness, best in topic, and author type. Among 34 submissions (22 human and 12 LLM) scored by 38 reviewers, human and AI essays received mostly similar ratings, though AI essays were rated higher for composition quality. Author type was accurately identified only 50% of the time, with prior LLM experience associated with improved accuracy. In multivariable analyses adjusted for author attributes and essay quality, only persuasiveness was independently associated with odds of a reviewer assigning AI as author type (adjusted odds ratio, 1.53 [95% CI, 1.09-2.16]; P=0.01). In conclusion, a group of experienced editorial board members struggled to distinguish human versus AI authorship, with a bias against best in topic for essays judged to be AI generated. Scientific journals may benefit from educating reviewers on the types and uses of AI in scientific writing and developing thoughtful policies on the appropriate use of AI in authoring manuscripts.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
审稿人检测和评判人工智能内容的经验:中风杂志》征文比赛。
人工智能(AI)大语言模型(LLMs)现在可以生成类似人类的一般文本和图像。对于 LLM 生成有说服力的科学论文并接受传统同行评审的能力,我们还没有进行过系统的研究。为了衡量人们对论文质量和作者性质的看法,我们在 2024 年举办了一场由人类和人工智能共同参与的论文竞赛。人类作者和 4 位不同的 LLM 就中风护理和结果研究中的争议性话题撰写论文。由中风编辑委员会成员(大多为血管神经学家)组成的小组对作者身份进行了盲审,并根据文章的质量、说服力、最佳主题和作者类型进行了不同程度的人工智能专业知识评分。在由 38 位评审员打分的 34 篇论文(22 篇人类论文和 12 篇 LLM 论文)中,人类论文和人工智能论文获得的评分基本相似,但人工智能论文的作文质量评分更高。作者类型的准确识别率仅为 50%,而先前的法学硕士经历与准确率的提高有关。在对作者属性和论文质量进行调整的多变量分析中,只有说服力与审稿人将人工智能定为作者类型的几率有独立关联(调整后的几率比为 1.53 [95% CI, 1.09-2.16];P=0.01)。总之,一组经验丰富的编委会成员在区分人类作者和人工智能作者时遇到了困难,他们对被判定为人工智能论文的最佳主题存在偏见。对审稿人进行有关人工智能在科学写作中的类型和用途的教育,并制定有关在撰写稿件时适当使用人工智能的深思熟虑的政策,可能会使科学期刊受益匪浅。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Stroke
Stroke 医学-临床神经学
CiteScore
13.40
自引率
6.00%
发文量
2021
审稿时长
3 months
期刊介绍: Stroke is a monthly publication that collates reports of clinical and basic investigation of any aspect of the cerebral circulation and its diseases. The publication covers a wide range of disciplines including anesthesiology, critical care medicine, epidemiology, internal medicine, neurology, neuro-ophthalmology, neuropathology, neuropsychology, neurosurgery, nuclear medicine, nursing, radiology, rehabilitation, speech pathology, vascular physiology, and vascular surgery. The audience of Stroke includes neurologists, basic scientists, cardiologists, vascular surgeons, internists, interventionalists, neurosurgeons, nurses, and physiatrists. Stroke is indexed in Biological Abstracts, BIOSIS, CAB Abstracts, Chemical Abstracts, CINAHL, Current Contents, Embase, MEDLINE, and Science Citation Index Expanded.
期刊最新文献
Anti-Inflammatory Thrombolytic JX10 (TMS-007) in Late Presentation of Acute Ischemic Stroke. Menstruation: An Important Indicator for Assessing Stroke Risk and Its Outcomes. Can Genetics Improve Prediction of Poststroke Epilepsy? Care Quality and Outcomes of Ischemic Stroke in Patients With Premorbid Dementia: Get With The Guidelines-Stroke Registry. Genetic Testing for Monogenic Stroke.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1