Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source

IF 2.1 3区医学 Q2 SURGERY International Journal of Medical Robotics and Computer Assisted Surgery Pub Date : 2024-02-13 DOI:10.1002/rcs.2621

Kyle W. Lawrence, Akram A. Habibi, Spencer A. Ward, Claudette M. Lajam, Ran Schwarzkopf, Joshua C. Rozell

{"title":"Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source","authors":"Kyle W. Lawrence, Akram A. Habibi, Spencer A. Ward, Claudette M. Lajam, Ran Schwarzkopf, Joshua C. Rozell","doi":"10.1002/rcs.2621","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Large language models (LLM) have unknown implications for medical research. This study assessed whether LLM-generated abstracts are distinguishable from human-written abstracts and to compare their perceived quality.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>The LLM ChatGPT was used to generate 20 arthroplasty abstracts (AI-generated) based on full-text manuscripts, which were compared to originally published abstracts (human-written). Six blinded orthopaedic surgeons rated abstracts on overall quality, communication, and confidence in the authorship source. Authorship-confidence scores were compared to a test value representing complete inability to discern authorship.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Modestly increased confidence in human authorship was observed for human-written abstracts compared with AI-generated abstracts (<i>p</i> = 0.028), though AI-generated abstract authorship-confidence scores were statistically consistent with inability to discern authorship (<i>p</i> = 0.999). Overall abstract quality was higher for human-written abstracts (<i>p</i> = 0.019).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>AI-generated abstracts' absolute authorship-confidence ratings demonstrated difficulty in discerning authorship but did not achieve the perceived quality of human-written abstracts. Caution is warranted in implementing LLMs into scientific writing.</p>\n </section>\n </div>","PeriodicalId":50311,"journal":{"name":"International Journal of Medical Robotics and Computer Assisted Surgery","volume":"20 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Robotics and Computer Assisted Surgery","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rcs.2621","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Large language models (LLM) have unknown implications for medical research. This study assessed whether LLM-generated abstracts are distinguishable from human-written abstracts and to compare their perceived quality.

Methods

The LLM ChatGPT was used to generate 20 arthroplasty abstracts (AI-generated) based on full-text manuscripts, which were compared to originally published abstracts (human-written). Six blinded orthopaedic surgeons rated abstracts on overall quality, communication, and confidence in the authorship source. Authorship-confidence scores were compared to a test value representing complete inability to discern authorship.

Results

Modestly increased confidence in human authorship was observed for human-written abstracts compared with AI-generated abstracts (p = 0.028), though AI-generated abstract authorship-confidence scores were statistically consistent with inability to discern authorship (p = 0.999). Overall abstract quality was higher for human-written abstracts (p = 0.019).

Conclusions

AI-generated abstracts' absolute authorship-confidence ratings demonstrated difficulty in discerning authorship but did not achieve the perceived quality of human-written abstracts. Caution is warranted in implementing LLMs into scientific writing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人工与人工智能生成的关节成形术文献：对感知交流、质量和作者来源的单盲分析。

背景：大语言模型（LLM）对医学研究的影响尚不可知。本研究评估了 LLM 生成的摘要是否能与人工撰写的摘要区分开来，并比较了它们的感知质量：方法：使用 LLM ChatGPT 根据全文手稿生成 20 篇关节成形术摘要（人工智能生成），并将其与最初发表的摘要（人工撰写）进行比较。六位双盲骨科外科医生对摘要的整体质量、沟通性和作者来源的可信度进行评分。作者信心得分与代表完全无法辨别作者的测试值进行比较：结果：与人工智能生成的摘要相比，人工撰写的摘要对人类作者的信任度略有提高（p = 0.028），但人工智能生成的摘要作者信任度得分与无法辨别作者身份在统计学上是一致的（p = 0.999）。人工撰写的摘要总体质量更高（p = 0.019）：结论：人工智能生成的摘要的绝对作者身份置信度评分表明难以辨别作者身份，但没有达到人工撰写摘要的感知质量。在科学写作中使用 LLMs 时需要谨慎。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Medical Robotics and Computer Assisted Surgery 医学-外科

CiteScore

4.50

自引率

12.00%

发文量

131

审稿时长

6-12 weeks

期刊介绍： The International Journal of Medical Robotics and Computer Assisted Surgery provides a cross-disciplinary platform for presenting the latest developments in robotics and computer assisted technologies for medical applications. The journal publishes cutting-edge papers and expert reviews, complemented by commentaries, correspondence and conference highlights that stimulate discussion and exchange of ideas. Areas of interest include robotic surgery aids and systems, operative planning tools, medical imaging and visualisation, simulation and navigation, virtual reality, intuitive command and control systems, haptics and sensor technologies. In addition to research and surgical planning studies, the journal welcomes papers detailing clinical trials and applications of computer-assisted workflows and robotic systems in neurosurgery, urology, paediatric, orthopaedic, craniofacial, cardiovascular, thoraco-abdominal, musculoskeletal and visceral surgery. Articles providing critical analysis of clinical trials, assessment of the benefits and risks of the application of these technologies, commenting on ease of use, or addressing surgical education and training issues are also encouraged. The journal aims to foster a community that encompasses medical practitioners, researchers, and engineers and computer scientists developing robotic systems and computational tools in academic and commercial environments, with the intention of promoting and developing these exciting areas of medical technology.