Comparison of the Usability and Reliability of Answers to Clinical Questions: AI-Generated ChatGPT versus a Human-Authored Resource.

IF 1 4区 医学 Q3 MEDICINE, GENERAL & INTERNAL Southern Medical Journal Pub Date : 2024-08-01 DOI:10.14423/SMJ.0000000000001715
Farrin A Manian, Katherine Garland, Jimin Ding
{"title":"Comparison of the Usability and Reliability of Answers to Clinical Questions: AI-Generated ChatGPT versus a Human-Authored Resource.","authors":"Farrin A Manian, Katherine Garland, Jimin Ding","doi":"10.14423/SMJ.0000000000001715","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Our aim was to compare the usability and reliability of answers to clinical questions posed of Chat-Generative Pre-Trained Transformer (ChatGPT) compared to those of a human-authored Web source (www.Pearls4Peers.com) in response to \"real-world\" clinical questions raised during the care of patients.</p><p><strong>Methods: </strong>Two domains of clinical information quality were studied: usability, based on organization/readability, relevance, and usefulness, and reliability, based on clarity, accuracy, and thoroughness. The top 36 most viewed real-world questions from a human-authored Web site (www.Pearls4Peers.com [P4P]) were posed to ChatGPT 3.5. Anonymized answers by ChatGPT and P4P (without literature citations) were separately assessed for usability by 18 practicing physicians (\"clinician users\") in triplicate and for reliability by 21 expert providers (\"content experts\") on a Likert scale (\"definitely yes,\" \"generally yes,\" or \"no\") in duplicate or triplicate. Participants also directly compared the usability and reliability of paired answers.</p><p><strong>Results: </strong>The usability and reliability of ChatGPT answers varied widely depending on the question posed. ChatGPT answers were not considered useful or accurate in 13.9% and 13.1% of cases, respectively. In within-individual rankings for usability, ChatGPT was inferior to P4P in organization/readability, relevance, and usefulness in 29.6%, 28.3%, and 29.6% of cases, respectively, and for reliability, inferior to P4P in clarity, accuracy, and thoroughness in 38.1%, 34.5%, and 31% of cases, respectively.</p><p><strong>Conclusions: </strong>The quality of ChatGPT responses to real-world clinical questions varied widely, with nearly one-third or more answers considered inferior to a human-authored source in several aspects of usability and reliability. Caution is advised when using ChatGPT in clinical decision making.</p>","PeriodicalId":22043,"journal":{"name":"Southern Medical Journal","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Southern Medical Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.14423/SMJ.0000000000001715","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Our aim was to compare the usability and reliability of answers to clinical questions posed of Chat-Generative Pre-Trained Transformer (ChatGPT) compared to those of a human-authored Web source (www.Pearls4Peers.com) in response to "real-world" clinical questions raised during the care of patients.

Methods: Two domains of clinical information quality were studied: usability, based on organization/readability, relevance, and usefulness, and reliability, based on clarity, accuracy, and thoroughness. The top 36 most viewed real-world questions from a human-authored Web site (www.Pearls4Peers.com [P4P]) were posed to ChatGPT 3.5. Anonymized answers by ChatGPT and P4P (without literature citations) were separately assessed for usability by 18 practicing physicians ("clinician users") in triplicate and for reliability by 21 expert providers ("content experts") on a Likert scale ("definitely yes," "generally yes," or "no") in duplicate or triplicate. Participants also directly compared the usability and reliability of paired answers.

Results: The usability and reliability of ChatGPT answers varied widely depending on the question posed. ChatGPT answers were not considered useful or accurate in 13.9% and 13.1% of cases, respectively. In within-individual rankings for usability, ChatGPT was inferior to P4P in organization/readability, relevance, and usefulness in 29.6%, 28.3%, and 29.6% of cases, respectively, and for reliability, inferior to P4P in clarity, accuracy, and thoroughness in 38.1%, 34.5%, and 31% of cases, respectively.

Conclusions: The quality of ChatGPT responses to real-world clinical questions varied widely, with nearly one-third or more answers considered inferior to a human-authored source in several aspects of usability and reliability. Caution is advised when using ChatGPT in clinical decision making.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
比较临床问题答案的可用性和可靠性:人工智能生成的 ChatGPT 与人工撰写的资源的可用性和可靠性比较。
研究目的我们的目的是比较聊天生成式预培训转换器(ChatGPT)与人工撰写的网络资源(www.Pearls4Peers.com)在回答患者护理过程中提出的 "真实世界 "临床问题时所提供的临床问题答案的可用性和可靠性。方法:我们研究了临床信息质量的两个方面:基于组织/可读性、相关性和有用性的可用性以及基于清晰度、准确性和全面性的可靠性:研究了临床信息质量的两个方面:基于组织/可读性、相关性和有用性的可用性,以及基于清晰度、准确性和全面性的可靠性。我们向 ChatGPT 3.5 提出了一个由人类撰写的网站(www.Pearls4Peers.com [P4P])上浏览量最高的 36 个真实世界问题。ChatGPT 和 P4P 的匿名答案(不含文献引用)分别由 18 名执业医师("临床医师用户")进行一式三份的可用性评估,以及由 21 名专家提供者("内容专家")以李克特量表("肯定是"、"一般是 "或 "否")进行一式两份或三份的可靠性评估。参与者还直接比较了配对答案的可用性和可靠性:结果:ChatGPT 答案的可用性和可靠性因问题的不同而有很大差异。分别有 13.9% 和 13.1% 的案例认为 ChatGPT 答案不实用或不准确。在个人可用性排名中,分别有 29.6%、28.3% 和 29.6% 的人认为 ChatGPT 在条理/可读性、相关性和有用性方面不如 P4P;在可靠性排名中,分别有 38.1%、34.5% 和 31% 的人认为 ChatGPT 在清晰度、准确性和全面性方面不如 P4P:结论:ChatGPT 对真实世界临床问题的回答质量差异很大,近三分之一或更多的回答在可用性和可靠性的几个方面被认为不如人工撰写的答案。在临床决策中使用 ChatGPT 时应谨慎。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Southern Medical Journal
Southern Medical Journal 医学-医学:内科
CiteScore
1.40
自引率
9.10%
发文量
222
审稿时长
4-8 weeks
期刊介绍: As the official journal of the Birmingham, Alabama-based Southern Medical Association (SMA), the Southern Medical Journal (SMJ) has for more than 100 years provided the latest clinical information in areas that affect patients'' daily lives. Now delivered to individuals exclusively online, the SMJ has a multidisciplinary focus that covers a broad range of topics relevant to physicians and other healthcare specialists in all relevant aspects of the profession, including medicine and medical specialties, surgery and surgery specialties; child and maternal health; mental health; emergency and disaster medicine; public health and environmental medicine; bioethics and medical education; and quality health care, patient safety, and best practices. Each month, articles span the spectrum of medical topics, providing timely, up-to-the-minute information for both primary care physicians and specialists. Contributors include leaders in the healthcare field from across the country and around the world. The SMJ enables physicians to provide the best possible care to patients in this age of rapidly changing modern medicine.
期刊最新文献
A Qualitative Study of Transportation-Related Barriers to HIV Care in South Carolina. Association of Socioeconomic Variables with Primary Cesarean Section. Exploring Bias in Health Care: Using Art to Facilitate a Narrative Medicine Approach among Third-Year Medical Students. Fit Testing Failure of Reprocessed "Duckbill"-Type N95 Masks. Improving Hypertension and Diabetes Mellitus Control with a Dedicated Patient Navigator.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1