Doctor Versus Artificial Intelligence: Patient and Physician Evaluation of Large Language Model Responses to Rheumatology Patient Questions in a Cross-Sectional Study

IF 10.9 1区 医学 Q1 RHEUMATOLOGY Arthritis & Rheumatology Pub Date : 2023-10-30 DOI:10.1002/art.42737
Carrie Ye, Elric Zweck, Zechen Ma, Justin Smith, Steven Katz
{"title":"Doctor Versus Artificial Intelligence: Patient and Physician Evaluation of Large Language Model Responses to Rheumatology Patient Questions in a Cross-Sectional Study","authors":"Carrie Ye,&nbsp;Elric Zweck,&nbsp;Zechen Ma,&nbsp;Justin Smith,&nbsp;Steven Katz","doi":"10.1002/art.42737","DOIUrl":null,"url":null,"abstract":"<div>\n \n <section>\n \n <h3> Objective</h3>\n \n <p>The objective of the current study was to assess the quality of large language model (LLM) chatbot versus physician-generated responses to patient-generated rheumatology questions.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We conducted a single-center cross-sectional survey of rheumatology patients (n = 17) in Edmonton, Alberta, Canada. Patients evaluated LLM chatbot versus physician-generated responses for comprehensiveness and readability, with four rheumatologists also evaluating accuracy by using a Likert scale from 1 to 10 (1 being poor, 10 being excellent).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Patients rated no significant difference between artificial intelligence (AI) and physician-generated responses in comprehensiveness (mean 7.12 ± SD 0.99 vs 7.52 ± 1.16; <i>P =</i> 0.1962) or readability (7.90 ± 0.90 vs 7.80 ± 0.75; <i>P =</i> 0.5905). Rheumatologists rated AI responses significantly poorer than physician responses on comprehensiveness (AI 5.52 ± 2.13 vs physician 8.76 ± 1.07; <i>P</i> &lt; 0.0001), readability (AI 7.85 ± 0.92 vs physician 8.75 ± 0.57; <i>P =</i> 0.0003), and accuracy (AI 6.48 ± 2.07 vs physician 9.08 ± 0.64; <i>P</i> &lt; 0.0001). The proportion of preference to AI- versus physician-generated responses by patients and physicians was 0.45 ± 0.18 and 0.15 ± 0.08, respectively (<i>P =</i> 0.0106). After learning that one answer for each question was AI generated, patients were able to correctly identify AI-generated answers at a lower proportion compared to physicians (0.49 ± 0.26 vs 0.97 ± 0.04; <i>P =</i> 0.0183). The average word count of AI answers was 69.10 ± 25.35 words, as compared to 98.83 ± 34.58 words for physician-generated responses (<i>P =</i> 0.0008).</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Rheumatology patients rated AI-generated responses to patient questions similarly to physician-generated responses in terms of comprehensiveness, readability, and overall preference. However, rheumatologists rated AI responses significantly poorer than physician-generated responses, suggesting that LLM chatbot responses are inferior to physician responses, a difference that patients may not be aware of.</p>\n \n <div>\n <figure>\n <div><picture>\n <source></source></picture><p></p>\n </div>\n </figure>\n </div>\n </section>\n </div>","PeriodicalId":129,"journal":{"name":"Arthritis & Rheumatology","volume":"76 3","pages":"479-484"},"PeriodicalIF":10.9000,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/art.42737","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthritis & Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://acrjournals.onlinelibrary.wiley.com/doi/10.1002/art.42737","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

The objective of the current study was to assess the quality of large language model (LLM) chatbot versus physician-generated responses to patient-generated rheumatology questions.

Methods

We conducted a single-center cross-sectional survey of rheumatology patients (n = 17) in Edmonton, Alberta, Canada. Patients evaluated LLM chatbot versus physician-generated responses for comprehensiveness and readability, with four rheumatologists also evaluating accuracy by using a Likert scale from 1 to 10 (1 being poor, 10 being excellent).

Results

Patients rated no significant difference between artificial intelligence (AI) and physician-generated responses in comprehensiveness (mean 7.12 ± SD 0.99 vs 7.52 ± 1.16; P = 0.1962) or readability (7.90 ± 0.90 vs 7.80 ± 0.75; P = 0.5905). Rheumatologists rated AI responses significantly poorer than physician responses on comprehensiveness (AI 5.52 ± 2.13 vs physician 8.76 ± 1.07; P < 0.0001), readability (AI 7.85 ± 0.92 vs physician 8.75 ± 0.57; P = 0.0003), and accuracy (AI 6.48 ± 2.07 vs physician 9.08 ± 0.64; P < 0.0001). The proportion of preference to AI- versus physician-generated responses by patients and physicians was 0.45 ± 0.18 and 0.15 ± 0.08, respectively (P = 0.0106). After learning that one answer for each question was AI generated, patients were able to correctly identify AI-generated answers at a lower proportion compared to physicians (0.49 ± 0.26 vs 0.97 ± 0.04; P = 0.0183). The average word count of AI answers was 69.10 ± 25.35 words, as compared to 98.83 ± 34.58 words for physician-generated responses (P = 0.0008).

Conclusion

Rheumatology patients rated AI-generated responses to patient questions similarly to physician-generated responses in terms of comprehensiveness, readability, and overall preference. However, rheumatologists rated AI responses significantly poorer than physician-generated responses, suggesting that LLM chatbot responses are inferior to physician responses, a difference that patients may not be aware of.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
医生与人工智能:患者和医生对风湿病患者问题的大型语言模型反应的评估,一项横断面研究。
目的:评估LLM聊天机器人与医生对患者产生的风湿病问题的反应的质量。方法:我们对加拿大埃德蒙顿的风湿病患者(n=17)进行了一项单中心横断面调查。患者评估LLM聊天机器人与医生生成的反应的全面性和可读性,四名风湿病学家也使用1-10的Likert量表评估准确性(1分为差,10分为优)。结果:患者对人工智能与医生产生的反应在全面性(7.12±0.99 vs.7.52±1.16,p=0.1962)和可读性(7.90±0.90 vs..780±0.75,p=0.5905)方面没有显著差异。风湿病学家对人工智能的反应在综合性方面明显差于医生的反应(AI 5.52±2.13 vs.医生8.76±1.07,P结论:风湿病患者对患者问题的AI反应在全面性、可读性和整体偏好方面与医生产生的反应相似。然而,风湿病学家对AI反应的评分明显低于医生产生的回应,这表明LLM聊天机器人的反应不如医生患者可能没有意识到的反应。这篇文章受版权保护。保留所有权利。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Arthritis & Rheumatology
Arthritis & Rheumatology RHEUMATOLOGY-
CiteScore
20.90
自引率
3.00%
发文量
371
期刊介绍: Arthritis & Rheumatology is the official journal of the American College of Rheumatology and focuses on the natural history, pathophysiology, treatment, and outcome of rheumatic diseases. It is a peer-reviewed publication that aims to provide the highest quality basic and clinical research in this field. The journal covers a wide range of investigative areas and also includes review articles, editorials, and educational material for researchers and clinicians. Being recognized as a leading research journal in rheumatology, Arthritis & Rheumatology serves the global community of rheumatology investigators and clinicians.
期刊最新文献
Investigating the role of Type I Interferon Signaling on Muscle Disease using mouse models Spondyloarthritis preceding and following inflammatory bowel disease diagnosis and risk factors: a temporal trends analysis in a population‐based cohort A Metabolomic Signature Predicts Gout Flare Clinical Outcome Associated with Colchicine Prophylaxis Successful Spontaneous Pregnancies and Healthy Neonates After Dual‐Target CAR –T Cell Therapy in Systemic Lupus Erythematosus Efficacy and Safety of Guselkumab in Participants with Active Psoriatic Arthritis After Inadequate Response to One Prior Tumor Necrosis Factor Inhibitor: Week‐24 Results of the Phase 3, Randomized, Placebo‐Controlled SOLSTICE Study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1