Vignette-based comparative analysis of ChatGPT and specialist treatment decisions for rheumatic patients: results of the Rheum2Guide study.

IF 3.2 3区 医学 Q2 RHEUMATOLOGY Rheumatology International Pub Date : 2024-10-01 Epub Date: 2024-08-10 DOI:10.1007/s00296-024-05675-5
Hannah Labinsky, Lea-Kristin Nagler, Martin Krusche, Sebastian Griewing, Peer Aries, Anja Kroiß, Patrick-Pascal Strunz, Sebastian Kuhn, Marc Schmalzing, Michael Gernert, Johannes Knitza
{"title":"Vignette-based comparative analysis of ChatGPT and specialist treatment decisions for rheumatic patients: results of the Rheum2Guide study.","authors":"Hannah Labinsky, Lea-Kristin Nagler, Martin Krusche, Sebastian Griewing, Peer Aries, Anja Kroiß, Patrick-Pascal Strunz, Sebastian Kuhn, Marc Schmalzing, Michael Gernert, Johannes Knitza","doi":"10.1007/s00296-024-05675-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The complex nature of rheumatic diseases poses considerable challenges for clinicians when developing individualized treatment plans. Large language models (LLMs) such as ChatGPT could enable treatment decision support.</p><p><strong>Objective: </strong>To compare treatment plans generated by ChatGPT-3.5 and GPT-4 to those of a clinical rheumatology board (RB).</p><p><strong>Design/methods: </strong>Fictional patient vignettes were created and GPT-3.5, GPT-4, and the RB were queried to provide respective first- and second-line treatment plans with underlying justifications. Four rheumatologists from different centers, blinded to the origin of treatment plans, selected the overall preferred treatment concept and assessed treatment plans' safety, EULAR guideline adherence, medical adequacy, overall quality, justification of the treatment plans and their completeness as well as patient vignette difficulty using a 5-point Likert scale.</p><p><strong>Results: </strong>20 fictional vignettes covering various rheumatic diseases and varying difficulty levels were assembled and a total of 160 ratings were assessed. In 68.8% (110/160) of cases, raters preferred the RB's treatment plans over those generated by GPT-4 (16.3%; 26/160) and GPT-3.5 (15.0%; 24/160). GPT-4's plans were chosen more frequently for first-line treatments compared to GPT-3.5. No significant safety differences were observed between RB and GPT-4's first-line treatment plans. Rheumatologists' plans received significantly higher ratings in guideline adherence, medical appropriateness, completeness and overall quality. Ratings did not correlate with the vignette difficulty. LLM-generated plans were notably longer and more detailed.</p><p><strong>Conclusion: </strong>GPT-4 and GPT-3.5 generated safe, high-quality treatment plans for rheumatic diseases, demonstrating promise in clinical decision support. Future research should investigate detailed standardized prompts and the impact of LLM usage on clinical decisions.</p>","PeriodicalId":21322,"journal":{"name":"Rheumatology International","volume":" ","pages":"2043-2053"},"PeriodicalIF":3.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11392980/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rheumatology International","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00296-024-05675-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The complex nature of rheumatic diseases poses considerable challenges for clinicians when developing individualized treatment plans. Large language models (LLMs) such as ChatGPT could enable treatment decision support.

Objective: To compare treatment plans generated by ChatGPT-3.5 and GPT-4 to those of a clinical rheumatology board (RB).

Design/methods: Fictional patient vignettes were created and GPT-3.5, GPT-4, and the RB were queried to provide respective first- and second-line treatment plans with underlying justifications. Four rheumatologists from different centers, blinded to the origin of treatment plans, selected the overall preferred treatment concept and assessed treatment plans' safety, EULAR guideline adherence, medical adequacy, overall quality, justification of the treatment plans and their completeness as well as patient vignette difficulty using a 5-point Likert scale.

Results: 20 fictional vignettes covering various rheumatic diseases and varying difficulty levels were assembled and a total of 160 ratings were assessed. In 68.8% (110/160) of cases, raters preferred the RB's treatment plans over those generated by GPT-4 (16.3%; 26/160) and GPT-3.5 (15.0%; 24/160). GPT-4's plans were chosen more frequently for first-line treatments compared to GPT-3.5. No significant safety differences were observed between RB and GPT-4's first-line treatment plans. Rheumatologists' plans received significantly higher ratings in guideline adherence, medical appropriateness, completeness and overall quality. Ratings did not correlate with the vignette difficulty. LLM-generated plans were notably longer and more detailed.

Conclusion: GPT-4 and GPT-3.5 generated safe, high-quality treatment plans for rheumatic diseases, demonstrating promise in clinical decision support. Future research should investigate detailed standardized prompts and the impact of LLM usage on clinical decisions.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对风湿病患者的 ChatGPT 和专科治疗决策进行基于视频的比较分析:Rheum2Guide 研究的结果。
背景:风湿病的复杂性给临床医生制定个性化治疗方案带来了巨大挑战。大型语言模型(LLM),如 ChatGPT,可以为治疗决策提供支持:将 ChatGPT-3.5 和 GPT-4 生成的治疗方案与临床风湿病委员会(RB)的治疗方案进行比较:设计/方法:创建虚构的患者小故事,并询问 GPT-3.5、GPT-4 和 RB,以提供各自的一线和二线治疗方案及基本理由。来自不同中心的四位风湿病专家对治疗方案的来源进行了盲法处理,选出了总体首选治疗方案,并采用 5 点李克特量表对治疗方案的安全性、EULAR 指南的依从性、医疗充分性、总体质量、治疗方案的合理性和完整性以及患者小故事的难度进行了评估。结果:20 个虚构的小故事涵盖了各种风湿病,难度各不相同,共评估了 160 个评分。在68.8%(110/160)的病例中,评分者更喜欢RB的治疗方案,而不是GPT-4(16.3%;26/160)和GPT-3.5(15.0%;24/160)生成的方案。与 GPT-3.5 相比,GPT-4 的方案更常被选为一线治疗方案。在 RB 和 GPT-4 的一线治疗方案之间没有观察到明显的安全性差异。风湿病专家的方案在指南遵循性、医疗适宜性、完整性和总体质量方面的评分明显更高。评分与小故事难度无关。LLM生成的计划明显更长、更详细:GPT-4和GPT-3.5生成了安全、高质量的风湿病治疗计划,显示了临床决策支持的前景。未来的研究应调查详细的标准化提示以及使用 LLM 对临床决策的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Rheumatology International
Rheumatology International 医学-风湿病学
CiteScore
7.30
自引率
5.00%
发文量
191
审稿时长
16. months
期刊介绍: RHEUMATOLOGY INTERNATIONAL is an independent journal reflecting world-wide progress in the research, diagnosis and treatment of the various rheumatic diseases. It is designed to serve researchers and clinicians in the field of rheumatology. RHEUMATOLOGY INTERNATIONAL will cover all modern trends in clinical research as well as in the management of rheumatic diseases. Special emphasis will be given to public health issues related to rheumatic diseases, applying rheumatology research to clinical practice, epidemiology of rheumatic diseases, diagnostic tests for rheumatic diseases, patient reported outcomes (PROs) in rheumatology and evidence on education of rheumatology. Contributions to these topics will appear in the form of original publications, short communications, editorials, and reviews. "Letters to the editor" will be welcome as an enhancement to discussion. Basic science research, including in vitro or animal studies, is discouraged to submit, as we will only review studies on humans with an epidemological or clinical perspective. Case reports without a proper review of the literatura (Case-based Reviews) will not be published. Every effort will be made to ensure speed of publication while maintaining a high standard of contents and production. Manuscripts submitted for publication must contain a statement to the effect that all human studies have been reviewed by the appropriate ethics committee and have therefore been performed in accordance with the ethical standards laid down in an appropriate version of the 1964 Declaration of Helsinki. It should also be stated clearly in the text that all persons gave their informed consent prior to their inclusion in the study. Details that might disclose the identity of the subjects under study should be omitted.
期刊最新文献
Safety, efficacy, and immunogenicity of SARS-CoV-2 mRNA vaccination in children and adult patients with rheumatic diseases: a comprehensive literature review. The association of obesity and the risk of rheumatoid arthritis according to abdominal obesity status: a nationwide population-based study in Korea. Acknowledgement to referees. Séraphin (1747-1800), "the facetious hunchback": How ankylosing spondylitis contributed to the success of his shadow puppet theatre. A comparison of comorbidities and their risk factors prevalence across rheumatoid arthritis, psoriatic arthritis and axial spondyloarthritis with focus on cardiovascular diseases: data from a single center real-world cohort.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1