Medical language matters: impact of clinical summary composition on a generative artificial intelligence's diagnostic accuracy.

IF 2.2 Q2 MEDICINE, GENERAL & INTERNAL Diagnosis Pub Date : 2024-12-12 DOI:10.1515/dx-2024-0167
Cassandra Skittle, Eliana Bonifacino, Casey N McQuade
{"title":"Medical language matters: impact of clinical summary composition on a generative artificial intelligence's diagnostic accuracy.","authors":"Cassandra Skittle, Eliana Bonifacino, Casey N McQuade","doi":"10.1515/dx-2024-0167","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Evaluate the impact of problem representation (PR) characteristics on Generative Artificial Intelligence (GAI) diagnostic accuracy.</p><p><strong>Methods: </strong>Internal medicine attendings and residents from two academic medical centers were given a clinical vignette and instructed to write a PR. Deductive content analysis described the characteristics comprising each PR. Individual PRs were input into ChatGPT-4 (OpenAI, September 2023) which was prompted to generate a ranked three-item differential. The ranked differential and the top-ranked diagnosis were scored on a 3-part scale, ranging from incorrect, partially correct, to correct. Logistic regression evaluated individual PR characteristic's impact on ChatGPT accuracy.</p><p><strong>Results: </strong>For a three-item differential, accuracy was associated with including fewer comorbidities (OR 0.57, p=0.010), fewer past historical items (OR 0.60, p=0.019), and more physical examination items (OR 1.66, p=0.015). For ChatGPT's ability to rank the true diagnosis as the single-best diagnosis, utilizing temporal semantic qualifiers, more semantic qualifiers overall, and adhering to a typical 3-part PR format all correlated with diagnostic accuracy: OR 3.447, p=0.046; OR 1.300, p=0.005; OR 3.577, p=0.020, respectively.</p><p><strong>Conclusions: </strong>Several distinct PR factors improved ChatGPT diagnostic accuracy. These factors have previously been associated with expertise in creating PR. Future studies should explore how clinical input qualities affect GAI diagnostic accuracy prospectively.</p>","PeriodicalId":11273,"journal":{"name":"Diagnosis","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnosis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/dx-2024-0167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Evaluate the impact of problem representation (PR) characteristics on Generative Artificial Intelligence (GAI) diagnostic accuracy.

Methods: Internal medicine attendings and residents from two academic medical centers were given a clinical vignette and instructed to write a PR. Deductive content analysis described the characteristics comprising each PR. Individual PRs were input into ChatGPT-4 (OpenAI, September 2023) which was prompted to generate a ranked three-item differential. The ranked differential and the top-ranked diagnosis were scored on a 3-part scale, ranging from incorrect, partially correct, to correct. Logistic regression evaluated individual PR characteristic's impact on ChatGPT accuracy.

Results: For a three-item differential, accuracy was associated with including fewer comorbidities (OR 0.57, p=0.010), fewer past historical items (OR 0.60, p=0.019), and more physical examination items (OR 1.66, p=0.015). For ChatGPT's ability to rank the true diagnosis as the single-best diagnosis, utilizing temporal semantic qualifiers, more semantic qualifiers overall, and adhering to a typical 3-part PR format all correlated with diagnostic accuracy: OR 3.447, p=0.046; OR 1.300, p=0.005; OR 3.577, p=0.020, respectively.

Conclusions: Several distinct PR factors improved ChatGPT diagnostic accuracy. These factors have previously been associated with expertise in creating PR. Future studies should explore how clinical input qualities affect GAI diagnostic accuracy prospectively.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
医学语言问题:临床摘要组成对生成式人工智能诊断准确性的影响。
目的:评估问题表征(PR)特征对生成式人工智能(GAI)诊断准确性的影响。方法:给两家学术医疗中心的内科主治医生和住院医生一份临床小短文,并要求他们写一份PR。演绎内容分析描述了每个PR的特征。将每个PR输入ChatGPT-4 (OpenAI, 2023年9月),并提示生成一个排名的三项差异。分级的鉴别诊断和排名靠前的诊断按3部分评分,从不正确、部分正确到正确。逻辑回归评估了个体PR特征对ChatGPT准确性的影响。结果:对于三项差异,准确性与包括较少的合并症(OR 0.57, p=0.010),较少的过去历史项目(OR 0.60, p=0.019)和更多的体检项目(OR 1.66, p=0.015)相关。对于ChatGPT将真实诊断排名为单一最佳诊断的能力,使用时间语义限定词,总体上使用更多语义限定词,并坚持典型的3部分PR格式,都与诊断准确性相关:OR 3.447, p=0.046;OR 1.300, p=0.005;OR为3.577,p=0.020。结论:几个明显的PR因素提高了ChatGPT诊断的准确性。这些因素之前与PR的专业知识有关。未来的研究应探讨临床输入质量如何影响GAI诊断的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Diagnosis
Diagnosis MEDICINE, GENERAL & INTERNAL-
CiteScore
7.20
自引率
5.70%
发文量
41
期刊介绍: Diagnosis focuses on how diagnosis can be advanced, how it is taught, and how and why it can fail, leading to diagnostic errors. The journal welcomes both fundamental and applied works, improvement initiatives, opinions, and debates to encourage new thinking on improving this critical aspect of healthcare quality.  Topics: -Factors that promote diagnostic quality and safety -Clinical reasoning -Diagnostic errors in medicine -The factors that contribute to diagnostic error: human factors, cognitive issues, and system-related breakdowns -Improving the value of diagnosis – eliminating waste and unnecessary testing -How culture and removing blame promote awareness of diagnostic errors -Training and education related to clinical reasoning and diagnostic skills -Advances in laboratory testing and imaging that improve diagnostic capability -Local, national and international initiatives to reduce diagnostic error
期刊最新文献
Diagnostic accuracy of non-mydriatic fundus photography as a triage and telemedicine tool for patients with vision loss. Pioneering diagnosis in Asia: advancing clinical reasoning expertise through the lens of 3M. Technical aspects and clinical applications of synthetic MRI: a scoping review. Fetal hematological phenotypes of various hemoglobinopathies and demonstration of embryonic hemoglobins on capillary electrophoresis: a large cohort data from prenatal screening program. Overview of dengue diagnostic limitations and potential strategies for improvement.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1