Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings.

IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2024-09-01 DOI:10.1093/jamia/ocae117
Kar-Mun C Woo, Gregory W Simon, Olumide Akindutire, Yindalon Aphinyanaphongs, Jonathan S Austrian, Jung G Kim, Nicholas Genes, Jacob A Goldenring, Vincent J Major, Chloé S Pariente, Edwin G Pineda, Stella K Kang
{"title":"Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings.","authors":"Kar-Mun C Woo, Gregory W Simon, Olumide Akindutire, Yindalon Aphinyanaphongs, Jonathan S Austrian, Jung G Kim, Nicholas Genes, Jacob A Goldenring, Vincent J Major, Chloé S Pariente, Edwin G Pineda, Stella K Kang","doi":"10.1093/jamia/ocae117","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings.</p><p><strong>Materials and methods: </strong>Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as \"definitely actionable\" (DA) or \"possibly actionable-clinical correlation\" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale.</p><p><strong>Results: </strong>For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were \"hallucinated\" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision.</p><p><strong>Conclusion: </strong>GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via \"human-in-the-loop\" workflows remains critical for clinical implementation.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1983-1993"},"PeriodicalIF":4.7000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339516/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae117","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings.

Materials and methods: Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as "definitely actionable" (DA) or "possibly actionable-clinical correlation" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale.

Results: For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were "hallucinated" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision.

Conclusion: GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via "human-in-the-loop" workflows remains critical for clinical implementation.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估 GPT-4 识别可操作的偶然放射学发现并生成患者指南的能力。
目的评估符合 HIPAA 标准的 GPT-4 版本在从急诊科患者的非结构化放射学报告中识别可操作的偶然发现方面的能力。评估人工智能(AI)生成的、面向患者的这些检查结果摘要的适当性:人工审核从一家大型学术医疗中心的电子健康记录中提取的放射学报告,以确定极有可能需要随访的非急诊、偶然发现,并进一步细分为 "绝对可操作"(DA)或 "可能可操作-临床相关"(PA-CC)。我们开发了 GPT-4 的指导提示,并使用 50 份报告的验证集进行了反复优化。然后将优化后的提示应用于由 430 份未见报告组成的测试集。GPT-4 的表现主要根据识别 DA 或 PA-CC 结果的准确性进行评分,其次才是单独识别 DA 结果的准确性。对输出结果进行了幻觉审查。通过李克特量表对人工智能生成的面向患者的摘要进行适当性评估:对于主要结果(DA 或 PA-CC),GPT-4 的召回率为 99.3%,精确率为 73.6%,F-1 为 84.5%。对于次要结果(仅 DA),GPT-4 的召回率为 95.2%,精确度为 77.3%,F-1 为 85.3%。没有发现 "幻觉"。不过,有 2.8% 的案例生成了有关建议的文本,这些建议是在没有具体参考的情况下推断出来的。大多数 "真阳性 "人工智能生成的摘要无需修改或只需少量修改:结论:GPT-4 展示了在完善的指导提示后检测可操作的偶然发现的能力。人工智能生成的患者指导通常是适当的,但很少包含推断建议。虽然这项技术有望增强诊断效果,但临床医生通过 "人在回路中 "的工作流程进行积极监督对于临床实施仍然至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
期刊最新文献
Efficacy of the mLab App: a randomized clinical trial for increasing HIV testing uptake using mobile technology. Machine learning-based prediction models in medical decision-making in kidney disease: patient, caregiver, and clinician perspectives on trust and appropriate use. Research for all: building a diverse researcher community for the All of Us Research Program. Learning health system linchpins: information exchange and a common data model. Oncointerpreter.ai enables interactive, personalized summarization of cancer diagnostics data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1