Can GPT-3.5 generate and code discharge summaries?

IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2024-09-14 DOI:10.1093/jamia/ocae132
Matúš Falis, Aryo Pradipta Gema, Hang Dong, Luke Daines, Siddharth Basetti, Michael Holder, Rose S Penfold, Alexandra Birch, Beatrice Alex
{"title":"Can GPT-3.5 generate and code discharge summaries?","authors":"Matúš Falis, Aryo Pradipta Gema, Hang Dong, Luke Daines, Siddharth Basetti, Michael Holder, Rose S Penfold, Alexandra Birch, Beatrice Alex","doi":"10.1093/jamia/ocae132","DOIUrl":null,"url":null,"abstract":"Objectives The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels. Materials and Methods Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents. Results Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative. Discussion and Conclusion While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"18 1","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae132","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels. Materials and Methods Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents. Results Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative. Discussion and Conclusion While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPT-3.5 能否生成出院摘要并对其进行编码?
目的 本研究旨在调查 GPT-3.5 在生成和编码带有国际疾病分类 (ICD) -10 代码的医疗文件时的作用,以便在低资源标签上进行数据扩增。材料与方法 利用 GPT-3.5,我们根据 MIMIC-IV 数据集中具有不常用(或生成)代码的患者的 ICD-10 代码描述列表,生成并编码了 9606 份出院摘要。这与基线训练集相结合,形成了一个增强训练集。神经编码模型在基线数据和增强数据上进行训练,并在 MIMIC-IV 测试集上进行评估。我们报告了完整代码集、一代代码及其族的微观和宏观 F1 分数。弱层次混淆矩阵(Weak Hierarchical Confusion Matrices)确定了后一种代码集的族内和族外编码错误。GPT-3.5 的编码性能在提示引导下的自我生成数据和真实的 MIMIC-IV 数据上进行了评估。临床医生对生成文件的临床可接受性进行了评估。结果 数据扩增导致模型的整体性能略有下降,但提高了生成候选代码及其族的性能,包括基线训练数据中缺少的 1 个代码。增强模型的族外错误率较低。GPT-3.5 可通过提示描述识别 ICD-10 代码,但在真实数据上表现不佳。评估人员强调了生成概念的正确性,但在多样性、支持信息和叙述方面存在不足。讨论与结论 虽然根据我们的提示设置,GPT-3.5 本身并不适合 ICD-10 编码,但它支持用于训练神经模型的数据增强。扩增对生成代码族有积极影响,但主要有利于已有示例的代码。扩增可减少族外错误。GPT-3.5 生成的文档能正确表述提示概念,但缺乏多样性和叙述的真实性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
期刊最新文献
The substance-exposed birthing person-infant/child dyad and health information exchange in the United States. Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning. Implementation and impact of an electronic patient reported outcomes system in a phase II multi-site adaptive platform clinical trial for early-stage breast cancer. Increasing adherence and collecting symptom-specific biometric signals in remote monitoring of heart failure patients: a randomized controlled trial. Correction to: Are medical history data fit for risk stratification of patients with chest pain in emergency care? Comparing data collected from patients using computerized history taking with data documented by physicians in the electronic health record in the CLEOS-CPDS prospective cohort study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1