Integrating a host transcriptomic biomarker with a large language model for diagnosis of lower respiratory tract infection

medRxiv - Infectious Diseases Pub Date : 2024-08-29 DOI:10.1101/2024.08.28.24312732

Hoang Van Phan, Natasha Spottiswoode, Emily C. Lydon, Victoria T. Chu, Adolfo Cuesta, Alexander D. Kazberouk, Natalie L. Richmond, Carolyn S. Calfee, Charles R. Langelier

{"title":"Integrating a host transcriptomic biomarker with a large language model for diagnosis of lower respiratory tract infection","authors":"Hoang Van Phan, Natasha Spottiswoode, Emily C. Lydon, Victoria T. Chu, Adolfo Cuesta, Alexander D. Kazberouk, Natalie L. Richmond, Carolyn S. Calfee, Charles R. Langelier","doi":"10.1101/2024.08.28.24312732","DOIUrl":null,"url":null,"abstract":"Lower respiratory tract infections (LRTIs) are a leading cause of mortality worldwide. Despite this, diagnosing LRTI remains challenging, particularly in the intensive care unit, where non-infectious respiratory conditions can present with similar features. Here, we tested a new method for LRTI diagnosis that combines the transcriptomic biomarker FABP4 with assessment of text from the electronic medical record (EMR) using the large language model Generative Pre-trained Transformer 4 (GPT-4). We evaluated this methodology in a prospective cohort of critically ill adults with acute respiratory failure, in which we measured pulmonary FABP4 expression and identified patients with LRTI or non-infectious conditions using retrospective adjudication. A diagnostic classifier combining FABP4 and GPT-4 achieved an area under the receiver operator curve (AUC) of 0.92 ± 0.06 by five-fold cross validation (CV), outperforming classifiers based on FABP4 expression alone (AUC 0.83) or GPT-4 alone (AUC 0.84). At the Youden’s index within each CV fold, the combined classifier achieved a mean sensitivity of 92% ± 7%, specificity of 90% ± 17% and accuracy of 91% +/- 8%. Taken together, our findings suggest that combining a host transcriptional biomarker with interpretation of EMR data using artificial intelligence is a promising new approach to infectious disease diagnosis.","PeriodicalId":501509,"journal":{"name":"medRxiv - Infectious Diseases","volume":"2010 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Infectious Diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.28.24312732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Lower respiratory tract infections (LRTIs) are a leading cause of mortality worldwide. Despite this, diagnosing LRTI remains challenging, particularly in the intensive care unit, where non-infectious respiratory conditions can present with similar features. Here, we tested a new method for LRTI diagnosis that combines the transcriptomic biomarker FABP4 with assessment of text from the electronic medical record (EMR) using the large language model Generative Pre-trained Transformer 4 (GPT-4). We evaluated this methodology in a prospective cohort of critically ill adults with acute respiratory failure, in which we measured pulmonary FABP4 expression and identified patients with LRTI or non-infectious conditions using retrospective adjudication. A diagnostic classifier combining FABP4 and GPT-4 achieved an area under the receiver operator curve (AUC) of 0.92 ± 0.06 by five-fold cross validation (CV), outperforming classifiers based on FABP4 expression alone (AUC 0.83) or GPT-4 alone (AUC 0.84). At the Youden’s index within each CV fold, the combined classifier achieved a mean sensitivity of 92% ± 7%, specificity of 90% ± 17% and accuracy of 91% +/- 8%. Taken together, our findings suggest that combining a host transcriptional biomarker with interpretation of EMR data using artificial intelligence is a promising new approach to infectious disease diagnosis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

将宿主转录组生物标记物与大型语言模型相结合诊断下呼吸道感染

下呼吸道感染（LRTI）是导致全球死亡的主要原因。尽管如此，LRTI 的诊断仍然具有挑战性，尤其是在重症监护病房，因为非感染性呼吸道疾病也可能表现出类似的特征。在这里，我们测试了一种新的 LRTI 诊断方法，它将转录组生物标志物 FABP4 与使用大型语言模型生成预训练转换器 4 (GPT-4) 评估电子病历 (EMR) 中的文本相结合。我们在急性呼吸衰竭重症成人前瞻性队列中评估了这一方法，测量了肺部 FABP4 的表达，并通过回顾性判定确定了 LRTI 或非感染性疾病患者。通过五倍交叉验证（CV），结合 FABP4 和 GPT-4 的诊断分类器的接收运算曲线下面积（AUC）为 0.92 ± 0.06，优于仅基于 FABP4 表达的分类器（AUC 0.83）或仅基于 GPT-4 的分类器（AUC 0.84）。在每个交叉验证褶皱内的尤登指数上，组合分类器的平均灵敏度为 92% ± 7%，特异度为 90% ± 17%，准确度为 91% +/- 8%。综上所述，我们的研究结果表明，将宿主转录生物标记物与利用人工智能解读EMR数据相结合是一种很有前景的传染病诊断新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Infectious Diseases

自引率

0.00%

发文量