首页 > 最新文献

NEJM AI最新文献

英文 中文
Large Language Models for More Efficient Reporting of Hospital Quality Measures. 更有效地报告医院质量措施的大型语言模型。
Pub Date : 2024-10-24 Epub Date: 2024-10-21 DOI: 10.1056/aics2400420
Aaron Boussina, Rishivardhan Krishnamoorthy, Kimberly Quintero, Shreyansh Joshi, Gabriel Wardi, Hayden Pour, Nicholas Hilbert, Atul Malhotra, Michael Hogarth, Amy M Sitapati, Chad VanDenBerg, Karandeep Singh, Christopher A Longhurst, Shamim Nemati

Hospital quality measures are a vital component of a learning health system, yet they can be costly to report, statistically underpowered, and inconsistent due to poor interrater reliability. Large language models (LLMs) have recently demonstrated impressive performance on health care-related tasks and offer a promising way to provide accurate abstraction of complete charts at scale. To evaluate this approach, we deployed an LLM-based system that ingests Fast Healthcare Interoperability Resources data and outputs a completed Severe Sepsis and Septic Shock Management Bundle (SEP-1) abstraction. We tested the system on a sample of 100 manual SEP-1 abstractions that University of California San Diego Health reported to the Centers for Medicare & Medicaid Services in 2022. The LLM system achieved agreement with manual abstractors on the measure category assignment in 90 of the abstractions (90%; κ=0.82; 95% confidence interval, 0.71 to 0.92). Expert review of the 10 discordant cases identified four that were mistakes introduced by manual abstraction. This pilot study suggests that LLMs using interoperable electronic health record data may perform accurate abstractions for complex quality measures. (Funded by the National Institute of Allergy and Infectious Diseases [1R42AI177108-1] and others.).

医院质量措施是学习型卫生系统的重要组成部分,但报告这些措施可能成本高昂,统计力度不足,而且由于相互间可靠性差而不一致。大型语言模型(llm)最近在医疗保健相关任务上展示了令人印象深刻的性能,并提供了一种有希望的方式来提供大规模完整图表的准确抽象。为了评估这种方法,我们部署了一个基于llm的系统,该系统摄取快速医疗保健互操作性资源数据并输出完整的严重败血症和感染性休克管理包(SEP-1)抽象。我们在加州大学圣地亚哥分校健康中心(University of California San Diego Health)于2022年向医疗保险和医疗补助服务中心(Centers for Medicare & Medicaid Services)报告的100份手动SEP-1摘要样本上测试了该系统。LLM系统在90个抽象(90%;κ= 0.82;95%置信区间,0.71 ~ 0.92)。专家对10个不一致的案例进行了审查,发现其中4个是由人工抽象引入的错误。这项试点研究表明,使用可互操作的电子健康记录数据的llm可以对复杂的质量测量进行准确的抽象。(由国家过敏和传染病研究所[1R42AI177108-1]等资助。)
{"title":"Large Language Models for More Efficient Reporting of Hospital Quality Measures.","authors":"Aaron Boussina, Rishivardhan Krishnamoorthy, Kimberly Quintero, Shreyansh Joshi, Gabriel Wardi, Hayden Pour, Nicholas Hilbert, Atul Malhotra, Michael Hogarth, Amy M Sitapati, Chad VanDenBerg, Karandeep Singh, Christopher A Longhurst, Shamim Nemati","doi":"10.1056/aics2400420","DOIUrl":"10.1056/aics2400420","url":null,"abstract":"<p><p>Hospital quality measures are a vital component of a learning health system, yet they can be costly to report, statistically underpowered, and inconsistent due to poor interrater reliability. Large language models (LLMs) have recently demonstrated impressive performance on health care-related tasks and offer a promising way to provide accurate abstraction of complete charts at scale. To evaluate this approach, we deployed an LLM-based system that ingests Fast Healthcare Interoperability Resources data and outputs a completed Severe Sepsis and Septic Shock Management Bundle (SEP-1) abstraction. We tested the system on a sample of 100 manual SEP-1 abstractions that University of California San Diego Health reported to the Centers for Medicare & Medicaid Services in 2022. The LLM system achieved agreement with manual abstractors on the measure category assignment in 90 of the abstractions (90%; κ=0.82; 95% confidence interval, 0.71 to 0.92). Expert review of the 10 discordant cases identified four that were mistakes introduced by manual abstraction. This pilot study suggests that LLMs using interoperable electronic health record data may perform accurate abstractions for complex quality measures. (Funded by the National Institute of Allergy and Infectious Diseases [1R42AI177108-1] and others.).</p>","PeriodicalId":520343,"journal":{"name":"NEJM AI","volume":"1 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142866963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prospective Multi-Site Validation of AI to Detect Tuberculosis and Chest X-Ray Abnormalities. 人工智能检测肺结核和胸部x线异常的前瞻性多位点验证。
Pub Date : 2024-10-01 Epub Date: 2024-09-26 DOI: 10.1056/aioa2400018
Sahar Kazemzadeh, Atilla P Kiraly, Zaid Nabulsi, Nsala Sanjase, Minyoi Maimbolwa, Brian Shuma, Shahar Jamshy, Christina Chen, Arnav Agharwal, Charles T Lau, Andrew Sellergren, Daniel Golden, Jin Yu, Eric Wu, Yossi Matias, Katherine Chou, Greg S Corrado, Shravya Shetty, Daniel Tse, Krish Eswaran, Yun Liu, Rory Pilgrim, Monde Muyoyeta, Shruthi Prabhakara

Background: Using artificial intelligence (AI) to interpret chest X-rays (CXRs) could support accessible triage tests for active pulmonary tuberculosis (TB) in resource-constrained settings.

Methods: The performance of two cloud-based CXR AI systems - one to detect TB and the other to detect CXR abnormalities - in a population with a high TB and human immunodeficiency virus (HIV) burden was evaluated. We recruited 1978 adults who had TB symptoms, were close contacts of known TB patients, or were newly diagnosed with HIV at three clinical sites. The TB-detecting AI (TB AI) scores were converted to binary using two thresholds: a high-sensitivity threshold and an exploratory threshold designed to resemble radiologist performance. Ten radiologists reviewed images for signs of TB, blinded to the reference standard. Primary analysis measured AI detection noninferiority to radiologist performance. Secondary analysis evaluated AI detection as compared with the World Health Organization (WHO) targets (90% sensitivity, 70% specificity). Both used an absolute margin of 5%. The abnormality-detecting AI (abnormality AI) was evaluated for noninferiority to a high-sensitivity target suitable for triaging (90% sensitivity, 50% specificity).

Results: Of the 1910 patients analyzed, 1827 (96%) had conclusive TB status, of which 649 (36%) were HIV positive and 192 (11%) were TB positive. The TB AI's sensitivity and specificity were 87% and 70%, respectively, at the high-sensitivity threshold and 78% and 82%, respectively, at the balanced threshold. Radiologists' mean sensitivity was 76% and mean specificity was 82%. At the high-sensitivity threshold, the TB AI was noninferior to average radiologist sensitivity (P<0.001) but not to average radiologist specificity (P=0.99) and was higher than the WHO target for specificity but not sensitivity. At the balanced threshold, the TB AI was comparable to radiologists. The abnormality AI's sensitivity and specificity were 97% and 79%, respectively, with both meeting the prespecified targets.

Conclusions: The CXR TB AI was noninferior to radiologists for active pulmonary TB triaging in a population with a high TB and HIV burden. Neither the TB AI nor the radiologists met WHO recommendations for sensitivity in the study population. AI can also be used to detect other CXR abnormalities in the same population.

背景:利用人工智能(AI)解读胸部x光片(cxr),可以在资源受限的情况下为活动性肺结核(TB)提供可获得的分诊检测。方法:评估两种基于云的CXR AI系统的性能,一种用于检测结核病,另一种用于检测CXR异常,在结核病和人类免疫缺陷病毒(HIV)负担高的人群中。我们招募了1978名有结核病症状的成年人,他们是已知结核病患者的密切接触者,或在三个临床站点新诊断为艾滋病毒感染者。TB检测AI (TB AI)评分使用两个阈值转换为二值:一个高灵敏度阈值和一个旨在类似放射科医生表现的探索性阈值。10名放射科医生在不了解参考标准的情况下检查了结核病的影像。初步分析测量了人工智能检测对放射科医生表现的非劣效性。二级分析将人工智能检测与世界卫生组织(WHO)的目标进行比较(灵敏度90%,特异性70%)。两者都使用了5%的绝对利润率。异常检测AI(异常AI)被评估为适合分诊的高灵敏度目标的非劣效性(90%灵敏度,50%特异性)。结果:1910例患者中,1827例(96%)确诊结核,其中649例(36%)为HIV阳性,192例(11%)为结核阳性。在高敏感阈值下,TB AI的敏感性和特异性分别为87%和70%,在平衡阈值下,其敏感性和特异性分别为78%和82%。放射科医生的平均敏感性为76%,平均特异性为82%。在高灵敏度阈值下,结核人工智能不低于放射科医生的平均灵敏度(p结论:CXR结核人工智能不低于放射科医生在结核病和艾滋病高负担人群中进行活动性肺结核分诊的能力。结核病人工智能和放射科医生在研究人群中的敏感性均未达到世卫组织的建议。人工智能还可用于检测同一人群中的其他CXR异常。
{"title":"Prospective Multi-Site Validation of AI to Detect Tuberculosis and Chest X-Ray Abnormalities.","authors":"Sahar Kazemzadeh, Atilla P Kiraly, Zaid Nabulsi, Nsala Sanjase, Minyoi Maimbolwa, Brian Shuma, Shahar Jamshy, Christina Chen, Arnav Agharwal, Charles T Lau, Andrew Sellergren, Daniel Golden, Jin Yu, Eric Wu, Yossi Matias, Katherine Chou, Greg S Corrado, Shravya Shetty, Daniel Tse, Krish Eswaran, Yun Liu, Rory Pilgrim, Monde Muyoyeta, Shruthi Prabhakara","doi":"10.1056/aioa2400018","DOIUrl":"10.1056/aioa2400018","url":null,"abstract":"<p><strong>Background: </strong>Using artificial intelligence (AI) to interpret chest X-rays (CXRs) could support accessible triage tests for active pulmonary tuberculosis (TB) in resource-constrained settings.</p><p><strong>Methods: </strong>The performance of two cloud-based CXR AI systems - one to detect TB and the other to detect CXR abnormalities - in a population with a high TB and human immunodeficiency virus (HIV) burden was evaluated. We recruited 1978 adults who had TB symptoms, were close contacts of known TB patients, or were newly diagnosed with HIV at three clinical sites. The TB-detecting AI (TB AI) scores were converted to binary using two thresholds: a high-sensitivity threshold and an exploratory threshold designed to resemble radiologist performance. Ten radiologists reviewed images for signs of TB, blinded to the reference standard. Primary analysis measured AI detection noninferiority to radiologist performance. Secondary analysis evaluated AI detection as compared with the World Health Organization (WHO) targets (90% sensitivity, 70% specificity). Both used an absolute margin of 5%. The abnormality-detecting AI (abnormality AI) was evaluated for noninferiority to a high-sensitivity target suitable for triaging (90% sensitivity, 50% specificity).</p><p><strong>Results: </strong>Of the 1910 patients analyzed, 1827 (96%) had conclusive TB status, of which 649 (36%) were HIV positive and 192 (11%) were TB positive. The TB AI's sensitivity and specificity were 87% and 70%, respectively, at the high-sensitivity threshold and 78% and 82%, respectively, at the balanced threshold. Radiologists' mean sensitivity was 76% and mean specificity was 82%. At the high-sensitivity threshold, the TB AI was noninferior to average radiologist sensitivity (P<0.001) but not to average radiologist specificity (P=0.99) and was higher than the WHO target for specificity but not sensitivity. At the balanced threshold, the TB AI was comparable to radiologists. The abnormality AI's sensitivity and specificity were 97% and 79%, respectively, with both meeting the prespecified targets.</p><p><strong>Conclusions: </strong>The CXR TB AI was noninferior to radiologists for active pulmonary TB triaging in a population with a high TB and HIV burden. Neither the TB AI nor the radiologists met WHO recommendations for sensitivity in the study population. AI can also be used to detect other CXR abnormalities in the same population.</p>","PeriodicalId":520343,"journal":{"name":"NEJM AI","volume":"1 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11737584/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143019977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
NEJM AI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1