Tyrus Vong, Nicholas Rizer, Vedant Jain, Valerie L Thompson, Mark Dredze, Eili Y Klein, Jeremiah S Hinson, Tanjala Purnell, Stephen Kwak, Tinsay Woreta, Alexandra T Strauss
{"title":"Automated identification of incidental hepatic steatosis on Emergency Department imaging using large language models.","authors":"Tyrus Vong, Nicholas Rizer, Vedant Jain, Valerie L Thompson, Mark Dredze, Eili Y Klein, Jeremiah S Hinson, Tanjala Purnell, Stephen Kwak, Tinsay Woreta, Alexandra T Strauss","doi":"10.1097/HC9.0000000000000638","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hepatic steatosis is a precursor to more severe liver disease, increasing morbidity and mortality risks. In the Emergency Department, routine abdominal imaging often reveals incidental hepatic steatosis that goes undiagnosed due to the acute nature of encounters. Imaging reports in the electronic health record contain valuable information not easily accessible as discrete data elements. We hypothesized that large language models could reliably detect hepatic steatosis from reports without extensive natural language processing training.</p><p><strong>Methods: </strong>We identified 200 adults who had CT abdominal imaging in the Emergency Department between August 1, 2016, and December 31, 2023. Using text from imaging reports and structured prompts, 3 Azure OpenAI models (ChatGPT 3.5, 4, 4o) identified patients with hepatic steatosis. We evaluated model performance regarding accuracy, inter-rater reliability, sensitivity, and specificity compared to physician reviews.</p><p><strong>Results: </strong>The accuracy for the models was 96.2% for v3.5, 98.3% for v4, and 98.8% for v4o. Inter-rater reliability ranged from 0.99 to 1.00 across 10 iterations. Mean model confidence scores were 2.9 (SD 0.8) for v3.5, 3.9 (SD 0.3) for v4, and 4.0 (SD 0.07) for v4o. Incorrect evaluations were 76 (3.8%) for v3.5, 34 (1.7%) for v4, and 25 (1.3%) for v4o. All models showed sensitivity and specificity above 0.9.</p><p><strong>Conclusions: </strong>Large language models can assist in identifying incidental conditions from imaging reports that otherwise may be missed opportunities for early disease intervention. Large language models are a democratization of natural language processing by allowing for a user-friendly, expansive analyses of electronic medical records without requiring the development of complex natural language processing models.</p>","PeriodicalId":12978,"journal":{"name":"Hepatology Communications","volume":"9 3","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hepatology Communications","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/HC9.0000000000000638","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Hepatic steatosis is a precursor to more severe liver disease, increasing morbidity and mortality risks. In the Emergency Department, routine abdominal imaging often reveals incidental hepatic steatosis that goes undiagnosed due to the acute nature of encounters. Imaging reports in the electronic health record contain valuable information not easily accessible as discrete data elements. We hypothesized that large language models could reliably detect hepatic steatosis from reports without extensive natural language processing training.
Methods: We identified 200 adults who had CT abdominal imaging in the Emergency Department between August 1, 2016, and December 31, 2023. Using text from imaging reports and structured prompts, 3 Azure OpenAI models (ChatGPT 3.5, 4, 4o) identified patients with hepatic steatosis. We evaluated model performance regarding accuracy, inter-rater reliability, sensitivity, and specificity compared to physician reviews.
Results: The accuracy for the models was 96.2% for v3.5, 98.3% for v4, and 98.8% for v4o. Inter-rater reliability ranged from 0.99 to 1.00 across 10 iterations. Mean model confidence scores were 2.9 (SD 0.8) for v3.5, 3.9 (SD 0.3) for v4, and 4.0 (SD 0.07) for v4o. Incorrect evaluations were 76 (3.8%) for v3.5, 34 (1.7%) for v4, and 25 (1.3%) for v4o. All models showed sensitivity and specificity above 0.9.
Conclusions: Large language models can assist in identifying incidental conditions from imaging reports that otherwise may be missed opportunities for early disease intervention. Large language models are a democratization of natural language processing by allowing for a user-friendly, expansive analyses of electronic medical records without requiring the development of complex natural language processing models.
期刊介绍:
Hepatology Communications is a peer-reviewed, online-only, open access journal for fast dissemination of high quality basic, translational, and clinical research in hepatology. Hepatology Communications maintains high standard and rigorous peer review. Because of its open access nature, authors retain the copyright to their works, all articles are immediately available and free to read and share, and it is fully compliant with funder and institutional mandates. The journal is committed to fast publication and author satisfaction.