Pub Date : 2025-12-01Epub Date: 2025-10-28DOI: 10.1016/j.mcpdig.2025.100300
Jiwoong Jeong MS , Chieh-Ju Chao MD , Reza Arsanjani MD , Chadi Ayoub MBBS, PhD , Steven J. Lester MD , Milagros Pereyra MD , Ebram F. Said MD , Michael Roarke BS , Cecilia Tagle-Cornell MS , Laura M. Koepke MSN , Yi-Lin Tsai MD , Chen Jung-Hsuan MD , Chun-Chin Chang MD , Juan M. Farina MD , Hari Trivedi MD , Bhavik N. Patel MD, MBA , Imon Banerjee PhD
Objective
To create an opportunistic screening model to predict coronary calcium burden and associated cardiovascular risk using only commonly available frontal chest x-rays (CXR) and patient demographics.
Patients and Methods
We proposed a novel multitask learning framework and trained a model using 2121 patients with paired gated computed tomography scans and CXR images internally (Mayo Clinic) from January 1, 2012, to December 31, 2022, with coronary artery calcification (CAC) scores (0, 1-99, and 100+) as ground truths. Results from the internal training were validated on multiple external datasets (Emory University Healthcare and Taipei Veterans General Hospital—from January 1, 2012, to December 31, 2022) with significant racial and ethnic differences.
Results
Classification performance between 0, 1-99, and 100+ CAC scores performed moderately on both the internal test and external datasets, reaching average f1-scores of 0.71±0.04 for Mayo, 0.65±0.02 for Emory University Healthcare, and 0.70±0.06 for Taipei Veterans General Hospital. For the clinically relevant risk identification, the performance of our model on the internal and 2 external datasets reached area under the receiver operating curves of 0.86±0.02, 0.77±0.03, and 0.82±0.03 for 0 versus 400+, respectively. For 0 versus 100+, we achieved area under the receiver operating curve of 0.83±0.03, 0.71±0.02, and 0.78±0.01, respectively. Prospective evaluation across 3 Mayo Clinic sites is on par with the external validations and reports only minimal temporal drift.
Conclusion
Open-source fusion artificial intelligence-CXR model performed better than existing state-of-the-art models for predicting CAC scores only on internal cohort, with robust performance on external datasets. This proposed model may be useful as a robust, first-pass opportunistic screening method for cardiovascular risk from regular CXR.
目的建立一种机会性筛查模型,仅利用常用的胸部x光片(CXR)和患者人口统计学数据预测冠状动脉钙负荷和相关心血管风险。患者和方法我们提出了一个新的多任务学习框架,并使用2012年1月1日至2022年12月31日在梅奥诊所(Mayo Clinic)内部进行的2121例患者的配对门控制计算机断层扫描和CXR图像训练了一个模型,其中冠状动脉钙化(CAC)评分(0、1-99和100+)作为基本事实。内部训练的结果在多个外部数据集(Emory University Healthcare and Taipei Veterans General hospital,从2012年1月1日至2022年12月31日)上进行验证,具有显著的种族和民族差异。结果0、1 ~ 99、100+ CAC评分在内部和外部数据集的分类表现均为中等,梅奥医院的平均评分为0.71±0.04,埃默里大学医疗保健为0.65±0.02,台北退伍军人总医院为0.70±0.06。对于临床相关风险识别,我们的模型在内部和2个外部数据集上的表现在受试者工作曲线下分别达到0.86±0.02,0.77±0.03和0.82±0.03,分别为0和400+。对于0和100+,我们获得的受试者工作曲线下面积分别为0.83±0.03,0.71±0.02和0.78±0.01。3个Mayo诊所站点的前瞻性评估与外部验证相同,报告的时间偏差最小。结论开源融合人工智能- cxr模型仅在内部队列上预测CAC分数优于现有最先进的模型,在外部数据集上具有稳健的性能。该模型可作为常规CXR中心血管风险的一种稳健的第一次机会性筛查方法。
{"title":"Artificial Intelligence Chest X-Ray Opportunistic Screening Model for Coronary Artery Calcium Deposition: A Multi-Objective Model With Multimodal Data Fusion","authors":"Jiwoong Jeong MS , Chieh-Ju Chao MD , Reza Arsanjani MD , Chadi Ayoub MBBS, PhD , Steven J. Lester MD , Milagros Pereyra MD , Ebram F. Said MD , Michael Roarke BS , Cecilia Tagle-Cornell MS , Laura M. Koepke MSN , Yi-Lin Tsai MD , Chen Jung-Hsuan MD , Chun-Chin Chang MD , Juan M. Farina MD , Hari Trivedi MD , Bhavik N. Patel MD, MBA , Imon Banerjee PhD","doi":"10.1016/j.mcpdig.2025.100300","DOIUrl":"10.1016/j.mcpdig.2025.100300","url":null,"abstract":"<div><h3>Objective</h3><div>To create an opportunistic screening model to predict coronary calcium burden and associated cardiovascular risk using only commonly available frontal chest x-rays (CXR) and patient demographics.</div></div><div><h3>Patients and Methods</h3><div>We proposed a novel multitask learning framework and trained a model using 2121 patients with paired gated computed tomography scans and CXR images internally (Mayo Clinic) from January 1, 2012, to December 31, 2022, with coronary artery calcification (CAC) scores (0, 1-99, and 100+) as ground truths. Results from the internal training were validated on multiple external datasets (Emory University Healthcare and Taipei Veterans General Hospital—from January 1, 2012, to December 31, 2022) with significant racial and ethnic differences.</div></div><div><h3>Results</h3><div>Classification performance between 0, 1-99, and 100+ CAC scores performed moderately on both the internal test and external datasets, reaching average f1-scores of 0.71±0.04 for Mayo, 0.65±0.02 for Emory University Healthcare, and 0.70±0.06 for Taipei Veterans General Hospital. For the clinically relevant risk identification, the performance of our model on the internal and 2 external datasets reached area under the receiver operating curves of 0.86±0.02, 0.77±0.03, and 0.82±0.03 for 0 versus 400+, respectively. For 0 versus 100+, we achieved area under the receiver operating curve of 0.83±0.03, 0.71±0.02, and 0.78±0.01, respectively. Prospective evaluation across 3 Mayo Clinic sites is on par with the external validations and reports only minimal temporal drift.</div></div><div><h3>Conclusion</h3><div>Open-source fusion artificial intelligence-CXR model performed better than existing state-of-the-art models for predicting CAC scores only on internal cohort, with robust performance on external datasets. This proposed model may be useful as a robust, first-pass opportunistic screening method for cardiovascular risk from regular CXR.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100300"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-09DOI: 10.1016/j.mcpdig.2025.100264
Laura C. Zwiers MPhil , Duco Veen PhD , Marianna Mitratza PhD , Timo B. Brakenhoff PhD , Brianna M. Goodale PhD , Paul Klaver MSc , Kay Y. Hage MSc , Marcel van Willigen PhD , George S. Downward PhD , Peter Lugtig PhD , Leendert van Maanen PhD , Stefan Van der Stigchel PhD , Peter van der Heijden PhD , Maureen Cronin PhD , Diederick E. Grobbee PhD , COVID-RED Consortium
Objective
To present retention strategies implemented in the coronavirus disease 2019 (COVID-19) rapid early detection trial, a decentralized trial investigating the use of a wearable device for severe acute respiratory syndrome coronavirus 2 detection, and to provide insights into study retention and investigate determinants of discontinuation.
Patients and Methods
The COVID-2019 rapid early detection trial collected data from 17,825 participants from February 22, 2021 to November 18, 2021. Participants wore a wearable device overnight and synchronized it with a mobile application on waking. Retention strategies included common and personalized activities. Multivariable logistic regression was used to identify participants at high risk of discontinuation after 6 months in the trial. Results were combined with insights from behavioral theory to target participants with additional telephone calls.
Results
Total of 14,326 (80.4%) participants remained in the trial after 6 months and 12,208 (68.5%) until the end of the trial. Multivariable logistic regression identified age, employment situation, living situation, and COVID-19 vaccination status as predictors of discontinuation. Subgroups at high risk of discontinuation were identified, and behavioral assessments indicated that the subgroup of vaccinated pensioners would receive additional telephone calls. Their dropout rate was 11.4% after telephone calls.
Conclusion
This study describes how innovative and targeted data-driven retention strategies can be applied in a large decentralized clinical trial and presents the implemented retention strategies and discontinuation rates. Results can serve as a starting point for designing retention strategies in future decentralized trials.
{"title":"Increasing Retention in a Large-Scale Decentralized Clinical Trial: Learnings From the COVID-RED Trial","authors":"Laura C. Zwiers MPhil , Duco Veen PhD , Marianna Mitratza PhD , Timo B. Brakenhoff PhD , Brianna M. Goodale PhD , Paul Klaver MSc , Kay Y. Hage MSc , Marcel van Willigen PhD , George S. Downward PhD , Peter Lugtig PhD , Leendert van Maanen PhD , Stefan Van der Stigchel PhD , Peter van der Heijden PhD , Maureen Cronin PhD , Diederick E. Grobbee PhD , COVID-RED Consortium","doi":"10.1016/j.mcpdig.2025.100264","DOIUrl":"10.1016/j.mcpdig.2025.100264","url":null,"abstract":"<div><h3>Objective</h3><div>To present retention strategies implemented in the coronavirus disease 2019 (COVID-19) rapid early detection trial, a decentralized trial investigating the use of a wearable device for severe acute respiratory syndrome coronavirus 2 detection, and to provide insights into study retention and investigate determinants of discontinuation.</div></div><div><h3>Patients and Methods</h3><div>The COVID-2019 rapid early detection trial collected data from 17,825 participants from February 22, 2021 to November 18, 2021. Participants wore a wearable device overnight and synchronized it with a mobile application on waking. Retention strategies included common and personalized activities. Multivariable logistic regression was used to identify participants at high risk of discontinuation after 6 months in the trial. Results were combined with insights from behavioral theory to target participants with additional telephone calls.</div></div><div><h3>Results</h3><div>Total of 14,326 (80.4%) participants remained in the trial after 6 months and 12,208 (68.5%) until the end of the trial. Multivariable logistic regression identified age, employment situation, living situation, and COVID-19 vaccination status as predictors of discontinuation. Subgroups at high risk of discontinuation were identified, and behavioral assessments indicated that the subgroup of vaccinated pensioners would receive additional telephone calls. Their dropout rate was 11.4% after telephone calls.</div></div><div><h3>Conclusion</h3><div>This study describes how innovative and targeted data-driven retention strategies can be applied in a large decentralized clinical trial and presents the implemented retention strategies and discontinuation rates. Results can serve as a starting point for designing retention strategies in future decentralized trials.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100264"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-16DOI: 10.1016/j.mcpdig.2025.100287
Rahul Gomes PhD, William L. Jerome BS, Sushil K. Garg MBBS
{"title":"Integrating U-Net in a LLM Supervisor Agent Pipeline for Pancreatic Ductal Adenocarcinoma Diagnosis","authors":"Rahul Gomes PhD, William L. Jerome BS, Sushil K. Garg MBBS","doi":"10.1016/j.mcpdig.2025.100287","DOIUrl":"10.1016/j.mcpdig.2025.100287","url":null,"abstract":"","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100287"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145786592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-16DOI: 10.1016/j.mcpdig.2025.100285
Mohammad Mehdi Hosseini , Meghdad Sabouri Rad , Junze (Vincent) Huang , Rakesh Choudhary , Saverio J. Carello , Ola El-Zammar , Michel Nasr , Bardia Rodd
{"title":"Graph-Based Deep Ensemble Learning to Enhance Diagnostic Efficiency in Lung Adenocarcinoma H&E-Stained Histopathological Subtyping","authors":"Mohammad Mehdi Hosseini , Meghdad Sabouri Rad , Junze (Vincent) Huang , Rakesh Choudhary , Saverio J. Carello , Ola El-Zammar , Michel Nasr , Bardia Rodd","doi":"10.1016/j.mcpdig.2025.100285","DOIUrl":"10.1016/j.mcpdig.2025.100285","url":null,"abstract":"","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100285"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145786566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-25DOI: 10.1016/j.mcpdig.2025.100270
Stephanie M. Helman PhD , Nathan T. Riek PhD , Susan M. Sereika PhD , Ahmad P. Tafti PhD , Robert Olsen BS , J. William Gaynor MD , Amy Jo Lisanti PhD , Salah S. Al-Zaiti PhD
Objective
To identify distinct postoperative temperature trajectories in neonates with congenital heart defects after cardiopulmonary bypass (CPB), using advanced unsupervised machine learning clustering methods, corroborate findings, and evaluate their prognostic value on outcomes.
Patients and Methods
A secondary cohort analysis of prospective data collected from a single pediatric referral center’s CardioAccess data registry, consistent of neonates who underwent CPB between January 1, 2015, and January 1, 2019, was performed. Postoperative temperatures were extracted from medical records (48 hours). Group-based trajectory modeling (GBTM) performance was compared with self-organizing maps (SOM) and k-means clustering. Cluster membership and model fit were optimized for 3 temperature clusters per method. The primary outcome was a composite of postoperative complications. Clustering techniques were compared and associated with outcomes using adjusted multivariable binary logistic regression.
Results
Neonates of ≥34 weeks’ gestation underwent CPB (N=450). GBTM, SOM, and k-means identified membership for 3 groups: (1) persistent hypothermia (n=38 [9%]; n=49 [11%]; and n=40 [9%], respectively); (2) resolving hypothermia (n=233 [51%]; n=227 [50%]; and n=147 [33%], respectively); and (3) normothermia (n=179 [40%]; n=174 [39%]; and n=263 [58%], respectively). Concordance between techniques found strong agreement between GBTM and SOM (κ=0.92) and weak agreement between GBTM and k-means (κ=0.41). After adjustment, persistently hypothermic neonates compared with normothermic neonates were associated with higher odds of the complication composite outcome in the GBTM (odds ratio [OR], 2.8; 95% CI, 1.0-7.3; P=.04) and SOM (OR, 2.3; 95% CI, 1.0-5.4; P=.04) models, but not in the k-means model (OR, 1.4; 95% CI, 0.7-3.1; P=.38).
Conclusion
Exploring concordance between different machine learning techniques shows that temperature in neonates after CPB follows unique trajectories. Those exhibiting persistent hypothermia trends are at higher risk of adverse outcomes.
{"title":"Exploring Novel Data-Driven Clustering Methods for Uncovering Patterns in Longitudinal Neonatal Postoperative Temperature Measurements","authors":"Stephanie M. Helman PhD , Nathan T. Riek PhD , Susan M. Sereika PhD , Ahmad P. Tafti PhD , Robert Olsen BS , J. William Gaynor MD , Amy Jo Lisanti PhD , Salah S. Al-Zaiti PhD","doi":"10.1016/j.mcpdig.2025.100270","DOIUrl":"10.1016/j.mcpdig.2025.100270","url":null,"abstract":"<div><h3>Objective</h3><div>To identify distinct postoperative temperature trajectories in neonates with congenital heart defects after cardiopulmonary bypass (CPB), using advanced unsupervised machine learning clustering methods, corroborate findings, and evaluate their prognostic value on outcomes.</div></div><div><h3>Patients and Methods</h3><div>A secondary cohort analysis of prospective data collected from a single pediatric referral center’s CardioAccess data registry, consistent of neonates who underwent CPB between January 1, 2015, and January 1, 2019, was performed. Postoperative temperatures were extracted from medical records (48 hours). Group-based trajectory modeling (GBTM) performance was compared with self-organizing maps (SOM) and k-means clustering. Cluster membership and model fit were optimized for 3 temperature clusters per method. The primary outcome was a composite of postoperative complications. Clustering techniques were compared and associated with outcomes using adjusted multivariable binary logistic regression.</div></div><div><h3>Results</h3><div>Neonates of ≥34 weeks’ gestation underwent CPB (<em>N</em>=450). GBTM, SOM, and k-means identified membership for 3 groups: (1) persistent hypothermia (n=38 [9%]; n=49 [11%]; and n=40 [9%], respectively); (2) resolving hypothermia (n=233 [51%]; n=227 [50%]; and n=147 [33%], respectively); and (3) normothermia (n=179 [40%]; n=174 [39%]; and n=263 [58%], respectively). Concordance between techniques found strong agreement between GBTM and SOM (κ=0.92) and weak agreement between GBTM and k-means (κ=0.41). After adjustment, persistently hypothermic neonates compared with normothermic neonates were associated with higher odds of the complication composite outcome in the GBTM (odds ratio [OR], 2.8; 95% CI, 1.0-7.3; <em>P</em>=.04) and SOM (OR, 2.3; 95% CI, 1.0-5.4; <em>P</em>=.04) models, but not in the k-means model (OR, 1.4; 95% CI, 0.7-3.1; <em>P</em>=.38).</div></div><div><h3>Conclusion</h3><div>Exploring concordance between different machine learning techniques shows that temperature in neonates after CPB follows unique trajectories. Those exhibiting persistent hypothermia trends are at higher risk of adverse outcomes.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100270"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-03DOI: 10.1016/j.mcpdig.2025.100260
Joseph P. Deason MBA , Scott J. Adams MD, PhD, FRCPC , Ahmad Rahman MSc , Stacey Lovo PhD , Ivar Mendez MD, PhD, FRCSC
Objective
To develop and pilot a technology selection tool (TST) designed to evaluate and recommend virtual care technologies tailored to specific community clinical needs.
Patients and Methods
Developed through collaborations among clinicians, software developers, technology experts, and health administrators, the TST uses a multiple criteria decision analysis framework to recommend technologies based on clinical relevance and technical quality. Its functionality was tested in a pilot project that assessed 5 technologies for their application in virtual wound care to support a remote community in Saskatchewan, Canada. The pilot study was completed March 7, 2025, through July 28, 2025.
Results
The TST identified the TeleVU Glass View as the optimal technology for virtual wound care. The TST generated product scores for the TeleVU Glass View (71.67), Teladoc Xpress (70.10), 19 Labs GALE (50.67), and TytoCare TytoKit (47.00), whereas disqualifying the Teladoc Lite Cart for not meeting the pass–fail portability criterion. TeleVU’s high product score resulted primarily from its technological attribute quality scores for Telestration (10), Audio (9), Video (9), and Share Content (9), which were all determined as clinically relevant for virtual wound care. The pilot enabled real-time wound care support by connecting local clinicians with virtual teams.
Conclusion
The TST offers a practical and adaptable tool to support evidence-based decision making for selecting technologies for specific clinical applications.
{"title":"A Technology Selection Tool Applying Multiple Criteria Decision Analysis for Virtual Care Implementation","authors":"Joseph P. Deason MBA , Scott J. Adams MD, PhD, FRCPC , Ahmad Rahman MSc , Stacey Lovo PhD , Ivar Mendez MD, PhD, FRCSC","doi":"10.1016/j.mcpdig.2025.100260","DOIUrl":"10.1016/j.mcpdig.2025.100260","url":null,"abstract":"<div><h3>Objective</h3><div>To develop and pilot a technology selection tool (TST) designed to evaluate and recommend virtual care technologies tailored to specific community clinical needs.</div></div><div><h3>Patients and Methods</h3><div>Developed through collaborations among clinicians, software developers, technology experts, and health administrators, the TST uses a multiple criteria decision analysis framework to recommend technologies based on clinical relevance and technical quality. Its functionality was tested in a pilot project that assessed 5 technologies for their application in virtual wound care to support a remote community in Saskatchewan, Canada. The pilot study was completed March 7, 2025, through July 28, 2025.</div></div><div><h3>Results</h3><div>The TST identified the TeleVU Glass View as the optimal technology for virtual wound care. The TST generated product scores for the TeleVU Glass View (71.67), Teladoc Xpress (70.10), 19 Labs GALE (50.67), and TytoCare TytoKit (47.00), whereas disqualifying the Teladoc Lite Cart for not meeting the pass–fail portability criterion. TeleVU’s high product score resulted primarily from its technological attribute quality scores for Telestration (10), Audio (9), Video (9), and Share Content (9), which were all determined as clinically relevant for virtual wound care. The pilot enabled real-time wound care support by connecting local clinicians with virtual teams.</div></div><div><h3>Conclusion</h3><div>The TST offers a practical and adaptable tool to support evidence-based decision making for selecting technologies for specific clinical applications.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100260"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145268892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-14DOI: 10.1016/j.mcpdig.2025.100296
Donald U. Apakama MD, MS , Kim-Anh-Nhi Nguyen MS , Daphnee Hyppolite MPA, RHIA , Shelly Soffer MD , Aya Mudrik BS , Emilia Ling MD, MBA, MS , Akini Moses MD , Ivanka Temnycky MS , Allison Glasser MBA , Rebecca Anderson MPH , Prathamesh Parchure MS , Evajoyce Woullard MS , Masoud Edalati PhD , Lili Chan MD, MS , Clair Kronk PhD , Robert Freeman RN , Arash Kia MD , Prem Timsina MD, PhD , Matthew A. Levin MD , Rohan Khera MD, MS , Girish N. Nadkarni MD, MPH
Objective
To evaluate whether generative pretrained transformer (GPT)-4 can detect and revise biased language in emergency department (ED) notes, against human-adjudicated gold-standard labels, and to identify modifiable factors associated with biased documentation.
Patients and Methods
We randomly sampled 50,000 ED medical and nursing notes from the Mount Sinai Health System (January 1, 2023, to December 31, 2023). We also randomly sampled 500 discharge notes from the Medical Information Mart for Intensive Care IV database. The GPT-4 flagged 4 types of bias: discrediting, stigmatizing/labeling, judgmental, and stereotyping. Two human reviewers verified model detections. We used multivariable logistic regression to examine associations between bias and health care utilization, presenting problems (eg, substance use), shift timing, and provider type. We then asked physicians to rate GPT-4’s proposed language revisions on a 10-point scale.
Results
The GPT-4 showed 97.6% sensitivity and 85.7% specificity compared with the human review. Biased language appeared in 6.5% (3229 of 50,000) of Mount Sinai notes and 7.4% (37 of 500) of Medical Information Mart for Intensive Care IV notes. In adjusted models, frequent health care utilization (adjusted odds ratio [aOR], 2.85; 95% CI, 1.95-4.17), substance use presentations (aOR, 3.09; 95% CI, 2.51-3.80), and overnight shifts (aOR, 1.37; 95% CI, 1.23-1.52) showed elevated odds of biased documentation. Physicians were more likely to include bias than nurses (aOR, 2.26; 95% CI, 2.07-2.46); GPT-4’s recommended revisions received mean physician ratings above 9 of 10.
Conclusion
The study showed that GPT-4 accurately detects biased language in clinical notes, identifies modifiable contributors to that bias, and delivers physician-endorsed revisions. This approach may help mitigate documentation bias and reduce disparities in care.
{"title":"Identifying Bias at Scale in Clinical Notes Using Large Language Models","authors":"Donald U. Apakama MD, MS , Kim-Anh-Nhi Nguyen MS , Daphnee Hyppolite MPA, RHIA , Shelly Soffer MD , Aya Mudrik BS , Emilia Ling MD, MBA, MS , Akini Moses MD , Ivanka Temnycky MS , Allison Glasser MBA , Rebecca Anderson MPH , Prathamesh Parchure MS , Evajoyce Woullard MS , Masoud Edalati PhD , Lili Chan MD, MS , Clair Kronk PhD , Robert Freeman RN , Arash Kia MD , Prem Timsina MD, PhD , Matthew A. Levin MD , Rohan Khera MD, MS , Girish N. Nadkarni MD, MPH","doi":"10.1016/j.mcpdig.2025.100296","DOIUrl":"10.1016/j.mcpdig.2025.100296","url":null,"abstract":"<div><h3>Objective</h3><div>To evaluate whether generative pretrained transformer (GPT)-4 can detect and revise biased language in emergency department (ED) notes, against human-adjudicated gold-standard labels, and to identify modifiable factors associated with biased documentation.</div></div><div><h3>Patients and Methods</h3><div>We randomly sampled 50,000 ED medical and nursing notes from the Mount Sinai Health System (January 1, 2023, to December 31, 2023). We also randomly sampled 500 discharge notes from the Medical Information Mart for Intensive Care IV database. The GPT-4 flagged 4 types of bias: discrediting, stigmatizing/labeling, judgmental, and stereotyping. Two human reviewers verified model detections. We used multivariable logistic regression to examine associations between bias and health care utilization, presenting problems (eg, substance use), shift timing, and provider type. We then asked physicians to rate GPT-4’s proposed language revisions on a 10-point scale.</div></div><div><h3>Results</h3><div>The GPT-4 showed 97.6% sensitivity and 85.7% specificity compared with the human review. Biased language appeared in 6.5% (3229 of 50,000) of Mount Sinai notes and 7.4% (37 of 500) of Medical Information Mart for Intensive Care IV notes. In adjusted models, frequent health care utilization (adjusted odds ratio [aOR], 2.85; 95% CI, 1.95-4.17), substance use presentations (aOR, 3.09; 95% CI, 2.51-3.80), and overnight shifts (aOR, 1.37; 95% CI, 1.23-1.52) showed elevated odds of biased documentation. Physicians were more likely to include bias than nurses (aOR, 2.26; 95% CI, 2.07-2.46); GPT-4’s recommended revisions received mean physician ratings above 9 of 10.</div></div><div><h3>Conclusion</h3><div>The study showed that GPT-4 accurately detects biased language in clinical notes, identifies modifiable contributors to that bias, and delivers physician-endorsed revisions. This approach may help mitigate documentation bias and reduce disparities in care.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100296"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-16DOI: 10.1016/j.mcpdig.2025.100282
Jeffrey R. Fetzer PhD , Saghir A. Al-Fasly PhD , Cadman L. Leggett MD , Nayantara Coelho-Prabhu MD , Shounak Majumder MD , John B. League III MMIS , Shradha Shalini MS , Ghazal Alabtah , Christine V. Dvorak MAOL , Hamid R. Tizhoosh PhD
{"title":"Foundation Models and Their Applications in Gastrointestinal Endoscopy","authors":"Jeffrey R. Fetzer PhD , Saghir A. Al-Fasly PhD , Cadman L. Leggett MD , Nayantara Coelho-Prabhu MD , Shounak Majumder MD , John B. League III MMIS , Shradha Shalini MS , Ghazal Alabtah , Christine V. Dvorak MAOL , Hamid R. Tizhoosh PhD","doi":"10.1016/j.mcpdig.2025.100282","DOIUrl":"10.1016/j.mcpdig.2025.100282","url":null,"abstract":"","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100282"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145786567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-16DOI: 10.1016/j.mcpdig.2025.100283
Ellen L. Larson MD , Erik Jessen PhD , Dong-Gi Mun PhD , Jennifer L. Tomlinson MD , Amro M. Abdelrahman MBBS, MS , Danielle M. Carlson , Hojjat Salehinejad PhD , Rory L. Smoot MD
{"title":"Feature Selection and Machine Learning Strategies Optimize a Minimal Molecular Assay for Cholangiocarcinoma Subtype","authors":"Ellen L. Larson MD , Erik Jessen PhD , Dong-Gi Mun PhD , Jennifer L. Tomlinson MD , Amro M. Abdelrahman MBBS, MS , Danielle M. Carlson , Hojjat Salehinejad PhD , Rory L. Smoot MD","doi":"10.1016/j.mcpdig.2025.100283","DOIUrl":"10.1016/j.mcpdig.2025.100283","url":null,"abstract":"","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 4","pages":"Article 100283"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145789796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}