首页 > 最新文献

International Journal of Medical Informatics最新文献

英文 中文
Reproducing real-world clinical prediction models using the DIVE platform: A comparative validation study across three chronic diseases 使用DIVE平台再现真实世界的临床预测模型:一项跨三种慢性疾病的比较验证研究
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-22 DOI: 10.1016/j.ijmedinf.2026.106303
Francesco Lapi , Ettore Marconi , Marco Gorini , Lorenzo Nuti , Gerardo Medea , Iacopo Cricelli

Objectives

The aim of this analysis is to evaluate the performance and reproducibility of the Python-based Data Insight Validation Engine (DIVE), a modular analytics interface implemented in Python to facilitate real-world evidence (RWE) generation from clinical (e.g. primary care) data. The platform was used to replicate three previously published studies focused on chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), and severe asthma, each originally developed using conventional statistical environments.

Methods

Using a primary care data source, DIVE was employed to replicate three studies on development and validation of prediction scores using machine learning (ML) and traditional inferential analyses. Namely, a ML-based Generalized Additive2 Model (GA2M) predicting CKD, and two Cox-based regression models for COPD exacerbations (CEX-HScore) and severe asthma (AS-HScore). Data referred to over one million patients under the care of approximately 800 general practitioners (GPs) in Italy. Although the initial studies were carried out between 2013 and 2021, the DIVE-based investigations extended from 2013 to 2022, thereby also demonstrating “external” temporal validation. Results obtained via DIVE were compared to the “original” prior findings.

Results

DIVE demonstrated high fidelity in replicating published results. The CKD model achieved largely consistent discrimination (AUC: 89.2% vs. 89.3%) and average precision (22.1% vs. 22.4%) using GA2M. The COPD model showed AUC of 65.5%, pseudo-R2 of 12.7%, and calibration slope of 1.01 (p = 0.317) which were consistent with original CEX-HScore (AUC: 66%; pseudo-R2: 13%; calibration slope: 1.03 (p = 0.345)). For severe asthma, the prediction model exhibited an AUC equals to 71.9%, pseudo-R2 of 17.6%, and calibration slope of 1.09 (p = 0.211), still aligned with the original AS-HScore (AUC: 72.5%; pseudo-R2: 18%; calibration slope: 1.12 (p = 0.182)).

Conclusion

DIVE represents a reliable, scalable, and interoperable solution for RWE analytics, demonstrating equivalence with traditional analytic methods and aligning with best practices in data reproducibility. Continued development toward integrating federated (multi-database) analyses protocols and broader interoperability might expand its utility across several clinical domains.
目的:本分析的目的是评估基于Python的数据洞察验证引擎(DIVE)的性能和可重复性,这是一个用Python实现的模块化分析接口,用于促进临床(例如初级保健)数据的真实世界证据(RWE)生成。该平台用于重复先前发表的三项研究,重点是慢性肾脏疾病(CKD)、慢性阻塞性肺疾病(COPD)和严重哮喘,每项研究最初都是使用传统的统计环境开发的。方法:使用初级保健数据源,采用DIVE重复三项研究,利用机器学习(ML)和传统的推理分析来开发和验证预测分数。即基于ml的广义加法模型(GA2M)预测CKD,以及两个基于cox的COPD恶化(CEX-HScore)和重度哮喘(AS-HScore)回归模型。数据涉及意大利约800名全科医生(gp)护理下的100多万患者。虽然最初的研究是在2013年至2021年之间进行的,但基于dive的调查从2013年延长到2022年,从而也证明了“外部”时间验证。通过DIVE获得的结果与“原始”先前的发现进行比较。结果:DIVE在复制已发表的结果方面表现出高保真度。使用GA2M, CKD模型获得了基本一致的识别率(AUC: 89.2% vs 89.3%)和平均精度(22.1% vs 22.4%)。COPD模型AUC为65.5%,拟合r2为12.7%,校正斜率为1.01 (p = 0.317),与原始CEX-HScore (AUC: 66%,拟合r2: 13%,校正斜率:1.03 (p = 0.345))一致。对于重度哮喘,预测模型AUC为71.9%,拟合r2为17.6%,校正斜率为1.09 (p = 0.211),与原始AS-HScore (AUC: 72.5%,拟合r2: 18%,校正斜率:1.12 (p = 0.182))保持一致。结论:DIVE是一种可靠的、可扩展的、可互操作的RWE分析解决方案,与传统分析方法相当,在数据再现性方面符合最佳实践。继续朝着集成联邦(多数据库)分析协议和更广泛的互操作性的方向发展,可能会扩展其在多个临床领域的实用性。
{"title":"Reproducing real-world clinical prediction models using the DIVE platform: A comparative validation study across three chronic diseases","authors":"Francesco Lapi ,&nbsp;Ettore Marconi ,&nbsp;Marco Gorini ,&nbsp;Lorenzo Nuti ,&nbsp;Gerardo Medea ,&nbsp;Iacopo Cricelli","doi":"10.1016/j.ijmedinf.2026.106303","DOIUrl":"10.1016/j.ijmedinf.2026.106303","url":null,"abstract":"<div><h3>Objectives</h3><div>The aim of this analysis is to evaluate the performance and reproducibility of the Python-based Data Insight Validation Engine (DIVE), a modular analytics interface implemented in Python to facilitate real-world evidence (RWE) generation from clinical (e.g. primary care) data. The platform was used to replicate three previously published studies focused on chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), and severe asthma, each originally developed using conventional statistical environments.</div></div><div><h3>Methods</h3><div>Using a primary care data source, DIVE was employed to replicate three studies on development and validation of prediction scores using machine learning (ML) and traditional inferential analyses. Namely, a ML-based Generalized Additive<sup>2</sup> Model (GA<sup>2</sup>M) predicting CKD, and two Cox-based regression models for COPD exacerbations (CEX-HScore) and severe asthma (AS-HScore). Data referred to over one million patients under the care of approximately 800 general practitioners (GPs) in Italy. Although the initial studies were carried out between 2013 and 2021, the DIVE-based investigations extended from 2013 to 2022, thereby also demonstrating “external” temporal validation. Results obtained via DIVE were compared to the “original” prior findings.</div></div><div><h3>Results</h3><div>DIVE demonstrated high fidelity in replicating published results. The CKD model achieved largely consistent discrimination (AUC: 89.2% vs. 89.3%) and average precision (22.1% vs. 22.4%) using GA<sup>2</sup>M. The COPD model showed AUC of 65.5%, pseudo-R<sup>2</sup> of 12.7%, and calibration slope of 1.01 (p = 0.317) which were consistent with original CEX-HScore (AUC: 66%; pseudo-R<sup>2</sup>: 13%; calibration slope: 1.03 (p = 0.345)). For severe asthma, the prediction model exhibited an AUC equals to 71.9%, pseudo-R<sup>2</sup> of 17.6%, and calibration slope of 1.09 (p = 0.211), still aligned with the original AS-HScore (AUC: 72.5%; pseudo-R<sup>2</sup>: 18%; calibration slope: 1.12 (p = 0.182)).</div></div><div><h3>Conclusion</h3><div>DIVE represents a reliable, scalable, and interoperable solution for RWE analytics, demonstrating equivalence with traditional analytic methods and aligning with best practices in data reproducibility. Continued development toward integrating federated (multi-database) analyses protocols and broader interoperability might expand its utility across several clinical domains.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"210 ","pages":"Article 106303"},"PeriodicalIF":4.1,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146068320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative evaluation of handling missing data points and modalities in electronic health records 处理电子健康记录中缺失数据点和模式的比较评估
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-21 DOI: 10.1016/j.ijmedinf.2026.106302
Aashish Bhandari , Sonika Tyagi
Background: Healthcare data, generally available as electronic health records (EHR), provide rich insights for predictive modelling. A common challenge in using EHR data is the missing information, which may occur completely at random (MCAR), at random (MAR) or not at random (MNAR). A typical measure to deal with missingness is through imputation, which could be statistical or learning-based. However, with imputation, we run the risk of changing the original data distribution. This can lead to serious issues, as even small changes in healthcare data can negatively impact clinical accuracy and decision-making. Alternative approaches are required.
Objective: This study examines machine learning strategies that address missing data directly within models. The goal is to assess how models preserve data structure and performance across different missingness patterns and rates in single vs multi-modal datasets.
Methods: We evaluate multiple machine learning architectures across three datasets. Two experimental setups are designed: one introduces missing data points in time-series records at the feature level, and the other masks complete or partial modalities in a multimodal dataset. Synthetic missingness is applied using established mechanisms and rates, with all experiments repeated across five random seeds. Results are benchmarked against imputation-based baselines to assess differences in data distribution and model performance.
Results: Direct modelling approaches preserved the underlying data structure better than imputation, which introduced distributional shifts. Embedding visualisations showed clearer label-based clustering in non-imputed settings. Models were more sensitive to missing text than missing measurements, underlining the contextual importance of clinical notes.
Conclusion: We provide a comparative analysis of different modelling strategies for handling missingness. We demonstrate that direct modelling approaches maintain clinical patterns more effectively than imputation. This emphasises the importance of integrating missingness handling into the modelling pipeline and selecting models based on missingness type and modality to ensure reliability in clinical applications.
背景:通常以电子健康记录(EHR)形式提供的医疗保健数据为预测建模提供了丰富的见解。使用EHR数据的一个常见挑战是信息缺失,这些信息可能完全随机(MCAR)、随机(MAR)或非随机(MNAR)出现。处理缺失的典型措施是通过归因,这可能是统计或学习为基础的。然而,对于imputation,我们冒着改变原始数据分布的风险。这可能会导致严重的问题,因为即使是医疗保健数据的微小变化也会对临床准确性和决策产生负面影响。需要其他方法。目的:本研究探讨了直接在模型中解决缺失数据的机器学习策略。目标是评估模型如何在单模态和多模态数据集中跨不同缺失模式和比率保持数据结构和性能。方法:我们在三个数据集上评估多个机器学习架构。设计了两个实验设置:一个在特征级别引入时间序列记录中的缺失数据点,另一个在多模态数据集中掩盖完整或部分模态。合成缺失使用既定的机制和速率,所有实验在五个随机种子上重复。根据基于假设的基线对结果进行基准测试,以评估数据分布和模型性能的差异。结果:直接建模方法比引入分布变化的imputation方法更好地保留了底层数据结构。在非输入设置中,嵌入可视化显示了更清晰的基于标签的聚类。模型对缺失的文本比缺失的测量更敏感,强调了临床记录的上下文重要性。结论:我们提供了不同的建模策略处理缺失的比较分析。我们证明直接建模方法比植入更有效地维持临床模式。这强调了将缺失处理整合到建模管道中的重要性,并根据缺失类型和模式选择模型,以确保临床应用的可靠性。
{"title":"A comparative evaluation of handling missing data points and modalities in electronic health records","authors":"Aashish Bhandari ,&nbsp;Sonika Tyagi","doi":"10.1016/j.ijmedinf.2026.106302","DOIUrl":"10.1016/j.ijmedinf.2026.106302","url":null,"abstract":"<div><div>Background: Healthcare data, generally available as electronic health records (EHR), provide rich insights for predictive modelling. A common challenge in using EHR data is the missing information, which may occur completely at random (MCAR), at random (MAR) or not at random (MNAR). A typical measure to deal with missingness is through imputation, which could be statistical or learning-based. However, with imputation, we run the risk of changing the original data distribution. This can lead to serious issues, as even small changes in healthcare data can negatively impact clinical accuracy and decision-making. Alternative approaches are required.</div><div>Objective: This study examines machine learning strategies that address missing data directly within models. The goal is to assess how models preserve data structure and performance across different missingness patterns and rates in single vs multi-modal datasets.</div><div>Methods: We evaluate multiple machine learning architectures across three datasets. Two experimental setups are designed: one introduces missing data points in time-series records at the feature level, and the other masks complete or partial modalities in a multimodal dataset. Synthetic missingness is applied using established mechanisms and rates, with all experiments repeated across five random seeds. Results are benchmarked against imputation-based baselines to assess differences in data distribution and model performance.</div><div>Results: Direct modelling approaches preserved the underlying data structure better than imputation, which introduced distributional shifts. Embedding visualisations showed clearer label-based clustering in non-imputed settings. Models were more sensitive to missing text than missing measurements, underlining the contextual importance of clinical notes.</div><div>Conclusion: We provide a comparative analysis of different modelling strategies for handling missingness. We demonstrate that direct modelling approaches maintain clinical patterns more effectively than imputation. This emphasises the importance of integrating missingness handling into the modelling pipeline and selecting models based on missingness type and modality to ensure reliability in clinical applications.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"210 ","pages":"Article 106302"},"PeriodicalIF":4.1,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The association of environmental exposure with multiple sclerosis severity score: A study based on sequential data modeling 环境暴露与多发性硬化症严重程度评分的关系:一项基于顺序数据建模的研究
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-21 DOI: 10.1016/j.ijmedinf.2026.106295
Mahin Vazifehdan , Pietro Bosoni , Erica Tavazzi , Enrico Longato , Eleonora Tavazzi , Roberto Bergamaschi , Barbara Di Camillo , Ameen Abu-Hanna , Riccardo Bellazzi , Arianna Dagliati
Introduction Multiple Sclerosis (MS) is a chronic neuroinflammatory disease influenced by clinical, demographic, and environmental factors. Predicting MS progression is challenging due to the disease’s heterogeneity. This study aimed to apply Artificial Intelligence (AI) methods to investigate the association between the MS Severity Score (MSSS), which is a measure that integrates disability level and disease duration, and patients’ longitudinal clinical data and environmental exposures. Methods We integrated long-term clinical records from the Mondino MS Center (Pavia, Italy) with environmental data, including air pollution and weather conditions, based on patients’ residential locations. To address missing data, we applied a hybrid imputation strategy combining exponentially weighted moving average and linear mixed-effect models. Automated Machine Learning (AutoML) was used for feature selection. We evaluated multiple Deep Learning (DL) architectures, including recurrent neural network, long short-term memory, and Gated Recurrent Unit (GRU), using varying historical window lengths to predict the MSSS class at the next follow-up. Results The final retrospective dataset comprised 4022 visits from 535 MS patients. AutoML identified both clinical and environmental variables as important features for prediction. Models incorporating environmental data performed comparably to or better than those using only clinical variables. The GRU model achieved the most stable performance, with an average Area Under the Curve of 0.814 when environmental data were included with four prior visits. Moreover, SHAP-based feature importance ranked environmental variables like PM10, PM2.5, nitrogen dioxide, precipitation, and humidity among the top predictors. Conclusion Incorporating environmental exposures into DL models can improve MSSS prediction, highlighting the value of diverse real-world data for MS monitoring. Prediction performance across different historical window lengths was comparable, suggesting that using data from two prior follow-ups (approximately one year of monitoring) may be sufficient to provide clinically meaningful predictions of MS progression.
多发性硬化症(MS)是一种受临床、人口和环境因素影响的慢性神经炎症性疾病。由于疾病的异质性,预测MS的进展是具有挑战性的。本研究旨在应用人工智能(AI)方法研究MS严重程度评分(MSSS)与患者纵向临床数据和环境暴露之间的关系,MSSS是一种综合残疾水平和疾病持续时间的指标。方法我们将Mondino MS中心(意大利帕维亚)的长期临床记录与环境数据(包括空气污染和天气条件)结合起来,基于患者的居住地点。为了解决缺失数据,我们采用了一种结合指数加权移动平均和线性混合效应模型的混合插值策略。使用自动机器学习(AutoML)进行特征选择。我们评估了多种深度学习(DL)架构,包括循环神经网络、长短期记忆和门控循环单元(GRU),使用不同的历史窗口长度来预测下一次随访时的mss类别。最终的回顾性数据集包括来自535名MS患者的4022次访问。AutoML将临床变量和环境变量识别为预测的重要特征。纳入环境数据的模型的表现与仅使用临床变量的模型相当或更好。GRU模型表现最稳定,在包含环境数据的4次访问时,平均曲线下面积为0.814。此外,基于shap的特征重要性将PM10、PM2.5、二氧化氮、降水和湿度等环境变量列为最重要的预测因子。结论将环境暴露纳入DL模型可以提高mss的预测,突出了多种真实世界数据对MS监测的价值。不同历史窗口长度的预测性能具有可比性,这表明使用两次先前随访(大约一年的监测)的数据可能足以提供有临床意义的MS进展预测。
{"title":"The association of environmental exposure with multiple sclerosis severity score: A study based on sequential data modeling","authors":"Mahin Vazifehdan ,&nbsp;Pietro Bosoni ,&nbsp;Erica Tavazzi ,&nbsp;Enrico Longato ,&nbsp;Eleonora Tavazzi ,&nbsp;Roberto Bergamaschi ,&nbsp;Barbara Di Camillo ,&nbsp;Ameen Abu-Hanna ,&nbsp;Riccardo Bellazzi ,&nbsp;Arianna Dagliati","doi":"10.1016/j.ijmedinf.2026.106295","DOIUrl":"10.1016/j.ijmedinf.2026.106295","url":null,"abstract":"<div><div><em>Introduction</em> Multiple Sclerosis (MS) is a chronic neuroinflammatory disease influenced by clinical, demographic, and environmental factors. Predicting MS progression is challenging due to the disease’s heterogeneity. This study aimed to apply Artificial Intelligence (AI) methods to investigate the association between the MS Severity Score (MSSS), which is a measure that integrates disability level and disease duration, and patients’ longitudinal clinical data and environmental exposures. <em>Methods</em> We integrated long-term clinical records from the Mondino MS Center (Pavia, Italy) with environmental data, including air pollution and weather conditions, based on patients’ residential locations. To address missing data, we applied a hybrid imputation strategy combining exponentially weighted moving average and linear mixed-effect models. Automated Machine Learning (AutoML) was used for feature selection. We evaluated multiple Deep Learning (DL) architectures, including recurrent neural network, long short-term memory, and Gated Recurrent Unit (GRU), using varying historical window lengths to predict the MSSS class at the next follow-up. <em>Results</em> The final retrospective dataset comprised 4022 visits from 535 MS patients. AutoML identified both clinical and environmental variables as important features for prediction. Models incorporating environmental data performed comparably to or better than those using only clinical variables. The GRU model achieved the most stable performance, with an average Area Under the Curve of 0.814 when environmental data were included with four prior visits. Moreover, SHAP-based feature importance ranked environmental variables like PM<sub>10</sub>, PM<sub>2.5</sub>, nitrogen dioxide, precipitation, and humidity among the top predictors. <em>Conclusion</em> Incorporating environmental exposures into DL models can improve MSSS prediction, highlighting the value of diverse real-world data for MS monitoring. Prediction performance across different historical window lengths was comparable, suggesting that using data from two prior follow-ups (approximately one year of monitoring) may be sufficient to provide clinically meaningful predictions of MS progression.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"210 ","pages":"Article 106295"},"PeriodicalIF":4.1,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping clinical terms to standard terminology for multi-institutional research platform: Mapping principles and system deployment 多机构研究平台的临床术语到标准术语的映射:映射原则和系统部署。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-20 DOI: 10.1016/j.ijmedinf.2026.106294
Hannah Kang , Youngsun Park , Yukyeong Son , Ho-Young Lee , Soo-Yong Shin

Objective

To develop and implement a multi-institutional research platform by standardizing and integrating clinical terms with international terminologies.

Materials and Methods

This study introduces the Health data Research Suite (HRS) platform, designed to standardize and connect electronic medical record (EMR) data across institutions for efficient multi-institutional research. A hybrid mapping process—combining automated and manual methods—ensures semantic equivalency, consistency, and compliance with international standards like SNOMED CT, LOINC, and RxNorm. Key strategies included domain-specific semantic restrictions, prioritized attributes for post-coordination, and tailored mapping approaches. Cross-validation and expert consultations resolved mapping discrepancies and inactive concept issues, ensuring reliable data alignment.

Results

The study enhanced mapping accuracy in SNOMED CT, LOINC, and RxNorm by utilizing semantic tags and attribute prioritization, with expert consultations addressing any discrepancies. The HRS platform, designed with advanced code search capabilities and user-friendly interfaces, improved cohort generation and facilitated multi-institutional research.

Discussion

Challenges in maintaining inter-institutional consistency and addressing SNOMED CT’s hierarchical limitations were mitigated with a detailed mapping manual, systematic validation, and expert consensus-building. However, a national shortage of trained terminology specialists in Korea underscores the need for educational programs to enhance workforce expertise. Future enhancements include advanced search options and attribute-based retrieval to further improve usability and research support.

Conclusion

This study presents a mapping strategy to align institutional clinical terms with international standards, addressing challenges in semantic consistency and system implementation. The approach enhances multi-institutional research efficiency and fosters innovation in integrated healthcare research, with potential to advance global health outcomes.
目的:通过临床术语与国际术语的标准化和整合,开发和实施一个多机构研究平台。材料和方法:本研究引入健康数据研究套件(HRS)平台,旨在标准化和连接跨机构的电子病历(EMR)数据,以实现高效的多机构研究。混合映射过程(结合了自动和手动方法)确保了语义的等价性、一致性,并符合国际标准,如SNOMED CT、LOINC和RxNorm。关键策略包括特定于领域的语义限制、后期协调的优先属性以及定制的映射方法。交叉验证和专家咨询解决了映射差异和非活动概念问题,确保了可靠的数据对齐。结果:该研究通过使用语义标签和属性优先级来提高SNOMED CT、LOINC和RxNorm的映射精度,并通过专家咨询来解决任何差异。HRS平台具有先进的代码搜索能力和用户友好界面,改进了队列生成并促进了多机构研究。讨论:通过详细的制图手册、系统验证和专家共识的建立,在保持机构间一致性和解决SNOMED CT分层限制方面的挑战得到了缓解。然而,韩国全国缺乏受过训练的专业术语专家,这凸显了提高劳动力专业知识的教育计划的必要性。未来的增强包括高级搜索选项和基于属性的检索,以进一步提高可用性和研究支持。结论:本研究提出了一种映射策略,以使机构临床术语与国际标准保持一致,解决语义一致性和系统实施方面的挑战。这种方法提高了多机构研究的效率,促进了综合医疗保健研究的创新,有可能促进全球健康成果。
{"title":"Mapping clinical terms to standard terminology for multi-institutional research platform: Mapping principles and system deployment","authors":"Hannah Kang ,&nbsp;Youngsun Park ,&nbsp;Yukyeong Son ,&nbsp;Ho-Young Lee ,&nbsp;Soo-Yong Shin","doi":"10.1016/j.ijmedinf.2026.106294","DOIUrl":"10.1016/j.ijmedinf.2026.106294","url":null,"abstract":"<div><h3>Objective</h3><div>To develop and implement a multi-institutional research platform by standardizing and integrating clinical terms with international terminologies.</div></div><div><h3>Materials and Methods</h3><div>This study introduces the Health data Research Suite (HRS) platform, designed to standardize and connect electronic medical record (EMR) data across institutions for efficient multi-institutional research. A hybrid mapping process—combining automated and manual methods—ensures semantic equivalency, consistency, and compliance with international standards like SNOMED CT, LOINC, and RxNorm. Key strategies included domain-specific semantic restrictions, prioritized attributes for post-coordination, and tailored mapping approaches. Cross-validation and expert consultations resolved mapping discrepancies and inactive concept issues, ensuring reliable data alignment.</div></div><div><h3>Results</h3><div>The study enhanced mapping accuracy in SNOMED CT, LOINC, and RxNorm by utilizing semantic tags and attribute prioritization, with expert consultations addressing any discrepancies. The HRS platform, designed with advanced code search capabilities and user-friendly interfaces, improved cohort generation and facilitated multi-institutional research.</div></div><div><h3>Discussion</h3><div>Challenges in maintaining inter-institutional consistency and addressing SNOMED CT’s hierarchical limitations were mitigated with a detailed mapping manual, systematic validation, and expert consensus-building. However, a national shortage of trained terminology specialists in Korea underscores the need for educational programs to enhance workforce expertise. Future enhancements include advanced search options and attribute-based retrieval to further improve usability and research support.</div></div><div><h3>Conclusion</h3><div>This study presents a mapping strategy to align institutional clinical terms with international standards, addressing challenges in semantic consistency and system implementation. The approach enhances multi-institutional research efficiency and fosters innovation in integrated healthcare research, with potential to advance global health outcomes.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106294"},"PeriodicalIF":4.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of an Interpretable Machine learning model for Discriminating between benign and malignant breast cancer 鉴别乳腺癌良恶性的可解释机器学习模型的开发与验证。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-20 DOI: 10.1016/j.ijmedinf.2026.106300
Zhichun Wang , Weixiang Liu , Lin Hua , Xiang Li , Guohui Xue

Objective

Breast cancer prognosis depends on early detection. We developed and externally validated a model using routine, readily available clinical and laboratory variables to discriminate malignant from benign breast lesions, aiming to reduce unnecessary biopsies and support early decision-making.

Methods

This retrospective two-center study included a development cohort 1from Jiujiang First People’s Hospital (N = 745; malignant 573, benign 172) and an external cohort2 from the First Affiliated Hospital of Nanchang University (N = 221; malignant 161, benign 60).Cohort 1 was randomly split into a 70:30 training and test set. Five-fold cross-validation was used to compare multiple algorithms and lock the model and hyperparameters; the locked model was evaluated on a fixed test set and the external cohort. The primary metric was AUC, with sensitivity, specificity, F1, Brier score, calibration curve, decision curve analysis (DCA), and SHAP for explanation.

Results

Logistic regression was selected, using Age, TT, APTT, CEA, and Ca. Cross-validated AUCs were 0.910 (training) and 0.905 (internal validation). The fixed test set yielded AUC 0.865 (sensitivity 0.802; specificity 0.712; F1 0.849; Brier 0.112). External validation achieved AUC 0.861, specificity 0.883, and PPV 0.934. DCA showed net benefit over “treat-all/none” across 20 %–95 % threshold probabilities. SHAP identified Age, TT, CEA, APTT and Ca as the dominant contributors.

Conclusions

A logistic model based on routine laboratory variables effectively distinguishes malignant from benign breast lesions, with robust external performance and clear clinical net benefit, enabling early risk stratification and fewer unnecessary biopsies.This study proposes a tool that quantifies breast tumor malignancy risk using only objective indicators, without subjective factors. Online tool: prediction-for-bc.shinyapps.io/dynnomapp/.
目的:乳腺癌的预后取决于早期发现。我们开发并外部验证了一个模型,该模型使用常规的、现成的临床和实验室变量来区分乳腺良性和恶性病变,旨在减少不必要的活检并支持早期决策。方法:本回顾性双中心研究包括来自九江第一人民医院的发展队列1 (N = 745,恶性573,良性172)和来自南昌大学第一附属医院的外部队列2 (N = 221,恶性161,良性60)。队列1随机分为70:30的训练集和测试集。采用五重交叉验证对多个算法进行比较,锁定模型和超参数;锁定模型在固定的测试集和外部队列上进行评估。主要指标为AUC,敏感性、特异性、F1、Brier评分、校准曲线、决策曲线分析(DCA)和SHAP进行解释。结果:选择Logistic回归,使用年龄、TT、APTT、CEA和Ca。交叉验证的auc分别为0.910(训练)和0.905(内部验证)。固定组的AUC为0.865(敏感性0.802,特异性0.712,F1为0.849,Brier为0.112)。外部验证的AUC为0.861,特异性为0.883,PPV为0.934。在20% - 95%的阈值概率范围内,DCA比“全部治疗/不治疗”显示出净效益。SHAP发现年龄、TT、CEA、APTT和Ca是主要的影响因子。结论:基于常规实验室变量的logistic模型能够有效区分乳腺恶性病变和良性病变,具有稳健的外部表现和明确的临床净收益,能够实现早期风险分层,减少不必要的活检。本研究提出了一种仅使用客观指标而不使用主观因素来量化乳腺肿瘤恶性风险的工具。在线工具:predictionforbc .shinyapps.io/dynnomapp/。
{"title":"Development and validation of an Interpretable Machine learning model for Discriminating between benign and malignant breast cancer","authors":"Zhichun Wang ,&nbsp;Weixiang Liu ,&nbsp;Lin Hua ,&nbsp;Xiang Li ,&nbsp;Guohui Xue","doi":"10.1016/j.ijmedinf.2026.106300","DOIUrl":"10.1016/j.ijmedinf.2026.106300","url":null,"abstract":"<div><h3>Objective</h3><div>Breast cancer prognosis depends on early detection. We developed and externally validated a model using routine, readily available clinical and laboratory variables to discriminate malignant from benign breast lesions, aiming to reduce unnecessary biopsies and support early decision-making.</div></div><div><h3>Methods</h3><div>This retrospective two-center study included a development cohort 1from Jiujiang First People’s Hospital (N = 745; malignant 573, benign 172) and an external cohort2 from the First Affiliated Hospital of Nanchang University (N = 221; malignant 161, benign 60).Cohort 1 was randomly split into a 70:30 training and test set. Five-fold cross-validation was used to compare multiple algorithms and lock the model and hyperparameters; the locked model was evaluated on a fixed test set and the external cohort. The primary metric was AUC, with sensitivity, specificity, F1, Brier score, calibration curve, decision curve analysis (DCA), and SHAP for explanation.</div></div><div><h3>Results</h3><div>Logistic regression was selected, using Age, TT, APTT, CEA, and Ca. Cross-validated AUCs were 0.910 (training) and 0.905 (internal validation). The fixed test set yielded AUC 0.865 (sensitivity 0.802; specificity 0.712; F1 0.849; Brier 0.112). External validation achieved AUC 0.861, specificity 0.883, and PPV 0.934. DCA showed net benefit over “treat-all/none” across 20 %–95 % threshold probabilities. SHAP identified Age, TT, CEA, APTT and Ca as the dominant contributors.</div></div><div><h3>Conclusions</h3><div>A logistic model based on routine laboratory variables effectively distinguishes malignant from benign breast lesions, with robust external performance and clear clinical net benefit, enabling early risk stratification and fewer unnecessary biopsies.This study proposes a tool that quantifies breast tumor malignancy risk using only objective indicators, without subjective factors. Online tool: <span><span>prediction-for-bc.shinyapps.io/dynnomapp/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"210 ","pages":"Article 106300"},"PeriodicalIF":4.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146067914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Case-based reasoning for clinical trial recruitment tools in oncology: When you need patients to find patients. 肿瘤临床试验招募工具的病例推理:当你需要患者寻找患者时。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-19 DOI: 10.1016/j.ijmedinf.2026.106301
Lou-Anne Guillotel, Thierry Lesimple, Oussama Zekri, Marc Cuggia, Boris Campillo-Gimenez

Background: Patient recruitment for clinical trials remains a major challenge, with 86% of trials failing to meet enrollment targets on time. In over 77% of cases, recruitment difficulties stem from matching problems between trials and patients. Case-Based Reasoning (CBR) offers a distinct patient-to-patient approach by determining eligibility through comparison with previously enrolled patients, yet this methodology remains underexplored in contemporary oncology trial matching despite its potential advantages.

Objective: To compare the performance of two CBR approaches-random forest (RF) and target patient similarity (TPS)-in predicting patient eligibility for recent oncology clinical trials using real-world electronic health record data.

Methods: We selected three breast cancer clinical trials (2019-2022) from our institutional registry. Patient data were extracted from our clinical data warehouse, including structured data (laboratory results, diagnosis codes, procedures, treatments) and unstructured clinical narratives processed using natural language processing. For each trial, we trained RF classifiers and TPS models using repeated hold-out validation (25 splits, 70/30 train-test). Performance was evaluated using discriminative metrics (AUC, positive precision, recall, F1-score) and ranking metrics (P@5, P@10, MAP, MRR, NDCG@5, NDCG@10). We analyzed model performance across varying numbers of eligible patients in training datasets (2 to 70% of the total number of eligible patients).

Results: Both approaches demonstrated strong discriminative performance across three trials, with average AUCs of 84.1 % for RF and 76.4 % for TPS, driven primarily by high recall (82.3 % and 77.7 %, respectively). However, positive precision remained low (13.3 % and 9.9 %), reflecting high false-positive rates due to class imbalance. RF showed superior ranking performance, particularly for the trial with the largest eligible cohort (n = 542; P@5 = 78.6 %, MRR = 88.0 %), compared to TPS (P@5 = 47.9 %, MRR = 69.2 %). Both approaches reached performance plateaus with only around 10 eligible patients in training datasets. Variable importance analysis revealed that treatment-related features, diagnostic codes, and procedures were consistently the most important predictors, with relevant patterns identified even with minimal training data.

Conclusions: CBR approaches can effectively support patient pre-screening for oncology clinical trials, with RF demonstrating moderately superior performance over TPS. Both methods show robust discriminative performance with small training datasets, though ranking performance varies substantially across trials. Our findings suggest that CBR approaches may benefit from integration with query-based or prompt-based methods during early recruitment phases when training data is scarce.

背景:临床试验的患者招募仍然是一个重大挑战,86%的试验未能按时达到入组目标。在超过77%的病例中,招募困难源于试验和患者之间的匹配问题。基于病例的推理(CBR)提供了一种独特的患者对患者的方法,通过与先前入组的患者进行比较来确定资格,然而,尽管这种方法具有潜在的优势,但在当代肿瘤试验匹配中仍未得到充分的探索。目的:比较随机森林(RF)和目标患者相似性(TPS)两种CBR方法在使用真实世界电子健康记录数据预测近期肿瘤临床试验患者资格方面的性能。方法:我们从我们的机构注册表中选择了三项乳腺癌临床试验(2019-2022)。从我们的临床数据仓库中提取患者数据,包括结构化数据(实验室结果、诊断代码、程序、治疗)和使用自然语言处理的非结构化临床叙述。对于每个试验,我们使用重复的保留验证(25次分割,70/30训练测试)训练RF分类器和TPS模型。使用判别指标(AUC、正准度、召回率、f1得分)和排名指标(P@5、P@10、MAP、MRR、NDCG@5、NDCG@10)对性能进行评估。我们分析了训练数据集中不同数量的合格患者(占合格患者总数的2 - 70%)的模型性能。结果:两种方法在三个试验中都表现出很强的判别性能,RF的平均auc为84.1%,TPS的平均auc为76.4%,主要是由于高召回率(分别为82.3%和77.7%)。然而,阳性准确率仍然很低(13.3%和9.9%),反映了由于类别不平衡导致的高假阳性率。与TPS (P@5 = 47.9%, MRR = 69.2%)相比,RF表现出更优越的排名表现,特别是对于最大符合条件的队列(n = 542; P@5 = 78.6%, MRR = 88.0%)的试验。这两种方法在训练数据集中只有大约10名符合条件的患者时达到了性能平台。变量重要性分析显示,与治疗相关的特征、诊断代码和程序始终是最重要的预测因素,即使使用最少的训练数据也能识别出相关模式。结论:CBR方法可以有效地支持肿瘤临床试验的患者预筛查,RF的表现略优于TPS。这两种方法在小型训练数据集上都显示出稳健的判别性能,尽管在不同的试验中排名性能差异很大。我们的研究结果表明,在培训数据稀缺的早期招聘阶段,CBR方法可能受益于与基于查询或基于提示的方法的集成。
{"title":"Case-based reasoning for clinical trial recruitment tools in oncology: When you need patients to find patients.","authors":"Lou-Anne Guillotel, Thierry Lesimple, Oussama Zekri, Marc Cuggia, Boris Campillo-Gimenez","doi":"10.1016/j.ijmedinf.2026.106301","DOIUrl":"https://doi.org/10.1016/j.ijmedinf.2026.106301","url":null,"abstract":"<p><strong>Background: </strong>Patient recruitment for clinical trials remains a major challenge, with 86% of trials failing to meet enrollment targets on time. In over 77% of cases, recruitment difficulties stem from matching problems between trials and patients. Case-Based Reasoning (CBR) offers a distinct patient-to-patient approach by determining eligibility through comparison with previously enrolled patients, yet this methodology remains underexplored in contemporary oncology trial matching despite its potential advantages.</p><p><strong>Objective: </strong>To compare the performance of two CBR approaches-random forest (RF) and target patient similarity (TPS)-in predicting patient eligibility for recent oncology clinical trials using real-world electronic health record data.</p><p><strong>Methods: </strong>We selected three breast cancer clinical trials (2019-2022) from our institutional registry. Patient data were extracted from our clinical data warehouse, including structured data (laboratory results, diagnosis codes, procedures, treatments) and unstructured clinical narratives processed using natural language processing. For each trial, we trained RF classifiers and TPS models using repeated hold-out validation (25 splits, 70/30 train-test). Performance was evaluated using discriminative metrics (AUC, positive precision, recall, F1-score) and ranking metrics (P@5, P@10, MAP, MRR, NDCG@5, NDCG@10). We analyzed model performance across varying numbers of eligible patients in training datasets (2 to 70% of the total number of eligible patients).</p><p><strong>Results: </strong>Both approaches demonstrated strong discriminative performance across three trials, with average AUCs of 84.1 % for RF and 76.4 % for TPS, driven primarily by high recall (82.3 % and 77.7 %, respectively). However, positive precision remained low (13.3 % and 9.9 %), reflecting high false-positive rates due to class imbalance. RF showed superior ranking performance, particularly for the trial with the largest eligible cohort (n = 542; P@5 = 78.6 %, MRR = 88.0 %), compared to TPS (P@5 = 47.9 %, MRR = 69.2 %). Both approaches reached performance plateaus with only around 10 eligible patients in training datasets. Variable importance analysis revealed that treatment-related features, diagnostic codes, and procedures were consistently the most important predictors, with relevant patterns identified even with minimal training data.</p><p><strong>Conclusions: </strong>CBR approaches can effectively support patient pre-screening for oncology clinical trials, with RF demonstrating moderately superior performance over TPS. Both methods show robust discriminative performance with small training datasets, though ranking performance varies substantially across trials. Our findings suggest that CBR approaches may benefit from integration with query-based or prompt-based methods during early recruitment phases when training data is scarce.</p>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"211 ","pages":"106301"},"PeriodicalIF":4.1,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146137631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable machine learning model for predicting in-hospital mortality in elderly acute pancreatitis: Development and validation in a multicenter cohort 用于预测老年急性胰腺炎住院死亡率的可解释机器学习模型:多中心队列的开发和验证
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-18 DOI: 10.1016/j.ijmedinf.2026.106299
Hao He , Li Luo , Lei Bai , Lei Luo , Kunming Tian , Xiaoyun Fu , Bao Fu

Background

Elderly acute pancreatitis (AP) patients face significantly higher in-hospital all-cause mortality, highlighting the need for effective risk stratification to support timely clinical decision-making.

Methods

We conducted a multicenter retrospective study that enrolled 2,728 elderly AP patients, with which we developed and validated a robust machine learning (ML) model for predicting in-hospital all-cause mortality. We first selected predictors of mortality using LASSO regression and random forest–based Boruta algorithms. Then, seven ML models incorporating the selected predictors were trained and evaluated using the area under the receiver operating characteristic curve (AUC).

Results

XGBoost demonstrated the highest predictive performance, achieving an AUC of 0.884 (95% CI: 0.823–0.945) in the external validation test, outperforming the conventional Ranson score in predicting in-hospital mortality. Shapley additive explanations ranked vasoactive drug, hospital length of stay, leukocyte count, noninvasive ventilation, and invasive mechanical ventilation as five key predictors. An interactive web-based tool based on the optimal XGBoost model has been available at https://appredction.shinyapps.io/acutepancreatitis_xgb/ to generate real-time risk predictions.

Conclusions

This study proposed a validated and interpretable ML model to support in-hospital risk stratification for elderly patients with AP, thereby facilitating clinical decision-making and optimizing intensive care unit resource allocation.
背景:老年急性胰腺炎(AP)患者面临着明显更高的院内全因死亡率,强调了有效的风险分层以支持及时的临床决策的必要性。方法:我们进行了一项多中心回顾性研究,纳入了2728例老年AP患者,我们开发并验证了一个强大的机器学习(ML)模型,用于预测院内全因死亡率。我们首先使用LASSO回归和基于随机森林的Boruta算法选择死亡率预测因子。然后,使用受试者工作特征曲线(AUC)下的面积对包含所选预测因子的七个ML模型进行训练和评估。结果:XGBoost表现出最高的预测性能,在外部验证检验中达到0.884 (95% CI: 0.823-0.945)的AUC,在预测院内死亡率方面优于传统的Ranson评分。Shapley加性解释将血管活性药物、住院时间、白细胞计数、无创通气和有创机械通气列为五个关键预测因素。基于最佳XGBoost模型的交互式网络工具可在https://appredction.shinyapps.io/acutepancreatitis_xgb/上获得,以生成实时风险预测。结论:本研究提出了一个经过验证且可解释的ML模型,支持老年AP患者的院内风险分层,从而促进临床决策,优化重症监护病房资源配置。
{"title":"Interpretable machine learning model for predicting in-hospital mortality in elderly acute pancreatitis: Development and validation in a multicenter cohort","authors":"Hao He ,&nbsp;Li Luo ,&nbsp;Lei Bai ,&nbsp;Lei Luo ,&nbsp;Kunming Tian ,&nbsp;Xiaoyun Fu ,&nbsp;Bao Fu","doi":"10.1016/j.ijmedinf.2026.106299","DOIUrl":"10.1016/j.ijmedinf.2026.106299","url":null,"abstract":"<div><h3>Background</h3><div>Elderly acute pancreatitis (AP) patients face significantly higher in-hospital all-cause mortality, highlighting the need for effective risk stratification to support timely clinical decision-making.</div></div><div><h3>Methods</h3><div>We conducted a multicenter retrospective study that enrolled 2,728 elderly AP patients, with which we developed and validated a robust machine learning (ML) model for predicting in-hospital all-cause mortality. We first selected predictors of mortality using LASSO regression and random forest–based Boruta algorithms. Then, seven ML models incorporating the selected predictors were trained and evaluated using the area under the receiver operating characteristic curve (AUC).</div></div><div><h3>Results</h3><div>XGBoost demonstrated the highest predictive performance, achieving an AUC of 0.884 (95% CI: 0.823–0.945) in the external validation test, outperforming the conventional Ranson score in predicting in-hospital mortality. Shapley additive explanations ranked vasoactive drug, hospital length of stay, leukocyte count, noninvasive ventilation, and invasive mechanical ventilation as five key predictors. An interactive web-based tool based on the optimal XGBoost model has been available at <span><span>https://appredction.shinyapps.io/acutepancreatitis_xgb/</span><svg><path></path></svg></span> to generate real-time risk predictions.</div></div><div><h3>Conclusions</h3><div>This study proposed a validated and interpretable ML model to support in-hospital risk stratification for elderly patients with AP, thereby facilitating clinical decision-making and optimizing intensive care unit resource allocation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106299"},"PeriodicalIF":4.1,"publicationDate":"2026-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A structured decision-support framework for selecting imputation methods in clinical structured datasets: A secondary analysis 在临床结构化数据集中选择输入方法的结构化决策支持框架:二次分析
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-18 DOI: 10.1016/j.ijmedinf.2026.106298
Marziyeh Afkanpour , Mehri Momeni , Hamed Tabesh

Objective

Missing values are a common challenge in healthcare data analysis, and inadequate handling can introduce bias and undermine the validity of findings. Imputation methods offer a practical solution, but selecting an appropriate approach depends on multiple dataset-specific factors. This study proposes a structured decision-support framework that defines key prerequisites for choosing suitable imputation methods during the preprocessing of clinically structured datasets.

Methods

A secondary analysis of a previous systematic review was conducted, covering 69 studies to identify factors influencing imputation method selection. Domain experts evaluated assumptions regarding missing data characteristics and dataset structure, reaching consensus on the most relevant factors. These factors were synthesized into a structured framework designed to guide systematic and transparent imputation method selection in clinical data preprocessing workflows.

Results

Nine key factors were identified as essential for determining an appropriate imputation method. These include missing data characteristics, mechanism, pattern, and ratio and dataset attributes such as data type, variable role, distribution, and correlation. The ratio of missingness was the most influential factor, followed by variable role and missing value mechanism. Most studies emphasized the combined importance of both missing data properties and dataset features in imputation selection.

Conclusions

Understanding the characteristics of missing values and dataset structure is crucial for selecting appropriate imputation methods. The proposed structured decision-support framework provides an evidence-based checklist to enhance transparency, reproducibility, and reliability in preprocessing clinical datasets within medical informatics workflows.
在医疗保健数据分析中,价值缺失是一个常见的挑战,处理不当可能会引入偏见并破坏结果的有效性。插值方法提供了一个实用的解决方案,但是选择一个合适的方法取决于多个数据集特定的因素。本研究提出了一个结构化决策支持框架,该框架定义了在临床结构化数据集预处理过程中选择合适的植入方法的关键先决条件。方法对已有的69项研究的系统综述进行二次分析,以确定影响imputation方法选择的因素。领域专家评估关于缺失数据特征和数据集结构的假设,在最相关的因素上达成共识。这些因素综合成一个结构化的框架,旨在指导临床数据预处理工作流程中系统透明的输入方法选择。结果确定了9个关键因素,确定了合适的归算方法。其中包括缺失的数据特征、机制、模式、比率和数据集属性,如数据类型、变量角色、分布和相关性。缺失率的影响最大,其次是变量作用和缺失价值机制。大多数研究都强调缺失数据属性和数据集特征在imputation选择中的综合重要性。结论了解缺失值的特征和数据集结构对选择合适的插值方法至关重要。提出的结构化决策支持框架提供了一个基于证据的清单,以提高医疗信息学工作流程中预处理临床数据集的透明度、可重复性和可靠性。
{"title":"A structured decision-support framework for selecting imputation methods in clinical structured datasets: A secondary analysis","authors":"Marziyeh Afkanpour ,&nbsp;Mehri Momeni ,&nbsp;Hamed Tabesh","doi":"10.1016/j.ijmedinf.2026.106298","DOIUrl":"10.1016/j.ijmedinf.2026.106298","url":null,"abstract":"<div><h3>Objective</h3><div>Missing values are a common challenge in healthcare data analysis, and inadequate handling can introduce bias and undermine the validity of findings. Imputation methods offer a practical solution, but selecting an appropriate approach depends on multiple dataset-specific factors. This study proposes a structured decision-support framework that defines key prerequisites for choosing suitable imputation methods during the preprocessing of clinically structured datasets.</div></div><div><h3>Methods</h3><div>A secondary analysis of a previous systematic review was conducted, covering 69 studies to identify factors influencing imputation method selection. Domain experts evaluated assumptions regarding missing data characteristics and dataset structure, reaching consensus on the most relevant factors. These factors were synthesized into a structured framework designed to guide systematic and transparent imputation method selection in clinical data preprocessing workflows.</div></div><div><h3>Results</h3><div>Nine key factors were identified as essential for determining an appropriate imputation method. These include missing data characteristics, mechanism, pattern, and ratio and dataset attributes such as data type, variable role, distribution, and correlation. The ratio of missingness was the most influential factor, followed by variable role and missing value mechanism. Most studies emphasized the combined importance of both missing data properties and dataset features in imputation selection.</div></div><div><h3>Conclusions</h3><div>Understanding the characteristics of missing values and dataset structure is crucial for selecting appropriate imputation methods. The proposed structured decision-support framework provides an evidence-based checklist to enhance transparency, reproducibility, and reliability in preprocessing clinical datasets within medical informatics workflows.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"210 ","pages":"Article 106298"},"PeriodicalIF":4.1,"publicationDate":"2026-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinician preferences for explainable AI in critical care: a comparative study of interpretable models and visualizations for intubation decision support 临床医生在重症监护中对可解释人工智能的偏好:可解释模型和插管决策支持可视化的比较研究
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-18 DOI: 10.1016/j.ijmedinf.2026.106287
Tiantian Xian , Nikolay Mehandjiev , Panos Constantinides , Yu-wang Chen , Qudamah Quboa , Gareth Kitchen

Background:

The complexity of many AI models hinders their clinical adoption because the clinicians using them do not regard them as transparent. This study addresses the lack of clinician-centered explainable AI (XAI) interfaces by designing and evaluating intuitive visual explanations for intubation prediction, testing the hypothesis that workflow-compatible designs enhance acceptance.

Objective:

This study compares three, time-aware, visual explanations for XAI-based intubation prediction and evaluate their acceptance, comprehension, and perceived utility among clinicians.

Methods:

We developed machine learning models to estimate the near-term risk of deterioration in the patient’s condition which may lead to mechanical intubation using ICU time-series data. We generated global and local explanations using SHAP and designed three customized visual formats—a temporal force plot, a temporal bar chart, and a dual-encoded SHAP heatmap. Clinicians (n = 206) evaluated comprehension and usability using objective questions and a Likert-based survey.

Results:

Based on 4608 critically ill patients with 10 medical variables over 7 hours of data for each patient, the Random Forest (RF) model achieved the highest area under the curve (AUC): 0.94. Furthermore, the local explanations were customized and evaluated by 206 clinicians through a survey conducted on the Prolific platform. A customized heatmap representation was selected as the visualization with the highest perceived clinical utility and alignment with clinical workflows.

Discussion:

The reported findings support the need for explanation formats to be tailored to clinical reasoning and task context, supporting the concept of cognitive fit. The heatmap’s close alignment with clinicians’ mental models and its graphical integrity enhances interpretability and trust. This study demonstrates that explanation effectiveness depends on contextual relevance, rather than a universal standard, and that the presentation format itself significantly shapes clinicians’ trust in XAI systems.

Conclusion:

This study advances clinical XAI by introducing a time-aware explanation framework for ICU intubation decisions. By integrating temporal trends with model reasoning, our visualizations closely align with clinicians’ cognitive workflows. Rigorous clinician-centered evaluation identified the dual-encoded SHAP heatmap as the most useful and workflow-compatible visualization, highlighting the importance of explanation design alongside predictive accuracy for clinical adoption.
背景:许多人工智能模型的复杂性阻碍了它们的临床应用,因为使用它们的临床医生并不认为它们是透明的。本研究通过设计和评估插管预测的直观视觉解释,解决了缺乏以临床为中心的可解释AI (XAI)界面的问题,验证了工作流兼容设计提高接受度的假设。目的:本研究比较了基于xai的插管预测的三种具有时间意识的视觉解释,并评估了它们在临床医生中的接受程度、理解程度和感知效用。方法:我们开发了机器学习模型来估计患者病情恶化的近期风险,这可能导致使用ICU时间序列数据进行机械插管。我们使用SHAP生成了全局和局部解释,并设计了三种定制的视觉格式——时间力图、时间条形图和双编码SHAP热图。临床医生(n = 206)使用客观问题和李克特调查评估理解和可用性。结果:基于4608例危重患者,10个医学变量,每个患者7小时的数据,随机森林(Random Forest, RF)模型的曲线下面积(AUC)最高,为0.94。此外,通过在多产平台上进行的调查,206名临床医生对当地的解释进行了定制和评估。选择自定义热图表示作为具有最高临床效用和与临床工作流程一致的可视化。讨论:报告的研究结果支持需要根据临床推理和任务背景量身定制解释格式,支持认知契合的概念。热图与临床医生的心理模型密切一致,其图形完整性增强了可解释性和信任度。本研究表明,解释的有效性取决于上下文相关性,而不是通用标准,并且演示格式本身显著地影响了临床医生对XAI系统的信任。结论:本研究通过引入ICU插管决策的时间意识解释框架来推进临床XAI。通过将时间趋势与模型推理相结合,我们的可视化与临床医生的认知工作流程紧密结合。严格的以临床医生为中心的评估确定了双编码的SHAP热图是最有用的和工作流程兼容的可视化,强调了解释设计和临床采用预测准确性的重要性。
{"title":"Clinician preferences for explainable AI in critical care: a comparative study of interpretable models and visualizations for intubation decision support","authors":"Tiantian Xian ,&nbsp;Nikolay Mehandjiev ,&nbsp;Panos Constantinides ,&nbsp;Yu-wang Chen ,&nbsp;Qudamah Quboa ,&nbsp;Gareth Kitchen","doi":"10.1016/j.ijmedinf.2026.106287","DOIUrl":"10.1016/j.ijmedinf.2026.106287","url":null,"abstract":"<div><h3>Background:</h3><div>The complexity of many AI models hinders their clinical adoption because the clinicians using them do not regard them as transparent. This study addresses the lack of clinician-centered explainable AI (XAI) interfaces by designing and evaluating intuitive visual explanations for intubation prediction, testing the hypothesis that workflow-compatible designs enhance acceptance.</div></div><div><h3>Objective:</h3><div>This study compares three, time-aware, visual explanations for XAI-based intubation prediction and evaluate their acceptance, comprehension, and perceived utility among clinicians.</div></div><div><h3>Methods:</h3><div>We developed machine learning models to estimate the near-term risk of deterioration in the patient’s condition which may lead to mechanical intubation using ICU time-series data. We generated global and local explanations using SHAP and designed three customized visual formats—a temporal force plot, a temporal bar chart, and a dual-encoded SHAP heatmap. Clinicians (<em>n</em> = 206) evaluated comprehension and usability using objective questions and a Likert-based survey.</div></div><div><h3>Results:</h3><div>Based on 4608 critically ill patients with 10 medical variables over 7 hours of data for each patient, the Random Forest (RF) model achieved the highest area under the curve (AUC): 0.94. Furthermore, the local explanations were customized and evaluated by 206 clinicians through a survey conducted on the Prolific platform. A customized heatmap representation was selected as the visualization with the highest perceived clinical utility and alignment with clinical workflows.</div></div><div><h3>Discussion:</h3><div>The reported findings support the need for explanation formats to be tailored to clinical reasoning and task context, supporting the concept of cognitive fit. The heatmap’s close alignment with clinicians’ mental models and its graphical integrity enhances interpretability and trust. This study demonstrates that explanation effectiveness depends on contextual relevance, rather than a universal standard, and that the presentation format itself significantly shapes clinicians’ trust in XAI systems.</div></div><div><h3>Conclusion:</h3><div>This study advances clinical XAI by introducing a time-aware explanation framework for ICU intubation decisions. By integrating temporal trends with model reasoning, our visualizations closely align with clinicians’ cognitive workflows. Rigorous clinician-centered evaluation identified the dual-encoded SHAP heatmap as the most useful and workflow-compatible visualization, highlighting the importance of explanation design alongside predictive accuracy for clinical adoption.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"210 ","pages":"Article 106287"},"PeriodicalIF":4.1,"publicationDate":"2026-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rule-augmented constraint learning for semantic error detection in MIMIC-III knowledge graph 基于规则增强约束学习的MIMIC-III知识图语义错误检测
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-17 DOI: 10.1016/j.ijmedinf.2026.106297
Özge Noben, Ömer Durukan Kılıç, Tjitze Rienstra, Michel Dumontier, Remzi Celebi
High-quality, error-free data is essential for developing reliable data-driven models, particularly in clinical decision support systems where inaccurate predictions can have serious consequences. While KGs offer a structured and semantically rich representation for clinical data, ensuring their consistency and correctness remains a challenge. Existing rule mining techniques provide solutions for the automatic extraction of logical constraints from KGs, but they often produce redundant or clinically irrelevant rules, especially when dealing with numeric or categorical literals such as age or lab values. KG constraints—rules intended to capture implausible or conflicting facts in the KG—can be used to spot semantic errors: facts that might conform to the underlying schema but contradict domain knowledge. In this work, we propose a novel framework for constraint learning in clinical KGs that identifies and transforms high-confidence rules into clinically plausible constraints. We propose two approaches, based on class disjointness and literal clustering combined with rule mining. We validate the clinical relevance of these generated rules using expert-curated constraints and large language models (LLMs). The results on the MIMIC-III clinical dataset show that rule filtering based constraint learning effectively preserves clinically meaningful rules that align with established medical knowledge. For numeric data, we achieve reliable value groupings through our clustering-based method, and the rules derived from these groupings were validated by LLMs. Their outputs confirm the clinical relevance of a portion of those discovered rules. By providing interpretable and scalable solutions to semantic inconsistencies in KGs, this study contributes to increasing the KG trustworthiness and its clinical usability.
高质量、无差错的数据对于开发可靠的数据驱动模型至关重要,特别是在临床决策支持系统中,不准确的预测可能会产生严重后果。虽然KGs为临床数据提供了结构化和语义丰富的表示,但确保它们的一致性和正确性仍然是一个挑战。现有的规则挖掘技术为从KGs中自动提取逻辑约束提供了解决方案,但它们经常产生冗余或临床无关的规则,特别是在处理数字或分类文字(如年龄或实验室值)时。KG约束—旨在捕获KG中不可信或冲突事实的规则—可用于发现语义错误:可能符合底层模式但与领域知识相矛盾的事实。在这项工作中,我们提出了一个新的框架,用于临床KGs的约束学习,该框架识别并将高置信度规则转化为临床合理的约束。我们提出了两种方法,基于类脱节和文字聚类结合规则挖掘。我们使用专家策划的约束和大型语言模型(llm)验证这些生成规则的临床相关性。MIMIC-III临床数据集的结果表明,基于规则过滤的约束学习有效地保留了与已建立的医学知识相一致的临床有意义的规则。对于数值数据,我们通过基于聚类的方法实现了可靠的值分组,并通过llm验证了从这些分组中导出的规则。他们的结果证实了这些发现的规则的一部分的临床相关性。通过提供可解释和可扩展的解决方案来解决KG的语义不一致,本研究有助于提高KG的可信度和临床可用性。
{"title":"Rule-augmented constraint learning for semantic error detection in MIMIC-III knowledge graph","authors":"Özge Noben,&nbsp;Ömer Durukan Kılıç,&nbsp;Tjitze Rienstra,&nbsp;Michel Dumontier,&nbsp;Remzi Celebi","doi":"10.1016/j.ijmedinf.2026.106297","DOIUrl":"10.1016/j.ijmedinf.2026.106297","url":null,"abstract":"<div><div>High-quality, error-free data is essential for developing reliable data-driven models, particularly in clinical decision support systems where inaccurate predictions can have serious consequences. While KGs offer a structured and semantically rich representation for clinical data, ensuring their consistency and correctness remains a challenge. Existing rule mining techniques provide solutions for the automatic extraction of logical constraints from KGs, but they often produce redundant or clinically irrelevant rules, especially when dealing with numeric or categorical literals such as age or lab values. KG constraints—rules intended to capture implausible or conflicting facts in the KG—can be used to spot semantic errors: facts that might conform to the underlying schema but contradict domain knowledge. In this work, we propose a novel framework for constraint learning in clinical KGs that identifies and transforms high-confidence rules into clinically plausible constraints. We propose two approaches, based on class disjointness and literal clustering combined with rule mining. We validate the clinical relevance of these generated rules using expert-curated constraints and large language models (LLMs). The results on the MIMIC-III clinical dataset show that rule filtering based constraint learning effectively preserves clinically meaningful rules that align with established medical knowledge. For numeric data, we achieve reliable value groupings through our clustering-based method, and the rules derived from these groupings were validated by LLMs. Their outputs confirm the clinical relevance of a portion of those discovered rules. By providing interpretable and scalable solutions to semantic inconsistencies in KGs, this study contributes to increasing the KG trustworthiness and its clinical usability.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"210 ","pages":"Article 106297"},"PeriodicalIF":4.1,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1