Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics Pub Date : 2024-06-01 Epub Date: 2024-08-22 DOI:10.1109/ichi61247.2024.00030

Ibna Kowsar, Shourav B Rabbani, Manar D Samad

{"title":"Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.","authors":"Ibna Kowsar, Shourav B Rabbani, Manar D Samad","doi":"10.1109/ichi61247.2024.00030","DOIUrl":null,"url":null,"abstract":"<p><p>The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"177-182"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463999/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi61247.2024.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于注意力的电子健康记录表格数据缺失值估算。

电子健康记录表格数据中缺失值的估算（IMV）对于机器学习进行特定患者预测建模至关重要。虽然生物统计学和最近的机器学习领域都开发了缺失值估算方法，但基于深度学习的解决方案在学习表格数据方面的成功率有限。本文提出了一种新颖的基于注意力的缺失值估算框架，它能利用特征间（自我注意力）或样本间注意力学习重建缺失值数据。我们采用了对比学习中使用的数据处理方法，以提高训练有素的估算模型的泛化能力。所提出的自我注意力估算方法优于最先进的统计和基于机器学习（决策树）的估算方法，在五个表格数据集上将归一化均方根误差降低了 18.4% 到 74.7%，在两个电子健康记录数据集上将归一化均方根误差降低了 52.6% 到 82.6%。当数值完全随机缺失时，所提出的基于注意力的缺失值估算方法在很大的缺失率范围（10% 到 50%）内都表现出了卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

自引率

0.00%

发文量