Electronic medical records imputation by temporal Generative Adversarial Network.

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Biodata Mining Pub Date : 2024-06-26 DOI:10.1186/s13040-024-00372-2

Yunfei Yin, Zheng Yuan, Islam Md Tanvir, Xianjian Bao

{"title":"Electronic medical records imputation by temporal Generative Adversarial Network.","authors":"Yunfei Yin, Zheng Yuan, Islam Md Tanvir, Xianjian Bao","doi":"10.1186/s13040-024-00372-2","DOIUrl":null,"url":null,"abstract":"<p><p>The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"19"},"PeriodicalIF":6.1000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11202349/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-024-00372-2","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用时态生成对抗网络估算电子病历。

电子病历的丢失严重影响了生物医学数据的实际应用。因此，有效填补这些丢失的数据是一项有意义的研究工作。目前，最先进的方法主要是使用生成对抗网络（GAN）来填补电子病历的缺失值，并取得了突破性进展。然而，当面对高缺失率的数据集时，这些方法的估算准确性会急剧下降。这促使我们探索 GAN 的不确定性，并改进基于 GAN 的估算方法。本文提出 GRUD（门递归单元衰减）网络和 UGAN（不确定性生成对抗网络），并将其有机地结合起来，称为 UGAN-GRUD。在 UGAN-GRUD 中，它强调使用 GAN 生成估算值，然后利用 GRUD 对其进行补偿。我们设计了 UGAN 和 GRUD 网络。前者通过生成器和判别器反复学习数据的分布模式和不确定性。后者则利用基于时间衰减因子的 GRUD 来弥补前者的不足，后者可以学习电子病历中的特定时间关系。通过对公开生物医学数据集的实验研究，结果表明 UGAN-GRUD 优于目前最先进的方法，平均 RMSE（均方根误差）提高了 13%，MAPE（平均绝对误差）提高了 24.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.