Extracting Disease-Relevant Features with Adversarial Regularization.

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine Pub Date : 2021-12-01 DOI:10.1109/bibm52615.2021.9669878

Junxiang Chen, Li Sun, Ke Yu, Kayhan Batmanghelich

{"title":"Extracting Disease-Relevant Features with Adversarial Regularization.","authors":"Junxiang Chen, Li Sun, Ke Yu, Kayhan Batmanghelich","doi":"10.1109/bibm52615.2021.9669878","DOIUrl":null,"url":null,"abstract":"Extracting hidden phenotypes is essential in medical data analysis because it facilitates disease subtyping, diagnosis, and understanding of disease etiology. Since the hidden phenotype is usually a low-dimensional representation that comprehensively describes the disease, we require a dimensionality-reduction method that captures as much disease-relevant information as possible. However, most unsupervised or self-supervised methods cannot achieve the goal because they learn a holistic representation containing both disease-relevant and disease-irrelevant information. Supervised methods can capture information that is predictive to the target clinical variable only, but the learned representation is usually not generalizable for the various aspects of the disease. Hence, we develop a dimensionality-reduction approach to extract Disease Relevant Features (DRFs) based on information theory. We propose to use clinical variables that weakly define the disease as so-called anchors. We derive a formulation that makes the DRF predictive of the anchors while forcing the remaining representation to be irrelevant to the anchors via adversarial regularization. We apply our method to a large-scale study of Chronic Obstructive Pulmonary Disease (COPD). Our experiment shows: (1) Learned DRFs are as predictive as the original representation in predicting the anchors, although it is in a significantly lower dimension. (2) Compared to supervised representation, the learned DRFs are more predictive to other relevant disease metrics that are not used during the training. (3) The learned DRFs are related to non-imaging biological measurements such as gene expressions, suggesting the DRFs include information related to the underlying biology of the disease.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":" ","pages":"3464-3471"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8863436/pdf/nihms-1778852.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm52615.2021.9669878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Extracting hidden phenotypes is essential in medical data analysis because it facilitates disease subtyping, diagnosis, and understanding of disease etiology. Since the hidden phenotype is usually a low-dimensional representation that comprehensively describes the disease, we require a dimensionality-reduction method that captures as much disease-relevant information as possible. However, most unsupervised or self-supervised methods cannot achieve the goal because they learn a holistic representation containing both disease-relevant and disease-irrelevant information. Supervised methods can capture information that is predictive to the target clinical variable only, but the learned representation is usually not generalizable for the various aspects of the disease. Hence, we develop a dimensionality-reduction approach to extract Disease Relevant Features (DRFs) based on information theory. We propose to use clinical variables that weakly define the disease as so-called anchors. We derive a formulation that makes the DRF predictive of the anchors while forcing the remaining representation to be irrelevant to the anchors via adversarial regularization. We apply our method to a large-scale study of Chronic Obstructive Pulmonary Disease (COPD). Our experiment shows: (1) Learned DRFs are as predictive as the original representation in predicting the anchors, although it is in a significantly lower dimension. (2) Compared to supervised representation, the learned DRFs are more predictive to other relevant disease metrics that are not used during the training. (3) The learned DRFs are related to non-imaging biological measurements such as gene expressions, suggesting the DRFs include information related to the underlying biology of the disease.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

对抗正则化提取疾病相关特征。

提取隐藏表型在医疗数据分析中是必不可少的，因为它有助于疾病分型、诊断和了解疾病病因。由于隐性表型通常是全面描述疾病的低维表示，因此我们需要一种降维方法来捕获尽可能多的疾病相关信息。然而，大多数无监督或自监督方法无法实现目标，因为它们学习的是包含疾病相关和疾病无关信息的整体表示。有监督的方法只能捕获预测目标临床变量的信息，但学习到的表征通常不能推广到疾病的各个方面。因此，我们开发了一种基于信息论的降维方法来提取疾病相关特征(drf)。我们建议使用那些弱定义疾病的临床变量作为所谓的锚点。我们推导了一个公式，该公式使DRF预测锚点，同时通过对抗性正则化强制剩余表示与锚点无关。我们将我们的方法应用于慢性阻塞性肺疾病(COPD)的大规模研究。我们的实验表明:(1)学习drf在预测锚点方面与原始表征一样具有预测性，尽管它的维度明显较低。(2)与监督表示相比，学习到的drf对训练过程中未使用的其他相关疾病指标更具预测性。(3)学习到的drf与非成像生物学测量(如基因表达)有关，表明drf包含与疾病潜在生物学相关的信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

自引率

0.00%

发文量