Junxiang Chen, Li Sun, Ke Yu, Kayhan Batmanghelich
{"title":"Extracting Disease-Relevant Features with Adversarial Regularization.","authors":"Junxiang Chen, Li Sun, Ke Yu, Kayhan Batmanghelich","doi":"10.1109/bibm52615.2021.9669878","DOIUrl":null,"url":null,"abstract":"<p><p>Extracting hidden phenotypes is essential in medical data analysis because it facilitates disease subtyping, diagnosis, and understanding of disease etiology. Since the hidden phenotype is usually a low-dimensional representation that comprehensively describes the disease, we require a dimensionality-reduction method that captures as much disease-relevant information as possible. However, most unsupervised or self-supervised methods cannot achieve the goal because they learn a holistic representation containing both disease-relevant and disease-irrelevant information. Supervised methods can capture information that is predictive to the target clinical variable only, but the learned representation is usually not generalizable for the various aspects of the disease. Hence, we develop a dimensionality-reduction approach to extract Disease Relevant Features (DRFs) based on information theory. We propose to use clinical variables that weakly define the disease as so-called <i>anchors</i>. We derive a formulation that makes the DRF predictive of the anchors while forcing the remaining representation to be irrelevant to the anchors via adversarial regularization. We apply our method to a large-scale study of Chronic Obstructive Pulmonary Disease (COPD). Our experiment shows: (1) Learned DRFs are as predictive as the original representation in predicting the anchors, although it is in a significantly lower dimension. (2) Compared to supervised representation, the learned DRFs are more predictive to other relevant disease metrics that are <i>not</i> used during the training. (3) The learned DRFs are related to non-imaging biological measurements such as gene expressions, suggesting the DRFs include information related to the underlying biology of the disease.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":" ","pages":"3464-3471"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8863436/pdf/nihms-1778852.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm52615.2021.9669878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Extracting hidden phenotypes is essential in medical data analysis because it facilitates disease subtyping, diagnosis, and understanding of disease etiology. Since the hidden phenotype is usually a low-dimensional representation that comprehensively describes the disease, we require a dimensionality-reduction method that captures as much disease-relevant information as possible. However, most unsupervised or self-supervised methods cannot achieve the goal because they learn a holistic representation containing both disease-relevant and disease-irrelevant information. Supervised methods can capture information that is predictive to the target clinical variable only, but the learned representation is usually not generalizable for the various aspects of the disease. Hence, we develop a dimensionality-reduction approach to extract Disease Relevant Features (DRFs) based on information theory. We propose to use clinical variables that weakly define the disease as so-called anchors. We derive a formulation that makes the DRF predictive of the anchors while forcing the remaining representation to be irrelevant to the anchors via adversarial regularization. We apply our method to a large-scale study of Chronic Obstructive Pulmonary Disease (COPD). Our experiment shows: (1) Learned DRFs are as predictive as the original representation in predicting the anchors, although it is in a significantly lower dimension. (2) Compared to supervised representation, the learned DRFs are more predictive to other relevant disease metrics that are not used during the training. (3) The learned DRFs are related to non-imaging biological measurements such as gene expressions, suggesting the DRFs include information related to the underlying biology of the disease.