{"title":"The impact of missing values imputation methods in cDNA microarrays on downstream data analysis","authors":"V. F. Ghoneim, N. Solouma, Y. Kadah","doi":"10.1109/NRSC.2011.5873605","DOIUrl":null,"url":null,"abstract":"DNA microarray is a high throughput gene profiling technology employed in numerous biological and medical studies. These studies require complete and accurate gene expression values which are not always available in practice due to the so-called microarray missing value (MV) problem. Many attempts have been held to deal with this problem. MV imputation algorithms to estimate MV have been designed as the most reliable solution for this problem. Many of the schemes introduced to evaluate these algorithms are limited to measuring the similarity between the original and imputed data. While imputed expression values themselves are not interesting, rather whether their impact on downstream analysis is the major concern. In this work the success of three MV imputation methods is measured in terms of Normalized Root Mean Square Error as well as classification accuracy and detection of differentially expressed genes (biomarkers) for distinguishing different phenotypes. The classification accuracies computed on the original complete and imputed datasets gave a practical evaluation of the three imputation methods where it showed slight variations among them. Some of the identified biomarkers were found to be Gene Ontology annotated coding for proteins involved in cell adhesion/motility, lipid/fatty acid transport and metabolism, immune/defence response, and electron transport.","PeriodicalId":438638,"journal":{"name":"2011 28th National Radio Science Conference (NRSC)","volume":"508 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 28th National Radio Science Conference (NRSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NRSC.2011.5873605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
DNA microarray is a high throughput gene profiling technology employed in numerous biological and medical studies. These studies require complete and accurate gene expression values which are not always available in practice due to the so-called microarray missing value (MV) problem. Many attempts have been held to deal with this problem. MV imputation algorithms to estimate MV have been designed as the most reliable solution for this problem. Many of the schemes introduced to evaluate these algorithms are limited to measuring the similarity between the original and imputed data. While imputed expression values themselves are not interesting, rather whether their impact on downstream analysis is the major concern. In this work the success of three MV imputation methods is measured in terms of Normalized Root Mean Square Error as well as classification accuracy and detection of differentially expressed genes (biomarkers) for distinguishing different phenotypes. The classification accuracies computed on the original complete and imputed datasets gave a practical evaluation of the three imputation methods where it showed slight variations among them. Some of the identified biomarkers were found to be Gene Ontology annotated coding for proteins involved in cell adhesion/motility, lipid/fatty acid transport and metabolism, immune/defence response, and electron transport.