{"title":"Microarray Missing Value Imputation by Iterated Local Least Squares","authors":"Zhipeng Cai, M. Heydari, Guohui Lin","doi":"10.1142/9781860947292_0019","DOIUrl":null,"url":null,"abstract":"Microarray gene expression data often contains missing values resulted from various reasons. However, most of the gene expression data analysis algorithms, such as clustering, classification and network design, require complete information, that is, without any missing values. It is therefore very important to accurately impute the missing values before applying the data analysis algorithms. In this paper, anIterated Local Least Squares Imputation method (ILLsimpute) is proposed to estimate the missing values. In ILLsimpute, a similarity threshold is learned using known expression values and at every iteration it is used to obtain a set of coherent genes for every target gene containing missing values. The target gene is then represented as a linear combination of the coherent genes, using the least squares. The algorithm terminates after certain iterations or when the imputation converges. The experimental results on real microarray datasets show that ILLsimpute outperforms three most recent methods on several commonly tested datasets.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"38 1","pages":"159-168"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860947292_0019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Microarray gene expression data often contains missing values resulted from various reasons. However, most of the gene expression data analysis algorithms, such as clustering, classification and network design, require complete information, that is, without any missing values. It is therefore very important to accurately impute the missing values before applying the data analysis algorithms. In this paper, anIterated Local Least Squares Imputation method (ILLsimpute) is proposed to estimate the missing values. In ILLsimpute, a similarity threshold is learned using known expression values and at every iteration it is used to obtain a set of coherent genes for every target gene containing missing values. The target gene is then represented as a linear combination of the coherent genes, using the least squares. The algorithm terminates after certain iterations or when the imputation converges. The experimental results on real microarray datasets show that ILLsimpute outperforms three most recent methods on several commonly tested datasets.