{"title":"基于聚类的LLS方法对DNA微阵列数据缺失值输入的改进","authors":"Phimmarin Keerin, W. Kurutach, Tossapon Boongoen","doi":"10.1109/ISCIT.2013.6645921","DOIUrl":null,"url":null,"abstract":"Gene expressions measured during a microarray experiment usually encounter the native problem of missing values. These are due to possible errors occurring in the primary experiments, image acquisition and interpretation processes. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data may not be possible with many standard analysis methods that require a complete data set. This paper introduces a new method to impute missing values in microarray data. The proposed algorithm, CLLS impute, is an extension of local least squares imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns corresponds to genes and experiments, respectively. CLLS kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, gene clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest to the target. Having known this, the target gene is imputed by applying regression analysis with similar genes previously determined. Empirical evaluation with several published gene expression datasets suggest that the proposed technique performs better than the classical local least square method and recently developed techniques found in the literature.","PeriodicalId":356009,"journal":{"name":"2013 13th International Symposium on Communications and Information Technologies (ISCIT)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"An improvement of missing value imputation in DNA microarray data using cluster-based LLS method\",\"authors\":\"Phimmarin Keerin, W. Kurutach, Tossapon Boongoen\",\"doi\":\"10.1109/ISCIT.2013.6645921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gene expressions measured during a microarray experiment usually encounter the native problem of missing values. These are due to possible errors occurring in the primary experiments, image acquisition and interpretation processes. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data may not be possible with many standard analysis methods that require a complete data set. This paper introduces a new method to impute missing values in microarray data. The proposed algorithm, CLLS impute, is an extension of local least squares imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns corresponds to genes and experiments, respectively. CLLS kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, gene clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest to the target. Having known this, the target gene is imputed by applying regression analysis with similar genes previously determined. Empirical evaluation with several published gene expression datasets suggest that the proposed technique performs better than the classical local least square method and recently developed techniques found in the literature.\",\"PeriodicalId\":356009,\"journal\":{\"name\":\"2013 13th International Symposium on Communications and Information Technologies (ISCIT)\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 13th International Symposium on Communications and Information Technologies (ISCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCIT.2013.6645921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 13th International Symposium on Communications and Information Technologies (ISCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCIT.2013.6645921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An improvement of missing value imputation in DNA microarray data using cluster-based LLS method
Gene expressions measured during a microarray experiment usually encounter the native problem of missing values. These are due to possible errors occurring in the primary experiments, image acquisition and interpretation processes. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data may not be possible with many standard analysis methods that require a complete data set. This paper introduces a new method to impute missing values in microarray data. The proposed algorithm, CLLS impute, is an extension of local least squares imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns corresponds to genes and experiments, respectively. CLLS kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, gene clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest to the target. Having known this, the target gene is imputed by applying regression analysis with similar genes previously determined. Empirical evaluation with several published gene expression datasets suggest that the proposed technique performs better than the classical local least square method and recently developed techniques found in the literature.