{"title":"Random Forest with Random Projection to Impute Missing Gene Expression Data","authors":"Lovedeep Gondara","doi":"10.1109/ICMLA.2015.29","DOIUrl":null,"url":null,"abstract":"Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.