Random Forest with Random Projection to Impute Missing Gene Expression Data

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI:10.1109/ICMLA.2015.29

Lovedeep Gondara

引用次数: 5

Abstract

Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于随机投影的随机森林缺失基因表达数据的估算

在基因表达研究中，测量误差或缺乏适当的实验设置往往导致数据无效或缺失。样本量小，实验成本低，需要一种有效的缺失数据补全技术。在本文中，我们提出了一种基于随机森林的方法，使用随机投影作为数据预处理滤波器。在各种真实数据集上使用不同缺失数据比例的初步结果表明，基于随机森林的imputation过程与基于k -最近邻和支持向量回归的方法一样好或更好。使用随机投影，我们发现数据集的维数可以在不影响输入过程的情况下降低50%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量