Dealing with gene expression missing data.

Systems biology Pub Date : 2006-05-01 DOI:10.1049/ip-syb:20050056

L P Brás, J C Menezes

{"title":"Dealing with gene expression missing data.","authors":"L P Brás, J C Menezes","doi":"10.1049/ip-syb:20050056","DOIUrl":null,"url":null,"abstract":"<p><p>Compared evaluation of different methods is presented for estimating missing values in microarray data: weighted K-nearest neighbours imputation (KNNimpute), regression-based methods such as local least squares imputation (LLSimpute) and partial least squares imputation (PLSimpute) and Bayesian principal component analysis (BPCA). The influence in prediction accuracy of some factors, such as methods' parameters, type of data relationships used in the estimation process (i.e. row-wise, column-wise or both), missing rate and pattern and type of experiment [time series (TS), non-time series (NTS) or mixed (MIX) experiments] is elucidated. Improvements based on the iterative use of data (iterative LLS and PLS imputation--ILLSimpute and IPLSimpute), the need to perform initial imputations (modified PLS and Helland PLS imputation--MPLSimpute and HPLSimpute) and the type of relationships employed (KNNarray, LLSarray, HPLSarray and alternating PLS--APLSimpute) are proposed. Overall, it is shown that data set properties (type of experiment, missing rate and pattern) affect the data similarity structure, therefore influencing the methods' performance. LLSimpute and ILLSimpute are preferable in the presence of data with a stronger similarity structure (TS and MIX experiments), whereas PLS-based methods (MPLSimpute, IPLSimpute and APLSimpute) are preferable when estimating NTS missing data.</p>","PeriodicalId":87457,"journal":{"name":"Systems biology","volume":"153 3","pages":"105-19"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/ip-syb:20050056","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/ip-syb:20050056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Compared evaluation of different methods is presented for estimating missing values in microarray data: weighted K-nearest neighbours imputation (KNNimpute), regression-based methods such as local least squares imputation (LLSimpute) and partial least squares imputation (PLSimpute) and Bayesian principal component analysis (BPCA). The influence in prediction accuracy of some factors, such as methods' parameters, type of data relationships used in the estimation process (i.e. row-wise, column-wise or both), missing rate and pattern and type of experiment [time series (TS), non-time series (NTS) or mixed (MIX) experiments] is elucidated. Improvements based on the iterative use of data (iterative LLS and PLS imputation--ILLSimpute and IPLSimpute), the need to perform initial imputations (modified PLS and Helland PLS imputation--MPLSimpute and HPLSimpute) and the type of relationships employed (KNNarray, LLSarray, HPLSarray and alternating PLS--APLSimpute) are proposed. Overall, it is shown that data set properties (type of experiment, missing rate and pattern) affect the data similarity structure, therefore influencing the methods' performance. LLSimpute and ILLSimpute are preferable in the presence of data with a stronger similarity structure (TS and MIX experiments), whereas PLS-based methods (MPLSimpute, IPLSimpute and APLSimpute) are preferable when estimating NTS missing data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基因表达缺失数据处理。

介绍了用于估计微阵列数据中缺失值的不同方法的比较评估:加权k近邻法(KNNimpute)，基于回归的方法，如局部最小二乘法(LLSimpute)和偏最小二乘法(PLSimpute)以及贝叶斯主成分分析(BPCA)。阐明了一些因素对预测精度的影响，如方法参数、估计过程中使用的数据关系类型(即逐行、逐列或两者兼有)、缺失率、模式和实验类型[时间序列(TS)、非时间序列(NTS)或混合(MIX)实验]。提出了基于数据迭代使用的改进(迭代LLS和PLS imputation—ILLSimpute和IPLSimpute)，执行初始imputation (modified PLS和Helland PLS imputation—MPLSimpute和HPLSimpute)的需要以及所采用的关系类型(KNNarray, LLSarray, HPLSarray和交替PLS—APLSimpute)。总体而言，数据集属性(实验类型、缺失率和模式)会影响数据相似度结构，从而影响方法的性能。LLSimpute和ILLSimpute在数据具有更强的相似结构(TS和MIX实验)时更可取，而基于pls的方法(MPLSimpute, IPLSimpute和APLSimpute)在估计NTS缺失数据时更可取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Systems biology

自引率

0.00%

发文量