Dealing with gene expression missing data.

L P Brás, J C Menezes
{"title":"Dealing with gene expression missing data.","authors":"L P Brás,&nbsp;J C Menezes","doi":"10.1049/ip-syb:20050056","DOIUrl":null,"url":null,"abstract":"<p><p>Compared evaluation of different methods is presented for estimating missing values in microarray data: weighted K-nearest neighbours imputation (KNNimpute), regression-based methods such as local least squares imputation (LLSimpute) and partial least squares imputation (PLSimpute) and Bayesian principal component analysis (BPCA). The influence in prediction accuracy of some factors, such as methods' parameters, type of data relationships used in the estimation process (i.e. row-wise, column-wise or both), missing rate and pattern and type of experiment [time series (TS), non-time series (NTS) or mixed (MIX) experiments] is elucidated. Improvements based on the iterative use of data (iterative LLS and PLS imputation--ILLSimpute and IPLSimpute), the need to perform initial imputations (modified PLS and Helland PLS imputation--MPLSimpute and HPLSimpute) and the type of relationships employed (KNNarray, LLSarray, HPLSarray and alternating PLS--APLSimpute) are proposed. Overall, it is shown that data set properties (type of experiment, missing rate and pattern) affect the data similarity structure, therefore influencing the methods' performance. LLSimpute and ILLSimpute are preferable in the presence of data with a stronger similarity structure (TS and MIX experiments), whereas PLS-based methods (MPLSimpute, IPLSimpute and APLSimpute) are preferable when estimating NTS missing data.</p>","PeriodicalId":87457,"journal":{"name":"Systems biology","volume":"153 3","pages":"105-19"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/ip-syb:20050056","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/ip-syb:20050056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

Compared evaluation of different methods is presented for estimating missing values in microarray data: weighted K-nearest neighbours imputation (KNNimpute), regression-based methods such as local least squares imputation (LLSimpute) and partial least squares imputation (PLSimpute) and Bayesian principal component analysis (BPCA). The influence in prediction accuracy of some factors, such as methods' parameters, type of data relationships used in the estimation process (i.e. row-wise, column-wise or both), missing rate and pattern and type of experiment [time series (TS), non-time series (NTS) or mixed (MIX) experiments] is elucidated. Improvements based on the iterative use of data (iterative LLS and PLS imputation--ILLSimpute and IPLSimpute), the need to perform initial imputations (modified PLS and Helland PLS imputation--MPLSimpute and HPLSimpute) and the type of relationships employed (KNNarray, LLSarray, HPLSarray and alternating PLS--APLSimpute) are proposed. Overall, it is shown that data set properties (type of experiment, missing rate and pattern) affect the data similarity structure, therefore influencing the methods' performance. LLSimpute and ILLSimpute are preferable in the presence of data with a stronger similarity structure (TS and MIX experiments), whereas PLS-based methods (MPLSimpute, IPLSimpute and APLSimpute) are preferable when estimating NTS missing data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基因表达缺失数据处理。
介绍了用于估计微阵列数据中缺失值的不同方法的比较评估:加权k近邻法(KNNimpute),基于回归的方法,如局部最小二乘法(LLSimpute)和偏最小二乘法(PLSimpute)以及贝叶斯主成分分析(BPCA)。阐明了一些因素对预测精度的影响,如方法参数、估计过程中使用的数据关系类型(即逐行、逐列或两者兼有)、缺失率、模式和实验类型[时间序列(TS)、非时间序列(NTS)或混合(MIX)实验]。提出了基于数据迭代使用的改进(迭代LLS和PLS imputation—ILLSimpute和IPLSimpute),执行初始imputation (modified PLS和Helland PLS imputation—MPLSimpute和HPLSimpute)的需要以及所采用的关系类型(KNNarray, LLSarray, HPLSarray和交替PLS—APLSimpute)。总体而言,数据集属性(实验类型、缺失率和模式)会影响数据相似度结构,从而影响方法的性能。LLSimpute和ILLSimpute在数据具有更强的相似结构(TS和MIX实验)时更可取,而基于pls的方法(MPLSimpute, IPLSimpute和APLSimpute)在估计NTS缺失数据时更可取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Systems theory of Smad signalling. Direct Lyapunov exponent analysis enables parametric study of transient signalling governing cell behaviour. Primary mouse hepatocytes for systems biology approaches: a standardized in vitro system for modelling of signal transduction pathways. Elimination of the initial value parameters when identifying a system close to a Hopf bifurcation. Decreased internalisation of erbB1 mutants in lung cancer is linked with a mechanism conferring sensitivity to gefitinib.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1