改进基于聚类的DNA微阵列数据缺失值估计

Lígia P. Brás, José C. Menezes
{"title":"改进基于聚类的DNA微阵列数据缺失值估计","authors":"Lígia P. Brás,&nbsp;José C. Menezes","doi":"10.1016/j.bioeng.2007.04.003","DOIUrl":null,"url":null,"abstract":"<div><p>We present a modification of the weighted <em>K</em>-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values.</p><p>The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation.</p><p>The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.</p></div>","PeriodicalId":80259,"journal":{"name":"Biomolecular engineering","volume":"24 2","pages":"Pages 273-282"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.bioeng.2007.04.003","citationCount":"80","resultStr":"{\"title\":\"Improving cluster-based missing value estimation of DNA microarray data\",\"authors\":\"Lígia P. Brás,&nbsp;José C. Menezes\",\"doi\":\"10.1016/j.bioeng.2007.04.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present a modification of the weighted <em>K</em>-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values.</p><p>The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation.</p><p>The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.</p></div>\",\"PeriodicalId\":80259,\"journal\":{\"name\":\"Biomolecular engineering\",\"volume\":\"24 2\",\"pages\":\"Pages 273-282\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.bioeng.2007.04.003\",\"citationCount\":\"80\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomolecular engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389034407000354\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecular engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389034407000354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 80

摘要

我们提出了一种基于估计数据重用的加权k近邻插值方法(KNNimpute),用于微阵列数据中的缺失值(mv)估计。这种方法被称为迭代KNN imputation (IKNNimpute),因为它是使用最近的估计值进行迭代估计的。通过归一化均方根误差(NRMSE)和估计值与真值之间的相关系数,评估IKNNimpute在不同条件下(缺失数据类型、缺失数据比例和缺失数据结构)的估计效率,并与其他基于聚类的估计方法(KNNimpute和顺序KNN)进行比较。通过检测MV估计后丢失的差异表达基因,我们进一步研究了imputation对使用SAM检测差异表达基因的影响。性能测量结果一致,表明IKNNimpute的迭代过程可以提高基于聚类的方法在高缺失率、非时间序列实验和包含时间序列和非时间序列数据集的预测能力,因为具有MV的基因信息被更有效地利用,迭代过程允许细化MV估计。更重要的是,IKNN对差异表达基因检测的不利影响较小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving cluster-based missing value estimation of DNA microarray data

We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values.

The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation.

The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial Board Advances in SELEX and application of aptamers in the central nervous system CIDB: Chlamydia Interactive Database for cross-querying genomics, transcriptomics and proteomics data Direct haplotyping of bi-allelic SNPs using ARMS and RFLP analysis techniques Molecular evolution of Fome lignosus laccase by ethyl methane sulfonate-based random mutagenesis in vitro
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1