改进基于聚类的DNA微阵列数据缺失值估计

Biomolecular engineering Pub Date : 2007-06-01 DOI:10.1016/j.bioeng.2007.04.003

Lígia P. Brás, José C. Menezes

{"title":"改进基于聚类的DNA微阵列数据缺失值估计","authors":"Lígia P. Brás, José C. Menezes","doi":"10.1016/j.bioeng.2007.04.003","DOIUrl":null,"url":null,"abstract":"<div><p>We present a modification of the weighted <em>K</em>-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values.</p><p>The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation.</p><p>The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.</p></div>","PeriodicalId":80259,"journal":{"name":"Biomolecular engineering","volume":"24 2","pages":"Pages 273-282"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.bioeng.2007.04.003","citationCount":"80","resultStr":"{\"title\":\"Improving cluster-based missing value estimation of DNA microarray data\",\"authors\":\"Lígia P. Brás, José C. Menezes\",\"doi\":\"10.1016/j.bioeng.2007.04.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present a modification of the weighted <em>K</em>-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values.</p><p>The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation.</p><p>The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.</p></div>\",\"PeriodicalId\":80259,\"journal\":{\"name\":\"Biomolecular engineering\",\"volume\":\"24 2\",\"pages\":\"Pages 273-282\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.bioeng.2007.04.003\",\"citationCount\":\"80\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomolecular engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389034407000354\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecular engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389034407000354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 80

摘要

我们提出了一种基于估计数据重用的加权k近邻插值方法(KNNimpute)，用于微阵列数据中的缺失值(mv)估计。这种方法被称为迭代KNN imputation (IKNNimpute)，因为它是使用最近的估计值进行迭代估计的。通过归一化均方根误差(NRMSE)和估计值与真值之间的相关系数，评估IKNNimpute在不同条件下(缺失数据类型、缺失数据比例和缺失数据结构)的估计效率，并与其他基于聚类的估计方法(KNNimpute和顺序KNN)进行比较。通过检测MV估计后丢失的差异表达基因，我们进一步研究了imputation对使用SAM检测差异表达基因的影响。性能测量结果一致，表明IKNNimpute的迭代过程可以提高基于聚类的方法在高缺失率、非时间序列实验和包含时间序列和非时间序列数据集的预测能力，因为具有MV的基因信息被更有效地利用，迭代过程允许细化MV估计。更重要的是，IKNN对差异表达基因检测的不利影响较小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving cluster-based missing value estimation of DNA microarray data

We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values.

The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation.

The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomolecular engineering

自引率

0.00%

发文量