An improvement of missing value imputation in DNA microarray data using cluster-based LLS method

Phimmarin Keerin, W. Kurutach, Tossapon Boongoen
{"title":"An improvement of missing value imputation in DNA microarray data using cluster-based LLS method","authors":"Phimmarin Keerin, W. Kurutach, Tossapon Boongoen","doi":"10.1109/ISCIT.2013.6645921","DOIUrl":null,"url":null,"abstract":"Gene expressions measured during a microarray experiment usually encounter the native problem of missing values. These are due to possible errors occurring in the primary experiments, image acquisition and interpretation processes. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data may not be possible with many standard analysis methods that require a complete data set. This paper introduces a new method to impute missing values in microarray data. The proposed algorithm, CLLS impute, is an extension of local least squares imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns corresponds to genes and experiments, respectively. CLLS kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, gene clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest to the target. Having known this, the target gene is imputed by applying regression analysis with similar genes previously determined. Empirical evaluation with several published gene expression datasets suggest that the proposed technique performs better than the classical local least square method and recently developed techniques found in the literature.","PeriodicalId":356009,"journal":{"name":"2013 13th International Symposium on Communications and Information Technologies (ISCIT)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 13th International Symposium on Communications and Information Technologies (ISCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCIT.2013.6645921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Gene expressions measured during a microarray experiment usually encounter the native problem of missing values. These are due to possible errors occurring in the primary experiments, image acquisition and interpretation processes. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data may not be possible with many standard analysis methods that require a complete data set. This paper introduces a new method to impute missing values in microarray data. The proposed algorithm, CLLS impute, is an extension of local least squares imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns corresponds to genes and experiments, respectively. CLLS kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, gene clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest to the target. Having known this, the target gene is imputed by applying regression analysis with similar genes previously determined. Empirical evaluation with several published gene expression datasets suggest that the proposed technique performs better than the classical local least square method and recently developed techniques found in the literature.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于聚类的LLS方法对DNA微阵列数据缺失值输入的改进
在微阵列实验中测量的基因表达通常会遇到缺失值的问题。这是由于在初级实验、图像采集和解释过程中可能出现的错误。如果不解决这个问题,可能会严重降低任何后续下游分析或医疗应用的可靠性。然而,微阵列数据的进一步研究可能不可能与许多标准的分析方法,需要一个完整的数据集。本文介绍了一种新的微阵列数据缺失值的计算方法。本文提出的CLLS算法是对局部最小二乘算法的扩展,采用了局部数据聚类,提高了算法的质量和效率。基因表达数据通常表示为矩阵,其行和列分别对应于基因和实验。CLLS首先通过删除缺失值的行来查找完整的数据集。然后,利用聚类技术对完整数据集进行聚类,得到基因聚类及其相应的质心。一组与目标基因相似的基因(缺失值)是指簇中质心最接近目标的基因。在此基础上,利用回归分析的方法,对已知的相似基因进行拟合。对几个已发表的基因表达数据集的实证评估表明,所提出的技术比经典的局部最小二乘法和最近在文献中发现的技术表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance evaluation of ETX metric on OLSR in heterogeneous networks Real-time advisory service for orchid care Realtime transmission of full high-definition 30 frames/s videos over 8×8 MIMO-OFDM channels using HACP-based lossless coding Design of ZigBee based WSN for smart demand responsive home energy management system Receptive field resolution analysis in convolutional feature extraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1