基因表达数据聚类方法的比较

Anton Borg, Niklas Lavesson, V. Boeva
{"title":"基因表达数据聚类方法的比较","authors":"Anton Borg, Niklas Lavesson, V. Boeva","doi":"10.3233/978-1-61499-330-8-55","DOIUrl":null,"url":null,"abstract":"Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping","PeriodicalId":322432,"journal":{"name":"Scandinavian Conference on AI","volume":"221 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Comparison of Clustering Approaches for Gene Expression Data\",\"authors\":\"Anton Borg, Niklas Lavesson, V. Boeva\",\"doi\":\"10.3233/978-1-61499-330-8-55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping\",\"PeriodicalId\":322432,\"journal\":{\"name\":\"Scandinavian Conference on AI\",\"volume\":\"221 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scandinavian Conference on AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/978-1-61499-330-8-55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Conference on AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/978-1-61499-330-8-55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

聚类算法已经被用来根据基因表达的相似程度将它们分成不同的组。这样的分组可能表明各自的基因是相关的和/或共同调节的,并随后表明这些基因可能具有共同的生物学作用。本文研究了四种聚类算法:k-means、cut-clustering、spectral和expectation-maximization。这些算法是相互比较的基准。利用动态时间翘曲距离对四种聚类算法在时间序列表达数据上的性能进行了研究,以衡量基因表达谱之间的相似性。采用四种不同的聚类验证方法来评估聚类算法:用于评估聚类质量的连通性和轮廓指数,用于评估ac聚类方法稳定性的Jaccard指数和用于评估准确性的Rand指数。用弗里德曼检验和内门尼事后检验对所得结果进行了分析。在Silhouette和Rand验证指标下,K-means算法明显优于光谱聚类算法。关键词。基因表达数据,基于图的聚类算法,最小割聚类,分区算法,动态时间翘曲
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparison of Clustering Approaches for Gene Expression Data
Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Goal-driven, assistive agents for instructing and guiding user activities On Associative Confounder Bias Heuristics for Determining the Elimination Ordering in the Influence Diagram Evaluation with Binary Trees Revisiting Inner Entanglements in Classical Planning Error AMP Chain Graphs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1