基因表达数据聚类方法的比较

Scandinavian Conference on AI Pub Date : 1900-01-01 DOI:10.3233/978-1-61499-330-8-55

Anton Borg, Niklas Lavesson, V. Boeva

{"title":"基因表达数据聚类方法的比较","authors":"Anton Borg, Niklas Lavesson, V. Boeva","doi":"10.3233/978-1-61499-330-8-55","DOIUrl":null,"url":null,"abstract":"Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping","PeriodicalId":322432,"journal":{"name":"Scandinavian Conference on AI","volume":"221 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Comparison of Clustering Approaches for Gene Expression Data\",\"authors\":\"Anton Borg, Niklas Lavesson, V. Boeva\",\"doi\":\"10.3233/978-1-61499-330-8-55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping\",\"PeriodicalId\":322432,\"journal\":{\"name\":\"Scandinavian Conference on AI\",\"volume\":\"221 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scandinavian Conference on AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/978-1-61499-330-8-55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Conference on AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/978-1-61499-330-8-55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

聚类算法已经被用来根据基因表达的相似程度将它们分成不同的组。这样的分组可能表明各自的基因是相关的和/或共同调节的，并随后表明这些基因可能具有共同的生物学作用。本文研究了四种聚类算法:k-means、cut-clustering、spectral和expectation-maximization。这些算法是相互比较的基准。利用动态时间翘曲距离对四种聚类算法在时间序列表达数据上的性能进行了研究，以衡量基因表达谱之间的相似性。采用四种不同的聚类验证方法来评估聚类算法:用于评估聚类质量的连通性和轮廓指数，用于评估ac聚类方法稳定性的Jaccard指数和用于评估准确性的Rand指数。用弗里德曼检验和内门尼事后检验对所得结果进行了分析。在Silhouette和Rand验证指标下，K-means算法明显优于光谱聚类算法。关键词。基因表达数据，基于图的聚类算法，最小割聚类，分区算法，动态时间翘曲

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparison of Clustering Approaches for Gene Expression Data

Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助