{"title":"基因表达数据聚类方法的比较","authors":"Anton Borg, Niklas Lavesson, V. Boeva","doi":"10.3233/978-1-61499-330-8-55","DOIUrl":null,"url":null,"abstract":"Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping","PeriodicalId":322432,"journal":{"name":"Scandinavian Conference on AI","volume":"221 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Comparison of Clustering Approaches for Gene Expression Data\",\"authors\":\"Anton Borg, Niklas Lavesson, V. Boeva\",\"doi\":\"10.3233/978-1-61499-330-8-55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping\",\"PeriodicalId\":322432,\"journal\":{\"name\":\"Scandinavian Conference on AI\",\"volume\":\"221 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scandinavian Conference on AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/978-1-61499-330-8-55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Conference on AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/978-1-61499-330-8-55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of Clustering Approaches for Gene Expression Data
Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping