{"title":"通过随机分区分布进行聚类分析","authors":"D. B. Dahl, J. Andros, J. Carter","doi":"10.1002/sam.11602","DOIUrl":null,"url":null,"abstract":"Hierarchical and k‐medoids clustering are deterministic clustering algorithms defined on pairwise distances. We use these same pairwise distances in a novel stochastic clustering procedure based on a probability distribution. We call our proposed method CaviarPD, a portmanteau from cluster analysis via random partition distributions. CaviarPD first samples clusterings from a distribution on partitions and then finds the best cluster estimate based on these samples using algorithms to minimize an expected loss. Using eight case studies, we show that our approach produces results as close to the truth as hierarchical and k‐medoids methods, and has the additional advantage of allowing for a probabilistic framework to assess clustering uncertainty. The method provides an intuitive graphical representation of clustering uncertainty through pairwise probabilities from partition samples. A software implementation of the method is available in the CaviarPD package for R.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cluster analysis via random partition distributions\",\"authors\":\"D. B. Dahl, J. Andros, J. Carter\",\"doi\":\"10.1002/sam.11602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hierarchical and k‐medoids clustering are deterministic clustering algorithms defined on pairwise distances. We use these same pairwise distances in a novel stochastic clustering procedure based on a probability distribution. We call our proposed method CaviarPD, a portmanteau from cluster analysis via random partition distributions. CaviarPD first samples clusterings from a distribution on partitions and then finds the best cluster estimate based on these samples using algorithms to minimize an expected loss. Using eight case studies, we show that our approach produces results as close to the truth as hierarchical and k‐medoids methods, and has the additional advantage of allowing for a probabilistic framework to assess clustering uncertainty. The method provides an intuitive graphical representation of clustering uncertainty through pairwise probabilities from partition samples. A software implementation of the method is available in the CaviarPD package for R.\",\"PeriodicalId\":342679,\"journal\":{\"name\":\"Statistical Analysis and Data Mining: The ASA Data Science Journal\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Analysis and Data Mining: The ASA Data Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/sam.11602\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cluster analysis via random partition distributions
Hierarchical and k‐medoids clustering are deterministic clustering algorithms defined on pairwise distances. We use these same pairwise distances in a novel stochastic clustering procedure based on a probability distribution. We call our proposed method CaviarPD, a portmanteau from cluster analysis via random partition distributions. CaviarPD first samples clusterings from a distribution on partitions and then finds the best cluster estimate based on these samples using algorithms to minimize an expected loss. Using eight case studies, we show that our approach produces results as close to the truth as hierarchical and k‐medoids methods, and has the additional advantage of allowing for a probabilistic framework to assess clustering uncertainty. The method provides an intuitive graphical representation of clustering uncertainty through pairwise probabilities from partition samples. A software implementation of the method is available in the CaviarPD package for R.