Federico Maria Quetti, Silvia Figini, Elena ballante
{"title":"通过适当贝叶斯引导法进行聚类的贝叶斯方法:贝叶斯袋式聚类(BBC)算法","authors":"Federico Maria Quetti, Silvia Figini, Elena ballante","doi":"arxiv-2409.08954","DOIUrl":null,"url":null,"abstract":"The paper presents a novel approach for unsupervised techniques in the field\nof clustering. A new method is proposed to enhance existing literature models\nusing the proper Bayesian bootstrap to improve results in terms of robustness\nand interpretability. Our approach is organized in two steps: k-means\nclustering is used for prior elicitation, then proper Bayesian bootstrap is\napplied as resampling method in an ensemble clustering approach. Results are\nanalyzed introducing measures of uncertainty based on Shannon entropy. The\nproposal provides clear indication on the optimal number of clusters, as well\nas a better representation of the clustered data. Empirical results are\nprovided on simulated data showing the methodological and empirical advances\nobtained.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"75 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Bayesian Approach to Clustering via the Proper Bayesian Bootstrap: the Bayesian Bagged Clustering (BBC) algorithm\",\"authors\":\"Federico Maria Quetti, Silvia Figini, Elena ballante\",\"doi\":\"arxiv-2409.08954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents a novel approach for unsupervised techniques in the field\\nof clustering. A new method is proposed to enhance existing literature models\\nusing the proper Bayesian bootstrap to improve results in terms of robustness\\nand interpretability. Our approach is organized in two steps: k-means\\nclustering is used for prior elicitation, then proper Bayesian bootstrap is\\napplied as resampling method in an ensemble clustering approach. Results are\\nanalyzed introducing measures of uncertainty based on Shannon entropy. The\\nproposal provides clear indication on the optimal number of clusters, as well\\nas a better representation of the clustered data. Empirical results are\\nprovided on simulated data showing the methodological and empirical advances\\nobtained.\",\"PeriodicalId\":501340,\"journal\":{\"name\":\"arXiv - STAT - Machine Learning\",\"volume\":\"75 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Bayesian Approach to Clustering via the Proper Bayesian Bootstrap: the Bayesian Bagged Clustering (BBC) algorithm
The paper presents a novel approach for unsupervised techniques in the field
of clustering. A new method is proposed to enhance existing literature models
using the proper Bayesian bootstrap to improve results in terms of robustness
and interpretability. Our approach is organized in two steps: k-means
clustering is used for prior elicitation, then proper Bayesian bootstrap is
applied as resampling method in an ensemble clustering approach. Results are
analyzed introducing measures of uncertainty based on Shannon entropy. The
proposal provides clear indication on the optimal number of clusters, as well
as a better representation of the clustered data. Empirical results are
provided on simulated data showing the methodological and empirical advances
obtained.