作为社区检测问题的贝叶斯非参数聚类

ERN: Nonparametric Methods (Topic) Pub Date : 2019-07-16 DOI:10.2139/ssrn.3424529

S. Tonellato

{"title":"作为社区检测问题的贝叶斯非参数聚类","authors":"S. Tonellato","doi":"10.2139/ssrn.3424529","DOIUrl":null,"url":null,"abstract":"It is well known that a wide class of bayesian nonparametric priors lead to the representation of the distribution of the observable variables as a mixture density with an infinite number of components, and that such a representation induces a clustering structure in the observations. However, cluster identification is not straightforward a posteriori and some post-processing is usually required. In order to circumvent label switching, pairwise posterior similarity has been introduced, and it has been used in order to either apply classical clustering algorithms or estimate the underlying partition by minimising a suitable loss function. This paper proposes to map observations on a weighted undirected graph, where each node represents a sample item and edge weights are given by the posterior pairwise similarities. It will be shown how, after building a particular random walk on such a graph, it is possible to apply a community detection algorithm, known as map equation method, by optimising the description length of the partition. A relevant feature of this method is that it allows for both the quantification of the posterior uncertainty of the classification and the selection of variables to be used for classification purposes.","PeriodicalId":11744,"journal":{"name":"ERN: Nonparametric Methods (Topic)","volume":"75 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Bayesian Nonparametric Clustering as a Community Detection Problem\",\"authors\":\"S. Tonellato\",\"doi\":\"10.2139/ssrn.3424529\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is well known that a wide class of bayesian nonparametric priors lead to the representation of the distribution of the observable variables as a mixture density with an infinite number of components, and that such a representation induces a clustering structure in the observations. However, cluster identification is not straightforward a posteriori and some post-processing is usually required. In order to circumvent label switching, pairwise posterior similarity has been introduced, and it has been used in order to either apply classical clustering algorithms or estimate the underlying partition by minimising a suitable loss function. This paper proposes to map observations on a weighted undirected graph, where each node represents a sample item and edge weights are given by the posterior pairwise similarities. It will be shown how, after building a particular random walk on such a graph, it is possible to apply a community detection algorithm, known as map equation method, by optimising the description length of the partition. A relevant feature of this method is that it allows for both the quantification of the posterior uncertainty of the classification and the selection of variables to be used for classification purposes.\",\"PeriodicalId\":11744,\"journal\":{\"name\":\"ERN: Nonparametric Methods (Topic)\",\"volume\":\"75 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERN: Nonparametric Methods (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3424529\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Nonparametric Methods (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3424529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

众所周知，广泛的贝叶斯非参数先验导致可观测变量的分布表示为具有无限数量分量的混合密度，并且这种表示在观测中引起聚类结构。然而，聚类识别并不是简单的后验，通常需要进行一些后处理。为了避免标签切换，引入了成对后验相似度，并将其用于应用经典聚类算法或通过最小化合适的损失函数来估计底层分区。本文提出在加权无向图上映射观测值，其中每个节点代表一个样本项，边的权重由后验两两相似度给出。它将展示如何在这样一个图上建立一个特定的随机漫步之后，通过优化分区的描述长度来应用社区检测算法，称为映射方程方法。这种方法的一个相关特征是，它既可以量化分类的后验不确定性，也可以选择用于分类目的的变量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bayesian Nonparametric Clustering as a Community Detection Problem

It is well known that a wide class of bayesian nonparametric priors lead to the representation of the distribution of the observable variables as a mixture density with an infinite number of components, and that such a representation induces a clustering structure in the observations. However, cluster identification is not straightforward a posteriori and some post-processing is usually required. In order to circumvent label switching, pairwise posterior similarity has been introduced, and it has been used in order to either apply classical clustering algorithms or estimate the underlying partition by minimising a suitable loss function. This paper proposes to map observations on a weighted undirected graph, where each node represents a sample item and edge weights are given by the posterior pairwise similarities. It will be shown how, after building a particular random walk on such a graph, it is possible to apply a community detection algorithm, known as map equation method, by optimising the description length of the partition. A relevant feature of this method is that it allows for both the quantification of the posterior uncertainty of the classification and the selection of variables to be used for classification purposes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ERN: Nonparametric Methods (Topic)

自引率

0.00%

发文量

期刊最新文献

Efficient Estimation of Pricing Kernels and Market-Implied Densities Futures-Trading Activity and Jump Risk: Evidence From the Bitcoin Market Partial Identification of Discrete Instrumental Variable Models using Shape Restrictions Frequency Dependent Risk Spatial Heterogeneity in the Borrowers' Mortgage Termination Decision – a Nonparametric Approach