Bayesian Nonparametric Clustering as a Community Detection Problem

S. Tonellato
{"title":"Bayesian Nonparametric Clustering as a Community Detection Problem","authors":"S. Tonellato","doi":"10.2139/ssrn.3424529","DOIUrl":null,"url":null,"abstract":"It is well known that a wide class of bayesian nonparametric priors lead to the representation of the distribution of the observable variables as a mixture density with an infinite number of components, and that such a representation induces a clustering structure in the observations. However, cluster identification is not straightforward a posteriori and some post-processing is usually required. In order to circumvent label switching, pairwise posterior similarity has been introduced, and it has been used in order to either apply classical clustering algorithms or estimate the underlying partition by minimising a suitable loss function. This paper proposes to map observations on a weighted undirected graph, where each node represents a sample item and edge weights are given by the posterior pairwise similarities. It will be shown how, after building a particular random walk on such a graph, it is possible to apply a community detection algorithm, known as map equation method, by optimising the description length of the partition. A relevant feature of this method is that it allows for both the quantification of the posterior uncertainty of the classification and the selection of variables to be used for classification purposes.","PeriodicalId":11744,"journal":{"name":"ERN: Nonparametric Methods (Topic)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Nonparametric Methods (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3424529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

It is well known that a wide class of bayesian nonparametric priors lead to the representation of the distribution of the observable variables as a mixture density with an infinite number of components, and that such a representation induces a clustering structure in the observations. However, cluster identification is not straightforward a posteriori and some post-processing is usually required. In order to circumvent label switching, pairwise posterior similarity has been introduced, and it has been used in order to either apply classical clustering algorithms or estimate the underlying partition by minimising a suitable loss function. This paper proposes to map observations on a weighted undirected graph, where each node represents a sample item and edge weights are given by the posterior pairwise similarities. It will be shown how, after building a particular random walk on such a graph, it is possible to apply a community detection algorithm, known as map equation method, by optimising the description length of the partition. A relevant feature of this method is that it allows for both the quantification of the posterior uncertainty of the classification and the selection of variables to be used for classification purposes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
作为社区检测问题的贝叶斯非参数聚类
众所周知,广泛的贝叶斯非参数先验导致可观测变量的分布表示为具有无限数量分量的混合密度,并且这种表示在观测中引起聚类结构。然而,聚类识别并不是简单的后验,通常需要进行一些后处理。为了避免标签切换,引入了成对后验相似度,并将其用于应用经典聚类算法或通过最小化合适的损失函数来估计底层分区。本文提出在加权无向图上映射观测值,其中每个节点代表一个样本项,边的权重由后验两两相似度给出。它将展示如何在这样一个图上建立一个特定的随机漫步之后,通过优化分区的描述长度来应用社区检测算法,称为映射方程方法。这种方法的一个相关特征是,它既可以量化分类的后验不确定性,也可以选择用于分类目的的变量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Efficient Estimation of Pricing Kernels and Market-Implied Densities Futures-Trading Activity and Jump Risk: Evidence From the Bitcoin Market Partial Identification of Discrete Instrumental Variable Models using Shape Restrictions Frequency Dependent Risk Spatial Heterogeneity in the Borrowers' Mortgage Termination Decision – a Nonparametric Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1