通过过度拟合混合先验自动检测边缘集群

IF 1.4 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Network Science Pub Date : 2024-01-19 DOI:10.1017/nws.2023.22

Hanh T. D. Pham, Daniel K. Sewell

{"title":"通过过度拟合混合先验自动检测边缘集群","authors":"Hanh T. D. Pham, Daniel K. Sewell","doi":"10.1017/nws.2023.22","DOIUrl":null,"url":null,"abstract":"Most community detection methods focus on clustering actors with common features in a network. However, clustering edges offers a more intuitive way to understand the network structure in many real-life applications. Among the existing methods for network edge clustering, the majority are algorithmic, with the exception of the latent space edge clustering (LSEC) model proposed by Sewell (Journal of Computational and Graphical Statistics, 30(2), 390–405, 2021). LSEC was shown to have good performance in simulation and real-life data analysis, but fitting this model requires prior knowledge of the number of clusters and latent dimensions, which are often unknown to researchers. Within a Bayesian framework, we propose an extension to the LSEC model using a sparse finite mixture prior that supports automated selection of the number of clusters. We refer to our proposed approach as the automated LSEC or aLSEC. We develop a variational Bayes generalized expectation-maximization approach and a Hamiltonian Monte Carlo-within Gibbs algorithm for estimation. Our simulation study showed that aLSEC reduced run time by 10 to over 100 times compared to LSEC. Like LSEC, aLSEC maintains a computational cost that grows linearly with the number of actors in a network, making it scalable to large sparse networks. We developed the R package aLSEC which implements the proposed methodology.","PeriodicalId":51827,"journal":{"name":"Network Science","volume":"60 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated detection of edge clusters via an overfitted mixture prior\",\"authors\":\"Hanh T. D. Pham, Daniel K. Sewell\",\"doi\":\"10.1017/nws.2023.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most community detection methods focus on clustering actors with common features in a network. However, clustering edges offers a more intuitive way to understand the network structure in many real-life applications. Among the existing methods for network edge clustering, the majority are algorithmic, with the exception of the latent space edge clustering (LSEC) model proposed by Sewell (Journal of Computational and Graphical Statistics, 30(2), 390–405, 2021). LSEC was shown to have good performance in simulation and real-life data analysis, but fitting this model requires prior knowledge of the number of clusters and latent dimensions, which are often unknown to researchers. Within a Bayesian framework, we propose an extension to the LSEC model using a sparse finite mixture prior that supports automated selection of the number of clusters. We refer to our proposed approach as the automated LSEC or aLSEC. We develop a variational Bayes generalized expectation-maximization approach and a Hamiltonian Monte Carlo-within Gibbs algorithm for estimation. Our simulation study showed that aLSEC reduced run time by 10 to over 100 times compared to LSEC. Like LSEC, aLSEC maintains a computational cost that grows linearly with the number of actors in a network, making it scalable to large sparse networks. We developed the R package aLSEC which implements the proposed methodology.\",\"PeriodicalId\":51827,\"journal\":{\"name\":\"Network Science\",\"volume\":\"60 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Network Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/nws.2023.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOCIAL SCIENCES, INTERDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/nws.2023.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

大多数社群检测方法都侧重于对网络中具有共同特征的参与者进行聚类。然而，在许多实际应用中，边缘聚类提供了一种更直观的了解网络结构的方法。在现有的网络边缘聚类方法中，除了 Sewell 提出的潜空间边缘聚类（LSEC）模型（《计算和图形统计期刊》，30(2), 390-405, 2021 年）之外，大多数方法都是算法性的。在模拟和现实数据分析中，LSEC 被证明具有良好的性能，但拟合该模型需要事先了解聚类数量和潜在维度，而研究人员往往不知道这些信息。在贝叶斯框架内，我们提出了一种使用稀疏有限混合物先验的 LSEC 模型扩展方法，它支持自动选择聚类数量。我们将所提出的方法称为自动 LSEC 或 aLSEC。我们开发了一种变分贝叶斯广义期望最大化方法和一种含吉布斯算法的哈密尔顿蒙特卡洛估计方法。我们的模拟研究表明，与 LSEC 相比，aLSEC 的运行时间缩短了 10 到 100 多倍。与 LSEC 一样，aLSEC 的计算成本与网络中参与者的数量呈线性增长，因此可扩展至大型稀疏网络。我们开发的 R 软件包 aLSEC 实现了所提出的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automated detection of edge clusters via an overfitted mixture prior

Most community detection methods focus on clustering actors with common features in a network. However, clustering edges offers a more intuitive way to understand the network structure in many real-life applications. Among the existing methods for network edge clustering, the majority are algorithmic, with the exception of the latent space edge clustering (LSEC) model proposed by Sewell (Journal of Computational and Graphical Statistics, 30(2), 390–405, 2021). LSEC was shown to have good performance in simulation and real-life data analysis, but fitting this model requires prior knowledge of the number of clusters and latent dimensions, which are often unknown to researchers. Within a Bayesian framework, we propose an extension to the LSEC model using a sparse finite mixture prior that supports automated selection of the number of clusters. We refer to our proposed approach as the automated LSEC or aLSEC. We develop a variational Bayes generalized expectation-maximization approach and a Hamiltonian Monte Carlo-within Gibbs algorithm for estimation. Our simulation study showed that aLSEC reduced run time by 10 to over 100 times compared to LSEC. Like LSEC, aLSEC maintains a computational cost that grows linearly with the number of actors in a network, making it scalable to large sparse networks. We developed the R package aLSEC which implements the proposed methodology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Network Science SOCIAL SCIENCES, INTERDISCIPLINARY-

CiteScore

3.50

自引率

5.90%

发文量

期刊介绍： Network Science is an important journal for an important discipline - one using the network paradigm, focusing on actors and relational linkages, to inform research, methodology, and applications from many fields across the natural, social, engineering and informational sciences. Given growing understanding of the interconnectedness and globalization of the world, network methods are an increasingly recognized way to research aspects of modern society along with the individuals, organizations, and other actors within it. The discipline is ready for a comprehensive journal, open to papers from all relevant areas. Network Science is a defining work, shaping this discipline. The journal welcomes contributions from researchers in all areas working on network theory, methods, and data.

期刊最新文献

The latent cognitive structures of social networks Algorithmic aspects of temporal betweenness When can networks be inferred from observed groups? Generating preferential attachment graphs via a Pólya urn with expanding colors A generalized hypothesis test for community structure in networks