用于确定网络数据中社群数量的 Eigengap 比率测试

arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI:arxiv-2409.05276

Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai

{"title":"用于确定网络数据中社群数量的 Eigengap 比率测试","authors":"Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai","doi":"arxiv-2409.05276","DOIUrl":null,"url":null,"abstract":"To characterize the community structure in network data, researchers have\nintroduced various block-type models, including the stochastic block model,\ndegree-corrected stochastic block model, mixed membership block model,\ndegree-corrected mixed membership block model, and others. A critical step in\napplying these models effectively is determining the number of communities in\nthe network. However, to our knowledge, existing methods for estimating the\nnumber of network communities often require model estimations or are unable to\nsimultaneously account for network sparsity and a divergent number of\ncommunities. In this paper, we propose an eigengap-ratio based test that\naddress these challenges. The test is straightforward to compute, requires no\nparameter tuning, and can be applied to a wide range of block models without\nthe need to estimate network distribution parameters. Furthermore, it is\neffective for both dense and sparse networks with a divergent number of\ncommunities. We show that the proposed test statistic converges to a function\nof the type-I Tracy-Widom distributions under the null hypothesis, and that the\ntest is asymptotically powerful under alternatives. Simulation studies on both\ndense and sparse networks demonstrate the efficacy of the proposed method.\nThree real-world examples are presented to illustrate the usefulness of the\nproposed test.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Eigengap Ratio Test for Determining the Number of Communities in Network Data\",\"authors\":\"Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai\",\"doi\":\"arxiv-2409.05276\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To characterize the community structure in network data, researchers have\\nintroduced various block-type models, including the stochastic block model,\\ndegree-corrected stochastic block model, mixed membership block model,\\ndegree-corrected mixed membership block model, and others. A critical step in\\napplying these models effectively is determining the number of communities in\\nthe network. However, to our knowledge, existing methods for estimating the\\nnumber of network communities often require model estimations or are unable to\\nsimultaneously account for network sparsity and a divergent number of\\ncommunities. In this paper, we propose an eigengap-ratio based test that\\naddress these challenges. The test is straightforward to compute, requires no\\nparameter tuning, and can be applied to a wide range of block models without\\nthe need to estimate network distribution parameters. Furthermore, it is\\neffective for both dense and sparse networks with a divergent number of\\ncommunities. We show that the proposed test statistic converges to a function\\nof the type-I Tracy-Widom distributions under the null hypothesis, and that the\\ntest is asymptotically powerful under alternatives. Simulation studies on both\\ndense and sparse networks demonstrate the efficacy of the proposed method.\\nThree real-world examples are presented to illustrate the usefulness of the\\nproposed test.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05276\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了描述网络数据中的群落结构，研究人员引入了各种块状模型，包括随机块状模型、度校正随机块状模型、混合成员块状模型、度校正混合成员块状模型等。有效应用这些模型的关键步骤是确定网络中的群落数量。然而，据我们所知，现有的估计网络社区数量的方法往往需要对模型进行估计，或者无法同时考虑网络稀疏性和社区数量的差异。在本文中，我们提出了一种基于 eigengap 比率的测试方法来解决这些难题。该检验计算简单，不需要调整参数，可应用于各种区块模型，无需估计网络分布参数。此外，它对具有不同群体数量的密集和稀疏网络都有效。我们证明，在零假设下，所提出的检验统计量收敛于 I 型 Tracy-Widom 分布的函数，并且在替代假设下，该检验在渐近上是强大的。在密集和稀疏网络上进行的仿真研究证明了所提方法的有效性，并列举了三个实际案例来说明所提检验的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Eigengap Ratio Test for Determining the Number of Communities in Network Data

To characterize the community structure in network data, researchers have introduced various block-type models, including the stochastic block model, degree-corrected stochastic block model, mixed membership block model, degree-corrected mixed membership block model, and others. A critical step in applying these models effectively is determining the number of communities in the network. However, to our knowledge, existing methods for estimating the number of network communities often require model estimations or are unable to simultaneously account for network sparsity and a divergent number of communities. In this paper, we propose an eigengap-ratio based test that address these challenges. The test is straightforward to compute, requires no parameter tuning, and can be applied to a wide range of block models without the need to estimate network distribution parameters. Furthermore, it is effective for both dense and sparse networks with a divergent number of communities. We show that the proposed test statistic converges to a function of the type-I Tracy-Widom distributions under the null hypothesis, and that the test is asymptotically powerful under alternatives. Simulation studies on both dense and sparse networks demonstrate the efficacy of the proposed method. Three real-world examples are presented to illustrate the usefulness of the proposed test.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - STAT - Methodology

自引率

0.00%

发文量