{"title":"用于确定网络数据中社群数量的 Eigengap 比率测试","authors":"Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai","doi":"arxiv-2409.05276","DOIUrl":null,"url":null,"abstract":"To characterize the community structure in network data, researchers have\nintroduced various block-type models, including the stochastic block model,\ndegree-corrected stochastic block model, mixed membership block model,\ndegree-corrected mixed membership block model, and others. A critical step in\napplying these models effectively is determining the number of communities in\nthe network. However, to our knowledge, existing methods for estimating the\nnumber of network communities often require model estimations or are unable to\nsimultaneously account for network sparsity and a divergent number of\ncommunities. In this paper, we propose an eigengap-ratio based test that\naddress these challenges. The test is straightforward to compute, requires no\nparameter tuning, and can be applied to a wide range of block models without\nthe need to estimate network distribution parameters. Furthermore, it is\neffective for both dense and sparse networks with a divergent number of\ncommunities. We show that the proposed test statistic converges to a function\nof the type-I Tracy-Widom distributions under the null hypothesis, and that the\ntest is asymptotically powerful under alternatives. Simulation studies on both\ndense and sparse networks demonstrate the efficacy of the proposed method.\nThree real-world examples are presented to illustrate the usefulness of the\nproposed test.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Eigengap Ratio Test for Determining the Number of Communities in Network Data\",\"authors\":\"Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai\",\"doi\":\"arxiv-2409.05276\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To characterize the community structure in network data, researchers have\\nintroduced various block-type models, including the stochastic block model,\\ndegree-corrected stochastic block model, mixed membership block model,\\ndegree-corrected mixed membership block model, and others. A critical step in\\napplying these models effectively is determining the number of communities in\\nthe network. However, to our knowledge, existing methods for estimating the\\nnumber of network communities often require model estimations or are unable to\\nsimultaneously account for network sparsity and a divergent number of\\ncommunities. In this paper, we propose an eigengap-ratio based test that\\naddress these challenges. The test is straightforward to compute, requires no\\nparameter tuning, and can be applied to a wide range of block models without\\nthe need to estimate network distribution parameters. Furthermore, it is\\neffective for both dense and sparse networks with a divergent number of\\ncommunities. We show that the proposed test statistic converges to a function\\nof the type-I Tracy-Widom distributions under the null hypothesis, and that the\\ntest is asymptotically powerful under alternatives. Simulation studies on both\\ndense and sparse networks demonstrate the efficacy of the proposed method.\\nThree real-world examples are presented to illustrate the usefulness of the\\nproposed test.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05276\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
为了描述网络数据中的群落结构,研究人员引入了各种块状模型,包括随机块状模型、度校正随机块状模型、混合成员块状模型、度校正混合成员块状模型等。有效应用这些模型的关键步骤是确定网络中的群落数量。然而,据我们所知,现有的估计网络社区数量的方法往往需要对模型进行估计,或者无法同时考虑网络稀疏性和社区数量的差异。在本文中,我们提出了一种基于 eigengap 比率的测试方法来解决这些难题。该检验计算简单,不需要调整参数,可应用于各种区块模型,无需估计网络分布参数。此外,它对具有不同群体数量的密集和稀疏网络都有效。我们证明,在零假设下,所提出的检验统计量收敛于 I 型 Tracy-Widom 分布的函数,并且在替代假设下,该检验在渐近上是强大的。在密集和稀疏网络上进行的仿真研究证明了所提方法的有效性,并列举了三个实际案例来说明所提检验的实用性。
An Eigengap Ratio Test for Determining the Number of Communities in Network Data
To characterize the community structure in network data, researchers have
introduced various block-type models, including the stochastic block model,
degree-corrected stochastic block model, mixed membership block model,
degree-corrected mixed membership block model, and others. A critical step in
applying these models effectively is determining the number of communities in
the network. However, to our knowledge, existing methods for estimating the
number of network communities often require model estimations or are unable to
simultaneously account for network sparsity and a divergent number of
communities. In this paper, we propose an eigengap-ratio based test that
address these challenges. The test is straightforward to compute, requires no
parameter tuning, and can be applied to a wide range of block models without
the need to estimate network distribution parameters. Furthermore, it is
effective for both dense and sparse networks with a divergent number of
communities. We show that the proposed test statistic converges to a function
of the type-I Tracy-Widom distributions under the null hypothesis, and that the
test is asymptotically powerful under alternatives. Simulation studies on both
dense and sparse networks demonstrate the efficacy of the proposed method.
Three real-world examples are presented to illustrate the usefulness of the
proposed test.