{"title":"二部图的极小极大最优聚类的广义幂方法","authors":"Guillaume Braun, Hemant Tyagi","doi":"10.1093/imaiai/iaad006","DOIUrl":null,"url":null,"abstract":"\n Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_{1}$ and the number of columns $n_{2}$ of the associated adjacency matrix are of different order, the existing methods derived from the ones used for symmetric graphs can come with sub-optimal guarantees. Due to increasing number of applications for bipartite graphs in the high-dimensional regime, it is of fundamental importance to design optimal algorithms for this setting. The recent work of Ndaoud et al. (2022, IEEE Trans. Inf. Theory, 68, 1960–1975) improves the existing upper-bound for the misclustering rate in the special case where the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$) communities. Unfortunately, their algorithm cannot be extended to the more general setting where $K \\neq L \\geq 2$. We overcome this limitation by introducing a new algorithm based on the power method. We derive conditions for exact recovery in the general setting where $K \\neq L \\geq 2$, and show that it recovers the result in Ndaoud et al. (2022, IEEE Trans. Inf. Theory, 68, 1960–1975). We also derive a minimax lower bound on the misclustering error when $K=L$ under a symmetric version of our model, which matches the corresponding upper bound up to a factor depending on $K$.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Minimax optimal clustering of bipartite graphs with a generalized power method\",\"authors\":\"Guillaume Braun, Hemant Tyagi\",\"doi\":\"10.1093/imaiai/iaad006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_{1}$ and the number of columns $n_{2}$ of the associated adjacency matrix are of different order, the existing methods derived from the ones used for symmetric graphs can come with sub-optimal guarantees. Due to increasing number of applications for bipartite graphs in the high-dimensional regime, it is of fundamental importance to design optimal algorithms for this setting. The recent work of Ndaoud et al. (2022, IEEE Trans. Inf. Theory, 68, 1960–1975) improves the existing upper-bound for the misclustering rate in the special case where the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$) communities. Unfortunately, their algorithm cannot be extended to the more general setting where $K \\\\neq L \\\\geq 2$. We overcome this limitation by introducing a new algorithm based on the power method. We derive conditions for exact recovery in the general setting where $K \\\\neq L \\\\geq 2$, and show that it recovers the result in Ndaoud et al. (2022, IEEE Trans. Inf. Theory, 68, 1960–1975). We also derive a minimax lower bound on the misclustering error when $K=L$ under a symmetric version of our model, which matches the corresponding upper bound up to a factor depending on $K$.\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2022-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/imaiai/iaad006\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/imaiai/iaad006","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 5
摘要
二部图聚类是网络分析中的一项基本任务。在高维状态下,相关邻接矩阵的行数$n_{1}$和列数$n_{2}$的顺序不同,从对称图中派生出来的现有方法可能会带来次优保证。由于高维区域中二部图的应用越来越多,因此设计最优算法具有重要的基础意义。Ndaoud et al. (2022, IEEE Trans.)Inf. Theory, 68, 1960-1975)在列(对应的列)的特殊情况下,改进了现有的错误聚类率上限。行)可以分区到$L = 2$(参见。$K = 2$)社区。不幸的是,他们的算法不能扩展到更一般的设置$K \neq L \geq 2$。我们通过引入一种基于幂方法的新算法来克服这一限制。我们推导了在$K \neq L \geq 2$的一般设置下精确恢复的条件,并表明它恢复了Ndaoud等人(2022,IEEE Trans.)的结果。参考理论,68,1960-1975)。在我们模型的对称版本下,我们还导出了在$K=L$时错误聚类误差的最小最大下界,它与依赖于$K$的因子的相应上界相匹配。
Minimax optimal clustering of bipartite graphs with a generalized power method
Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_{1}$ and the number of columns $n_{2}$ of the associated adjacency matrix are of different order, the existing methods derived from the ones used for symmetric graphs can come with sub-optimal guarantees. Due to increasing number of applications for bipartite graphs in the high-dimensional regime, it is of fundamental importance to design optimal algorithms for this setting. The recent work of Ndaoud et al. (2022, IEEE Trans. Inf. Theory, 68, 1960–1975) improves the existing upper-bound for the misclustering rate in the special case where the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$) communities. Unfortunately, their algorithm cannot be extended to the more general setting where $K \neq L \geq 2$. We overcome this limitation by introducing a new algorithm based on the power method. We derive conditions for exact recovery in the general setting where $K \neq L \geq 2$, and show that it recovers the result in Ndaoud et al. (2022, IEEE Trans. Inf. Theory, 68, 1960–1975). We also derive a minimax lower bound on the misclustering error when $K=L$ under a symmetric version of our model, which matches the corresponding upper bound up to a factor depending on $K$.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.