{"title":"A semi-orthogonal nonnegative matrix tri-factorization algorithm for overlapping community detection","authors":"Zhaoyang Li, Yuehan Yang","doi":"10.1007/s00362-024-01537-1","DOIUrl":null,"url":null,"abstract":"<p>In this paper, we focus on overlapping community detection and propose an efficient semi-orthogonal nonnegative matrix tri-factorization (semi-ONMTF) algorithm. This method factorizes a matrix <i>X</i> into an orthogonal matrix <i>U</i>, a nonnegative matrix <i>B</i>, and a transposed matrix <span>\\(U^\\mathrm {\\scriptscriptstyle T} \\)</span>. We use the Cayley Transformation to maintain strict orthogonality of <i>U</i> that each iteration stays on the Stiefel Manifold. This algorithm is computationally efficient because the solutions of <i>U</i> and <i>B</i> are simplified into a matrix-wise update algorithm. Applying this method, we detect overlapping communities by the belonging coefficient vector and analyse associations between communities by the unweighted network of communities. We conduct simulations and applications to show that the proposed method has wide applicability. In a real data example, we apply the semi-ONMTF to a stock data set and construct a directed association network of companies. Based on the modularity for directed and overlapping communities, we obtain five overlapping communities, 17 overlapping nodes, and five outlier nodes in the network. We also discuss the associations between communities, providing insights into the overlapping community detection on the stock market network.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"395 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Papers","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00362-024-01537-1","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we focus on overlapping community detection and propose an efficient semi-orthogonal nonnegative matrix tri-factorization (semi-ONMTF) algorithm. This method factorizes a matrix X into an orthogonal matrix U, a nonnegative matrix B, and a transposed matrix \(U^\mathrm {\scriptscriptstyle T} \). We use the Cayley Transformation to maintain strict orthogonality of U that each iteration stays on the Stiefel Manifold. This algorithm is computationally efficient because the solutions of U and B are simplified into a matrix-wise update algorithm. Applying this method, we detect overlapping communities by the belonging coefficient vector and analyse associations between communities by the unweighted network of communities. We conduct simulations and applications to show that the proposed method has wide applicability. In a real data example, we apply the semi-ONMTF to a stock data set and construct a directed association network of companies. Based on the modularity for directed and overlapping communities, we obtain five overlapping communities, 17 overlapping nodes, and five outlier nodes in the network. We also discuss the associations between communities, providing insights into the overlapping community detection on the stock market network.
本文的重点是重叠群落检测,并提出了一种高效的半正交非负矩阵三因子化(semi-ONMTF)算法。该方法将矩阵 X 分解为一个正交矩阵 U、一个非负矩阵 B 和一个转置矩阵(U^\mathrm {\scriptscriptstyle T} \)。我们使用凯利变换(Cayley Transformation)来保持 U 的严格正交性,使每次迭代都保持在 Stiefel Manifold 上。这种算法的计算效率很高,因为 U 和 B 的解被简化为矩阵更新算法。应用这种方法,我们可以通过归属系数向量检测重叠群落,并通过非加权群落网络分析群落间的关联。我们通过模拟和应用表明,所提出的方法具有广泛的适用性。在一个真实数据示例中,我们将半ONMTF应用于股票数据集,并构建了公司的有向关联网络。根据有向和重叠群落的模块性,我们得到了网络中的 5 个重叠群落、17 个重叠节点和 5 个离群节点。我们还讨论了社群之间的关联,为在股票市场网络上检测重叠社群提供了启示。
期刊介绍:
The journal Statistical Papers addresses itself to all persons and organizations that have to deal with statistical methods in their own field of work. It attempts to provide a forum for the presentation and critical assessment of statistical methods, in particular for the discussion of their methodological foundations as well as their potential applications. Methods that have broad applications will be preferred. However, special attention is given to those statistical methods which are relevant to the economic and social sciences. In addition to original research papers, readers will find survey articles, short notes, reports on statistical software, problem section, and book reviews.