{"title":"Efficient Network Representation Learning via Cluster Similarity","authors":"Yasuhiro Fujiwara, Yasutoshi Ida, Atsutoshi Kumagai, Masahiro Nakano, Akisato Kimura, Naonori Ueda","doi":"10.1007/s41019-023-00222-x","DOIUrl":null,"url":null,"abstract":"Abstract Network representation learning is a de facto tool for graph analytics. The mainstream of the previous approaches is to factorize the proximity matrix between nodes. However, if n is the number of nodes, since the size of the proximity matrix is $$n \\times n$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>n</mml:mi> <mml:mo>×</mml:mo> <mml:mi>n</mml:mi> </mml:mrow> </mml:math> , it needs $$O(n^3)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>n</mml:mi> <mml:mn>3</mml:mn> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> time and $$O(n^2)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>n</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> space to perform network representation learning; they are significantly high for large-scale graphs. This paper introduces the novel idea of using similarities between clusters instead of proximities between nodes; the proposed approach computes the representations of the clusters from similarities between clusters and computes the representations of nodes by referring to them. If l is the number of clusters, since $$l \\ll n$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>l</mml:mi> <mml:mo>≪</mml:mo> <mml:mi>n</mml:mi> </mml:mrow> </mml:math> , we can efficiently obtain the representations of clusters from a small $$l \\times l$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>l</mml:mi> <mml:mo>×</mml:mo> <mml:mi>l</mml:mi> </mml:mrow> </mml:math> similarity matrix. Furthermore, since nodes in each cluster share similar structural properties, we can effectively compute the representation vectors of nodes. Experiments show that our approach can perform network representation learning more efficiently and effectively than existing approaches.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":5.1000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41019-023-00222-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Network representation learning is a de facto tool for graph analytics. The mainstream of the previous approaches is to factorize the proximity matrix between nodes. However, if n is the number of nodes, since the size of the proximity matrix is $$n \times n$$ n×n , it needs $$O(n^3)$$ O(n3) time and $$O(n^2)$$ O(n2) space to perform network representation learning; they are significantly high for large-scale graphs. This paper introduces the novel idea of using similarities between clusters instead of proximities between nodes; the proposed approach computes the representations of the clusters from similarities between clusters and computes the representations of nodes by referring to them. If l is the number of clusters, since $$l \ll n$$ l≪n , we can efficiently obtain the representations of clusters from a small $$l \times l$$ l×l similarity matrix. Furthermore, since nodes in each cluster share similar structural properties, we can effectively compute the representation vectors of nodes. Experiments show that our approach can perform network representation learning more efficiently and effectively than existing approaches.
网络表示学习实际上是图分析的工具。以往的主流方法是对节点间的接近矩阵进行因式分解。但如果n为节点数,由于邻近矩阵的大小为$$n \times n$$ n × n,进行网络表示学习需要$$O(n^3)$$ O (n 3)时间和$$O(n^2)$$ O (n 2)空间;对于大规模图形来说,它们非常高。本文介绍了利用聚类之间的相似度代替节点之间的接近度的新思想;该方法根据聚类之间的相似性计算聚类的表示,并通过引用节点来计算节点的表示。如果l是聚类的数目,由于$$l \ll n$$ l≪n,我们可以从一个小的$$l \times l$$ l × l相似矩阵中有效地得到聚类的表示。此外,由于每个集群中的节点具有相似的结构属性,我们可以有效地计算节点的表示向量。实验表明,我们的方法可以比现有的方法更有效地进行网络表示学习。
期刊介绍:
The journal of Data Science and Engineering (DSE) responds to the remarkable change in the focus of information technology development from CPU-intensive computation to data-intensive computation, where the effective application of data, especially big data, becomes vital. The emerging discipline data science and engineering, an interdisciplinary field integrating theories and methods from computer science, statistics, information science, and other fields, focuses on the foundations and engineering of efficient and effective techniques and systems for data collection and management, for data integration and correlation, for information and knowledge extraction from massive data sets, and for data use in different application domains. Focusing on the theoretical background and advanced engineering approaches, DSE aims to offer a prime forum for researchers, professionals, and industrial practitioners to share their knowledge in this rapidly growing area. It provides in-depth coverage of the latest advances in the closely related fields of data science and data engineering. More specifically, DSE covers four areas: (i) the data itself, i.e., the nature and quality of the data, especially big data; (ii) the principles of information extraction from data, especially big data; (iii) the theory behind data-intensive computing; and (iv) the techniques and systems used to analyze and manage big data. DSE welcomes papers that explore the above subjects. Specific topics include, but are not limited to: (a) the nature and quality of data, (b) the computational complexity of data-intensive computing,(c) new methods for the design and analysis of the algorithms for solving problems with big data input,(d) collection and integration of data collected from internet and sensing devises or sensor networks, (e) representation, modeling, and visualization of big data,(f) storage, transmission, and management of big data,(g) methods and algorithms of data intensive computing, such asmining big data,online analysis processing of big data,big data-based machine learning, big data based decision-making, statistical computation of big data, graph-theoretic computation of big data, linear algebraic computation of big data, and big data-based optimization. (h) hardware systems and software systems for data-intensive computing, (i) data security, privacy, and trust, and(j) novel applications of big data.