Abstract Network representation learning is a de facto tool for graph analytics. The mainstream of the previous approaches is to factorize the proximity matrix between nodes. However, if n is the number of nodes, since the size of the proximity matrix is $$n times n$$ n×n , it needs $$O(n^3)$$ O(n3) time and $$O(n^2)$$ O(n2) space to perform network representation learning; they are significantly high for large-scale graphs. This paper introduces the novel idea of using similarities between clusters instead of proximities between nodes; the proposed approach computes the representations of the clusters from similarities between clusters and computes the representations of nodes by referring to them. If l is the number of clusters, since $$l ll n$$ l≪n , we can efficiently obtain the representations of clusters from a small $$l times l$$ l×l similarity matrix. Furthermore, since nodes in each cluster share similar structural properties, we can effectively compute the representation vectors of nodes. Experiments show that our approach can perform network representation learning more efficiently and effectively than existing approaches.
网络表示学习实际上是图分析的工具。以往的主流方法是对节点间的接近矩阵进行因式分解。但如果n为节点数,由于邻近矩阵的大小为$$n times n$$ n × n,进行网络表示学习需要$$O(n^3)$$ O (n 3)时间和$$O(n^2)$$ O (n 2)空间;对于大规模图形来说,它们非常高。本文介绍了利用聚类之间的相似度代替节点之间的接近度的新思想;该方法根据聚类之间的相似性计算聚类的表示,并通过引用节点来计算节点的表示。如果l是聚类的数目,由于$$l ll n$$ l≪n,我们可以从一个小的$$l times l$$ l × l相似矩阵中有效地得到聚类的表示。此外,由于每个集群中的节点具有相似的结构属性,我们可以有效地计算节点的表示向量。实验表明,我们的方法可以比现有的方法更有效地进行网络表示学习。
{"title":"Efficient Network Representation Learning via Cluster Similarity","authors":"Yasuhiro Fujiwara, Yasutoshi Ida, Atsutoshi Kumagai, Masahiro Nakano, Akisato Kimura, Naonori Ueda","doi":"10.1007/s41019-023-00222-x","DOIUrl":"https://doi.org/10.1007/s41019-023-00222-x","url":null,"abstract":"Abstract Network representation learning is a de facto tool for graph analytics. The mainstream of the previous approaches is to factorize the proximity matrix between nodes. However, if n is the number of nodes, since the size of the proximity matrix is $$n times n$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>n</mml:mi> <mml:mo>×</mml:mo> <mml:mi>n</mml:mi> </mml:mrow> </mml:math> , it needs $$O(n^3)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>n</mml:mi> <mml:mn>3</mml:mn> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> time and $$O(n^2)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>n</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> space to perform network representation learning; they are significantly high for large-scale graphs. This paper introduces the novel idea of using similarities between clusters instead of proximities between nodes; the proposed approach computes the representations of the clusters from similarities between clusters and computes the representations of nodes by referring to them. If l is the number of clusters, since $$l ll n$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>l</mml:mi> <mml:mo>≪</mml:mo> <mml:mi>n</mml:mi> </mml:mrow> </mml:math> , we can efficiently obtain the representations of clusters from a small $$l times l$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>l</mml:mi> <mml:mo>×</mml:mo> <mml:mi>l</mml:mi> </mml:mrow> </mml:math> similarity matrix. Furthermore, since nodes in each cluster share similar structural properties, we can effectively compute the representation vectors of nodes. Experiments show that our approach can perform network representation learning more efficiently and effectively than existing approaches.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135298458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s41019-023-00223-w
Kangfei Zhao, Zongyan He, Jeffrey Xu Yu, Yu Rong
Abstract Deep Learning (DL) has been widely used in many applications, and its success is achieved with large training data. A key issue is how to provide a DL solution when there is no large training data to learn initially. In this paper, we explore a meta-learning approach for a specific problem, subgraph isomorphism counting, which is a fundamental problem in graph analysis to count the number of a given pattern graph, p , in a data graph, g , that matches p . There are various data graphs and pattern graphs. A subgraph isomorphism counting query is specified by a pair, ( g , p ). This problem is NP-hard and needs large training data to learn by DL in nature. We design a Gaussian Process (GP) model which combines Graph Neural Network with Bayesian nonparametric, and we train the GP by a meta-learning algorithm on a small set of training data. By meta-learning, we can obtain a generalized meta-model to better encode the information of data and pattern graphs and capture the prior of small tasks. With the meta-model learned, we handle a collection of pairs ( g , p ), as a task, where some pairs may be associated with the ground-truth, and some pairs are the queries to answer. There are two cases. One is there are some with ground-truth (few-shot), and one is there is none with ground-truth (zero-shot). We provide our solutions for both. In particular, for zero-shot, we propose a new data-driven approach to predict the count values. Note that zero-shot learning for our regression tasks is difficult, and there is no hands-on solution in the literature. We conducted extensive experimental studies to confirm that our approach is robust to model degeneration on small training data, and our meta-model can fast adapt to new queries by few-shot and zero-shot learning.
{"title":"Learning with Small Data: Subgraph Counting Queries","authors":"Kangfei Zhao, Zongyan He, Jeffrey Xu Yu, Yu Rong","doi":"10.1007/s41019-023-00223-w","DOIUrl":"https://doi.org/10.1007/s41019-023-00223-w","url":null,"abstract":"Abstract Deep Learning (DL) has been widely used in many applications, and its success is achieved with large training data. A key issue is how to provide a DL solution when there is no large training data to learn initially. In this paper, we explore a meta-learning approach for a specific problem, subgraph isomorphism counting, which is a fundamental problem in graph analysis to count the number of a given pattern graph, p , in a data graph, g , that matches p . There are various data graphs and pattern graphs. A subgraph isomorphism counting query is specified by a pair, ( g , p ). This problem is NP-hard and needs large training data to learn by DL in nature. We design a Gaussian Process (GP) model which combines Graph Neural Network with Bayesian nonparametric, and we train the GP by a meta-learning algorithm on a small set of training data. By meta-learning, we can obtain a generalized meta-model to better encode the information of data and pattern graphs and capture the prior of small tasks. With the meta-model learned, we handle a collection of pairs ( g , p ), as a task, where some pairs may be associated with the ground-truth, and some pairs are the queries to answer. There are two cases. One is there are some with ground-truth (few-shot), and one is there is none with ground-truth (zero-shot). We provide our solutions for both. In particular, for zero-shot, we propose a new data-driven approach to predict the count values. Note that zero-shot learning for our regression tasks is difficult, and there is no hands-on solution in the literature. We conducted extensive experimental studies to confirm that our approach is robust to model degeneration on small training data, and our meta-model can fast adapt to new queries by few-shot and zero-shot learning.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136355012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-29DOI: 10.1007/s41019-023-00225-8
Junyang Chen, Ziyi Chen, Mengzhu Wang, Ge Fan, Guo Zhong, Ou Liu, Wenfeng Du, Zhenghua Xu, Zhiguo Gong
{"title":"A Neural Inference of User Social Interest for Item Recommendation","authors":"Junyang Chen, Ziyi Chen, Mengzhu Wang, Ge Fan, Guo Zhong, Ou Liu, Wenfeng Du, Zhenghua Xu, Zhiguo Gong","doi":"10.1007/s41019-023-00225-8","DOIUrl":"https://doi.org/10.1007/s41019-023-00225-8","url":null,"abstract":"","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"65 1","pages":"223 - 233"},"PeriodicalIF":4.2,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80204113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}