Learning with Small Data: Subgraph Counting Queries

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Science and Engineering Pub Date : 2023-09-01 DOI:10.1007/s41019-023-00223-w

Kangfei Zhao, Zongyan He, Jeffrey Xu Yu, Yu Rong

{"title":"Learning with Small Data: Subgraph Counting Queries","authors":"Kangfei Zhao, Zongyan He, Jeffrey Xu Yu, Yu Rong","doi":"10.1007/s41019-023-00223-w","DOIUrl":null,"url":null,"abstract":"Abstract Deep Learning (DL) has been widely used in many applications, and its success is achieved with large training data. A key issue is how to provide a DL solution when there is no large training data to learn initially. In this paper, we explore a meta-learning approach for a specific problem, subgraph isomorphism counting, which is a fundamental problem in graph analysis to count the number of a given pattern graph, p , in a data graph, g , that matches p . There are various data graphs and pattern graphs. A subgraph isomorphism counting query is specified by a pair, ( g , p ). This problem is NP-hard and needs large training data to learn by DL in nature. We design a Gaussian Process (GP) model which combines Graph Neural Network with Bayesian nonparametric, and we train the GP by a meta-learning algorithm on a small set of training data. By meta-learning, we can obtain a generalized meta-model to better encode the information of data and pattern graphs and capture the prior of small tasks. With the meta-model learned, we handle a collection of pairs ( g , p ), as a task, where some pairs may be associated with the ground-truth, and some pairs are the queries to answer. There are two cases. One is there are some with ground-truth (few-shot), and one is there is none with ground-truth (zero-shot). We provide our solutions for both. In particular, for zero-shot, we propose a new data-driven approach to predict the count values. Note that zero-shot learning for our regression tasks is difficult, and there is no hands-on solution in the literature. We conducted extensive experimental studies to confirm that our approach is robust to model degeneration on small training data, and our meta-model can fast adapt to new queries by few-shot and zero-shot learning.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":4.6000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41019-023-00223-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Deep Learning (DL) has been widely used in many applications, and its success is achieved with large training data. A key issue is how to provide a DL solution when there is no large training data to learn initially. In this paper, we explore a meta-learning approach for a specific problem, subgraph isomorphism counting, which is a fundamental problem in graph analysis to count the number of a given pattern graph, p , in a data graph, g , that matches p . There are various data graphs and pattern graphs. A subgraph isomorphism counting query is specified by a pair, ( g , p ). This problem is NP-hard and needs large training data to learn by DL in nature. We design a Gaussian Process (GP) model which combines Graph Neural Network with Bayesian nonparametric, and we train the GP by a meta-learning algorithm on a small set of training data. By meta-learning, we can obtain a generalized meta-model to better encode the information of data and pattern graphs and capture the prior of small tasks. With the meta-model learned, we handle a collection of pairs ( g , p ), as a task, where some pairs may be associated with the ground-truth, and some pairs are the queries to answer. There are two cases. One is there are some with ground-truth (few-shot), and one is there is none with ground-truth (zero-shot). We provide our solutions for both. In particular, for zero-shot, we propose a new data-driven approach to predict the count values. Note that zero-shot learning for our regression tasks is difficult, and there is no hands-on solution in the literature. We conducted extensive experimental studies to confirm that our approach is robust to model degeneration on small training data, and our meta-model can fast adapt to new queries by few-shot and zero-shot learning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

小数据学习:子图计数查询

深度学习(Deep Learning, DL)在许多应用中得到了广泛的应用，它的成功离不开大量的训练数据。一个关键问题是如何在最初没有大量训练数据可供学习的情况下提供DL解决方案。在本文中，我们探索了一个特定问题的元学习方法，即子图同构计数，这是图分析中的一个基本问题，用于计算数据图g中与p匹配的给定模式图p的个数。有各种各样的数据图和模式图。子图同构计数查询由一对(g, p)指定。这个问题本质上是np困难的，需要大量的训练数据来进行深度学习。我们设计了一个结合了图神经网络和贝叶斯非参数的高斯过程模型，并在一个小的训练数据集上使用元学习算法对高斯过程进行训练。通过元学习，我们可以得到一个广义的元模型来更好地编码数据和模式图的信息，并捕获小任务的先验。通过学习元模型，我们处理一组对(g, p)，作为一个任务，其中一些对可能与基本事实相关联，一些对是要回答的查询。有两种情况。一种是有一些是基本事实(few-shot)，一种是没有基本事实(zero-shot)。我们为这两方面提供解决方案。特别是对于零射，我们提出了一种新的数据驱动方法来预测计数值。请注意，我们的回归任务的零学习是困难的，并且在文献中没有实际的解决方案。我们进行了大量的实验研究，以证实我们的方法对小训练数据上的模型退化具有鲁棒性，并且我们的元模型可以通过few-shot和zero-shot学习快速适应新的查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Data Science and Engineering Engineering-Computational Mechanics

CiteScore

10.40

自引率

2.40%

发文量

审稿时长

12 weeks

期刊介绍： The journal of Data Science and Engineering (DSE) responds to the remarkable change in the focus of information technology development from CPU-intensive computation to data-intensive computation, where the effective application of data, especially big data, becomes vital. The emerging discipline data science and engineering, an interdisciplinary field integrating theories and methods from computer science, statistics, information science, and other fields, focuses on the foundations and engineering of efficient and effective techniques and systems for data collection and management, for data integration and correlation, for information and knowledge extraction from massive data sets, and for data use in different application domains. Focusing on the theoretical background and advanced engineering approaches, DSE aims to offer a prime forum for researchers, professionals, and industrial practitioners to share their knowledge in this rapidly growing area. It provides in-depth coverage of the latest advances in the closely related fields of data science and data engineering. More specifically, DSE covers four areas: (i) the data itself, i.e., the nature and quality of the data, especially big data; (ii) the principles of information extraction from data, especially big data; (iii) the theory behind data-intensive computing; and (iv) the techniques and systems used to analyze and manage big data. DSE welcomes papers that explore the above subjects. Specific topics include, but are not limited to: (a) the nature and quality of data, (b) the computational complexity of data-intensive computing,(c) new methods for the design and analysis of the algorithms for solving problems with big data input,(d) collection and integration of data collected from internet and sensing devises or sensor networks, (e) representation, modeling, and visualization of big data,(f) storage, transmission, and management of big data,(g) methods and algorithms of data intensive computing, such asmining big data,online analysis processing of big data,big data-based machine learning, big data based decision-making, statistical computation of big data, graph-theoretic computation of big data, linear algebraic computation of big data, and big data-based optimization. (h) hardware systems and software systems for data-intensive computing, (i) data security, privacy, and trust, and(j) novel applications of big data.