学习计数:一个用于graphlet计数估计的深度学习框架

IF 1.5 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Network Science Pub Date : 2020-09-11 DOI:10.1017/nws.2020.35

Xutong Liu, Y. Chen, John C.S. Lui, Konstantin Avrachenkov

{"title":"学习计数:一个用于graphlet计数估计的深度学习框架","authors":"Xutong Liu, Y. Chen, John C.S. Lui, Konstantin Avrachenkov","doi":"10.1017/nws.2020.35","DOIUrl":null,"url":null,"abstract":"Abstract Graphlet counting is a widely explored problem in network analysis and has been successfully applied to a variety of applications in many domains, most notatbly bioinformatics, social science, and infrastructure network studies. Efficiently computing graphlet counts remains challenging due to the combinatorial explosion, where a naive enumeration algorithm needs O(Nk) time for k-node graphlets in a network of size N. Recently, many works introduced carefully designed combinatorial and sampling methods with encouraging results. However, the existing methods ignore the fact that graphlet counts and the graph structural information are correlated. They always consider a graph as a new input and repeat the tedious counting procedure on a regular basis even if it is similar or exactly isomorphic to previously studied graphs. This provides an opportunity to speed up the graphlet count estimation procedure by exploiting this correlation via learning methods. In this paper, we raise a novel graphlet count learning (GCL) problem: given a set of historical graphs with known graphlet counts, how to learn to estimate/predict graphlet count for unseen graphs coming from the same (or similar) underlying distribution. We develop a deep learning framework which contains two convolutional neural network models and a series of data preprocessing techniques to solve the GCL problem. Extensive experiments are conducted on three types of synthetic random graphs and three types of real-world graphs for all 3-, 4-, and 5-node graphlets to demonstrate the accuracy, efficiency, and generalizability of our framework. Compared with state-of-the-art exact/sampling methods, our framework shows great potential, which can offer up to two orders of magnitude speedup on synthetic graphs and achieve on par speed on real-world graphs with competitive accuracy.","PeriodicalId":51827,"journal":{"name":"Network Science","volume":"9 1","pages":"S23 - S60"},"PeriodicalIF":1.5000,"publicationDate":"2020-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/nws.2020.35","citationCount":"4","resultStr":"{\"title\":\"Learning to count: A deep learning framework for graphlet count estimation\",\"authors\":\"Xutong Liu, Y. Chen, John C.S. Lui, Konstantin Avrachenkov\",\"doi\":\"10.1017/nws.2020.35\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Graphlet counting is a widely explored problem in network analysis and has been successfully applied to a variety of applications in many domains, most notatbly bioinformatics, social science, and infrastructure network studies. Efficiently computing graphlet counts remains challenging due to the combinatorial explosion, where a naive enumeration algorithm needs O(Nk) time for k-node graphlets in a network of size N. Recently, many works introduced carefully designed combinatorial and sampling methods with encouraging results. However, the existing methods ignore the fact that graphlet counts and the graph structural information are correlated. They always consider a graph as a new input and repeat the tedious counting procedure on a regular basis even if it is similar or exactly isomorphic to previously studied graphs. This provides an opportunity to speed up the graphlet count estimation procedure by exploiting this correlation via learning methods. In this paper, we raise a novel graphlet count learning (GCL) problem: given a set of historical graphs with known graphlet counts, how to learn to estimate/predict graphlet count for unseen graphs coming from the same (or similar) underlying distribution. We develop a deep learning framework which contains two convolutional neural network models and a series of data preprocessing techniques to solve the GCL problem. Extensive experiments are conducted on three types of synthetic random graphs and three types of real-world graphs for all 3-, 4-, and 5-node graphlets to demonstrate the accuracy, efficiency, and generalizability of our framework. Compared with state-of-the-art exact/sampling methods, our framework shows great potential, which can offer up to two orders of magnitude speedup on synthetic graphs and achieve on par speed on real-world graphs with competitive accuracy.\",\"PeriodicalId\":51827,\"journal\":{\"name\":\"Network Science\",\"volume\":\"9 1\",\"pages\":\"S23 - S60\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2020-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1017/nws.2020.35\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Network Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/nws.2020.35\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOCIAL SCIENCES, INTERDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/nws.2020.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}

引用次数: 4

摘要

Graphlet计数是网络分析中一个被广泛探索的问题，并已成功地应用于许多领域，尤其是生物信息学、社会科学和基础设施网络研究。由于组合爆炸的原因，有效地计算graphlet计数仍然具有挑战性，其中在大小为n的网络中，一个朴素的枚举算法需要O(Nk)时间来处理k个节点的graphlet。最近，许多作品介绍了精心设计的组合和采样方法，并取得了令人鼓舞的结果。然而，现有的方法忽略了图元计数和图结构信息之间的相关性。他们总是把一个图当作一个新的输入，并定期重复繁琐的计数过程，即使它与以前研究过的图相似或完全同构。这就提供了一个机会，通过学习方法利用这种相关性来加速graphlet计数估计过程。在本文中，我们提出了一个新的graphlet count learning (GCL)问题:给定一组已知graphlet count的历史图，如何学习估计/预测来自相同(或类似)底层分布的未见图的graphlet count。我们开发了一个包含两个卷积神经网络模型和一系列数据预处理技术的深度学习框架来解决GCL问题。在三种类型的合成随机图和三种类型的真实世界图上进行了广泛的实验，用于所有3、4和5节点的graphlets，以证明我们的框架的准确性、效率和泛化性。与最先进的精确/采样方法相比，我们的框架显示出巨大的潜力，它可以在合成图上提供高达两个数量级的加速，并在具有竞争精度的真实图上实现同等速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning to count: A deep learning framework for graphlet count estimation

Abstract Graphlet counting is a widely explored problem in network analysis and has been successfully applied to a variety of applications in many domains, most notatbly bioinformatics, social science, and infrastructure network studies. Efficiently computing graphlet counts remains challenging due to the combinatorial explosion, where a naive enumeration algorithm needs O(Nk) time for k-node graphlets in a network of size N. Recently, many works introduced carefully designed combinatorial and sampling methods with encouraging results. However, the existing methods ignore the fact that graphlet counts and the graph structural information are correlated. They always consider a graph as a new input and repeat the tedious counting procedure on a regular basis even if it is similar or exactly isomorphic to previously studied graphs. This provides an opportunity to speed up the graphlet count estimation procedure by exploiting this correlation via learning methods. In this paper, we raise a novel graphlet count learning (GCL) problem: given a set of historical graphs with known graphlet counts, how to learn to estimate/predict graphlet count for unseen graphs coming from the same (or similar) underlying distribution. We develop a deep learning framework which contains two convolutional neural network models and a series of data preprocessing techniques to solve the GCL problem. Extensive experiments are conducted on three types of synthetic random graphs and three types of real-world graphs for all 3-, 4-, and 5-node graphlets to demonstrate the accuracy, efficiency, and generalizability of our framework. Compared with state-of-the-art exact/sampling methods, our framework shows great potential, which can offer up to two orders of magnitude speedup on synthetic graphs and achieve on par speed on real-world graphs with competitive accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Network Science SOCIAL SCIENCES, INTERDISCIPLINARY-

CiteScore

3.50

自引率

5.90%

发文量

期刊介绍： Network Science is an important journal for an important discipline - one using the network paradigm, focusing on actors and relational linkages, to inform research, methodology, and applications from many fields across the natural, social, engineering and informational sciences. Given growing understanding of the interconnectedness and globalization of the world, network methods are an increasingly recognized way to research aspects of modern society along with the individuals, organizations, and other actors within it. The discipline is ready for a comprehensive journal, open to papers from all relevant areas. Network Science is a defining work, shaping this discipline. The journal welcomes contributions from researchers in all areas working on network theory, methods, and data.

期刊最新文献

Accounting for edge uncertainty in stochastic actor-oriented models for dynamic network analysis. Recommendations for sharing network data and materials. The latent cognitive structures of social networks Algorithmic aspects of temporal betweenness When can networks be inferred from observed groups?