Xiaozhou Liu, Yudi Santoso, Venkatesh Srinivasan, Alex Thomo
{"title":"Distributed Enumeration of Four Node Graphlets at Quadrillion-Scale","authors":"Xiaozhou Liu, Yudi Santoso, Venkatesh Srinivasan, Alex Thomo","doi":"10.1145/3468791.3468805","DOIUrl":null,"url":null,"abstract":"Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of scalability. Distributed computing is often proposed as a solution to improve scalability. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs using a distributed platform. We propose an efficient distributed solution which significantly surpasses the existing solutions. With this method we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. We show the scalability of our solution through experimental results. Finally, we also extend our algorithm to enumerate graphlets in probabilistic graphs and demonstrate its suitability for this case.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"33rd International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3468791.3468805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of scalability. Distributed computing is often proposed as a solution to improve scalability. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs using a distributed platform. We propose an efficient distributed solution which significantly surpasses the existing solutions. With this method we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. We show the scalability of our solution through experimental results. Finally, we also extend our algorithm to enumerate graphlets in probabilistic graphs and demonstrate its suitability for this case.