Improving Distribued Subgraph Matching Algorithm on Timely Dataflow

Zhengmin Lai, Zhengyi Yang, Longbin Lai
{"title":"Improving Distribued Subgraph Matching Algorithm on Timely Dataflow","authors":"Zhengmin Lai, Zhengyi Yang, Longbin Lai","doi":"10.1109/ICDEW.2019.000-2","DOIUrl":null,"url":null,"abstract":"The subgraph matching problem is defined to find all subgraphs of a data graph that are isomorphic to a given query graph. Subgraph matching plays a vital role in the fields of e-commerce, social media and biological science. CliqueJoin is a distributed subgraph matching algorithm that is designed to be efficient and scalable. However, CliqueJoin is originally developed on MapReduce, thus the performance of the algorithm can be affected by the notorious I/O issue of MapReduce while processing multi-round join tasks. Meanwhile, CliqueJoin does not propose a cost evaluation strategy for labelled graphs, which limits its application in practice where most real-world graphs are labelled. Targeting the limitations of CliqueJoin, we propose CliqueJoin++ to improve CliqueJoin in two aspects. Firstly, we implement CliqueJoin++ on the Timely dataflow system instead of MapReduce to avoid considerable I/O cost. Secondly, we extend the cost evaluation function in CliqueJoin to compute optimal join plans for labelled graphs in the distributed context. Extensive experiments have been conducted to show that the proposed method is up to 10 times faster than the MapReduce version for unlabelled matching, and it achieves good performance and scalability for labelled matching.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2019.000-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The subgraph matching problem is defined to find all subgraphs of a data graph that are isomorphic to a given query graph. Subgraph matching plays a vital role in the fields of e-commerce, social media and biological science. CliqueJoin is a distributed subgraph matching algorithm that is designed to be efficient and scalable. However, CliqueJoin is originally developed on MapReduce, thus the performance of the algorithm can be affected by the notorious I/O issue of MapReduce while processing multi-round join tasks. Meanwhile, CliqueJoin does not propose a cost evaluation strategy for labelled graphs, which limits its application in practice where most real-world graphs are labelled. Targeting the limitations of CliqueJoin, we propose CliqueJoin++ to improve CliqueJoin in two aspects. Firstly, we implement CliqueJoin++ on the Timely dataflow system instead of MapReduce to avoid considerable I/O cost. Secondly, we extend the cost evaluation function in CliqueJoin to compute optimal join plans for labelled graphs in the distributed context. Extensive experiments have been conducted to show that the proposed method is up to 10 times faster than the MapReduce version for unlabelled matching, and it achieves good performance and scalability for labelled matching.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
实时数据流上改进的分布式子图匹配算法
子图匹配问题的定义是寻找与给定查询图同构的数据图的所有子图。子图匹配在电子商务、社交媒体和生物科学等领域发挥着至关重要的作用。CliqueJoin是一种高效、可扩展的分布式子图匹配算法。然而,CliqueJoin最初是在MapReduce上开发的,因此在处理多轮连接任务时,算法的性能可能会受到MapReduce臭名昭著的I/O问题的影响。同时,CliqueJoin并没有提出标记图的成本评估策略,这限制了它在实际应用中的应用,因为大多数现实世界的图都是标记的。针对CliqueJoin的局限性,我们提出了cliquejoin++,从两个方面对CliqueJoin进行改进。首先,我们在及时数据流系统上实现cliquejoin++而不是MapReduce,以避免大量的I/O开销。其次,我们扩展了CliqueJoin中的代价评估函数,以计算分布式环境下标记图的最优连接计划。大量的实验表明,该方法在无标记匹配方面比MapReduce版本快10倍,并且在标记匹配方面具有良好的性能和可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Triangle Counting on GPU Using Fine-Grained Task Distribution Distilling Knowledge from User Information for Document Level Sentiment Classification Reachability in Large Graphs Using Bloom Filters Food Image to Cooking Instructions Conversion Through Compressed Embeddings Using Deep Learning Predicting Online User Purchase Behavior Based on Browsing History
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1