Query preserving graph compression

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI:10.1145/2213836.2213855

W. Fan, Jianzhong Li, Xin Wang, Yinghui Wu

{"title":"Query preserving graph compression","authors":"W. Fan, Jianzhong Li, Xin Wang, Yinghui Wu","doi":"10.1145/2213836.2213855","DOIUrl":null,"url":null,"abstract":"It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Λ of queries of users' choice. We compute a small Gr from a graph G such that (a) for any query Q Ε Λ Q, Q(G) = Q'(Gr), where Q' Ε Λ can be efficiently computed from Q; and (b) any algorithm for computing Q(G) can be directly applied to evaluating Q' on Gr as is. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to all the queries in Λ. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while reserving query answers. (2) We provide techniques for aintaining compressed graph Gr in response to changes ΔG to the original graph G. We show that the incremental maintenance problems are unbounded for the two lasses of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using real-life data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57% for graph pattern matching, and that our incremental maintenance algorithms are efficient.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"201","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2213836.2213855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 201

Abstract

It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Λ of queries of users' choice. We compute a small Gr from a graph G such that (a) for any query Q Ε Λ Q, Q(G) = Q'(Gr), where Q' Ε Λ can be efficiently computed from Q; and (b) any algorithm for computing Q(G) can be directly applied to evaluating Q' on Gr as is. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to all the queries in Λ. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while reserving query answers. (2) We provide techniques for aintaining compressed graph Gr in response to changes ΔG to the original graph G. We show that the incremental maintenance problems are unbounded for the two lasses of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using real-life data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57% for graph pattern matching, and that our incremental maintenance algorithms are efficient.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

保持查询的图压缩

在社交网络中，发现具有数百万个节点和数十亿条边的图是很常见的。对此类图的查询通常非常昂贵。这促使我们提出保留查询的图压缩，将图压缩到用户选择的查询类Λ。我们从图G中计算一个小的Gr，使得(a)对于任何查询Q Ε Λ Q, Q(G) = Q'(Gr)，其中Q' Ε Λ可以有效地从Q计算;(b)任何计算Q(G)的算法都可以直接用于求Gr上的Q'。也就是说，虽然我们不能降低评估图查询的复杂性，但我们减少了数据图，同时保留了Λ中所有查询的答案。为了验证这种方法的有效性，(1)我们通过(有界)模拟开发了两类查询的压缩策略:可达性查询和图模式查询。在保留查询答案的前提下，分别通过可达性等价关系和图双模拟对图进行有效压缩。(2)我们提供了维护压缩图Gr以响应原始图G ΔG变化的技术。我们表明，增量维护问题对于这两类查询是无界的，即它们的成本不是ΔG大小和Gr变化的函数。然而，我们开发了仅依赖于ΔG和Gr的增量算法，独立于G，即我们不必解压缩Gr来传播变化。(3)使用实际数据，我们实验验证了我们的压缩技术可以将图的可达性平均降低95%，图的模式匹配平均降低57%，并且我们的增量维护算法是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

自引率

0.00%

发文量