网络规模图的无损和有损摘要

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313402

Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan

{"title":"网络规模图的无损和有损摘要","authors":"Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan","doi":"10.1145/3308558.3313402","DOIUrl":null,"url":null,"abstract":"Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400 × faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4 × better compression than them.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"SWeG: Lossless and Lossy Summarization of Web-Scale Graphs\",\"authors\":\"Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan\",\"doi\":\"10.1145/3308558.3313402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400 × faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4 × better compression than them.\",\"PeriodicalId\":23013,\"journal\":{\"name\":\"The World Wide Web Conference\",\"volume\":\"53 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The World Wide Web Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3308558.3313402\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

给定一个分布在多台机器上的太字节规模的图，我们如何用更少的节点和边来总结它，以便我们能够准确地或在错误范围内恢复原始图?由于大规模图无处不在，从网络图到在线社交网络，紧凑地表示图对于有效地存储和处理它们变得非常重要。给定一个图，图摘要的目的是找到它的紧凑表示，包括(a)一个摘要图，其中节点是输入图中不相交的节点集，每条边表示两个集合中所有节点对之间的边;(b)边缘修正，用于精确地或在误差范围内从汇总图恢复输入图。虽然图摘要是一种广泛使用的图压缩技术，并且可以与其他技术相结合，但现有的图摘要算法在输出的速度或紧凑性方面并不令人满意。更重要的是，它们假设输入图足够小，可以装入主存储器。在这项工作中，我们提出了一个快速并行算法SWeG，用于总结具有紧凑表示的图。SWeG不仅是为共享内存设计的，而且还为MapReduce设置设计，以总结太大而无法在主内存中容纳的图形。我们证明了SWeG是(a)快速的:SWeG比提供类似紧凑表示的竞争对手快5400倍，(b)可扩展的:SWeG缩放到具有数百亿条边的图，以及(c)紧凑的:与最先进的压缩方法相结合，SWeG实现了比它们高3.4倍的压缩。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SWeG: Lossless and Lossy Summarization of Web-Scale Graphs

Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400 × faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4 × better compression than them.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助