Scube: Efficient Summarization for Skewed Graph Streams

Ming Chen, Renxiang Zhou, Hanhua Chen, Hai Jin
{"title":"Scube: Efficient Summarization for Skewed Graph Streams","authors":"Ming Chen, Renxiang Zhou, Hanhua Chen, Hai Jin","doi":"10.1109/ICDCS54860.2022.00019","DOIUrl":null,"url":null,"abstract":"Graph stream, which represents an evolving graph updating as an infinite edge stream, is a special emerging graph data model widely adopted in big data analysis applications. Entirely storing the continuously produced and tremendously large-scale datasets is impractical. Therefore, graph stream summarization structures which support approximate graph stream storage and management attract much recent attention. Existing designs commonly leverage a compressive matrix and use hash-based schemes to map each edge to a bucket of the matrix. Accordingly, they store the edges associated with the same node in the same row or column of the matrix. We show that existing designs suffer from unacceptable query latency and precision in the presence of node degree skewness in graph streams.We argue that the key to efficient graph stream summarization is to identify the high-degree nodes and leverage a differentiated strategy for the associated edges. However, it is not trivial to estimate the degree of a node in real-time graph streams due to the rigorous requirements of space and time efficiency. Moreover, the existence of duplicate edges makes high-degree nodes identification difficult. To solve the problem, we propose Scube, an efficient summarization structure for skewed graph streams. Two factors contribute to the efficiency of Scube. First, Scube proposes a space and computation efficient probabilistic counting scheme to identify high-degree nodes in a graph stream. Second, Scube differentiates the storage strategy for the edges associated with high-degree nodes by dynamically allocating multiple rows or columns. We conduct comprehensive experiments to evaluate the performance of Scube on large-scale real-world datasets. The results show that Scube significantly reduces the query latency over a graph stream by 48%-99%, as well as achieving acceptable query accuracy compared to the state-of-the-art designs.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Graph stream, which represents an evolving graph updating as an infinite edge stream, is a special emerging graph data model widely adopted in big data analysis applications. Entirely storing the continuously produced and tremendously large-scale datasets is impractical. Therefore, graph stream summarization structures which support approximate graph stream storage and management attract much recent attention. Existing designs commonly leverage a compressive matrix and use hash-based schemes to map each edge to a bucket of the matrix. Accordingly, they store the edges associated with the same node in the same row or column of the matrix. We show that existing designs suffer from unacceptable query latency and precision in the presence of node degree skewness in graph streams.We argue that the key to efficient graph stream summarization is to identify the high-degree nodes and leverage a differentiated strategy for the associated edges. However, it is not trivial to estimate the degree of a node in real-time graph streams due to the rigorous requirements of space and time efficiency. Moreover, the existence of duplicate edges makes high-degree nodes identification difficult. To solve the problem, we propose Scube, an efficient summarization structure for skewed graph streams. Two factors contribute to the efficiency of Scube. First, Scube proposes a space and computation efficient probabilistic counting scheme to identify high-degree nodes in a graph stream. Second, Scube differentiates the storage strategy for the edges associated with high-degree nodes by dynamically allocating multiple rows or columns. We conduct comprehensive experiments to evaluate the performance of Scube on large-scale real-world datasets. The results show that Scube significantly reduces the query latency over a graph stream by 48%-99%, as well as achieving acceptable query accuracy compared to the state-of-the-art designs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
sccube:倾斜图流的高效总结
图流是大数据分析应用中广泛采用的一种特殊的新兴图数据模型,它以无限边缘流的形式表现了图的不断更新。完全存储连续产生的和超大规模的数据集是不切实际的。因此,支持近似图流存储和管理的图流摘要结构引起了人们的广泛关注。现有的设计通常利用压缩矩阵,并使用基于哈希的方案将每个边映射到矩阵的一个桶。相应地,它们将与同一节点相关联的边存储在矩阵的同一行或同列中。我们表明,在图流中存在节点度偏差的情况下,现有的设计存在不可接受的查询延迟和精度。我们认为,高效图流总结的关键是识别高节点,并对相关边利用差异化策略。然而,由于对空间和时间效率的严格要求,在实时图流中估计节点的程度并不是一件容易的事情。此外,重复边的存在使得高次节点的识别变得困难。为了解决这个问题,我们提出了一种高效的歪斜图流摘要结构Scube。有两个因素影响着Scube的效率。首先,sccube提出了一种空间和计算效率高的概率计数方案来识别图流中的高节点。其次,Scube通过动态分配多行或多列来区分与高节点相关的边的存储策略。我们进行了全面的实验来评估sccube在大规模真实数据集上的性能。结果表明,与最先进的设计相比,sccube显着将图流上的查询延迟减少了48%-99%,并且实现了可接受的查询精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Nezha: Exploiting Concurrency for Transaction Processing in DAG-based Blockchains Toward Cleansing Backdoored Neural Networks in Federated Learning Themis: An Equal, Unpredictable, and Scalable Consensus for Consortium Blockchain IoDSCF: A Store-Carry-Forward Routing Protocol for joint Bus Networks and Internet of Drones FlowValve: Packet Scheduling Offloaded on NP-based SmartNICs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1