缓冲流图分区

Q2 Mathematics Journal of Experimental Algorithmics Pub Date : 2021-02-18 DOI:10.1145/3546911

Marcelo Fonseca Faraj, Christian Schulz

{"title":"缓冲流图分区","authors":"Marcelo Fonseca Faraj, Christian Schulz","doi":"10.1145/3546911","DOIUrl":null,"url":null,"abstract":"Partitioning graphs into blocks of roughly equal size is a widely used tool when processing large graphs. Currently, there is a gap observed in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been adopted to partition massive graph data on small machines. In the streaming model, vertices arrive one at a time including their neighborhood, and then have to be assigned directly to a block. These algorithms can partition huge graphs quickly with little memory, but they produce partitions with low solution quality. On the other hand, there are offline (shared-memory) multilevel algorithms that produce partitions with high-quality but also need a machine with enough memory to partition huge networks. In this work, we make a first step to close this gap by presenting an algorithm that computes significantly improved partitions of huge graphs using a single machine with little memory in a streaming setting. First, we adopt the buffered streaming model which is a more reasonable approach in practice. In this model, a processing element can store a buffer of nodes alongside with their edges before making assignment decisions. When our algorithm receives a batch of nodes, we build a model graph that represents the nodes of the batch and the already present partition structure. This model enables us to apply multilevel algorithms and in turn, on cheap machines, compute much higher quality solutions of huge graphs than previously possible. To partition the model graph, we develop a multilevel algorithm that optimizes an objective function that has previously been shown to be effective for the streaming setting. Surprisingly, this also removes the dependency on the number of blocks k from the running time compared to the previous state-of-the-art. Overall, our algorithm computes, on average, 75.9% better solutions than Fennel [35] using a very small buffer size. In addition, for large values of k our algorithm becomes faster than Fennel.","PeriodicalId":53707,"journal":{"name":"Journal of Experimental Algorithmics","volume":"27 1","pages":"1 - 26"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Buffered Streaming Graph Partitioning\",\"authors\":\"Marcelo Fonseca Faraj, Christian Schulz\",\"doi\":\"10.1145/3546911\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Partitioning graphs into blocks of roughly equal size is a widely used tool when processing large graphs. Currently, there is a gap observed in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been adopted to partition massive graph data on small machines. In the streaming model, vertices arrive one at a time including their neighborhood, and then have to be assigned directly to a block. These algorithms can partition huge graphs quickly with little memory, but they produce partitions with low solution quality. On the other hand, there are offline (shared-memory) multilevel algorithms that produce partitions with high-quality but also need a machine with enough memory to partition huge networks. In this work, we make a first step to close this gap by presenting an algorithm that computes significantly improved partitions of huge graphs using a single machine with little memory in a streaming setting. First, we adopt the buffered streaming model which is a more reasonable approach in practice. In this model, a processing element can store a buffer of nodes alongside with their edges before making assignment decisions. When our algorithm receives a batch of nodes, we build a model graph that represents the nodes of the batch and the already present partition structure. This model enables us to apply multilevel algorithms and in turn, on cheap machines, compute much higher quality solutions of huge graphs than previously possible. To partition the model graph, we develop a multilevel algorithm that optimizes an objective function that has previously been shown to be effective for the streaming setting. Surprisingly, this also removes the dependency on the number of blocks k from the running time compared to the previous state-of-the-art. Overall, our algorithm computes, on average, 75.9% better solutions than Fennel [35] using a very small buffer size. In addition, for large values of k our algorithm becomes faster than Fennel.\",\"PeriodicalId\":53707,\"journal\":{\"name\":\"Journal of Experimental Algorithmics\",\"volume\":\"27 1\",\"pages\":\"1 - 26\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Experimental Algorithmics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3546911\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Algorithmics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3546911","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 7

摘要

在处理大型图时，将图划分为大小大致相等的块是一种广泛使用的工具。目前，在可用的分区算法空间中观察到一个空白。一方面，流算法已经被用于在小型机器上对海量图数据进行分区。在流模型中，顶点每次到达一个，包括它们的邻域，然后必须直接分配给一个块。这些算法可以在内存较少的情况下快速划分巨大的图，但它们产生的分区解质量较低。另一方面，有一些离线(共享内存)多层算法可以产生高质量的分区，但也需要一台具有足够内存的机器来分区庞大的网络。在这项工作中，我们通过提出一种算法来缩小这一差距，该算法在流式设置中使用单个机器使用少量内存计算巨大图形的分区。首先，我们采用了在实践中更为合理的缓冲流模型。在该模型中，处理元素可以在做出分配决策之前，在其边缘附近存储节点的缓冲区。当我们的算法接收到一批节点时，我们构建一个模型图来表示这批节点和已经存在的分区结构。这个模型使我们能够应用多层算法，反过来，在便宜的机器上，计算出比以前更高质量的巨大图形的解决方案。为了划分模型图，我们开发了一种多级算法，该算法优化了先前已被证明对流设置有效的目标函数。令人惊讶的是，与以前的状态相比，这也消除了对运行时间中块数量k的依赖。总体而言，我们的算法使用非常小的缓冲区大小计算出的解决方案平均比Fennel[35]好75.9%。此外，对于较大的k值，我们的算法变得比Fennel更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Buffered Streaming Graph Partitioning

Partitioning graphs into blocks of roughly equal size is a widely used tool when processing large graphs. Currently, there is a gap observed in the space of available partitioning algorithms. On the one hand, there are streaming algorithms that have been adopted to partition massive graph data on small machines. In the streaming model, vertices arrive one at a time including their neighborhood, and then have to be assigned directly to a block. These algorithms can partition huge graphs quickly with little memory, but they produce partitions with low solution quality. On the other hand, there are offline (shared-memory) multilevel algorithms that produce partitions with high-quality but also need a machine with enough memory to partition huge networks. In this work, we make a first step to close this gap by presenting an algorithm that computes significantly improved partitions of huge graphs using a single machine with little memory in a streaming setting. First, we adopt the buffered streaming model which is a more reasonable approach in practice. In this model, a processing element can store a buffer of nodes alongside with their edges before making assignment decisions. When our algorithm receives a batch of nodes, we build a model graph that represents the nodes of the batch and the already present partition structure. This model enables us to apply multilevel algorithms and in turn, on cheap machines, compute much higher quality solutions of huge graphs than previously possible. To partition the model graph, we develop a multilevel algorithm that optimizes an objective function that has previously been shown to be effective for the streaming setting. Surprisingly, this also removes the dependency on the number of blocks k from the running time compared to the previous state-of-the-art. Overall, our algorithm computes, on average, 75.9% better solutions than Fennel [35] using a very small buffer size. In addition, for large values of k our algorithm becomes faster than Fennel.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Experimental Algorithmics Mathematics-Theoretical Computer Science

CiteScore

3.10

自引率

0.00%

发文量

期刊介绍： The ACM JEA is a high-quality, refereed, archival journal devoted to the study of discrete algorithms and data structures through a combination of experimentation and classical analysis and design techniques. It focuses on the following areas in algorithms and data structures: ■combinatorial optimization ■computational biology ■computational geometry ■graph manipulation ■graphics ■heuristics ■network design ■parallel processing ■routing and scheduling ■searching and sorting ■VLSI design

期刊最新文献

Random projections for Linear Programming: an improved retrieval phase SAT-Boosted Tabu Search for Coloring Massive Graphs An Experimental Evaluation of Semidefinite Programming and Spectral Algorithms for Max Cut A constructive heuristic for the uniform capacitated vertex k-center problem Algorithms for Efficiently Computing Structural Anonymity in Complex Networks