海量图上的线性时间顶点划分。

International journal of computer science (Rabat) Pub Date : 2016-01-01

Peter Mell, Richard Harang, Assane Gueye

{"title":"海量图上的线性时间顶点划分。","authors":"Peter Mell, Richard Harang, Assane Gueye","doi":"","DOIUrl":null,"url":null,"abstract":"The problem of optimally removing a set of vertices from a graph to minimize the size of the largest resultant component is known to be NP-complete. Prior work has provided near optimal heuristics with a high time complexity that function on up to hundreds of nodes and less optimal but faster techniques that function on up to thousands of nodes. In this work, we analyze how to perform vertex partitioning on massive graphs of tens of millions of nodes. We use a previously known and very simple heuristic technique: iteratively removing the node of largest degree and all of its edges. This approach has an apparent quadratic complexity since, upon removal of a node and adjoining set of edges, the node degree calculations must be updated prior to choosing the next node. However, we describe a linear time complexity solution using an array whose indices map to node degree and whose values are hash tables indicating the presence or absence of a node at that degree value. This approach also has a linear growth with respect to memory usage which is surprising since we lowered the time complexity from quadratic to linear. We empirically demonstrate linear scalability and linear memory usage on random graphs of up to 15000 nodes. We then demonstrate tractability on massive graphs through execution on a graph with 34 million nodes representing Internet wide router connectivity.","PeriodicalId":91508,"journal":{"name":"International journal of computer science (Rabat)","volume":"5 1","pages":"1-11"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913482/pdf/nihms784027.pdf","citationCount":"0","resultStr":"{\"title\":\"Linear Time Vertex Partitioning on Massive Graphs.\",\"authors\":\"Peter Mell, Richard Harang, Assane Gueye\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of optimally removing a set of vertices from a graph to minimize the size of the largest resultant component is known to be NP-complete. Prior work has provided near optimal heuristics with a high time complexity that function on up to hundreds of nodes and less optimal but faster techniques that function on up to thousands of nodes. In this work, we analyze how to perform vertex partitioning on massive graphs of tens of millions of nodes. We use a previously known and very simple heuristic technique: iteratively removing the node of largest degree and all of its edges. This approach has an apparent quadratic complexity since, upon removal of a node and adjoining set of edges, the node degree calculations must be updated prior to choosing the next node. However, we describe a linear time complexity solution using an array whose indices map to node degree and whose values are hash tables indicating the presence or absence of a node at that degree value. This approach also has a linear growth with respect to memory usage which is surprising since we lowered the time complexity from quadratic to linear. We empirically demonstrate linear scalability and linear memory usage on random graphs of up to 15000 nodes. We then demonstrate tractability on massive graphs through execution on a graph with 34 million nodes representing Internet wide router connectivity.\",\"PeriodicalId\":91508,\"journal\":{\"name\":\"International journal of computer science (Rabat)\",\"volume\":\"5 1\",\"pages\":\"1-11\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913482/pdf/nihms784027.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of computer science (Rabat)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of computer science (Rabat)","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

从图中最优地移除一组顶点以最小化最大合成分量的大小的问题被称为np完全问题。先前的工作提供了接近最优的启发式，但时间复杂度很高，可以在多达数百个节点上运行，而不太优但速度更快的技术可以在多达数千个节点上运行。在这项工作中，我们分析了如何在数千万个节点的海量图上执行顶点划分。我们使用一种以前已知的非常简单的启发式技术:迭代地移除最大度的节点及其所有的边。这种方法具有明显的二次复杂度，因为在移除节点和相邻边集之后，必须在选择下一个节点之前更新节点度计算。然而，我们使用一个数组来描述线性时间复杂度解决方案，该数组的索引映射到节点度，其值是哈希表，表示该度值上节点的存在或不存在。这种方法在内存使用方面也有线性增长，这是令人惊讶的，因为我们将时间复杂度从二次降低到线性。我们在多达15000个节点的随机图上实证地展示了线性可伸缩性和线性内存使用。然后，我们通过在一个具有3400万个节点的图上执行，展示了大规模图的可追溯性，该图代表了互联网范围内的路由器连接。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Linear Time Vertex Partitioning on Massive Graphs.

The problem of optimally removing a set of vertices from a graph to minimize the size of the largest resultant component is known to be NP-complete. Prior work has provided near optimal heuristics with a high time complexity that function on up to hundreds of nodes and less optimal but faster techniques that function on up to thousands of nodes. In this work, we analyze how to perform vertex partitioning on massive graphs of tens of millions of nodes. We use a previously known and very simple heuristic technique: iteratively removing the node of largest degree and all of its edges. This approach has an apparent quadratic complexity since, upon removal of a node and adjoining set of edges, the node degree calculations must be updated prior to choosing the next node. However, we describe a linear time complexity solution using an array whose indices map to node degree and whose values are hash tables indicating the presence or absence of a node at that degree value. This approach also has a linear growth with respect to memory usage which is surprising since we lowered the time complexity from quadratic to linear. We empirically demonstrate linear scalability and linear memory usage on random graphs of up to 15000 nodes. We then demonstrate tractability on massive graphs through execution on a graph with 34 million nodes representing Internet wide router connectivity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助