基于代表性实例的不确定图处理

IF 1.7 2区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Database Systems Pub Date : 2015-10-23 DOI:10.1145/2818182

Panos Parchas, Francesco Gullo, D. Papadias, F. Bonchi

{"title":"基于代表性实例的不确定图处理","authors":"Panos Parchas, Francesco Gullo, D. Papadias, F. Bonchi","doi":"10.1145/2818182","DOIUrl":null,"url":null,"abstract":"Data in several applications can be represented as an uncertain graph whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Thus, typical approaches employ Monte-Carlo sampling, which (i) draws a number of possible graphs (samples), (ii) evaluates the query on each of them, and (iii) aggregates the individual answers to generate the final result. However, this approach can also be extremely time consuming for large uncertain graphs commonly found in practice. To facilitate efficiency, we study the problem of extracting a single representative instance from an uncertain graph. Conventional processing techniques can then be applied on this representative to closely approximate the result on the original graph.\n In order to maintain data utility, the representative instance should preserve structural characteristics of the uncertain graph. We start with representatives that capture the expected vertex degrees, as this is a fundamental property of the graph topology. We then generalize the notion of vertex degree to the concept of n-clique cardinality, that is, the number of cliques of size n that contain a vertex. For the first problem, we propose two methods: Average Degree Rewiring (ADR), which is based on random edge rewiring, and Approximate B-Matching (ABM), which applies graph matching techniques. For the second problem, we develop a greedy approach and a game-theoretic framework. We experimentally demonstrate, with real uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several metrics such as shortest path distance, clustering coefficient, and betweenness centrality.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"54 1","pages":"20:1-20:39"},"PeriodicalIF":1.7000,"publicationDate":"2015-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":"{\"title\":\"Uncertain Graph Processing through Representative Instances\",\"authors\":\"Panos Parchas, Francesco Gullo, D. Papadias, F. Bonchi\",\"doi\":\"10.1145/2818182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data in several applications can be represented as an uncertain graph whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Thus, typical approaches employ Monte-Carlo sampling, which (i) draws a number of possible graphs (samples), (ii) evaluates the query on each of them, and (iii) aggregates the individual answers to generate the final result. However, this approach can also be extremely time consuming for large uncertain graphs commonly found in practice. To facilitate efficiency, we study the problem of extracting a single representative instance from an uncertain graph. Conventional processing techniques can then be applied on this representative to closely approximate the result on the original graph.\\n In order to maintain data utility, the representative instance should preserve structural characteristics of the uncertain graph. We start with representatives that capture the expected vertex degrees, as this is a fundamental property of the graph topology. We then generalize the notion of vertex degree to the concept of n-clique cardinality, that is, the number of cliques of size n that contain a vertex. For the first problem, we propose two methods: Average Degree Rewiring (ADR), which is based on random edge rewiring, and Approximate B-Matching (ABM), which applies graph matching techniques. For the second problem, we develop a greedy approach and a game-theoretic framework. We experimentally demonstrate, with real uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several metrics such as shortest path distance, clustering coefficient, and betweenness centrality.\",\"PeriodicalId\":50915,\"journal\":{\"name\":\"ACM Transactions on Database Systems\",\"volume\":\"54 1\",\"pages\":\"20:1-20:39\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2015-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Database Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/2818182\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2818182","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 31

摘要

在一些应用中，数据可以表示为一个不确定图，其边被标记为存在概率。对于大多数应用程序来说，不确定图上的精确查询处理是禁止的，因为它涉及到对指数数量的实例化进行评估。因此，典型的方法采用蒙特卡罗抽样，它(i)绘制许多可能的图(样本)，(ii)评估每个图上的查询，以及(iii)汇总单个答案以生成最终结果。然而，对于实践中常见的大型不确定图，这种方法也可能非常耗时。为了提高效率，我们研究了从不确定图中提取单个代表性实例的问题。然后，可以将传统的处理技术应用于该代表上，以接近原始图形上的结果。为了保持数据的实用性，代表性实例应保留不确定图的结构特征。我们从捕获预期顶点度的表示开始，因为这是图拓扑的基本属性。然后我们将顶点度的概念推广到n团基数的概念，即包含一个顶点的大小为n的团的数量。针对第一个问题，我们提出了两种方法:基于随机边重新布线的平均度重新布线(ADR)和应用图匹配技术的近似b匹配(ABM)。对于第二个问题，我们开发了一个贪婪的方法和博弈论框架。我们通过实验证明，使用真实的不确定图，确实可以使用代表性实例来有效而准确地回答基于几个指标(如最短路径距离、聚类系数和中间性中心性)的查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Uncertain Graph Processing through Representative Instances

Data in several applications can be represented as an uncertain graph whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Thus, typical approaches employ Monte-Carlo sampling, which (i) draws a number of possible graphs (samples), (ii) evaluates the query on each of them, and (iii) aggregates the individual answers to generate the final result. However, this approach can also be extremely time consuming for large uncertain graphs commonly found in practice. To facilitate efficiency, we study the problem of extracting a single representative instance from an uncertain graph. Conventional processing techniques can then be applied on this representative to closely approximate the result on the original graph. In order to maintain data utility, the representative instance should preserve structural characteristics of the uncertain graph. We start with representatives that capture the expected vertex degrees, as this is a fundamental property of the graph topology. We then generalize the notion of vertex degree to the concept of n-clique cardinality, that is, the number of cliques of size n that contain a vertex. For the first problem, we propose two methods: Average Degree Rewiring (ADR), which is based on random edge rewiring, and Approximate B-Matching (ABM), which applies graph matching techniques. For the second problem, we develop a greedy approach and a game-theoretic framework. We experimentally demonstrate, with real uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several metrics such as shortest path distance, clustering coefficient, and betweenness centrality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Database Systems 工程技术-计算机：软件工程

CiteScore

5.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.

期刊最新文献

Automated Category Tree Construction: Hardness Bounds and Algorithms Database Repairing with Soft Functional Dependencies Sharing Queries with Nonequivalent User-Defined Aggregate Functions A family of centrality measures for graph data based on subgraphs GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)