ACM Transactions on Database Systems (TODS)最新文献_第7页

Efficient Evaluation and Static Analysis for Well-Designed Pattern Trees with Projection 带投影的设计良好的模式树的有效评价和静态分析

ACM Transactions on Database Systems (TODS)

Pub Date : 2018-08-21 DOI: 10.1145/3233983

P. Barceló, Markus Kröll, R. Pichler, Sebastian Skritek

Conjunctive queries (CQs) fail to provide an answer when the pattern described by the query does not exactly match the data. CQs might thus be too restrictive as a querying mechanism when data is semistructured or incomplete. The semantic web therefore provides a formalism—known as (projected) well-designed pattern trees (pWDPTs)—that tackles this problem: pWDPTs allow us to formulate queries that match parts of the query over the data if available, but do not ignore answers of the remaining query otherwise. Here we abstract away the specifics of semantic web applications and study pWDPTs over arbitrary relational schemas. Since the language of pWDPTs subsumes CQs, their evaluation problem is intractable. We identify structural properties of pWDPTs that lead to (fixed-parameter) tractability of various variants of the evaluation problem. We also show that checking if a pWDPT is equivalent to one in our tractable class is in 2EXPTIME. As a corollary, we obtain fixed-parameter tractability of evaluation for pWDPTs with such good behavior. Our techniques also allow us to develop a theory of approximations for pWDPTs.

当查询所描述的模式与数据不完全匹配时，连接查询(cq)无法提供答案。因此，当数据是半结构化或不完整时，cq作为一种查询机制可能过于严格。因此，语义web提供了一种称为(投影的)设计良好的模式树(pWDPTs)的形式，它解决了这个问题:pWDPTs允许我们制定查询，在可用的情况下匹配数据上的部分查询，但不忽略其余查询的答案。在这里，我们抽象出语义web应用程序的细节，并在任意关系模式上研究pwdpt。由于pwdpt的语言包含cq，它们的评估问题是棘手的。我们确定了pwdpt的结构属性，这些属性导致了评估问题的各种变体的(固定参数)可追溯性。我们还展示了在2EXPTIME中检查pWDPT是否等同于可处理类中的pWDPT。作为推论，我们得到了具有这种良好行为的pwdpt的定参数可跟踪性。我们的技术还允许我们开发pwdpt的近似理论。

{"title":"Efficient Evaluation and Static Analysis for Well-Designed Pattern Trees with Projection","authors":"P. Barceló, Markus Kröll, R. Pichler, Sebastian Skritek","doi":"10.1145/3233983","DOIUrl":"https://doi.org/10.1145/3233983","url":null,"abstract":"Conjunctive queries (CQs) fail to provide an answer when the pattern described by the query does not exactly match the data. CQs might thus be too restrictive as a querying mechanism when data is semistructured or incomplete. The semantic web therefore provides a formalism—known as (projected) well-designed pattern trees (pWDPTs)—that tackles this problem: pWDPTs allow us to formulate queries that match parts of the query over the data if available, but do not ignore answers of the remaining query otherwise. Here we abstract away the specifics of semantic web applications and study pWDPTs over arbitrary relational schemas. Since the language of pWDPTs subsumes CQs, their evaluation problem is intractable. We identify structural properties of pWDPTs that lead to (fixed-parameter) tractability of various variants of the evaluation problem. We also show that checking if a pWDPT is equivalent to one in our tractable class is in 2EXPTIME. As a corollary, we obtain fixed-parameter tractability of evaluation for pWDPTs with such good behavior. Our techniques also allow us to develop a theory of approximations for pWDPTs.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"9 1","pages":"1 - 44"},"PeriodicalIF":0.0,"publicationDate":"2018-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81375781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Lightweight Monitoring of Distributed Streams 分布式流的轻量级监控

ACM Transactions on Database Systems (TODS)

Pub Date : 2018-07-31 DOI: 10.1145/3226113

A. Lazerson, D. Keren, A. Schuster

As data becomes dynamic, large, and distributed, there is increasing demand for what have become known as distributed stream algorithms. Since continuously collecting the data to a central server and processing it there is infeasible, a common approach is to define local conditions at the distributed nodes, such that—as long as they are maintained—some desirable global condition holds. Previous methods derived local conditions focusing on communication efficiency. While proving very useful for reducing the communication volume, these local conditions often suffer from heavy computational burden at the nodes. The computational complexity of the local conditions affects both the runtime and the energy consumption. These are especially critical for resource-limited devices like smartphones and sensor nodes. Such devices are becoming more ubiquitous due to the recent trend toward smart cities and the Internet of Things. To accommodate for high data rates and limited resources of these devices, it is crucial that the local conditions be quickly and efficiently evaluated. Here we propose a novel approach, designated CB (for Convex/Concave Bounds). CB defines local conditions using suitably chosen convex and concave functions. Lightweight and simple, these local conditions can be rapidly checked on the fly. CB’s superiority over the state-of-the-art is demonstrated in its reduced runtime and power consumption, by up to six orders of magnitude in some cases. As an added bonus, CB also reduced communication overhead in all the tested application scenarios.

随着数据变得动态、庞大和分布式，对分布式流算法的需求也在不断增加。由于连续地将数据收集到中央服务器并在那里进行处理是不可实现的，因此一种常见的方法是在分布式节点上定义局部条件，这样——只要它们得到维护——一些理想的全局条件就会保持。以前的方法推导了关注通信效率的局部条件。虽然事实证明这些局部条件对于减少通信量非常有用，但这些局部条件通常在节点上遭受沉重的计算负担。局部条件的计算复杂度既影响运行时间，也影响能耗。这对于智能手机和传感器节点等资源有限的设备尤其重要。由于最近智能城市和物联网的趋势，这些设备变得越来越普遍。为了适应这些设备的高数据速率和有限的资源，快速有效地评估当地条件至关重要。在这里，我们提出了一种新的方法，称为CB(凸/凹边界)。CB使用适当选择的凸函数和凹函数定义局部条件。重量轻，简单，这些局部条件可以在飞行中快速检查。CB的优势体现在其运行时间和功耗的降低上，在某些情况下可降低6个数量级。作为额外的好处，CB还减少了所有测试应用程序场景中的通信开销。

{"title":"Lightweight Monitoring of Distributed Streams","authors":"A. Lazerson, D. Keren, A. Schuster","doi":"10.1145/3226113","DOIUrl":"https://doi.org/10.1145/3226113","url":null,"abstract":"As data becomes dynamic, large, and distributed, there is increasing demand for what have become known as distributed stream algorithms. Since continuously collecting the data to a central server and processing it there is infeasible, a common approach is to define local conditions at the distributed nodes, such that—as long as they are maintained—some desirable global condition holds. Previous methods derived local conditions focusing on communication efficiency. While proving very useful for reducing the communication volume, these local conditions often suffer from heavy computational burden at the nodes. The computational complexity of the local conditions affects both the runtime and the energy consumption. These are especially critical for resource-limited devices like smartphones and sensor nodes. Such devices are becoming more ubiquitous due to the recent trend toward smart cities and the Internet of Things. To accommodate for high data rates and limited resources of these devices, it is crucial that the local conditions be quickly and efficiently evaluated. Here we propose a novel approach, designated CB (for Convex/Concave Bounds). CB defines local conditions using suitably chosen convex and concave functions. Lightweight and simple, these local conditions can be rapidly checked on the fly. CB’s superiority over the state-of-the-art is demonstrated in its reduced runtime and power consumption, by up to six orders of magnitude in some cases. As an added bonus, CB also reduced communication overhead in all the tested application scenarios.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"1 1","pages":"1 - 37"},"PeriodicalIF":0.0,"publicationDate":"2018-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88143836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Adaptive Asynchronous Parallelization of Graph Algorithms 图算法的自适应异步并行化

ACM Transactions on Database Systems (TODS)

Pub Date : 2018-05-27 DOI: 10.1145/3397491

W. Fan, Ping Lu, Wenyuan Yu, Jingbo Xu, Qiang Yin, Xiaojian Luo, Jingren Zhou, Ruochun Jin

This article proposes an Adaptive Asynchronous Parallel (AAP) model for graph computations. As opposed to Bulk Synchronous Parallel (BSP) and Asynchronous Parallel (AP) models, AAP reduces both stragglers and stale computations by dynamically adjusting relative progress of workers. We show that BSP, AP, and Stale Synchronous Parallel model (SSP) are special cases of AAP. Better yet, AAP optimizes parallel processing by adaptively switching among these models at different stages of a single execution. Moreover, employing the programming model of GRAPE, AAP aims to parallelize existing sequential algorithms based on simultaneous fixpoint computation with partial and incremental evaluation. Under a monotone condition, AAP guarantees to converge at correct answers if the sequential algorithms are correct. Furthermore, we show that AAP can optimally simulate MapReduce, PRAM, BSP, AP, and SSP. Using real-life and synthetic graphs, we experimentally verify that AAP outperforms BSP, AP, and SSP for a variety of graph computations.

本文提出了一种图计算的自适应异步并行(AAP)模型。与批量同步并行(BSP)和异步并行(AP)模型相反，AAP通过动态调整工人的相对进度来减少掉队者和过时的计算。我们证明了BSP, AP和Stale同步并行模型(SSP)是AAP的特殊情况。更好的是，AAP通过在单个执行的不同阶段自适应地在这些模型之间切换来优化并行处理。此外，AAP还利用GRAPE的规划模型，将现有的基于部分和增量计算的同时不动点计算的顺序算法并行化。在单调条件下，AAP保证在序列算法正确的情况下收敛于正确答案。此外，我们表明AAP可以最优地模拟MapReduce, PRAM, BSP, AP和SSP。使用现实生活和合成图，我们实验验证了AAP在各种图计算中优于BSP, AP和SSP。

引用次数: 27

TriAL 试验

ACM Transactions on Database Systems (TODS)

Pub Date : 2018-03-23 DOI: 10.1145/3154385

L. Libkin, Juan L. Reutter, Adrián Soto, D. Vrgoc

Navigational queries over RDF data are viewed as one of the main applications of graph query languages, and yet the standard model of graph databases—essentially labeled graphs—is different from the triples-based model of RDF. While encodings of RDF databases into graph data exist, we show that even the most natural ones are bound to lose some functionality when used in conjunction with graph query languages. The solution is to work directly with triples, but then many properties taken for granted in the graph database context (e.g., reachability) lose their natural meaning. Our goal is to introduce languages that work directly over triples and are closed, i.e., they produce sets of triples, rather than graphs. Our basic language is called TriAL, or Triple Algebra: it guarantees closure properties by replacing the product with a family of join operations. We extend TriAL with recursion and explain why such an extension is more intricate for triples than for graphs. We present a declarative language, namely a fragment of datalog, capturing the recursive algebra. For both languages, the combined complexity of query evaluation is given by low-degree polynomials. We compare our language with previously studied graph query languages such as adaptations of XPath, regular path queries, and nested regular expressions; many of these languages are subsumed by the recursive triple algebra. We also provide an implementation of recursive TriAL on top of a relational query engine, and we show its usefulness by running a wide array of navigational queries over real-world RDF data, while at the same time testing how our implementation compares to existing RDF systems.

对RDF数据的导航查询被视为图查询语言的主要应用之一，但是图数据库的标准模型(本质上是标记的图)不同于RDF的基于三元组的模型。虽然存在将RDF数据库编码为图数据的方法，但我们表明，即使是最自然的RDF数据库，在与图查询语言结合使用时，也必然会失去一些功能。解决方案是直接使用三元组，但是在图数据库上下文中，许多被认为是理所当然的属性(例如，可达性)失去了其自然意义。我们的目标是引入直接在三元组上工作并且是封闭的语言，也就是说，它们产生三元组的集合，而不是图。我们的基本语言叫做TriAL，或者Triple Algebra:它通过用一系列连接操作替换乘积来保证闭包属性。我们将TriAL扩展为递归，并解释为什么这种扩展对于三元组比对于图更复杂。我们提出了一种声明性语言，即数据的片段，捕捉递归代数。对于两种语言，查询求值的组合复杂度由低次多项式表示。我们将我们的语言与之前研究过的图形查询语言(如XPath的改编、正则路径查询和嵌套正则表达式)进行比较;这些语言中的许多都被归为递归三重代数。我们还在关系查询引擎的基础上提供了递归TriAL的实现，并通过在真实的RDF数据上运行大量导航查询来展示它的有用性，同时测试我们的实现与现有RDF系统的比较。

{"title":"TriAL","authors":"L. Libkin, Juan L. Reutter, Adrián Soto, D. Vrgoc","doi":"10.1145/3154385","DOIUrl":"https://doi.org/10.1145/3154385","url":null,"abstract":"Navigational queries over RDF data are viewed as one of the main applications of graph query languages, and yet the standard model of graph databases—essentially labeled graphs—is different from the triples-based model of RDF. While encodings of RDF databases into graph data exist, we show that even the most natural ones are bound to lose some functionality when used in conjunction with graph query languages. The solution is to work directly with triples, but then many properties taken for granted in the graph database context (e.g., reachability) lose their natural meaning. Our goal is to introduce languages that work directly over triples and are closed, i.e., they produce sets of triples, rather than graphs. Our basic language is called TriAL, or Triple Algebra: it guarantees closure properties by replacing the product with a family of join operations. We extend TriAL with recursion and explain why such an extension is more intricate for triples than for graphs. We present a declarative language, namely a fragment of datalog, capturing the recursive algebra. For both languages, the combined complexity of query evaluation is given by low-degree polynomials. We compare our language with previously studied graph query languages such as adaptations of XPath, regular path queries, and nested regular expressions; many of these languages are subsumed by the recursive triple algebra. We also provide an implementation of recursive TriAL on top of a relational query engine, and we show its usefulness by running a wide array of navigational queries over real-world RDF data, while at the same time testing how our implementation compares to existing RDF systems.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"34 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2018-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75995232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Practical Private Range Search in Depth 实用的私人范围搜索的深度

ACM Transactions on Database Systems (TODS)

Pub Date : 2018-03-12 DOI: 10.1145/3167971

I. Demertzis, Stavros Papadopoulos, Odysseas Papapetrou, Antonios Deligiannakis, M. Garofalakis, Charalampos Papamanthou

We consider a data owner that outsources its dataset to an untrusted server. The owner wishes to enable the server to answer range queries on a single attribute, without compromising the privacy of the data and the queries. There are several schemes on “practical” private range search (mainly in database venues) that attempt to strike a trade-off between efficiency and security. Nevertheless, these methods either lack provable security guarantees or permit unacceptable privacy leakages. In this article, we take an interdisciplinary approach, which combines the rigor of security formulations and proofs with efficient data management techniques. We construct a wide set of novel schemes with realistic security/performance trade-offs, adopting the notion of Searchable Symmetric Encryption (SSE), primarily proposed for keyword search. We reduce range search to multi-keyword search using range-covering techniques with tree-like indexes, and formalize the problem as Range Searchable Symmetric Encryption (RSSE). We demonstrate that, given any secure SSE scheme, the challenge boils down to (i) formulating leakages that arise from the index structure and (ii) minimizing false positives incurred by some schemes under heavy data skew. We also explain an important concept in the recent SSE bibliography, namely locality, and design generic and specialized ways to attribute locality to our RSSE schemes. Moreover, we are the first to devise secure schemes for answering range aggregate queries, such as range sums and range min/max. We analytically detail the superiority of our proposals over prior work and experimentally confirm their practicality.

我们假设数据所有者将其数据集外包给不受信任的服务器。所有者希望使服务器能够回答单个属性的范围查询，而不损害数据和查询的隐私。有几种“实用的”私人范围搜索方案(主要是在数据库场所)试图在效率和安全性之间取得平衡。然而，这些方法要么缺乏可证明的安全保证，要么允许不可接受的隐私泄露。在本文中，我们采用跨学科的方法，将安全性公式和证明的严谨性与高效的数据管理技术相结合。我们构建了一系列具有现实安全性/性能权衡的新方案，采用可搜索对称加密(SSE)的概念，主要用于关键字搜索。我们使用带树状索引的范围覆盖技术将范围搜索简化为多关键字搜索，并将问题形式化为范围可搜索对称加密(RSSE)。我们证明，给定任何安全SSE方案，挑战归结为(i)制定由索引结构引起的泄漏和(ii)最小化在严重数据倾斜下某些方案引起的误报。我们还解释了最近SSE参考书目中的一个重要概念，即局部性，并设计了通用和专门的方法来将局部性归因于我们的RSSE方案。此外，我们是第一个设计安全方案来回答范围聚合查询，如范围总和和范围最小/最大。我们详细分析了我们的建议相对于先前工作的优越性，并通过实验证实了它们的实用性。

{"title":"Practical Private Range Search in Depth","authors":"I. Demertzis, Stavros Papadopoulos, Odysseas Papapetrou, Antonios Deligiannakis, M. Garofalakis, Charalampos Papamanthou","doi":"10.1145/3167971","DOIUrl":"https://doi.org/10.1145/3167971","url":null,"abstract":"We consider a data owner that outsources its dataset to an untrusted server. The owner wishes to enable the server to answer range queries on a single attribute, without compromising the privacy of the data and the queries. There are several schemes on “practical” private range search (mainly in database venues) that attempt to strike a trade-off between efficiency and security. Nevertheless, these methods either lack provable security guarantees or permit unacceptable privacy leakages. In this article, we take an interdisciplinary approach, which combines the rigor of security formulations and proofs with efficient data management techniques. We construct a wide set of novel schemes with realistic security/performance trade-offs, adopting the notion of Searchable Symmetric Encryption (SSE), primarily proposed for keyword search. We reduce range search to multi-keyword search using range-covering techniques with tree-like indexes, and formalize the problem as Range Searchable Symmetric Encryption (RSSE). We demonstrate that, given any secure SSE scheme, the challenge boils down to (i) formulating leakages that arise from the index structure and (ii) minimizing false positives incurred by some schemes under heavy data skew. We also explain an important concept in the recent SSE bibliography, namely locality, and design generic and specialized ways to attribute locality to our RSSE schemes. Moreover, we are the first to devise secure schemes for answering range aggregate queries, such as range sums and range min/max. We analytically detail the superiority of our proposals over prior work and experimentally confirm their practicality.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"46 1","pages":"1 - 52"},"PeriodicalIF":0.0,"publicationDate":"2018-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86251193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Computing Optimal Repairs for Functional Dependencies 计算功能依赖的最优修复

ACM Transactions on Database Systems (TODS)

Pub Date : 2017-12-20 DOI: 10.1145/3360904

Ester Livshits, B. Kimelfeld, Sudeepa Roy

We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair), which is obtained by a minimum number of tuple deletions, and an optimal update repair (optimal U-repair), which is obtained by a minimum number of value (cell) updates. For computing an optimal S-repair, we present a polynomial-time algorithm that succeeds on certain sets of FDs and fails on others. We prove the following about the algorithm. When it succeeds, it can also incorporate weighted tuples and duplicate tuples. When it fails, the problem is NP-hard and, in fact, APX-complete (hence, cannot be approximated better than some constant). Thus, we establish a dichotomy in the complexity of computing an optimal S-repair. We present general analysis techniques for the complexity of computing an optimal U-repair, some based on the dichotomy for S-repairs. We also draw a connection to a past dichotomy in the complexity of finding a “most probable database” that satisfies a set of FDs with a single attribute on the left-hand side; the case of general FDs was left open, and we show how our dichotomy provides the missing generalization and thereby settles the open problem.

我们研究了在完整性约束为功能依赖(fd)的情况下，计算不一致数据库的最佳修复的复杂性。我们关注两种类型的修复:最优子集修复(最优s修复)，它是通过最小数量的元组删除获得的，以及最优更新修复(最优u修复)，它是通过最小数量的值(单元)更新获得的。为了计算最优s -修复，我们提出了一种多项式时间算法，该算法在某些fd集上成功，而在其他fd集上失败。关于该算法，我们证明了以下几点。成功后，它还可以合并加权元组和重复元组。当它失败时，问题是np困难的，实际上是apx完全的(因此，不能比某个常数更好地近似)。因此，我们建立了计算最优s -修复复杂度的二分法。我们提出了计算最优u -修理的复杂性的一般分析技术，其中一些是基于s -修理的二分法。我们还将其与过去的二分法联系起来，即寻找满足左侧具有单个属性的一组fd的“最可能数据库”的复杂性;一般fd的情况是开放的，我们展示了我们的二分法如何提供缺失的泛化，从而解决了开放的问题。

{"title":"Computing Optimal Repairs for Functional Dependencies","authors":"Ester Livshits, B. Kimelfeld, Sudeepa Roy","doi":"10.1145/3360904","DOIUrl":"https://doi.org/10.1145/3360904","url":null,"abstract":"We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair), which is obtained by a minimum number of tuple deletions, and an optimal update repair (optimal U-repair), which is obtained by a minimum number of value (cell) updates. For computing an optimal S-repair, we present a polynomial-time algorithm that succeeds on certain sets of FDs and fails on others. We prove the following about the algorithm. When it succeeds, it can also incorporate weighted tuples and duplicate tuples. When it fails, the problem is NP-hard and, in fact, APX-complete (hence, cannot be approximated better than some constant). Thus, we establish a dichotomy in the complexity of computing an optimal S-repair. We present general analysis techniques for the complexity of computing an optimal U-repair, some based on the dichotomy for S-repairs. We also draw a connection to a past dichotomy in the complexity of finding a “most probable database” that satisfies a set of FDs with a single attribute on the left-hand side; the case of general FDs was left open, and we show how our dichotomy provides the missing generalization and thereby settles the open problem.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"13 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2017-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78869152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation 一类具有计数、交错和无序连接的正则表达式中的线性时间隶属度

ACM Transactions on Database Systems (TODS)

Pub Date : 2017-11-13 DOI: 10.1145/3132701

Dario Colazzo, G. Ghelli, C. Sartiani

Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving (shuffle) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-hard, which is unacceptable in most practical scenarios. In this article, we study the problem of membership checking for a restricted class of these extended REs, called conflict-free REs, which are expressive enough to cover the vast majority of real-world applications. We present several polynomial algorithms for membership checking over conflict-free REs. The algorithms are all polynomial and differ in terms of adopted optimization techniques and in the kind of supported operators. As a particular application, we generalize the approach to check membership of Extensible Markup Language trees into a class of EDTDs (Extended Document Type Definitions) that models the crucial aspects of DTDs (Document Type Definitions) and XSD (XML Schema Definitions) schemas. Results about an extensive experimental analysis validate the efficiency of the presented membership checking techniques.

正则表达式(正则表达式)在数据库和编程语言中无处不在。虽然许多应用程序使用带有交错(shuffle)和无序连接操作符的扩展正则，但这种扩展严重影响了基本操作的复杂性，特别是使成员关系NP-hard，这在大多数实际场景中是不可接受的。在本文中，我们研究了这些扩展的正则的一个受限类的成员检查问题，称为无冲突正则，它的表达能力足以覆盖绝大多数实际应用程序。我们提出了几种多项式算法用于无冲突正则的隶属性检查。这些算法都是多项式的，在采用的优化技术和支持的操作符类型方面有所不同。作为一个特殊的应用程序，我们将检查可扩展标记语言树的成员关系的方法推广到一类edtd(扩展文档类型定义)中，这些edtd(扩展文档类型定义)对dtd(文档类型定义)和XSD (XML模式定义)模式的关键方面进行建模。大量的实验分析结果验证了所提出的隶属度检验技术的有效性。

{"title":"Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation","authors":"Dario Colazzo, G. Ghelli, C. Sartiani","doi":"10.1145/3132701","DOIUrl":"https://doi.org/10.1145/3132701","url":null,"abstract":"Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving (shuffle) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-hard, which is unacceptable in most practical scenarios. In this article, we study the problem of membership checking for a restricted class of these extended REs, called conflict-free REs, which are expressive enough to cover the vast majority of real-world applications. We present several polynomial algorithms for membership checking over conflict-free REs. The algorithms are all polynomial and differ in terms of adopted optimization techniques and in the kind of supported operators. As a particular application, we generalize the approach to check membership of Extensible Markup Language trees into a class of EDTDs (Extended Document Type Definitions) that models the crucial aspects of DTDs (Document Type Definitions) and XSD (XML Schema Definitions) schemas. Results about an extensive experimental analysis validate the efficiency of the presented membership checking techniques.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"134 1","pages":"1 - 44"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77382214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Blazes 大火

ACM Transactions on Database Systems (TODS)

Pub Date : 2017-10-27 DOI: 10.1145/3110214

P. Alvaro, Neil Conway, Joseph M. Hellerstein, D. Maier

Distributed consistency is perhaps the most-discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this article, we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system and another using the Bloom declarative language.

分布式一致性可能是当今分布式系统中讨论最多的话题。协调协议可以确保一致性，但在实践中，除非谨慎使用，否则它们会导致不理想的性能。可伸缩的分布式体系结构尽可能地避免协调，但是协调不足的系统可能在故障下表现出行为异常，这通常是极其难以调试的。这对分布式系统架构师和开发人员提出了重大挑战。在本文中，我们介绍了Blazes，这是一个跨平台的程序分析框架，它(a)识别需要协调以确保一致执行的程序位置，并且(b)自动合成特定于应用程序的协调代码，可以显著优于通用技术。我们提出了两个案例研究，一个使用Twitter Storm系统中的注释程序，另一个使用Bloom声明性语言。

引用次数: 6

EmptyHeaded EmptyHeaded

ACM Transactions on Database Systems (TODS)

Pub Date : 2017-10-27 DOI: 10.1145/3129246

Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, K. Olukotun, C. Ré

There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail). High-level engines are easier to use but are orders of magnitude slower than the low-level graph engines. We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. At the core of EmptyHeaded’s design is a new class of join algorithms that satisfy strong theoretical guarantees, but have thus far not achieved performance comparable to that of specialized graph processing engines. To achieve high performance, EmptyHeaded introduces a new join engine architecture, including a novel query optimizer and execution engine that leverage single-instruction multiple data (SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level approaches by up to three orders of magnitude on graph pattern queries, PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude faster than many low-level baselines. We validate that EmptyHeaded competes with the best-of-breed low-level engine (Galois), achieving comparable performance on PageRank and at most 3× worse performance on SSSP. Finally, we show that the EmptyHeaded design can easily be extended to accommodate a standard resource description framework (RDF) workload, the LUBM benchmark. On the LUBM benchmark, we show that EmptyHeaded can compete with and sometimes outperform two high-level, but specialized RDF baselines (TripleBit and RDF-3X), while outperforming MonetDB by up to three orders of magnitude and LogicBlox by up to two orders of magnitude.

有两种类型的高性能图形处理引擎:低级引擎和高级引擎。低级引擎(Galois, PowerGraph, Snap)提供优化的数据结构和计算模型，但要求用户编写低级命令式代码，因此确保效率是用户的负担。在高级引擎中，用户使用诸如datalog (SociaLite)或SQL (Grail)之类的查询语言编写。高级引擎更容易使用，但比低级图形引擎慢几个数量级。我们提出了emptyheading，这是一个高级引擎，它支持丰富的类似数据的查询语言，并实现了与低级引擎相当的性能。emptyhead设计的核心是一类新的连接算法，它满足了强大的理论保证，但迄今为止还没有达到与专门的图形处理引擎相媲美的性能。为了实现高性能，emptyheading引入了一个新的连接引擎架构，包括一个新的查询优化器和利用单指令多数据(SIMD)并行性的执行引擎。使用这种体系结构，emptyheading在图模式查询、PageRank和单源最短路径(SSSP)上的性能比高级方法高出三个数量级，并且比许多低级基线快一个数量级。我们验证了EmptyHeaded与同类中最好的低级引擎(Galois)竞争，在PageRank上取得了相当的性能，在SSSP上的性能最多差3倍。最后，我们展示了可以很容易地扩展EmptyHeaded设计，以适应标准资源描述框架(RDF)工作负载，即LUBM基准。在LUBM基准测试中，我们展示了emptyheading可以与两个高级但专门的RDF基线(TripleBit和RDF- 3x)竞争，有时甚至优于它们，同时比MonetDB和LogicBlox的性能分别高出三个数量级和两个数量级。

{"title":"EmptyHeaded","authors":"Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, K. Olukotun, C. Ré","doi":"10.1145/3129246","DOIUrl":"https://doi.org/10.1145/3129246","url":null,"abstract":"There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail). High-level engines are easier to use but are orders of magnitude slower than the low-level graph engines. We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. At the core of EmptyHeaded’s design is a new class of join algorithms that satisfy strong theoretical guarantees, but have thus far not achieved performance comparable to that of specialized graph processing engines. To achieve high performance, EmptyHeaded introduces a new join engine architecture, including a novel query optimizer and execution engine that leverage single-instruction multiple data (SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level approaches by up to three orders of magnitude on graph pattern queries, PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude faster than many low-level baselines. We validate that EmptyHeaded competes with the best-of-breed low-level engine (Galois), achieving comparable performance on PageRank and at most 3× worse performance on SSSP. Finally, we show that the EmptyHeaded design can easily be extended to accommodate a standard resource description framework (RDF) workload, the LUBM benchmark. On the LUBM benchmark, we show that EmptyHeaded can compete with and sometimes outperform two high-level, but specialized RDF baselines (TripleBit and RDF-3X), while outperforming MonetDB by up to three orders of magnitude and LogicBlox by up to two orders of magnitude.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"43 1","pages":"1 - 44"},"PeriodicalIF":0.0,"publicationDate":"2017-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74231482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Declarative Probabilistic Programming with Datalog 数据表的声明性概率规划

ACM Transactions on Database Systems (TODS)

Pub Date : 2017-10-27 DOI: 10.1145/3132700

V. Bárány, B. T. Cate, B. Kimelfeld, Dan Olteanu, Zografoula Vagena

Probabilistic programming languages are used for developing statistical models. They typically consist of two components: a specification of a stochastic process (the prior) and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. In this article, we establish a probabilistic-programming extension of Datalog that, on the one hand, allows for defining a rich family of statistical models, and on the other hand retains the fundamental properties of declarativity. Our proposed extension provides mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.

概率编程语言用于开发统计模型。它们通常由两部分组成:随机过程的说明(先验)和将概率空间限制在条件子空间(后验)的观察说明。这种形式的用例包括机器学习和人工智能中的算法开发。在本文中，我们建立了Datalog的概率编程扩展，它一方面允许定义丰富的统计模型族，另一方面保留了声明性的基本属性。我们提出的扩展提供了包括常见数值概率函数的机制;特别是，规则的结论可能包含从这些函数中得出的值。程序的语义是输入数据库相对于程序的可能结果的概率分布。通过外延关系和内延关系上的完整性约束，观察结果自然地被合并在一起。所得到的语义在不同的情况下是健壮的，并且对于保持逻辑等价的重写是不变的。

{"title":"Declarative Probabilistic Programming with Datalog","authors":"V. Bárány, B. T. Cate, B. Kimelfeld, Dan Olteanu, Zografoula Vagena","doi":"10.1145/3132700","DOIUrl":"https://doi.org/10.1145/3132700","url":null,"abstract":"Probabilistic programming languages are used for developing statistical models. They typically consist of two components: a specification of a stochastic process (the prior) and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. In this article, we establish a probabilistic-programming extension of Datalog that, on the one hand, allows for defining a rich family of statistical models, and on the other hand retains the fundamental properties of declarativity. Our proposed extension provides mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"11 1","pages":"1 - 35"},"PeriodicalIF":0.0,"publicationDate":"2017-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86495821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37