Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems最新文献_第3页

The Limits of Efficiency for Open- and Closed-World Query Evaluation Under Guarded TGDs 保护TGDs下开放世界和封闭世界查询求值的效率极限

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-12-28 DOI: 10.1145/3375395.3387653

P. Barceló, V. Dalmau, C. Feier, C. Lutz, Andreas Pieris

Ontology-mediated querying and querying in the presence of constraints are two key database problems where tuple-generating dependencies (TGDs) play a central role. In ontology-mediated querying, TGDs can formalize the ontology and thus derive additional facts from the given data, while in querying in the presence of constraints, they restrict the set of admissible databases. In this work, we study the limits of efficient query evaluation in the context of the above two problems, focusing on guarded and frontier-guarded TGDs and on UCQs as the actual queries. We show that a class of ontology-mediated queries (OMQs) based on guarded TGDs can be evaluated in FPT iff the OMQs in the class are equivalent to OMQs in which the actual query has bounded treewidth, up to some reasonable assumptions. For querying in the presence of constraints, we consider classes of constraint-query specifications (CQSs) that bundle a set of constraints with an actual query. We show a dichotomy result for CQSs based on guarded TGDs that parallels the one for OMQs except that, additionally, FPT coincides with PTime combined complexity. The proof is based on a novel connection between OMQ and CQS evaluation. Using a direct proof, we also show a similar dichotomy result, again up to some reasonable assumptions, for CQSs based on frontier-guarded TGDs with a bounded number of atoms in TGD heads. Our results on CQSs can be viewed as extensions of Grohe's well-known characterization of the tractable classes of CQs (without constraints). Like Grohe's characterization, all the above results assume that the arity of relation symbols is bounded by a constant. We also study the associated meta problems, i.e., whether a given OMQ or CQS is equivalent to one in which the actual query has bounded treewidth.

本体中介查询和存在约束的查询是两个关键的数据库问题，其中元组生成依赖项(tgd)起着核心作用。在本体中介查询中，tgd可以形式化本体，从而从给定数据中派生出额外的事实，而在存在约束的查询中，它们限制了可接受的数据库集。在这项工作中，我们研究了在上述两个问题的背景下，高效查询评估的限制，重点关注守卫和边界守卫的tgd，以及作为实际查询的ucq。我们证明了一类基于受保护的tgd的本体中介查询(omq)可以在FPT中求值，前提是该类中的omq等同于实际查询具有有界树宽的omq，直到一些合理的假设。对于存在约束的查询，我们考虑约束查询规范(CQSs)类，它们将一组约束与实际查询捆绑在一起。我们展示了基于受保护的TGDs的cqs的二分法结果与omq的二分法结果相似，除了FPT与PTime组合复杂性一致。该证明是基于OMQ和CQS评估之间的一种新的联系。使用直接证明，我们也显示了类似的二分类结果，同样取决于一些合理的假设，对于基于边界保护的TGD的cqs, TGD头部中有有限数量的原子。我们关于CQs的结果可以看作是Grohe关于CQs可处理类(无约束)的著名特征的扩展。与Grohe的描述一样，上述所有结果都假设关系符号的密度由一个常数限定。我们还研究了相关的元问题，即给定的OMQ或CQS是否等同于实际查询具有有界树宽的问题。

{"title":"The Limits of Efficiency for Open- and Closed-World Query Evaluation Under Guarded TGDs","authors":"P. Barceló, V. Dalmau, C. Feier, C. Lutz, Andreas Pieris","doi":"10.1145/3375395.3387653","DOIUrl":"https://doi.org/10.1145/3375395.3387653","url":null,"abstract":"Ontology-mediated querying and querying in the presence of constraints are two key database problems where tuple-generating dependencies (TGDs) play a central role. In ontology-mediated querying, TGDs can formalize the ontology and thus derive additional facts from the given data, while in querying in the presence of constraints, they restrict the set of admissible databases. In this work, we study the limits of efficient query evaluation in the context of the above two problems, focusing on guarded and frontier-guarded TGDs and on UCQs as the actual queries. We show that a class of ontology-mediated queries (OMQs) based on guarded TGDs can be evaluated in FPT iff the OMQs in the class are equivalent to OMQs in which the actual query has bounded treewidth, up to some reasonable assumptions. For querying in the presence of constraints, we consider classes of constraint-query specifications (CQSs) that bundle a set of constraints with an actual query. We show a dichotomy result for CQSs based on guarded TGDs that parallels the one for OMQs except that, additionally, FPT coincides with PTime combined complexity. The proof is based on a novel connection between OMQ and CQS evaluation. Using a direct proof, we also show a similar dichotomy result, again up to some reasonable assumptions, for CQSs based on frontier-guarded TGDs with a bounded number of atoms in TGD heads. Our results on CQSs can be viewed as extensions of Grohe's well-known characterization of the tractable classes of CQs (without constraints). Like Grohe's characterization, all the above results assume that the arity of relation symbols is bounded by a constant. We also study the associated meta problems, i.e., whether a given OMQ or CQS is equivalent to one in which the actual query has bounded treewidth.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133883094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Aggregate Queries on Sparse Databases 稀疏数据库的聚合查询

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-12-27 DOI: 10.1145/3375395.3387660

Szymon Toruńczyk

We propose an algebraic framework for studying efficient algorithms for query evaluation, aggregation, enumeration, and maintenance under updates, on sparse databases. Our framework allows to treat those problems in a unified way, by considering various semirings, depending on the considered problem. As a concrete application, we propose a powerful query language extending first-order logic by aggregation in multiple semirings. We obtain an optimal algorithm for computing the answers of such queries on sparse databases. More precisely, given a database from a fixed class with bounded expansion, the algorithm computes in linear timea data structure which allows to enumerate the set of answers to the query, with constant delay between two outputs.

我们提出了一个代数框架，用于研究在稀疏数据库上进行查询评估、聚合、枚举和更新维护的有效算法。我们的框架允许以统一的方式处理这些问题，根据所考虑的问题考虑不同的半环。作为具体应用，我们提出了一种强大的查询语言，通过在多个半环中聚合来扩展一阶逻辑。我们得到了在稀疏数据库上计算这类查询答案的最优算法。更准确地说，给定一个有界扩展的固定类数据库，该算法以线性时间数据结构进行计算，该数据结构允许枚举查询的答案集，两个输出之间具有恒定的延迟。

引用次数: 12

Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases 求解概率数据库中内延猜想的一个特例

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-12-26 DOI: 10.1145/3375395.3387642

Mikaël Monet

We consider the problem of exact probabilistic inference for Union of Conjunctive Queries (UCQs) on tuple-independent databases. For this problem, two approaches currently coexist. In the extensional method, query evaluation is performed by exploiting the structure of the query, and relies heavily on the use of the inclusion--exclusion principle. In the intensional method, one first builds a representation of the lineage of the query in a tractable formalism of knowledge compilation. The chosen formalism should then ensure that the probability can be efficiently computed using simple disjointness and independence assumptions, without the need of performing inclusion--exclusion. The extensional approach has long been thought to be strictly more powerful than the intensional approach, the reason being that for some queries, the use of inclusion--exclusion seemed unavoidable. In this paper we introduce a new technique to construct lineage representations as deterministic decomposable circuits in polynomial time. We prove that this technique applies to a class of UCQs that had been conjectured to separate the complexity of the two approaches. In essence, we show that relying on the inclusion--exclusion formula can be avoided by using negation. This result brings back hope to prove that the intensional approach can handle all tractable UCQs.

研究了元独立数据库上连接查询联合的精确概率推理问题。对于这个问题，目前有两种方法并存。在扩展方法中，查询计算是通过利用查询的结构来执行的，并且严重依赖于包含-排除原则的使用。在内涵方法中，首先以易于处理的知识汇编形式构建查询沿袭的表示。然后，所选择的形式应该确保概率可以使用简单的不连接性和独立性假设有效地计算出来，而不需要执行包含-排除。长期以来，外延方法一直被认为严格地比内延方法更强大，原因是对于某些查询，使用包含-排除似乎是不可避免的。本文介绍了一种在多项式时间内将谱系表示构造为确定性可分解电路的新技术。我们证明了这种技术适用于一类ucq，这些ucq被推测用于分离两种方法的复杂性。从本质上讲，我们表明可以通过使用否定来避免依赖包含-排除公式。这一结果为证明内蕴方法可以处理所有可处理的ucq带来了希望。

{"title":"Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases","authors":"Mikaël Monet","doi":"10.1145/3375395.3387642","DOIUrl":"https://doi.org/10.1145/3375395.3387642","url":null,"abstract":"We consider the problem of exact probabilistic inference for Union of Conjunctive Queries (UCQs) on tuple-independent databases. For this problem, two approaches currently coexist. In the extensional method, query evaluation is performed by exploiting the structure of the query, and relies heavily on the use of the inclusion--exclusion principle. In the intensional method, one first builds a representation of the lineage of the query in a tractable formalism of knowledge compilation. The chosen formalism should then ensure that the probability can be efficiently computed using simple disjointness and independence assumptions, without the need of performing inclusion--exclusion. The extensional approach has long been thought to be strictly more powerful than the intensional approach, the reason being that for some queries, the use of inclusion--exclusion seemed unavoidable. In this paper we introduce a new technique to construct lineage representations as deterministic decomposable circuits in polynomial time. We prove that this technique applies to a class of UCQs that had been conjectured to separate the complexity of the two approaches. In essence, we show that relying on the inclusion--exclusion formula can be avoided by using negation. This result brings back hope to prove that the intensional approach can handle all tractable UCQs.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130823103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration 用随机访问和随机顺序枚举回答连接查询的并集

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-12-23 DOI: 10.1145/3375395.3387662

Nofar Carmeli, Shai Zeevi, Christoph Berkholz, B. Kimelfeld, Nicole Schweikardt

As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration. In this paper, we seek a structure that supports the more demanding task of a "random permutation": polylogarithmic-delay enumeration in truly random order. Enumeration of this kind is required if downstream applications assume that the intermediate results are representative of the whole result set in a statistically valuable manner. An even more demanding task is that of a "random access": polylogarithmic-time retrieval of an answer whose position is given. We establish that the free-connex acyclic CQs are tractable in all three senses: enumeration, random-order enumeration, and random access; and in the absence of self-joins, it follows from past results that every other CQ is intractable by each of the three (under some fine-grained complexity assumptions). However, the three yardsticks are separated in the case of a union of CQs (UCQ): while a union of free-connex acyclic CQs has a tractable enumeration, it may (provably) admit no random access. For such UCQs we devise a random-order enumeration whose delay is logarithmic in expectation. We also identify a subclass of UCQs for which we can provide random access with polylogarithmic access time. Finally, we present an implementation and an empirical study that show a considerable practical superiority of our random-order enumeration approach over state-of-the-art alternatives.

随着数据分析对数字系统变得越来越重要，对数据库查询进行特征化以进行更有效的评估也变得越来越重要。我们考虑了经过线性时间预处理阶段后具有多对数延迟的回答枚举的可追溯性尺度。通过在预处理阶段构造一个支持多对数延迟枚举的数据结构，可以获得这样的求值。在本文中，我们寻求一种结构来支持更苛刻的“随机排列”任务:真正随机顺序的多对数延迟枚举。如果下游应用程序假设中间结果以具有统计价值的方式代表整个结果集，则需要进行这种枚举。一个要求更高的任务是“随机存取”:用多对数时间检索给定位置的答案。证明了自由连通无环cq在枚举、随机顺序枚举和随机访问三种意义上都是可处理的;在没有自连接的情况下，从过去的结果可以得出，其他CQ对这三个CQ都是难以处理的(在一些细粒度的复杂性假设下)。然而，在cq的并集(UCQ)的情况下，这三个尺度是分开的:虽然自由连接的无环cq的并集具有可处理的枚举，但它可能(可证明地)不允许随机访问。对于这样的ucq，我们设计了一个随机顺序枚举，其延迟期望为对数。我们还确定了ucq的一个子类，我们可以为其提供具有多对数访问时间的随机访问。最后，我们提出了一个实现和实证研究，表明我们的随机顺序枚举方法比最先进的替代方法具有相当大的实际优势。

{"title":"Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration","authors":"Nofar Carmeli, Shai Zeevi, Christoph Berkholz, B. Kimelfeld, Nicole Schweikardt","doi":"10.1145/3375395.3387662","DOIUrl":"https://doi.org/10.1145/3375395.3387662","url":null,"abstract":"As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration. In this paper, we seek a structure that supports the more demanding task of a \"random permutation\": polylogarithmic-delay enumeration in truly random order. Enumeration of this kind is required if downstream applications assume that the intermediate results are representative of the whole result set in a statistically valuable manner. An even more demanding task is that of a \"random access\": polylogarithmic-time retrieval of an answer whose position is given. We establish that the free-connex acyclic CQs are tractable in all three senses: enumeration, random-order enumeration, and random access; and in the absence of self-joins, it follows from past results that every other CQ is intractable by each of the three (under some fine-grained complexity assumptions). However, the three yardsticks are separated in the case of a union of CQs (UCQ): while a union of free-connex acyclic CQs has a tractable enumeration, it may (provably) admit no random access. For such UCQs we devise a random-order enumeration whose delay is logarithmic in expectation. We also identify a subclass of UCQs for which we can provide random access with polylogarithmic access time. Finally, we present an implementation and an empirical study that show a considerable practical superiority of our random-order enumeration approach over state-of-the-art alternatives.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130883950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Counting Problems over Incomplete Databases 不完整数据库上的计数问题

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-12-23 DOI: 10.1145/3375395.3387656

M. Arenas, Pablo Barcel'o, Mikaël Monet

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query q, we consider the following two problems: given as input an incomplete database D, (a) return the number of completions of D that satisfy q; or (b) return or the number of valuations of the nulls of D yielding a completion that satisfies q. We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when q is a self-join-free conjunctive query, and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in D (what is called Codd tables); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations (for instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption). Moreover, we find that both (1) and (2) reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial randomized approximation scheme, in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.

我们研究了在不完整数据库的背景下出现的各种基本计数问题的复杂性，即关系数据库可以包含以标记null形式的未知值。具体来说，我们假设这些未知值的域是有限的，对于布尔查询q，我们考虑以下两个问题:给定一个不完整数据库D作为输入，(a)返回D满足q的补全个数;当q是一个自连接无连接查询时，我们得到了这些问题的# p -硬度和多项式时间可计算性之间的二分类，并研究了以下两个限制对复杂性的影响:(1)每个空在D中最多出现一次(称为Codd表);(2)每个空的定义域是相同的。粗略地说，我们证明了计算补全比计算估值困难得多(例如，后者总是在#P中，我们证明了前者在一些广泛相信的理论复杂性假设下不在#P中)。此外，我们发现(1)和(2)都降低了问题的复杂性。我们还研究了这些问题的近似性，并表明，虽然计数赋值总是有一个完全多项式随机化的近似方案，但在大多数情况下，计数补全没有。最后，我们考虑更具表现力的查询语言，并根据已知的复杂性类来定位我们的问题。

{"title":"Counting Problems over Incomplete Databases","authors":"M. Arenas, Pablo Barcel'o, Mikaël Monet","doi":"10.1145/3375395.3387656","DOIUrl":"https://doi.org/10.1145/3375395.3387656","url":null,"abstract":"We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query q, we consider the following two problems: given as input an incomplete database D, (a) return the number of completions of D that satisfy q; or (b) return or the number of valuations of the nulls of D yielding a completion that satisfies q. We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when q is a self-join-free conjunctive query, and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in D (what is called Codd tables); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations (for instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption). Moreover, we find that both (1) and (2) reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial randomized approximation scheme, in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116954819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Conjunctive Regular Path Queries with String Variables 使用字符串变量的合取规则路径查询

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-12-19 DOI: 10.1145/3375395.3387663

Markus L. Schmid

We introduce the class CXRPQ of conjunctive xregex path queries, which are obtained from conjunctive regular path queries (CRPQs) by adding string variables (also called backreferences) as found in practical implementations of regular expressions. CXRPQs can be considered user-friendly, since they combine two concepts that are well-established in practice: pattern-based graph queries and regular expressions with backreferences. Due to the string variables, CXRPQs can express inter-path dependencies, which are not expressible by CRPQs. The evaluation complexity of CXRPQs, if not further restricted, is PSpace-hard in data complexity. We identify three natural fragments with more acceptable evaluation complexity: their data complexity is in NL, while their combined complexity varies between ExpSpace, PSpace and NP. In terms of expressive power, we compare the CXRPQ-fragments with CRPQs and unions of CRPQs, and with extended conjunctive regular path queries (ECRPQs) and unions of ECRPQs.

我们介绍了连接xregex路径查询的CXRPQ类，它是通过在正则表达式的实际实现中添加字符串变量(也称为反向引用)从连接正则路径查询(crpq)中获得的。cxrpq可以被认为是用户友好的，因为它们结合了两个在实践中得到认可的概念:基于模式的图形查询和带反向引用的正则表达式。由于使用字符串变量，cxrpq可以表达路径间的依赖关系，这是crpq无法表达的。如果不进一步限制，cxrpq的求值复杂性在数据复杂性方面是PSpace-hard的。我们确定了三个具有更可接受的求值复杂度的自然片段:它们的数据复杂度在NL中，而它们的组合复杂度在ExpSpace、PSpace和NP之间变化。在表达能力方面，我们将cxrpq片段与crpq和crpq的并集，以及扩展连接正则路径查询(ecrpq)和ecrpq的并集进行了比较。

引用次数: 4

Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries 层次查询静态和动态求值的权衡

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-07-03 DOI: 10.1145/3375395.3387646

A. Kara, M. Nikolic, Dan Olteanu, Haozhe Zhang

We investigate trade-offs in static and dynamic evaluation of hierarchical queries with arbitrary free variables. In the static setting, the trade-off is between the time to partially compute the query result and the delay needed to enumerate its tuples. In the dynamic setting, we additionally consider the time needed to update the query result under single-tuple inserts or deletes to the database. Our approach observes the degree of values in the database and uses different computation and maintenance strategies for high-degree (heavy) and low-degree (light) values. For the latter it partially computes the result, while for the former it computes enough information to allow for on-the-fly enumeration. The main result of this work defines the preprocessing time, the update time, and the enumeration delay as functions of the light/heavy threshold. By conveniently choosing this threshold, our approach recovers a number of prior results when restricted to hierarchical queries. For a restricted class of hierarchical queries, our approach can achieve worst-case optimal update time and enumeration delay conditioned on the Online Matrix-Vector Multiplication Conjecture.

我们研究了具有任意自由变量的分层查询的静态和动态评估中的权衡。在静态设置中，需要在部分计算查询结果的时间和枚举其元组所需的延迟之间进行权衡。在动态设置中，我们还要考虑在单元组插入或删除数据库时更新查询结果所需的时间。我们的方法观察数据库中的值的程度，并对高程度(重)和低程度(轻)的值使用不同的计算和维护策略。对于后者，它部分计算结果，而对于前者，它计算足够的信息以允许动态枚举。本文的主要成果是将预处理时间、更新时间和枚举延迟定义为轻/重阈值的函数。通过方便地选择这个阈值，我们的方法可以在仅限于分层查询时恢复许多先前的结果。对于一类受限制的分层查询，我们的方法可以在在线矩阵向量乘法猜想的条件下实现最坏情况下的最优更新时间和枚举延迟。

{"title":"Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries","authors":"A. Kara, M. Nikolic, Dan Olteanu, Haozhe Zhang","doi":"10.1145/3375395.3387646","DOIUrl":"https://doi.org/10.1145/3375395.3387646","url":null,"abstract":"We investigate trade-offs in static and dynamic evaluation of hierarchical queries with arbitrary free variables. In the static setting, the trade-off is between the time to partially compute the query result and the delay needed to enumerate its tuples. In the dynamic setting, we additionally consider the time needed to update the query result under single-tuple inserts or deletes to the database. Our approach observes the degree of values in the database and uses different computation and maintenance strategies for high-degree (heavy) and low-degree (light) values. For the latter it partially computes the result, while for the former it computes enough information to allow for on-the-fly enumeration. The main result of this work defines the preprocessing time, the update time, and the enumeration delay as functions of the light/heavy threshold. By conveniently choosing this threshold, our approach recovers a number of prior results when restricted to hierarchical queries. For a restricted class of hierarchical queries, our approach can achieve worst-case optimal update time and enumeration delay conditioned on the Online Matrix-Vector Multiplication Conjecture.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114930240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins 带有自连接的二元合取查询弹性复杂度的新结果

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-07-02 DOI: 10.1145/3375395.3387647

C. Freire, Wolfgang Gatterbauer, N. Immerman, A. Meliou

The resilience of a Boolean query on a database is the minimum number of tuples that need to be deleted from the input tables in order to make the query false. A solution to this problem immediately translates into a solution for the more widely known problem of deletion propagation with source-side effects. In this paper, we give several novel results on the hardness of the resilience problem for conjunctive queries with self-joins, and, more specifically, we present a dichotomy result for the class of single-self-join binary queries with exactly two repeated relations occurring in the query. Unlike in the self-join free case, the concept of triad is not enough to fully characterize the complexity of resilience. We identify new structural properties, namely chains, confluences and permutations, which lead to various NP-hardness results. We also give novel involved reductions to network flow to show certain cases are in P. Although restricted, our results provide important insights into the problem of self-joins that we hope can help solve the general case of all conjunctive queries with self-joins in the future.

数据库上布尔查询的弹性是为了使查询为假而需要从输入表中删除的元组的最小数量。这个问题的解决方案立即转化为更广为人知的带有源端效应的删除传播问题的解决方案。在本文中，我们给出了几个关于自连接连接查询的弹性问题的硬度的新结果，更具体地说，我们给出了一类在查询中恰好出现两个重复关系的单自连接二元查询的二分结果。与自连接自由情况不同，三元组的概念不足以完全描述弹性的复杂性。我们发现了新的结构性质，即链，汇流和排列，导致不同的np -硬度结果。我们还对网络流进行了新的涉及约简，以显示某些情况是在p中。尽管受到限制，我们的结果为自连接问题提供了重要的见解，我们希望可以帮助解决未来所有带有自连接的连接查询的一般情况。

{"title":"New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins","authors":"C. Freire, Wolfgang Gatterbauer, N. Immerman, A. Meliou","doi":"10.1145/3375395.3387647","DOIUrl":"https://doi.org/10.1145/3375395.3387647","url":null,"abstract":"The resilience of a Boolean query on a database is the minimum number of tuples that need to be deleted from the input tables in order to make the query false. A solution to this problem immediately translates into a solution for the more widely known problem of deletion propagation with source-side effects. In this paper, we give several novel results on the hardness of the resilience problem for conjunctive queries with self-joins, and, more specifically, we present a dichotomy result for the class of single-self-join binary queries with exactly two repeated relations occurring in the query. Unlike in the self-join free case, the concept of triad is not enough to fully characterize the complexity of resilience. We identify new structural properties, namely chains, confluences and permutations, which lead to various NP-hardness results. We also give novel involved reductions to network flow to show certain cases are in P. Although restricted, our results provide important insights into the problem of self-joins that we hope can help solve the general case of all conjunctive queries with self-joins in the future.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131171769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

The Adversarial Robustness of Sampling 抽样的对抗鲁棒性

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-06-26 DOI: 10.1145/3375395.3387643

Omri Ben-Eliezer, E. Yogev

Random sampling is a fundamental primitive in modern algorithms, statistics, and machine learning, used as a generic method to obtain a small yet "representative" subset of the data. In this work, we investigate the robustness of sampling against adaptive adversarial attacks in a streaming setting: An adversary sends a stream of elements from a universe U to a sampling algorithm (e.g., Bernoulli sampling or reservoir sampling), with the goal of making the sample "very unrepresentative" of the underlying data stream. The adversary is fully adaptive in the sense that it knows the exact content of the sample at any given point along the stream, and can choose which element to send next accordingly, in an online manner. Well-known results in the static setting indicate that if the full stream is chosen in advance (non-adaptively), then a random sample of size Ω(d/ε2) is an ε-approximation of the full data with good probability, where d is the VC-dimension of the underlying set system (U, R). Does this sample size suffice for robustness against an adaptive adversary? The simplistic answer is negative : We demonstrate a set system where a constant sample size (corresponding to a VC-dimension of 1) suffices in the static setting, yet an adaptive adversary can make the sample very unrepresentative, as long as the sample size is (strongly) sublinear in the stream length, using a simple and easy-to-implement attack. However, this attack is "theoretical only", requiring the set system size to (essentially) be exponential in the stream length. This is not a coincidence: We show that in order to make the sampling algorithm robust against adaptive adversaries, the modification required is solely to replace the VC-dimension term d in the sample size with the cardinality term log |R|. That is, the Bernoulli and reservoir sampling algorithms with sample size Ω(log |R|/ε2) output a representative sample of the stream with good probability, even in the presence of an adaptive adversary. This nearly matches the bound imposed by the attack.

随机抽样是现代算法、统计学和机器学习中的基本基本要素，被用作获取数据的小而“代表性”子集的通用方法。在这项工作中，我们研究了在流设置中采样对自适应对抗性攻击的鲁棒性:攻击者将来自宇宙U的元素流发送到采样算法(例如，伯努利采样或储层采样)，其目标是使样本“非常不具有代表性”底层数据流。对手是完全自适应的，因为它知道沿着流的任何给定点的样本的确切内容，并且可以以在线的方式相应地选择下一步发送哪个元素。静态设置中众所周知的结果表明，如果提前(非自适应)选择完整的流，那么大小为Ω(d/ε2)的随机样本是完整数据的ε-近似，具有良好的概率，其中d是底层集合系统(U, R)的vc维。这个样本大小是否足以满足对自适应对手的鲁棒性?简单的答案是否定的:我们演示了一个集合系统，其中恒定的样本量(对应于vc维为1)在静态设置中就足够了，但是自适应对手可以使样本非常不具有代表性，只要样本量在流长度中(强烈)是次线性的，使用简单且易于实现的攻击。然而，这种攻击是“理论上的”，要求设置的系统大小(本质上)是流长度的指数。这不是巧合:我们表明，为了使采样算法对自适应对手具有鲁棒性，所需的修改仅仅是将样本大小中的vc维项d替换为基数项log |R|。也就是说，样本大小为Ω(log |R|/ε2)的伯努利和储层采样算法即使在存在自适应对手的情况下，也能以良好的概率输出流的代表性样本。这几乎与攻击造成的边界一致。

{"title":"The Adversarial Robustness of Sampling","authors":"Omri Ben-Eliezer, E. Yogev","doi":"10.1145/3375395.3387643","DOIUrl":"https://doi.org/10.1145/3375395.3387643","url":null,"abstract":"Random sampling is a fundamental primitive in modern algorithms, statistics, and machine learning, used as a generic method to obtain a small yet \"representative\" subset of the data. In this work, we investigate the robustness of sampling against adaptive adversarial attacks in a streaming setting: An adversary sends a stream of elements from a universe U to a sampling algorithm (e.g., Bernoulli sampling or reservoir sampling), with the goal of making the sample \"very unrepresentative\" of the underlying data stream. The adversary is fully adaptive in the sense that it knows the exact content of the sample at any given point along the stream, and can choose which element to send next accordingly, in an online manner. Well-known results in the static setting indicate that if the full stream is chosen in advance (non-adaptively), then a random sample of size Ω(d/ε2) is an ε-approximation of the full data with good probability, where d is the VC-dimension of the underlying set system (U, R). Does this sample size suffice for robustness against an adaptive adversary? The simplistic answer is negative : We demonstrate a set system where a constant sample size (corresponding to a VC-dimension of 1) suffices in the static setting, yet an adaptive adversary can make the sample very unrepresentative, as long as the sample size is (strongly) sublinear in the stream length, using a simple and easy-to-implement attack. However, this attack is \"theoretical only\", requiring the set system size to (essentially) be exponential in the stream length. This is not a coincidence: We show that in order to make the sampling algorithm robust against adaptive adversaries, the modification required is solely to replace the VC-dimension term d in the sample size with the cardinality term log |R|. That is, the Bernoulli and reservoir sampling algorithms with sample size Ω(log |R|/ε2) output a representative sample of the stream with good probability, even in the presence of an adaptive adversary. This nearly matches the bound imposed by the attack.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129780505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Bag Query Containment and Information Theory 包查询遏制与信息论

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2019-06-24 DOI: 10.1145/3375395.3387645

Mahmoud Abo Khamis, Phokion G. Kolaitis, H. Ngo, Dan Suciu

The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.

查询包含问题是数据管理中的一个基本算法问题。虽然这个问题在集合语义下很容易理解，但在包语义下却很难理解。特别是包语义下的合取查询包含问题是否可判定是一个长期悬而未决的问题。揭示了信息论与包语义下的合取查询包容之间的紧密联系。这些联系是利用信息不平等建立起来的，这被认为是信息论的规律。我们的第一个主要结果表明，决定信息不等式泛化的有效性是多一等价于包含查询是无循环的合取查询包含的限制情况;因此，这两个问题要么都是可决定的，要么都是不可决定的。我们的第二个主要结果确定了包语义下的联合查询包含问题的一个新的可判定案例。具体地说，我们给出了在包语义下的包含查询的指数时间算法，假设包含查询是弦的并且允许一个简单的连接树。

{"title":"Bag Query Containment and Information Theory","authors":"Mahmoud Abo Khamis, Phokion G. Kolaitis, H. Ngo, Dan Suciu","doi":"10.1145/3375395.3387645","DOIUrl":"https://doi.org/10.1145/3375395.3387645","url":null,"abstract":"The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131584966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3