Mahmoud Abo Khamis, H. Ngo, R. Pichler, Dan Suciu, Y. Wang
Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naive evaluation algorithm on any datalog program.
{"title":"Convergence of Datalog over (Pre-) Semirings","authors":"Mahmoud Abo Khamis, H. Ngo, R. Pichler, Dan Suciu, Y. Wang","doi":"10.1145/3517804.3524140","DOIUrl":"https://doi.org/10.1145/3517804.3524140","url":null,"abstract":"Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naive evaluation algorithm on any datalog program.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127074865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Releasing the result size of conjunctive queries and graph pattern queries under differential privacy (DP) has received considerable attention in the literature, but existing solutions do not offer any optimality guarantees. We provide the first DP mechanism for this problem with a fairly strong notion of optimality, which can be considered as a natural relaxation of instance-optimality to a constant.
{"title":"A Nearly Instance-optimal Differentially Private Mechanism for Conjunctive Queries","authors":"Wei Dong, K. Yi","doi":"10.1145/3517804.3524143","DOIUrl":"https://doi.org/10.1145/3517804.3524143","url":null,"abstract":"Releasing the result size of conjunctive queries and graph pattern queries under differential privacy (DP) has received considerable attention in the literature, but existing solutions do not offer any optimality guarantees. We provide the first DP mechanism for this problem with a fairly strong notion of optimality, which can be considered as a natural relaxation of instance-optimality to a constant.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115475124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacob Focke, L. A. Goldberg, M. Roth, Stanislav Živný
We study the complexity of approximating the number of answers to a small query φ in a large database D. We establish an exhaustive classification into tractable and intractable cases if φ is a conjunctive query possibly including disequalities and negations: - If there is a constant bound on the arity of φ, and if the randomised Exponential Time Hypothesis (rETH) holds, then the problem has a fixed-parameter tractable approximation scheme (FPTRAS) if and only if the treewidth of φ is bounded. - If the arity is unbounded and φ does not have negations, then the problem has an FPTRAS if and only if the adaptive width of φ (a width measure strictly more general than treewidth) is bounded; the lower bound relies on the rETH as well. Additionally we show that our results cannot be strengthened to achieve a fully polynomial randomised approximation scheme (FPRAS): We observe that, unless NP=RP, there is no FPRAS even if the treewidth (and the adaptive width) is 1. However, if there are neither disequalities nor negations, we prove the existence of an FPRAS for queries of bounded fractional hypertreewidth, strictly generalising the recently established FPRAS for conjunctive queries with bounded hypertreewidth due to Arenas, Croquevielle, Jayaram and Riveros (STOC 2021).
{"title":"Approximately Counting Answers to Conjunctive Queries with Disequalities and Negations","authors":"Jacob Focke, L. A. Goldberg, M. Roth, Stanislav Živný","doi":"10.1145/3517804.3526231","DOIUrl":"https://doi.org/10.1145/3517804.3526231","url":null,"abstract":"We study the complexity of approximating the number of answers to a small query φ in a large database D. We establish an exhaustive classification into tractable and intractable cases if φ is a conjunctive query possibly including disequalities and negations: - If there is a constant bound on the arity of φ, and if the randomised Exponential Time Hypothesis (rETH) holds, then the problem has a fixed-parameter tractable approximation scheme (FPTRAS) if and only if the treewidth of φ is bounded. - If the arity is unbounded and φ does not have negations, then the problem has an FPTRAS if and only if the adaptive width of φ (a width measure strictly more general than treewidth) is bounded; the lower bound relies on the rETH as well. Additionally we show that our results cannot be strengthened to achieve a fully polynomial randomised approximation scheme (FPRAS): We observe that, unless NP=RP, there is no FPRAS even if the treewidth (and the adaptive width) is 1. However, if there are neither disequalities nor negations, we prove the existence of an FPRAS for queries of bounded fractional hypertreewidth, strictly generalising the recently established FPRAS for conjunctive queries with bounded hypertreewidth due to Arenas, Croquevielle, Jayaram and Riveros (STOC 2021).","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121511856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steffen van Bergerem, Martin Grohe, Martin Ritzert
We analyse the complexity of learning first-order queries in a model-theoretic framework for supervised learning introduced by (Grohe and Turán, TOCS 2004). Previous research on the complexity of learning in this framework focussed on the question of when learning is possible in time sublinear in the background structure. Here we study the parameterized complexity of the learning problem. We have two main results. The first is a hardness result, showing that learning first-order queries is at least as hard as the corresponding model-checking problem, which implies that on general structures it is hard for the parameterized complexity class AW[*]. Our second main contribution is a fixed-parameter tractable agnostic PAC learning algorithm for first-order queries over sparse relational data (more precisely, over nowhere dense background structures).
{"title":"On the Parameterized Complexity of Learning First-Order Logic","authors":"Steffen van Bergerem, Martin Grohe, Martin Ritzert","doi":"10.1145/3517804.3524151","DOIUrl":"https://doi.org/10.1145/3517804.3524151","url":null,"abstract":"We analyse the complexity of learning first-order queries in a model-theoretic framework for supervised learning introduced by (Grohe and Turán, TOCS 2004). Previous research on the complexity of learning in this framework focussed on the question of when learning is possible in time sublinear in the background structure. Here we study the parameterized complexity of the learning problem. We have two main results. The first is a hardness result, showing that learning first-order queries is at least as hard as the corresponding model-checking problem, which implies that on general structures it is hard for the parameterized complexity class AW[*]. Our second main contribution is a fixed-parameter tractable agnostic PAC learning algorithm for first-order queries over sparse relational data (more precisely, over nowhere dense background structures).","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128008434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Piotr Ostropolski-Nalewaja, J. Marcinkowski, David Carral, S. Rudolph
We consider (first-order) query rewritability in the context of theory-mediated query answering. The starting point of our journey is the FUS/FES conjecture, which states that any theory that is a finite expansion set (FES) and admits query rewriting (BDD, FUS) must be uniformly bounded. We show that this conjecture holds for a large class of BDD theories, which we call "local". Upon investigating how "non-local" BDD theories can actually get, we discover unexpected phenomena that, we think, are at odds with prevailing intuitions about BDD theories.
{"title":"A Journey to the Frontiers of Query Rewritability","authors":"Piotr Ostropolski-Nalewaja, J. Marcinkowski, David Carral, S. Rudolph","doi":"10.1145/3517804.3524163","DOIUrl":"https://doi.org/10.1145/3517804.3524163","url":null,"abstract":"We consider (first-order) query rewritability in the context of theory-mediated query answering. The starting point of our journey is the FUS/FES conjecture, which states that any theory that is a finite expansion set (FES) and admits query rewriting (BDD, FUS) must be uniformly bounded. We show that this conjecture holds for a large class of BDD theories, which we call \"local\". Upon investigating how \"non-local\" BDD theories can actually get, we discover unexpected phenomena that, we think, are at odds with prevailing intuitions about BDD theories.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122833120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kimelfeld and Sagiv [Kimelfeld and Sagiv, PODS 2006], [Kimelfeld and Sagiv, Inf. Syst. 2008] pointed out that the problem of enumerating K-fragments is of great importance in a keyword search on data graphs. In a graph-theoretic term, the problem corresponds to enumerating minimal Steiner trees in (directed) graphs. In this paper, we propose a linear-delay and polynomial-space algorithm for enumerating all minimal Steiner trees, improving on a previous result in [Kimelfeld and Sagiv, Inf. Syst. 2008]. Our enumeration algorithm can be extended to other Steiner problems, such as minimal Steiner forests, minimal terminal Steiner trees, and minimal directed Steiner trees. As another variant of the minimal Steiner tree enumeration problem, we study the problem of enumerating minimal induced Steiner subgraphs. We propose a polynomial-delay and exponential-space enumeration algorithm of minimal induced Steiner subgraphs on claw-free graphs. Contrary to these tractable results, we show that the problem of enumerating minimal group Steiner trees is at least as hard as the minimal transversal enumeration problem on hypergraphs.
Kimelfeld和Sagiv [Kimelfeld and Sagiv, PODS 2006], [Kimelfeld and Sagiv, Inf. Syst. 2008]指出,在数据图的关键字搜索中,k片段的枚举问题是非常重要的。在图论术语中,这个问题对应于(有向)图中最小斯坦纳树的枚举。在本文中,我们提出了一种线性延迟和多项式空间算法,用于枚举所有最小Steiner树,改进了先前在[Kimelfeld和Sagiv, Inf. Syst. 2008]中的结果。我们的枚举算法可以推广到其他的斯坦纳问题,如最小斯坦纳森林、最小终端斯坦纳树和最小有向斯坦纳树。作为最小Steiner树枚举问题的另一个变体,我们研究了最小诱导Steiner子图的枚举问题。提出了无爪图上最小诱导Steiner子图的多项式延迟和指数空间枚举算法。与这些容易处理的结果相反,我们证明了枚举最小群斯坦纳树的问题至少与超图上的最小横向枚举问题一样难。
{"title":"Linear-Delay Enumeration for Minimal Steiner Problems","authors":"Yasuaki Kobayashi, Kazuhiro Kurita, Kunihiro Wasa","doi":"10.1145/3517804.3524148","DOIUrl":"https://doi.org/10.1145/3517804.3524148","url":null,"abstract":"Kimelfeld and Sagiv [Kimelfeld and Sagiv, PODS 2006], [Kimelfeld and Sagiv, Inf. Syst. 2008] pointed out that the problem of enumerating K-fragments is of great importance in a keyword search on data graphs. In a graph-theoretic term, the problem corresponds to enumerating minimal Steiner trees in (directed) graphs. In this paper, we propose a linear-delay and polynomial-space algorithm for enumerating all minimal Steiner trees, improving on a previous result in [Kimelfeld and Sagiv, Inf. Syst. 2008]. Our enumeration algorithm can be extended to other Steiner problems, such as minimal Steiner forests, minimal terminal Steiner trees, and minimal directed Steiner trees. As another variant of the minimal Steiner tree enumeration problem, we study the problem of enumerating minimal induced Steiner subgraphs. We propose a polynomial-delay and exponential-space enumeration algorithm of minimal induced Steiner subgraphs on claw-free graphs. Contrary to these tractable results, we show that the problem of enumerating minimal group Steiner trees is at least as hard as the minimal transversal enumeration problem on hypergraphs.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125753450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Storing a counter incremented N times would naively consume O(log N) bits of memory. In 1978 Morris described the very first streaming algorithm: the "Morris Counter" [15]. His algorithm's space bound is a random variable, and it has been shown to be O(log log N + log(1/ε) + log(1/δ)) bits in expectation to provide a (1+ε)-approximation with probability $1-δ to the counter's value. We provide a new simple algorithm with a simple analysis showing that randomized space O(log log N + log(1/ε) + log log(1/δ)) bits suffice for the same task, i.e. an exponentially improved dependence on the inverse failure probability. We then provide a new analysis showing that the original Morris Counter itself, after a minor but necessary tweak, actually also enjoys this same improved upper bound. Lastly, we prove a new lower bound for this task showing optimality of our upper bound. We thus completely resolve the asymptotic space complexity of approximate counting. Furthermore all our constants are explicit, and our lower bound and tightest upper bound differ by a multiplicative factor of at most 3+o(1).
存储递增N次的计数器将天真地消耗O(log N)位内存。1978年,莫里斯描述了第一个流媒体算法:“莫里斯计数器”[15]。他的算法的空间边界是一个随机变量,它已经被证明是O(log log N + log(1/ε) + log(1/δ))位,期望提供一个(1+ε)-近似,概率为$1-δ。我们提供了一种新的简单算法,通过简单的分析表明,随机化空间O(log log N + log(1/ε) + log log(1/δ))位足以满足相同的任务,即对逆失效概率的依赖性呈指数级提高。然后,我们提供了一个新的分析,表明原来的莫里斯计数器本身,经过一个小但必要的调整,实际上也享受相同的改进上界。最后,我们证明了这个任务的一个新的下界,显示了上界的最优性。从而完全解决了近似计数的渐近空间复杂度问题。此外,我们所有的常数都是显式的,我们的下界和最紧上界相差一个乘因子,最多为3+ 0(1)。
{"title":"Optimal Bounds for Approximate Counting","authors":"Jelani Nelson, Huacheng Yu","doi":"10.1145/3517804.3526225","DOIUrl":"https://doi.org/10.1145/3517804.3526225","url":null,"abstract":"Storing a counter incremented N times would naively consume O(log N) bits of memory. In 1978 Morris described the very first streaming algorithm: the \"Morris Counter\" [15]. His algorithm's space bound is a random variable, and it has been shown to be O(log log N + log(1/ε) + log(1/δ)) bits in expectation to provide a (1+ε)-approximation with probability $1-δ to the counter's value. We provide a new simple algorithm with a simple analysis showing that randomized space O(log log N + log(1/ε) + log log(1/δ)) bits suffice for the same task, i.e. an exponentially improved dependence on the inverse failure probability. We then provide a new analysis showing that the original Morris Counter itself, after a minor but necessary tweak, actually also enjoys this same improved upper bound. Lastly, we prove a new lower bound for this task showing optimality of our upper bound. We thus completely resolve the asymptotic space complexity of approximate counting. Furthermore all our constants are explicit, and our lower bound and tightest upper bound differ by a multiplicative factor of at most 3+o(1).","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116208091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","authors":"","doi":"10.1145/3517804","DOIUrl":"https://doi.org/10.1145/3517804","url":null,"abstract":"","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126223990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}