首页 > 最新文献

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems最新文献

英文 中文
Triangle and Four Cycle Counting in the Data Stream Model 数据流模型中的三角形和四周期计数
A. Mcgregor, Sofya Vorotnikova
The problem of estimating the number of cycles in a graph is one of the most widely studied graph problems in the data stream model. Three relevant variants of the data stream model include: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order, the random order model in which the edges are randomly permuted, and the adjacency list order model in which all edges incident to the same vertex appear consecutively. In this paper, we focus on the problem of triangle and four-cycle counting in these models. We improve over the state-of-the-art results as follows, where n is the number of vertices, m is the number of edges and T is the number of triangles/four-cycles in the graph (i.e., the quantity being estimated): Random Order Model: We present a single-pass algorithm that (1+ε)-approximates the number of triangles using ~O(ε-2 m/√T) space and prove that this is optimal in the range T ≤ √m. The best previous result, a (3+ε)-approximation using ~O(ε-4.5 m/√T) space, was presented by Cormode and Jowhari~(Theor. Comput. Sci. 2017). Adjacency List Model: We present an algorithm that returns a (1+ε)-approximation of the number of 4-cycles using two passes and ~O(ε-4 m/√T) space. The best previous result, a constant approximation using ~O(m/T3/8) space, was presented by Kallaugher et al. (PODS~2019). We also show that (1+ε)-approximation in a single pass is possible in a) polylog(n) space if T=Ω(n2) and b) ~O(n) space if T=Ω(n). Arbitrary Order Model: We present a three-pass algorithm that (1+ε)-approximates the number of 4-cycles using ~O(ε-2 m/T1/4) space and a one-pass algorithm that uses ~O(ε-2 n) space when T=Ω(n2). The best existing result, a (1+ε)-approximation using ~O(ε-2 m2/T) space, was presented by Bera and Chakrabarti (STACS~2017). We also show a multi-pass lower bound and another algorithm for distinguishing graphs with no four cycles and graphs with many 4-cycles.
图中循环数的估计问题是数据流模型中研究最广泛的图问题之一。数据流模型的三个相关变体包括:任意顺序模型,其中流由任意顺序的图边组成;随机顺序模型,其中边随机排列;邻接表顺序模型,其中所有与同一顶点相关的边连续出现。在本文中,我们重点讨论了这些模型中的三角形和四循环计数问题。我们改进了最先进的结果如下,其中n是顶点的数量,m是边的数量,T是图中三角形的数量/四个循环(即估计的数量):随机顺序模型:我们提出了一个单遍算法(1+ε)-使用~O(ε-2 m/√T)空间近似三角形的数量,并证明这是在T≤√m范围内的最优算法。Cormode和Jowhari在~O(ε-4.5 m/√T)空间上给出了最好的近似结果(3+ε)。第一版。Sci, 2017)。邻接表模型:我们提出了一种算法,该算法使用两次遍历和~O(ε-4 m/√T)空间返回4循环数的(1+ε)-近似值。Kallaugher等人(PODS~2019)提出了最好的先前结果,即使用~O(m/T3/8)空间的常数近似。我们还证明了如果T=Ω(n2),在a) polylog(n)空间中(1+ε)-单次逼近是可能的;如果T=Ω(n),在b) ~O(n)空间中(1+ε)-近似是可能的。任意阶模型:当T=Ω(n2)时,我们提出了一种使用~O(ε-2 m/T1/4)空间的(1+ε)-三遍算法和使用~O(ε-2 n)空间的一遍算法。现有最好的结果是Bera和Chakrabarti (STACS~2017)提出的~O(ε-2 m2/T)空间的(1+ε)近似。我们还给出了一个多遍下界和另一种区分无四环图和多四环图的算法。
{"title":"Triangle and Four Cycle Counting in the Data Stream Model","authors":"A. Mcgregor, Sofya Vorotnikova","doi":"10.1145/3375395.3387652","DOIUrl":"https://doi.org/10.1145/3375395.3387652","url":null,"abstract":"The problem of estimating the number of cycles in a graph is one of the most widely studied graph problems in the data stream model. Three relevant variants of the data stream model include: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order, the random order model in which the edges are randomly permuted, and the adjacency list order model in which all edges incident to the same vertex appear consecutively. In this paper, we focus on the problem of triangle and four-cycle counting in these models. We improve over the state-of-the-art results as follows, where n is the number of vertices, m is the number of edges and T is the number of triangles/four-cycles in the graph (i.e., the quantity being estimated): Random Order Model: We present a single-pass algorithm that (1+ε)-approximates the number of triangles using ~O(ε-2 m/√T) space and prove that this is optimal in the range T ≤ √m. The best previous result, a (3+ε)-approximation using ~O(ε-4.5 m/√T) space, was presented by Cormode and Jowhari~(Theor. Comput. Sci. 2017). Adjacency List Model: We present an algorithm that returns a (1+ε)-approximation of the number of 4-cycles using two passes and ~O(ε-4 m/√T) space. The best previous result, a constant approximation using ~O(m/T3/8) space, was presented by Kallaugher et al. (PODS~2019). We also show that (1+ε)-approximation in a single pass is possible in a) polylog(n) space if T=Ω(n2) and b) ~O(n) space if T=Ω(n). Arbitrary Order Model: We present a three-pass algorithm that (1+ε)-approximates the number of 4-cycles using ~O(ε-2 m/T1/4) space and a one-pass algorithm that uses ~O(ε-2 n) space when T=Ω(n2). The best existing result, a (1+ε)-approximation using ~O(ε-2 m2/T) space, was presented by Bera and Chakrabarti (STACS~2017). We also show a multi-pass lower bound and another algorithm for distinguishing graphs with no four cycles and graphs with many 4-cycles.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133389318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Fine-Grained Complexity Analysis of Queries: From Decision to Counting and Enumeration 查询的细粒度复杂性分析:从决策到计数和枚举
Arnaud Durand
This paper is devoted to a complexity study of various tasks related to query answering such as deciding if a Boolean query is true or not, counting the size of the answer set or enumerating the results. It is a survey of some of the many tools from complexity measures trough algorithmic methods to conditional lower bounds that have been designed in the domain over the last years.
本文致力于研究与查询回答相关的各种任务的复杂性,例如确定布尔查询是否为真,计算答案集的大小或枚举结果。它是对过去几年在该领域设计的许多工具中的一些工具的调查,从通过算法方法的复杂性度量到条件下界。
{"title":"Fine-Grained Complexity Analysis of Queries: From Decision to Counting and Enumeration","authors":"Arnaud Durand","doi":"10.1145/3375395.3389130","DOIUrl":"https://doi.org/10.1145/3375395.3389130","url":null,"abstract":"This paper is devoted to a complexity study of various tasks related to query answering such as deciding if a Boolean query is true or not, counting the size of the answer set or enumerating the results. It is a survey of some of the many tools from complexity measures trough algorithmic methods to conditional lower bounds that have been designed in the domain over the last years.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131314235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Three Modern Roles for Logic in AI 逻辑在人工智能中的三个现代角色
Adnan Darwiche
We consider three modern roles for logic in artificial intelligence, which are based on the theory of tractable Boolean circuits: (1) logic as a basis for computation, (2) logic for learning from a combination of data and knowledge, and (3) logic for reasoning about the behavior of machine learning systems.
我们考虑了逻辑在人工智能中的三个现代角色,它们基于可处理布尔电路理论:(1)作为计算基础的逻辑,(2)从数据和知识的组合中学习的逻辑,以及(3)机器学习系统行为推理的逻辑。
{"title":"Three Modern Roles for Logic in AI","authors":"Adnan Darwiche","doi":"10.1145/3375395.3389131","DOIUrl":"https://doi.org/10.1145/3375395.3389131","url":null,"abstract":"We consider three modern roles for logic in artificial intelligence, which are based on the theory of tractable Boolean circuits: (1) logic as a basis for computation, (2) logic for learning from a combination of data and knowledge, and (3) logic for reasoning about the behavior of machine learning systems.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125013025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
A Framework for Adversarially Robust Streaming Algorithms 一种对抗鲁棒流算法框架
Omri Ben-Eliezer, Rajesh Jayaram, David P. Woodruff, E. Yogev
We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinear-space deterministic algorithms; on the other hand, classical space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally $F_p$-estimation, Fp-heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust (1+ε)-approximation algorithms whose required space matches that of the best known non-robust algorithms up to a poly(log n, 1/ε) multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a non-robust streaming algorithm into a robust one in various scenarios.
我们研究了流算法的对抗鲁棒性。在这种情况下,一个算法被认为是鲁棒的,如果它的性能保证保持,即使流是由对手自适应地选择的,对手沿着流观察算法的输出,并能以在线的方式作出反应。虽然确定性流算法具有固有的鲁棒性,但流文献中的许多核心问题不允许使用次线性空间确定性算法;另一方面,这些问题的经典空间高效随机算法通常不具有对抗性鲁棒性。这就提出了一个自然的问题,即是否存在有效的对抗鲁棒(随机)流算法来解决这些问题。在这项工作中,我们证明了对插入模型中各种重要的流问题的答案是肯定的,包括不同的元素和更普遍的$F_p$-估计、Fp-heavy hitters、熵估计等。对于所有这些问题,我们开发了对抗鲁棒(1+ε)逼近算法,其所需空间与已知的非鲁棒算法相匹配,可达一个多(log n, 1/ε)乘法因子(在某些情况下甚至可达一个常数因子)。为此,我们开发了几个通用工具,允许在各种场景中有效地将非鲁棒流算法转换为鲁棒流算法。
{"title":"A Framework for Adversarially Robust Streaming Algorithms","authors":"Omri Ben-Eliezer, Rajesh Jayaram, David P. Woodruff, E. Yogev","doi":"10.1145/3375395.3387658","DOIUrl":"https://doi.org/10.1145/3375395.3387658","url":null,"abstract":"We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinear-space deterministic algorithms; on the other hand, classical space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally $F_p$-estimation, Fp-heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust (1+ε)-approximation algorithms whose required space matches that of the best known non-robust algorithms up to a poly(log n, 1/ε) multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a non-robust streaming algorithm into a robust one in various scenarios.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122940085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
How the Degeneracy Helps for Triangle Counting in Graph Streams 简并如何帮助图流中的三角形计数
Suman Kalyan Bera, Seshadhri Comandur
We revisit the well-studied problem of triangle count estimation in graph streams. Given a graph represented as a stream of m edges, our aim is to compute a (1+-ε)-approximation to the triangle count T, using a small space algorithm. For arbitrary order and a constant number of passes, the space complexity is known to be essentially Θ(min(m3/2 /T, m/√T)) (McGregor et al., PODS 2016, Bera et al., STACS 2017). We give a (constant pass, arbitrary order) streaming algorithm that can circumvent this lower bound for low degeneracy graphs. The degeneracy, K, is a nuanced measure of density, and the class of constant degeneracy graphs is immensely rich (containing planar graphs, minor-closed families, and preferential attachment graphs). We design a streaming algorithm with space complexity ~O(mK/T). For constant degeneracy graphs, this bound is ~O(m/T), which is significantly smaller than both m3/2 /T and m/√T. We complement our algorithmic result with a nearly matching lower bound of Ω(mK/T).
我们重新研究了图流中三角形计数估计的问题。给定一个表示为m条边流的图,我们的目标是使用小空间算法计算三角形计数T的(1+-ε)-近似值。对于任意顺序和恒定次数的传递,已知空间复杂度本质上为Θ(min(m3/2 /T, m/√T)) (McGregor等人,PODS 2016, Bera等人,STACS 2017)。我们给出了一个(常数通道,任意阶)流算法,可以绕过低退化图的下界。简并度K是密度的一个微妙度量,而常简并度图的种类非常丰富(包含平面图、小闭合族和优先连接图)。我们设计了一个空间复杂度为0 (mK/T)的流算法。对于常简并图,该边界为~O(m/T),明显小于m3/2 /T和m/√T。我们用近似匹配的下界Ω(mK/T)来补充我们的算法结果。
{"title":"How the Degeneracy Helps for Triangle Counting in Graph Streams","authors":"Suman Kalyan Bera, Seshadhri Comandur","doi":"10.1145/3375395.3387665","DOIUrl":"https://doi.org/10.1145/3375395.3387665","url":null,"abstract":"We revisit the well-studied problem of triangle count estimation in graph streams. Given a graph represented as a stream of m edges, our aim is to compute a (1+-ε)-approximation to the triangle count T, using a small space algorithm. For arbitrary order and a constant number of passes, the space complexity is known to be essentially Θ(min(m3/2 /T, m/√T)) (McGregor et al., PODS 2016, Bera et al., STACS 2017). We give a (constant pass, arbitrary order) streaming algorithm that can circumvent this lower bound for low degeneracy graphs. The degeneracy, K, is a nuanced measure of density, and the class of constant degeneracy graphs is immensely rich (containing planar graphs, minor-closed families, and preferential attachment graphs). We design a streaming algorithm with space complexity ~O(mK/T). For constant degeneracy graphs, this bound is ~O(m/T), which is significantly smaller than both m3/2 /T and m/√T. We complement our algorithmic result with a nearly matching lower bound of Ω(mK/T).","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115449459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data word2vec, node2vec, graph2vec, X2vec:结构化数据的向量嵌入理论
Martin Grohe
Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of methods for generating such embeddings have been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view. Starting with a survey of embedding techniques that have been used in practice, in this paper we propose two theoretical approaches that we see as central for understanding the foundations of vector embeddings. We draw connections between the various approaches and suggest directions for future research.
图形和关系结构的向量表示,无论是手工制作的特征向量还是学习的表示,都使我们能够将标准数据分析和机器学习技术应用于结构。在机器学习和知识表示文献中,已经研究了广泛的生成这种嵌入的方法。然而,从理论的角度来看,向量嵌入得到的关注相对较少。在本文中,我们从对实践中使用的嵌入技术的调查开始,提出了两种理论方法,我们认为这是理解向量嵌入基础的核心。我们在各种方法之间建立联系,并提出未来研究的方向。
{"title":"word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data","authors":"Martin Grohe","doi":"10.1145/3375395.3387641","DOIUrl":"https://doi.org/10.1145/3375395.3387641","url":null,"abstract":"Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of methods for generating such embeddings have been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view. Starting with a survey of embedding techniques that have been used in practice, in this paper we propose two theoretical approaches that we see as central for understanding the foundations of vector embeddings. We draw connections between the various approaches and suggest directions for future research.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121532969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
On Monotonic Determinacy and Rewritability for Recursive Queries and Views 递归查询和视图的单调确定性和可重写性
Michael Benedikt, S. Kikot, Piotr Ostropolski-Nalewaja, M. Romero
A query Q is monotonically determined over a set of views if Q can be expressed as a monotonic function of the view image. In the case of relational algebra views and queries, monotonic determinacy coincides with rewritability as a union of conjunctive queries, and it is decidable in important special cases, such as for CQ views and queries. We investigate the situation for views and queries in the recursive query language Datalog. We give both positive and negative results about the ability to decide monotonic determinacy, and also about the co-incidence of monotonic determinacy with Datalog rewritability.
如果Q可以表示为视图图像的单调函数,则查询Q是在一组视图上单调确定的。在关系代数视图和查询的情况下,单调确定性与作为合取查询的联合的可重写性是一致的,并且在重要的特殊情况下是可判定的,例如对于CQ视图和查询。我们研究了递归查询语言Datalog中视图和查询的情况。我们给出了确定单调确定性的能力的正反两个结果,以及单调确定性与数据可重写性的共同关系。
{"title":"On Monotonic Determinacy and Rewritability for Recursive Queries and Views","authors":"Michael Benedikt, S. Kikot, Piotr Ostropolski-Nalewaja, M. Romero","doi":"10.1145/3375395.3387661","DOIUrl":"https://doi.org/10.1145/3375395.3387661","url":null,"abstract":"A query Q is monotonically determined over a set of views if Q can be expressed as a monotonic function of the view image. In the case of relational algebra views and queries, monotonic determinacy coincides with rewritability as a union of conjunctive queries, and it is decidable in important special cases, such as for CQ views and queries. We investigate the situation for views and queries in the recursive query language Datalog. We give both positive and negative results about the ability to decide monotonic determinacy, and also about the co-incidence of monotonic determinacy with Datalog rewritability.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131358206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the I/O Complexity of the k-Nearest Neighbors Problem 关于k近邻问题的I/O复杂度
Mayank Goswami, R. Jacob, R. Pagh
We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: í(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.
我们考虑了k-最近邻(k- nn)问题的精确和近似版本的静态,外部内存索引,并在标准不可分割假设下显示了新的下界:在Hamming空间中高维k- nn的多项式空间索引方案不能利用块传输:í(k)块读取需要回答查询。对于l∞度量,即使我们允许返回c-近似近邻,下界也保持不变,对于c∈(1,3)。对c < 3的限制是必要的:对于每个度量,在Hellerstein等人的可索引性模型中存在一个索引方案,使用空间O(kn),其中n是点的数量,可以使用最优的(k /B) I/O检索k - 3-近似近邻,其中B是块大小。对于特定的度量,可以使用具有更好近似因子的数据结构。对于Hamming空间中的k- nn,且每一个近似因子c>1,存在一个多项式空间数据结构,该结构返回k个c-近似的最近邻在≤k/B≤I/ o。为了显示这些下界,我们开发了两种新技术:首先,为了处理近似算法在决定返回哪个结果集方面有更大的自由,我们开发了Hellerstein等人的λ集工作量技术的放宽版本。这种技术使我们能够证明在d≥n维情况下的下界。为了将下界扩展到d = O(k log(n/k))维数,我们开发了一种新的确定性降维技术,这可能是一个独立的兴趣。
{"title":"On the I/O Complexity of the k-Nearest Neighbors Problem","authors":"Mayank Goswami, R. Jacob, R. Pagh","doi":"10.1145/3375395.3387649","DOIUrl":"https://doi.org/10.1145/3375395.3387649","url":null,"abstract":"We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: í(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131610041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative Datalog with Continuous Distributions 具有连续分布的生成数据
Martin Grohe, Benjamin Lucien Kaminski, J. Katoen, P. Lindner
Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more foundational approach towards defining its semantics. It is based on standard notions from probability theory known as stochastic kernels and Markov processes. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and we show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.
Bárány等人(TODS 2017)认为需要将声明性编程和概率编程结合起来,他们最近引入了Datalog的概率扩展,作为“纯声明性概率编程语言”。我们重新审视这种语言,并提出一种更基本的方法来定义其语义。它基于概率论中的标准概念,即随机核和马尔可夫过程。这允许我们将语义扩展到连续概率分布,从而解决Bárány等人提出的开放问题。我们展示了我们的语义是相当健壮的,在计算程序时允许并行执行和任意跟踪命令。我们将语义置于无限概率数据库的框架中(Grohe和Lindner, ICDT 2020),并且我们表明,即使概率数据程序的输入是任意概率数据库,语义仍然是有意义的。
{"title":"Generative Datalog with Continuous Distributions","authors":"Martin Grohe, Benjamin Lucien Kaminski, J. Katoen, P. Lindner","doi":"10.1145/3375395.3387659","DOIUrl":"https://doi.org/10.1145/3375395.3387659","url":null,"abstract":"Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a \"purely declarative probabilistic programming language.\" We revisit this language and propose a more foundational approach towards defining its semantics. It is based on standard notions from probability theory known as stochastic kernels and Markov processes. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and we show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123512079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The Impact of Negation on the Complexity of the Shapley Value in Conjunctive Queries 否定对连接查询中Shapley值复杂度的影响
A. Reshef, B. Kimelfeld, Ester Livshits
The Shapley value is a conventional and well-studied function for determining the contribution of a player to the coalition in a cooperative game. Among its applications in a plethora of domains, it has recently been proposed to use the Shapley value for quantifying the contribution of a tuple to the result of a database query. In particular, we have a thorough understanding of the tractability frontier for the class of Conjunctive Queries (CQs) and aggregate functions over CQs. It has also been established that a tractable (randomized) multiplicative approximation exists for every union of CQs. Nevertheless, all of these results are based on the monotonicity of CQs. In this work, we investigate the implication of negation on the complexity of Shapley computation, in both the exact and approximate senses. We generalize a known dichotomy to account for negated atoms. We also show that negation fundamentally changes the complexity of approximation. We do so by drawing a connection to the problem of deciding whether a tuple is "relevant" to a query, and by analyzing its complexity.
Shapley值是一个传统的、研究得很好的函数,用于确定合作博弈中参与者对联盟的贡献。在其在众多领域的应用中,最近有人提出使用Shapley值来量化元组对数据库查询结果的贡献。特别是,我们对连接查询(cq)类和cq上的聚合函数的可跟踪性边界有了透彻的理解。我们还证明了对于cq的每一个并存在一个可处理的(随机的)乘法近似。然而,所有这些结果都是基于cq的单调性。在这项工作中,我们在精确和近似意义上研究否定对沙普利计算复杂性的影响。我们推广一种已知的二分法来解释带负电的原子。我们还表明,否定从根本上改变了近似的复杂性。我们通过与确定元组是否与查询“相关”的问题建立联系,并通过分析其复杂性来做到这一点。
{"title":"The Impact of Negation on the Complexity of the Shapley Value in Conjunctive Queries","authors":"A. Reshef, B. Kimelfeld, Ester Livshits","doi":"10.1145/3375395.3387664","DOIUrl":"https://doi.org/10.1145/3375395.3387664","url":null,"abstract":"The Shapley value is a conventional and well-studied function for determining the contribution of a player to the coalition in a cooperative game. Among its applications in a plethora of domains, it has recently been proposed to use the Shapley value for quantifying the contribution of a tuple to the result of a database query. In particular, we have a thorough understanding of the tractability frontier for the class of Conjunctive Queries (CQs) and aggregate functions over CQs. It has also been established that a tractable (randomized) multiplicative approximation exists for every union of CQs. Nevertheless, all of these results are based on the monotonicity of CQs. In this work, we investigate the implication of negation on the complexity of Shapley computation, in both the exact and approximate senses. We generalize a known dichotomy to account for negated atoms. We also show that negation fundamentally changes the complexity of approximation. We do so by drawing a connection to the problem of deciding whether a tuple is \"relevant\" to a query, and by analyzing its complexity.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115693193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1