首页 > 最新文献

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory最新文献

英文 中文
Coordination-Free Byzantine Replication with Minimal Communication Costs 具有最小通信成本的无协调拜占庭复制
Jelle Hellings, Mohammad Sadoghi
State-of-the-art fault-tolerant and federated data management systems rely on fully-replicated designs in which all participants have equivalent roles. Consequently, these systems have only limited scalability and are ill-suited for high-performance data management. As an alternative, we propose a hierarchical design in which a Byzantine cluster manages data, while an arbitrary number of learners can reliable learn these updates and use the corresponding data. To realize our design, we propose the delayed-replication algorithm, an efficient solution to the Byzantine learner problem that is central to our design. The delayed-replication algorithm is coordination-free, scalable, and has minimal communication cost for all participants involved. In doing so, the delayed-broadcast algorithm opens the door to new high-performance fault-tolerant and federated data management systems. To illustrate this, we show that the delayed-replication algorithm is not only useful to support specialized learners, but can also be used to reduce the overall communication cost of permissioned blockchains and to improve their storage scalability.
最先进的容错和联邦数据管理系统依赖于完全复制的设计,其中所有参与者都具有相同的角色。因此,这些系统只有有限的可伸缩性,不适合高性能数据管理。作为替代方案,我们提出了一种分层设计,其中拜占庭集群管理数据,而任意数量的学习者可以可靠地学习这些更新并使用相应的数据。为了实现我们的设计,我们提出了延迟复制算法,这是对拜占庭学习器问题的有效解决方案,这是我们设计的核心。延迟复制算法不需要协调,可扩展,并且所有参与者的通信成本最小。这样,延迟广播算法为新的高性能容错和联邦数据管理系统打开了大门。为了说明这一点,我们表明延迟复制算法不仅对支持专门的学习器有用,而且还可用于降低许可区块链的总体通信成本并提高其存储可扩展性。
{"title":"Coordination-Free Byzantine Replication with Minimal Communication Costs","authors":"Jelle Hellings, Mohammad Sadoghi","doi":"10.4230/LIPIcs.ICDT.2020.17","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2020.17","url":null,"abstract":"State-of-the-art fault-tolerant and federated data management systems rely on fully-replicated designs in which all participants have equivalent roles. Consequently, these systems have only limited scalability and are ill-suited for high-performance data management. As an alternative, we propose a hierarchical design in which a Byzantine cluster manages data, while an arbitrary number of learners can reliable learn these updates and use the corresponding data. To realize our design, we propose the delayed-replication algorithm, an efficient solution to the Byzantine learner problem that is central to our design. The delayed-replication algorithm is coordination-free, scalable, and has minimal communication cost for all participants involved. In doing so, the delayed-broadcast algorithm opens the door to new high-performance fault-tolerant and federated data management systems. To illustrate this, we show that the delayed-replication algorithm is not only useful to support specialized learners, but can also be used to reduce the overall communication cost of permissioned blockchains and to improve their storage scalability.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84515277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra 关于线性代数和关系代数的统一语言LARA的可表达性
P. Barceló, N. Higuera, Jorge Pérez, Bernardo Subercaseaux
We study the expressive power of the LARA language -- a recently proposed unified model for expressing relational and linear algebra operations -- both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. We start by showing LARA to be expressive complete with respect to first-order logic with aggregation. Since LARA is parameterized by a set of user-defined functions which allow to transform values in tables, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.
我们研究了LARA语言(最近提出的用于表达关系和线性代数运算的统一模型)在传统数据库查询语言和机器学习管道中经常执行的一些分析任务方面的表达能力。我们首先证明LARA对于一阶逻辑的集合是表达完备的。由于LARA是由一组允许转换表中的值的用户定义函数参数化的,因此该语言的确切表达能力取决于如何定义这些函数。我们根据强制查询要满足的泛型级别来区分两种主要情况。在强泛型假设下,语言不能表示矩阵卷积,这是当前机器学习操作中非常重要的操作。这种语言也是局部的,因此不能表示表现递归行为的矩阵逆等操作。对于表示卷积,可以通过在定义域上添加一个潜在的线性阶来放宽对泛型的要求。然而,这破坏了局部性,使语言的表达能力变得更加难以理解。特别是,尽管在复杂性假设下,生成的语言仍然不能表示矩阵逆,但在没有这些假设的情况下对这一事实的证明似乎很难获得。
{"title":"On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra","authors":"P. Barceló, N. Higuera, Jorge Pérez, Bernardo Subercaseaux","doi":"10.4230/LIPIcs.ICDT.2020.6","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2020.6","url":null,"abstract":"We study the expressive power of the LARA language -- a recently proposed unified model for expressing relational and linear algebra operations -- both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. We start by showing LARA to be expressive complete with respect to first-order logic with aggregation. Since LARA is parameterized by a set of user-defined functions which allow to transform values in tables, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81104012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
The space complexity of inner product filters 内积滤波器的空间复杂度
R. Pagh, J. Sivertsen
Motivated by the problem of filtering candidate pairs in inner product similarity joins we study the following inner product estimation problem: Given parameters $din {bf N}$, $alpha>betageq 0$ and unit vectors $x,yin {bf R}^{d}$ consider the task of distinguishing between the cases $langle x, yrangleleqbeta$ and $langle x, yranglegeq alpha$ where $langle x, yrangle = sum_{i=1}^d x_i y_i$ is the inner product of vectors $x$ and $y$. The goal is to distinguish these cases based on information on each vector encoded independently in a bit string of the shortest length possible. In contrast to much work on compressing vectors using randomized dimensionality reduction, we seek to solve the problem deterministically, with no probability of error. Inner product estimation can be solved in general via estimating $langle x, yrangle$ with an additive error bounded by $varepsilon = alpha - beta$. We show that $d log_2 left(tfrac{sqrt{1-beta}}{varepsilon}right) pm Theta(d)$ bits of information about each vector is necessary and sufficient. Our upper bound is constructive and improves a known upper bound of $d log_2(1/varepsilon) + O(d)$ by up to a factor of 2 when $beta$ is close to $1$. The lower bound holds even in a stronger model where one of the vectors is known exactly, and an arbitrary estimation function is allowed.
基于内积相似连接中候选对的过滤问题,我们研究了以下内积估计问题:给定参数$din {bf N}$, $alpha>betageq 0$和单位向量$x,yin {bf R}^{d}$,考虑区分$langle x, yrangleleqbeta$和$langle x, yranglegeq alpha$的情况,其中$langle x, yrangle = sum_{i=1}^d x_i y_i$是向量$x$和$y$的内积。目标是根据在尽可能短的位串中独立编码的每个向量上的信息来区分这些情况。与使用随机降维压缩向量的许多工作相反,我们寻求确定性地解决问题,没有错误的概率。一般来说,内积估计可以通过以$varepsilon = alpha - beta$为界的加性误差估计$langle x, yrangle$来解决。我们证明了每个向量的$d log_2 left(tfrac{sqrt{1-beta}}{varepsilon}right) pm Theta(d)$位信息是必要和充分的。我们的上界是建设性的,当$beta$接近$1$时,它将已知的$d log_2(1/varepsilon) + O(d)$上界提高了2倍。即使在一个更强的模型中,其中一个向量是确切已知的,并且允许任意估计函数,下界也成立。
{"title":"The space complexity of inner product filters","authors":"R. Pagh, J. Sivertsen","doi":"10.4230/LIPIcs.ICDT.2020.22","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2020.22","url":null,"abstract":"Motivated by the problem of filtering candidate pairs in inner product similarity joins we study the following inner product estimation problem: Given parameters $din {bf N}$, $alpha>betageq 0$ and unit vectors $x,yin {bf R}^{d}$ consider the task of distinguishing between the cases $langle x, yrangleleqbeta$ and $langle x, yranglegeq alpha$ where $langle x, yrangle = sum_{i=1}^d x_i y_i$ is the inner product of vectors $x$ and $y$. The goal is to distinguish these cases based on information on each vector encoded independently in a bit string of the shortest length possible. In contrast to much work on compressing vectors using randomized dimensionality reduction, we seek to solve the problem deterministically, with no probability of error. Inner product estimation can be solved in general via estimating $langle x, yrangle$ with an additive error bounded by $varepsilon = alpha - beta$. We show that $d log_2 left(tfrac{sqrt{1-beta}}{varepsilon}right) pm Theta(d)$ bits of information about each vector is necessary and sufficient. Our upper bound is constructive and improves a known upper bound of $d log_2(1/varepsilon) + O(d)$ by up to a factor of 2 when $beta$ is close to $1$. The lower bound holds even in a stronger model where one of the vectors is known exactly, and an arbitrary estimation function is allowed.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88177375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Weight Annotation in Information Extraction 信息抽取中的权重标注
J. Doleschal, B. Kimelfeld, W. Martens, L. Peterfreund
The framework of document spanners abstracts the task of informationextraction from text as a function that maps every document (a string) into arelation over the document's spans (intervals identified by their start and endindices). For instance, the regular spanners are the closure under theRelational Algebra (RA) of the regular expressions with capture variables, andthe expressive power of the regular spanners is precisely captured by the classof VSet-automata -- a restricted class of transducers that mark the endpointsof selected spans. In this work, we embark on the investigation of document spanners that canannotate extractions with auxiliary information such as confidence, support,and confidentiality measures. To this end, we adopt the abstraction ofprovenance semirings by Green et al., where tuples of a relation are annotatedwith the elements of a commutative semiring, and where the annotationpropagates through the positive RA operators via the semiring operators. Hence,the proposed spanner extension, referred to as an annotator, maps every stringinto an annotated relation over the spans. As a specific instantiation, weexplore weighted VSet-automata that, similarly to weighted automata andtransducers, attach semiring elements to transitions. We investigate keyaspects of expressiveness, such as the closure under the positive RA, and keyaspects of computational complexity, such as the enumeration of annotatedanswers and their ranked enumeration in the case of ordered semirings. For anumber of these problems, fundamental properties of the underlying semiring,such as positivity, are crucial for establishing tractability.
文档生成器的框架将从文本中提取信息的任务抽象为一个函数,该函数将每个文档(字符串)映射到文档跨度(由其开始和结束索引标识的间隔)上的关系。例如,正则扳手是正则表达式的关系代数(RA)下的闭包,正则扳手的表达能力是由VSet-automata类精确捕获的——VSet-automata是一种有限的传感器类,它标记了所选跨度的端点。在这项工作中,我们着手研究可以用辅助信息(如信心、支持和保密措施)注释摘录的文档生成器。为此,我们采用Green等人的来源半环的抽象,其中关系的元组用交换半环的元素进行注释,并且注释通过正RA算子通过半环算子进行传播。因此,建议的扳手扩展(称为注释器)将每个字符串映射到跨上的注释关系。作为一个具体的实例,我们探索了加权vset自动机,它类似于加权自动机和换能器,将半环元素附加到转换上。我们研究了表达性的关键方面,如正RA下的闭包,以及计算复杂性的关键方面,如有序半环情况下注释答案的枚举及其排序枚举。对于许多这样的问题,底层半环的基本性质,如正性,对于建立可追溯性至关重要。
{"title":"Weight Annotation in Information Extraction","authors":"J. Doleschal, B. Kimelfeld, W. Martens, L. Peterfreund","doi":"10.46298/lmcs-18(1:21)2022","DOIUrl":"https://doi.org/10.46298/lmcs-18(1:21)2022","url":null,"abstract":"The framework of document spanners abstracts the task of information\u0000extraction from text as a function that maps every document (a string) into a\u0000relation over the document's spans (intervals identified by their start and end\u0000indices). For instance, the regular spanners are the closure under the\u0000Relational Algebra (RA) of the regular expressions with capture variables, and\u0000the expressive power of the regular spanners is precisely captured by the class\u0000of VSet-automata -- a restricted class of transducers that mark the endpoints\u0000of selected spans.\u0000 In this work, we embark on the investigation of document spanners that can\u0000annotate extractions with auxiliary information such as confidence, support,\u0000and confidentiality measures. To this end, we adopt the abstraction of\u0000provenance semirings by Green et al., where tuples of a relation are annotated\u0000with the elements of a commutative semiring, and where the annotation\u0000propagates through the positive RA operators via the semiring operators. Hence,\u0000the proposed spanner extension, referred to as an annotator, maps every string\u0000into an annotated relation over the spans. As a specific instantiation, we\u0000explore weighted VSet-automata that, similarly to weighted automata and\u0000transducers, attach semiring elements to transitions. We investigate key\u0000aspects of expressiveness, such as the closure under the positive RA, and key\u0000aspects of computational complexity, such as the enumeration of annotated\u0000answers and their ranked enumeration in the case of ordered semirings. For a\u0000number of these problems, fundamental properties of the underlying semiring,\u0000such as positivity, are crucial for establishing tractability.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84689144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
The Shapley Value of Tuples in Query Answering 查询应答中元组的Shapley值
Ester Livshits, L. Bertossi, B. Kimelfeld, Moshe Sebag
We investigate the application of the Shapley value to quantifying thecontribution of a tuple to a query answer. The Shapley value is a widely knownnumerical measure in cooperative game theory and in many applications of gametheory for assessing the contribution of a player to a coalition game. It hasbeen established already in the 1950s, and is theoretically justified by beingthe very single wealth-distribution measure that satisfies some natural axioms.While this value has been investigated in several areas, it received littleattention in data management. We study this measure in the context ofconjunctive and aggregate queries by defining corresponding coalition games. Weprovide algorithmic and complexity-theoretic results on the computation ofShapley-based contributions to query answers; and for the hard cases we presentapproximation algorithms.
我们研究了Shapley值在量化元组对查询答案的贡献方面的应用。Shapley值是合作博弈论中一个广为人知的数值度量,在博弈论的许多应用中用于评估参与者对联盟博弈的贡献。它早在20世纪50年代就被建立起来了,从理论上讲,它是一种满足某些自然公理的单一财富分配方法。虽然这个值已经在几个领域进行了研究,但它在数据管理中很少受到关注。我们通过定义相应的联盟对策,在连接查询和聚合查询的背景下研究了这一度量。我们提供了基于shapley的查询答案贡献计算的算法和复杂性理论结果;对于困难的情况,我们提出了近似算法。
{"title":"The Shapley Value of Tuples in Query Answering","authors":"Ester Livshits, L. Bertossi, B. Kimelfeld, Moshe Sebag","doi":"10.46298/lmcs-17(3:22)2021","DOIUrl":"https://doi.org/10.46298/lmcs-17(3:22)2021","url":null,"abstract":"We investigate the application of the Shapley value to quantifying the\u0000contribution of a tuple to a query answer. The Shapley value is a widely known\u0000numerical measure in cooperative game theory and in many applications of game\u0000theory for assessing the contribution of a player to a coalition game. It has\u0000been established already in the 1950s, and is theoretically justified by being\u0000the very single wealth-distribution measure that satisfies some natural axioms.\u0000While this value has been investigated in several areas, it received little\u0000attention in data management. We study this measure in the context of\u0000conjunctive and aggregate queries by defining corresponding coalition games. We\u0000provide algorithmic and complexity-theoretic results on the computation of\u0000Shapley-based contributions to query answers; and for the hard cases we present\u0000approximation algorithms.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91176498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Infinite Probabilistic Databases 无限概率数据库
Martin Grohe, P. Lindner
Probabilistic databases (PDBs) model uncertainty in data in a quantitativeway. In the established formal framework, probabilistic (relational) databasesare finite probability spaces over relational database instances. Thisfiniteness can clash with intuitive query behavior (Ceylan et al., KR 2016),and with application scenarios that are better modeled by continuousprobability distributions (Dalvi et al., CACM 2009). We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with aprimary focus on countably infinite spaces. However, an extension beyondcountable probability spaces raises nontrivial foundational issues concernedwith the measurability of events and queries and ultimately with the questionwhether queries have a well-defined semantics. We argue that finite point processes are an appropriate model fromprobability theory for dealing with general probabilistic databases. Thisallows us to construct suitable (uncountable) probability spaces of databaseinstances in a systematic way. Our main technical results are measurabilitystatements for relational algebra queries as well as aggregate queries andDatalog queries.
概率数据库(PDBs)以定量的方式对数据中的不确定性进行建模。在已建立的正式框架中,概率(关系)数据库是关系数据库实例上的有限概率空间。这种有限性可能与直观的查询行为(Ceylan et al., KR 2016)以及通过连续概率分布更好地建模的应用场景(Dalvi et al., ccm 2009)相冲突。我们在(Grohe and Lindner, PODS 2019)中正式引入了无限pdb,主要关注可数无限空间。然而,超越可数概率空间的扩展引发了与事件和查询的可度量性有关的重要基础问题,并最终引发了查询是否具有良好定义的语义的问题。本文认为,有限点过程是概率论中处理一般概率数据库的合适模型。这允许我们以系统的方式构建数据库实例的合适(不可数)概率空间。我们的主要技术成果是关系代数查询以及聚合查询和数据查询的可度量语句。
{"title":"Infinite Probabilistic Databases","authors":"Martin Grohe, P. Lindner","doi":"10.46298/lmcs-18(1:34)2022","DOIUrl":"https://doi.org/10.46298/lmcs-18(1:34)2022","url":null,"abstract":"Probabilistic databases (PDBs) model uncertainty in data in a quantitative\u0000way. In the established formal framework, probabilistic (relational) databases\u0000are finite probability spaces over relational database instances. This\u0000finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016),\u0000and with application scenarios that are better modeled by continuous\u0000probability distributions (Dalvi et al., CACM 2009).\u0000 We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a\u0000primary focus on countably infinite spaces. However, an extension beyond\u0000countable probability spaces raises nontrivial foundational issues concerned\u0000with the measurability of events and queries and ultimately with the question\u0000whether queries have a well-defined semantics.\u0000 We argue that finite point processes are an appropriate model from\u0000probability theory for dealing with general probabilistic databases. This\u0000allows us to construct suitable (uncountable) probability spaces of database\u0000instances in a systematic way. Our main technical results are measurability\u0000statements for relational algebra queries as well as aggregate queries and\u0000Datalog queries.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78001133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Oblivious Chase Termination: The Sticky Case 遗忘追逐终止:棘手的情况
M. Calautti, Andreas Pieris
The chase procedure is one of the most fundamental algorithmic tools in database theory. A key algorithmic task is uniform chase termination, i.e., given a set of tuple-generating dependencies (tgds), is it the case that the chase under this set of tgds terminates, for every input database? In view of the fact that this problem is undecidable, no matter which version of the chase we consider, it is natural to ask whether well-behaved classes of tgds, introduced in different contexts such as ontological reasoning, make our problem decidable. In this work, we consider a prominent decidability paradigm for tgds, called stickiness. We show that for sticky sets of tgds, uniform chase termination is decidable if we focus on the (semi-)oblivious chase, and we pinpoint its exact complexity: PSPACE-complete in general, and NLOGSPACE-complete for predicates of bounded arity. These complexity results are obtained via graph-based syntactic characterizations of chase termination that are of independent interest. 2012 ACM Subject Classification Theory of Computation→Database query languages (principles), database constraints theory, logic and databases
追逐程序是数据库理论中最基本的算法工具之一。一个关键的算法任务是统一的追逐终止,即,给定一组元组生成依赖项(tgds),是否在这组tgds下的追逐终止,对于每个输入数据库?鉴于这个问题是不可判定的,无论我们考虑哪一种版本的追逐,我们都很自然地要问,在不同的背景下(如本体论推理)引入的行为良好的tgds类是否使我们的问题是可判定的。在这项工作中,我们考虑了tgds的一个突出的可决性范式,称为粘性。我们表明,对于tgds的粘性集,如果我们关注(半)遗忘追逐,则均匀追逐终止是可确定的,并且我们确定了其确切的复杂性:一般情况下为PSPACE-complete,对于有界性的谓词为NLOGSPACE-complete。这些复杂性结果是通过基于图的语法特征来获得的,这些特征是独立的。2012 ACM学科分类:计算理论→数据库查询语言(原理)、数据库约束理论、逻辑学与数据库
{"title":"Oblivious Chase Termination: The Sticky Case","authors":"M. Calautti, Andreas Pieris","doi":"10.4230/LIPICS.ICDT.2019.17","DOIUrl":"https://doi.org/10.4230/LIPICS.ICDT.2019.17","url":null,"abstract":"The chase procedure is one of the most fundamental algorithmic tools in database theory. A key algorithmic task is uniform chase termination, i.e., given a set of tuple-generating dependencies (tgds), is it the case that the chase under this set of tgds terminates, for every input database? In view of the fact that this problem is undecidable, no matter which version of the chase we consider, it is natural to ask whether well-behaved classes of tgds, introduced in different contexts such as ontological reasoning, make our problem decidable. In this work, we consider a prominent decidability paradigm for tgds, called stickiness. We show that for sticky sets of tgds, uniform chase termination is decidable if we focus on the (semi-)oblivious chase, and we pinpoint its exact complexity: PSPACE-complete in general, and NLOGSPACE-complete for predicates of bounded arity. These complexity results are obtained via graph-based syntactic characterizations of chase termination that are of independent interest. 2012 ACM Subject Classification Theory of Computation→Database query languages (principles), database constraints theory, logic and databases","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77565635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Expressive Power of Entity-Linking Frameworks 实体链接框架的表达能力
D. Burdick, Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, W. Tan
We develop a unifying approach to declarative entity linking by introducing the notion of an entity linking framework and an accompanying notion of the certain links in such a framework. In an entity linking framework, logic-based constraints are used to express properties of the desired link relations in terms of source relations and, possibly, in terms of other link relations. The definition of the certain links in such a framework makes use of weighted repairs and consistent answers in inconsistent databases. We demonstrate the modeling capabilities of this approach by showing that numerous concrete entity linking scenarios can be cast as such entity linking frameworks for suitable choices of constraints and weights. By using the certain links as a measure of expressive power, we investigate the relative expressive power of several entity linking frameworks and obtain sharp comparisons. In particular, we show that we gain expressive power if we allow constraints that capture non-recursive collective entity resolution, where link relations may depend on other link relations (and not just on source relations). Moreover, we show that an increase in expressive power also takes place when we allow constraints that incorporate preferences as an additional mechanism for expressing "goodness" of links.
我们通过引入实体链接框架的概念和该框架中某些链接的伴随概念,开发了一种统一的声明性实体链接方法。在实体链接框架中,基于逻辑的约束用于根据源关系(可能还包括其他链接关系)表示所需链接关系的属性。这种框架中某些链接的定义利用了加权修复和不一致数据库中的一致答案。我们展示了这种方法的建模能力,通过展示许多具体的实体链接场景可以被转换为这样的实体链接框架,以适当地选择约束和权重。通过使用特定链接作为表达能力的度量,我们研究了几种实体链接框架的相对表达能力,并进行了比较。特别是,我们表明,如果我们允许捕获非递归集体实体解析的约束,我们将获得表达能力,其中链接关系可能依赖于其他链接关系(而不仅仅依赖于源关系)。此外,我们表明,当我们允许约束将偏好作为表达链接“好”的附加机制时,表达能力也会增加。
{"title":"Expressive Power of Entity-Linking Frameworks","authors":"D. Burdick, Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, W. Tan","doi":"10.4230/LIPIcs.ICDT.2017.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2017.10","url":null,"abstract":"We develop a unifying approach to declarative entity linking by introducing the notion of an entity linking framework and an accompanying notion of the certain links in such a framework. In an entity linking framework, logic-based constraints are used to express properties of the desired link relations in terms of source relations and, possibly, in terms of other link relations. The definition of the certain links in such a framework makes use of weighted repairs and consistent answers in inconsistent databases. We demonstrate the modeling capabilities of this approach by showing that numerous concrete entity linking scenarios can be cast as such entity linking frameworks for suitable choices of constraints and weights. By using the certain links as a measure of expressive power, we investigate the relative expressive power of several entity linking frameworks and obtain sharp comparisons. In particular, we show that we gain expressive power if we allow constraints that capture non-recursive collective entity resolution, where link relations may depend on other link relations (and not just on source relations). Moreover, we show that an increase in expressive power also takes place when we allow constraints that incorporate preferences as an additional mechanism for expressing \"goodness\" of links.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87227303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Ranked Enumeration of Conjunctive Query Results 连接查询结果的排序枚举
Shaleen Deep, Paraschos Koutris
We investigate the enumeration of top-k answers for conjunctive queries against relational databases according to a given ranking function. The task is to design data structures and algorithms that allow for efficient enumeration after a preprocessing phase. Our main contribution is a novel priority queue based algorithm with near-optimal delay and non-trivial space guarantees that are output sensitive and depend on structure of the query. In particular, we exploit certain desirable properties of ranking functions that frequently occur in practice and degree information in the database instance, allowing for efficient enumeration. We introduce the notion of {em decomposable} and {em compatible} ranking functions in conjunction with query decomposition, a property that allows for partial aggregation of tuple scores in order to efficiently enumerate the ranked output. We complement the algorithmic results with lower bounds justifying why certain assumptions about properties of ranking functions are necessary and discuss popular conjectures providing evidence for optimality of enumeration delay guarantees. Our results extend and improve upon a long line of work that has studied ranked enumeration from both theoretical and practical perspective.
我们根据给定的排序函数研究了对关系数据库的联合查询的top-k答案的枚举。任务是设计数据结构和算法,允许在预处理阶段之后进行有效的枚举。我们的主要贡献是一种新颖的基于优先级队列的算法,具有近乎最优的延迟和非平凡的空间保证,这些保证对输出敏感并依赖于查询的结构。特别是,我们利用了在实践中经常出现的排序函数的某些理想属性,并对数据库实例中的信息进行了排序,从而实现了高效的枚举。我们将{em可分解}和{em兼容}排序函数的概念与查询分解结合在一起,查询分解是一种属性,允许元组分数的部分聚合,以便有效地枚举排序输出。我们用下界来补充算法结果,证明为什么关于排序函数性质的某些假设是必要的,并讨论了为枚举延迟保证的最优性提供证据的流行猜想。我们的结果扩展和改进了从理论和实践角度研究排名枚举的长期工作。
{"title":"Ranked Enumeration of Conjunctive Query Results","authors":"Shaleen Deep, Paraschos Koutris","doi":"10.4230/LIPIcs.ICDT.2021.5","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2021.5","url":null,"abstract":"We investigate the enumeration of top-k answers for conjunctive queries against relational databases according to a given ranking function. The task is to design data structures and algorithms that allow for efficient enumeration after a preprocessing phase. Our main contribution is a novel priority queue based algorithm with near-optimal delay and non-trivial space guarantees that are output sensitive and depend on structure of the query. In particular, we exploit certain desirable properties of ranking functions that frequently occur in practice and degree information in the database instance, allowing for efficient enumeration. We introduce the notion of {em decomposable} and {em compatible} ranking functions in conjunction with query decomposition, a property that allows for partial aggregation of tuple scores in order to efficiently enumerate the ranked output. We complement the algorithmic results with lower bounds justifying why certain assumptions about properties of ranking functions are necessary and discuss popular conjectures providing evidence for optimality of enumeration delay guarantees. Our results extend and improve upon a long line of work that has studied ranked enumeration from both theoretical and practical perspective.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79902111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
An Experimental Study of the Treewidth of Real-World Graph Data (Extended Version) 现实世界图数据树宽度的实验研究(扩展版)
S. Maniu, P. Senellart, Suraj Jog
Treewidth is a parameter that measures how tree-like a relational instance is, and whether it can reasonably be decomposed into a tree. Many computation tasks are known to be tractable on databases of small treewidth, but computing the treewidth of a given instance is intractable. This article is the first large-scale experimental study of treewidth and tree decompositions of real-world database instances (25 datasets from 8 different domains, with sizes ranging from a few thousand to a few million vertices). The goal is to determine which data, if any, can benefit of the wealth of algorithms for databases of small treewidth. For each dataset, we obtain upper and lower bound estimations of their treewidth, and study the properties of their tree decompositions. We show in particular that, even when treewidth is high, using partial tree decompositions can result in data structures that can assist algorithms.
Treewidth是一个参数,用于度量关系实例有多像树,以及它是否可以合理地分解为树。已知许多计算任务在树宽较小的数据库上是可处理的,但是计算给定实例的树宽是难以处理的。本文是对真实世界数据库实例(来自8个不同领域的25个数据集,大小从几千到几百万个顶点)的树宽度和树分解的第一次大规模实验研究。目标是确定哪些数据(如果有的话)可以从小树宽数据库的丰富算法中受益。对于每个数据集,我们获得了它们的树宽度的上界和下界估计,并研究了它们的树分解性质。我们特别指出,即使树宽很高,使用部分树分解也可以产生有助于算法的数据结构。
{"title":"An Experimental Study of the Treewidth of Real-World Graph Data (Extended Version)","authors":"S. Maniu, P. Senellart, Suraj Jog","doi":"10.4230/LIPICS.ICDT.2019.12","DOIUrl":"https://doi.org/10.4230/LIPICS.ICDT.2019.12","url":null,"abstract":"Treewidth is a parameter that measures how tree-like a relational instance is, and whether it can reasonably be decomposed into a tree. Many computation tasks are known to be tractable on databases of small treewidth, but computing the treewidth of a given instance is intractable. This article is the first large-scale experimental study of treewidth and tree decompositions of real-world database instances (25 datasets from 8 different domains, with sizes ranging from a few thousand to a few million vertices). The goal is to determine which data, if any, can benefit of the wealth of algorithms for databases of small treewidth. For each dataset, we obtain upper and lower bound estimations of their treewidth, and study the properties of their tree decompositions. We show in particular that, even when treewidth is high, using partial tree decompositions can result in data structures that can assist algorithms.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77826124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
期刊
Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1