Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory最新文献

英文中文

Counting Triangles under Updates in Worst-Case Optimal Time 计数三角形下更新在最坏情况下的最佳时间

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-04-09 DOI: 10.4230/LIPIcs.ICDT.2019.4

A. Kara, H. Ngo, M. Nikolic, Dan Olteanu, Haozhe Zhang

We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size. This lowest update time is worst-case optimal conditioned on the Online Matrix-Vector Multiplication conjecture. The classical and factorized incremental view maintenance approaches are recovered as special cases of our approach within the space-time tradeoff. In particular, they require linear-time update maintenance, which is suboptimal. Our approach also recovers the worst-case optimal time complexity for computing the triangle count in the non-incremental setting.

我们考虑在输入关系的单元组更新下增量维护三角形计数查询的问题。我们介绍了一种展示时空权衡的方法，使得时空积在输入数据库的大小上是二次的，更新时间可以低到这个大小的平方根。这种最低更新时间是最坏情况下最优条件下的在线矩阵-向量乘法猜想。经典增量视图维护方法和因式增量视图维护方法被恢复为我们的方法在时空权衡中的特殊情况。特别是，它们需要线性时间更新维护，这是次优的。我们的方法还恢复了在非增量设置中计算三角形计数的最坏情况下的最佳时间复杂度。

引用次数: 30

Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics 包语义下连接查询的并行正确性和可移植性

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.18

Bas Ketsman, F. Neven, Brecht Vandevoort

Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query. This property is referred to as parallel-correctness. Another key problem is to detect whether the data reshuffle step can be avoided when evaluating subsequent queries. The latter problem is referred to as transfer of parallel-correctness. This paper extends the study of parallel-correctness and transfer of parallel-correctness of conjunctive queries to incorporate bag semantics. We provide semantical characterizations for both problems, obtain complexity bounds and discuss the relationship with their set semantics counterparts. Finally, we revisit both problems under a modified distribution model that takes advantage of a linear order on compute nodes and obtain tight complexity bounds.

单轮多路连接算法首先在多个服务器上重新洗牌数据，然后以并行和无通信的方式评估手头的查询。一个关键问题是，重新洗牌的给定分布策略是否足以计算给定的查询。这个属性被称为并行正确性。另一个关键问题是在评估后续查询时检测是否可以避免数据重组步骤。后一个问题称为并行正确性的转移。本文扩展了连接查询并行正确性和并行正确性迁移的研究，将包语义纳入其中。我们给出了这两个问题的语义表征，获得了复杂度界，并讨论了它们与集合语义对应物的关系。最后，我们在一个改进的分布模型下重新讨论了这两个问题，该模型利用了计算节点的线性顺序并获得了紧密的复杂度界限。

引用次数: 4

An Update on Dynamic Complexity Theory 动态复杂性理论的新进展

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.3

T. Zeume

In many modern data management scenarios, data is subject to frequent changes. In order to avoid costly re-computing query answers from scratch after each small update, one can try to use auxiliary relations that have been computed before. Of course, the auxiliary relations need to be updated dynamically whenever the data changes. Dynamic complexity theory studies which queries and auxiliary relations can be updated in a highly parallel fashion, that is, by constant-depth circuits or, equivalently, by first-order formulas or the relational algebra. After gently introducing dynamic complexity theory, I will discuss recent results of the area with a focus on the dynamic complexity of the reachability query.

在许多现代数据管理场景中，数据经常发生变化。为了避免在每次小更新之后从头开始重新计算查询答案，可以尝试使用以前计算过的辅助关系。当然，只要数据发生变化，辅助关系就需要动态更新。动态复杂性理论研究查询和辅助关系可以以高度并行的方式更新，即通过定深回路或等价的一阶公式或关系代数。在简要介绍动态复杂性理论之后，我将讨论该领域的最新成果，重点关注可达性查询的动态复杂性。

引用次数: 0

Expressivity and Complexity of MongoDB Queries MongoDB查询的表现力和复杂性

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.9

E. Botoeva, Diego Calvanese, B. Cogrel, Guohui Xiao

In this paper, we consider MongoDB, a widely adopted but not formally understood database system managing JSON documents and equipped with a powerful query mechanism, called the aggregation framework. We provide a clean formal abstraction of this query language, which we call MQuery. We study the expressivity of MQuery, showing the equivalence of its well-typed fragment with nested relational algebra. We further investigate the computational complexity of significant fragments of it, obtaining several (tight) bounds in combined complexity, which range from LogSpace to alternating exponential-time with a polynomial number of alternations.

在本文中，我们考虑MongoDB，这是一个被广泛采用但尚未被正式理解的数据库系统，它管理JSON文档，并配备了强大的查询机制，称为聚合框架。我们为这种查询语言提供了一个清晰的形式化抽象，我们称之为MQuery。我们研究了MQuery的表达性，证明了它的良好类型片段与嵌套关系代数的等价性。我们进一步研究了它的重要片段的计算复杂度，得到了组合复杂度的几个(紧)界，其范围从LogSpace到交替指数时间，交替次数为多项式。

引用次数: 21

Rewriting Guarded Existential Rules into Small Datalog Programs 将存在规则改写为小数据程序

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.4

Shqiponja Ahmetaj, Magdalena Ortiz, M. Simkus

The goal of this paper is to understand the relative expressiveness of the query language in which queries are specified by a set of guarded (disjunctive) tuple-generating dependencies (TGDs) and an output (or 'answer') predicate. Our main result is to show that every such query can be translated into a polynomially-sized (disjunctive) Datalog program if the maximal number of variables in the (disjunctive) TGDs is bounded by a constant. To overcome the challenge that Datalog has no direct means to express the existential quantification present in TGDs, we define a two-player game that characterizes the satisfaction of the dependencies, and design a Datalog query that can decide the existence of a winning strategy for the game. For guarded disjunctive TGDs, we can obtain Datalog rules with disjunction in the heads. However, the use of disjunction is limited, and the resulting rules fall into a fragment that can be evaluated in deterministic single exponential time. We proceed quite differently for the case when the TGDs are not disjunctive and we show that we can obtain a plain Datalog query. Notably, unlike previous translations for related fragments, our translation requires only polynomial time if the maximal number of variables in the (disjunctive) TGDs is bounded by a constant.

本文的目标是理解查询语言的相对表达性，其中查询由一组保护的(析取的)元组生成依赖项(tgd)和输出(或“答案”)谓词指定。我们的主要结果是表明，如果(析取)tgd中变量的最大数量有一个常数，那么每个这样的查询都可以被转换成一个多项式大小的(析取)Datalog程序。为了克服Datalog无法直接表达存在量化的挑战，我们定义了一个描述依赖关系满意度的双人游戏，并设计了一个Datalog查询，可以决定游戏中获胜策略的存在性。对于有保护析取的tgd，我们可以得到头部有析取的数据规则。然而，析取的使用是有限的，并且所得到的规则落入一个片段，可以在确定性的单指数时间内评估。对于tgd不是析取的情况，我们进行了完全不同的处理，并且我们表明我们可以获得一个普通的Datalog查询。值得注意的是，与之前对相关片段的翻译不同，如果(析取)tgd中变量的最大数量有一个常数限制，我们的翻译只需要多项式时间。

{"title":"Rewriting Guarded Existential Rules into Small Datalog Programs","authors":"Shqiponja Ahmetaj, Magdalena Ortiz, M. Simkus","doi":"10.4230/LIPIcs.ICDT.2018.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2018.4","url":null,"abstract":"The goal of this paper is to understand the relative expressiveness of the query language in which queries are specified by a set of guarded (disjunctive) tuple-generating dependencies (TGDs) and an output (or 'answer') predicate. Our main result is to show that every such query can be translated into a polynomially-sized (disjunctive) Datalog program if the maximal number of variables in the (disjunctive) TGDs is bounded by a constant. To overcome the challenge that Datalog has no direct means to express the existential quantification present in TGDs, we define a two-player game that characterizes the satisfaction of the dependencies, and design a Datalog query that can decide the existence of a winning strategy for the game. For guarded disjunctive TGDs, we can obtain Datalog rules with disjunction in the heads. However, the use of disjunction is limited, and the resulting rules fall into a fragment that can be evaluated in deterministic single exponential time. We proceed quite differently for the case when the TGDs are not disjunctive and we show that we can obtain a plain Datalog query. Notably, unlike previous translations for related fragments, our translation requires only polynomial time if the maximal number of variables in the (disjunctive) TGDs is bounded by a constant.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"832 1","pages":"4:1-4:24"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77547771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Massively Parallel Entity Matching with Linear Classification in Low Dimensional Space 基于低维空间线性分类的大规模并行实体匹配

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.20

Yufei Tao

In entity matching classification, we are given two sets R and S of objects where whether r and s form a match is known for each pair (r, s) in R x S. If R and S are subsets of domains D(R) and D(S) respectively, the goal is to discover a classifier function f: D(R) x D(S) -> {0, 1} from a certain class satisfying the property that, for every (r, s) in R x S, f(r, s) = 1 if and only if r and s are a match. Past research is accustomed to running a learning algorithm directly on all the labeled (i.e., match or not) pairs in R times S. This, however, suffers from the drawback that even reading through the input incurs a quadratic cost. We pursue a direction towards removing the quadratic barrier. Denote by T the set of matching pairs in R times S. We propose to accept R, S, and T as the input, and aim to solve the problem with cost proportional to |R|+|S|+|T|, thereby achieving a large performance gain in the (typical) scenario where |T|<<|R||S|. This paper provides evidence on the feasibility of the new direction, by showing how to accomplish the aforementioned purpose for entity matching with linear classification, where a classifier is a linear multi-dimensional plane separating the matching and non-matching pairs. We actually do so in the MPC model, echoing the trend of deploying massively parallel computing systems for large-scale learning. As a side product, we obtain new MPC algorithms for three geometric problems: linear programming, batched range counting, and dominance join.

在实体匹配的分类,我们给出两套R和S的对象是否R和S形式以每一对匹配(R, S)在x R S .如果R和S是域的子集(R)和D (S)分别的目标是发现一个分类器函数f: D (R) x D (S) - >{0,1}从某个类的属性,每一个在R (R, S) x年代,f (R, S) = 1当且仅当R和S是匹配。过去的研究习惯于直接在R乘以s的所有标记(即匹配或不匹配)对上运行学习算法，然而，这存在一个缺点，即即使读取输入也会产生二次成本。我们追求一个消除二次势垒的方向。用T表示R乘以S的匹配对的集合。我们建议接受R、S、T作为输入，以|R|+|S|+|T|为代价来解决问题，从而在|T|<<|R||S|的(典型)场景中获得较大的性能提升。本文通过展示如何用线性分类实现实体匹配的上述目的，为新方向的可行性提供了证据，其中分类器是分离匹配对和不匹配对的线性多维平面。我们实际上是在MPC模型中这样做的，这与为大规模学习部署大规模并行计算系统的趋势相呼应。作为副产物，我们得到了三个几何问题的新的MPC算法:线性规划、批处理范围计数和优势连接。

{"title":"Massively Parallel Entity Matching with Linear Classification in Low Dimensional Space","authors":"Yufei Tao","doi":"10.4230/LIPIcs.ICDT.2018.20","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2018.20","url":null,"abstract":"In entity matching classification, we are given two sets R and S of objects where whether r and s form a match is known for each pair (r, s) in R x S. If R and S are subsets of domains D(R) and D(S) respectively, the goal is to discover a classifier function f: D(R) x D(S) -> {0, 1} from a certain class satisfying the property that, for every (r, s) in R x S, f(r, s) = 1 if and only if r and s are a match. Past research is accustomed to running a learning algorithm directly on all the labeled (i.e., match or not) pairs in R times S. This, however, suffers from the drawback that even reading through the input incurs a quadratic cost. We pursue a direction towards removing the quadratic barrier. Denote by T the set of matching pairs in R times S. We propose to accept R, S, and T as the input, and aim to solve the problem with cost proportional to |R|+|S|+|T|, thereby achieving a large performance gain in the (typical) scenario where |T|<<|R||S|. This paper provides evidence on the feasibility of the new direction, by showing how to accomplish the aforementioned purpose for entity matching with linear classification, where a classifier is a linear multi-dimensional plane separating the matching and non-matching pairs. We actually do so in the MPC model, echoing the trend of deploying massively parallel computing systems for large-scale learning. As a side product, we obtain new MPC algorithms for three geometric problems: linear programming, batched range counting, and dominance join.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"198200 1","pages":"20:1-20:19"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74890787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Fine-grained Algorithms and Complexity 细粒度算法和复杂性

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.1

V. V. Williams

A central goal of algorithmic research is to determine how fast computational problems can be solved in the worst case. Theorems from complexity theory state that there are problems that, on inputs of size n, can be solved in t(n) time but not in O(t(n)^{1-epsilon}) time for epsilon>0. The main challenge is to determine where in this hierarchy various natural and important problems lie. Throughout the years, many ingenious algorithmic techniques have been developed and applied to obtain blazingly fast algorithms for many problems. Nevertheless, for many other central problems, the best known running times are essentially those of their classical algorithms from the 1950s and 1960s. Unconditional lower bounds seem very difficult to obtain, and so practically all known time lower bounds are conditional. For years, the main tool for proving hardness of computational problems have been NP-hardness reductions, basing hardness on P neq NP. However, when we care about the exact running time (as opposed to merely polynomial vs non-polynomial), NP-hardness is not applicable, especially if the problem is already solvable in polynomial time. In recent years, a new theory has been developed, based on "fine-grained reductions" that focus on exact running times. Mimicking NP-hardness, the approach is to (1) select a key problem X that is conjectured to require essentially t(n) time for some t, and (2) reduce X in a fine-grained way to many important problems. This approach has led to the discovery of many meaningful relationships between problems, and even sometimes to equivalence classes. The main key problems used to base hardness on have been: the 3SUM problem, the CNF-SAT problem (based on the Strong Exponential Time Hypothesis (SETH)) and the All Pairs Shortest Paths Problem. Research on SETH-based lower bounds has flourished in particular in recent years showing that the classical algorithms are optimal for problems such as Approximate Diameter, Edit Distance, Frechet Distance, Longest Common Subsequence etc. In this talk I will give an overview of the current progress in this area of study, and will highlight some exciting new developments and their relationship to database theory.

算法研究的一个中心目标是确定在最坏的情况下计算问题的解决速度有多快。复杂度理论的定理表明，对于输入大小为n的问题，可以在t(n)时间内解决，但对于>0的问题，不能在O(t(n)^{1-epsilon})时间内解决。主要的挑战是确定在这个层次结构中各种自然的和重要的问题所在。多年来，许多巧妙的算法技术被开发和应用，以获得对许多问题的快速算法。然而，对于许多其他核心问题，最著名的运行时间基本上是20世纪50年代和60年代经典算法的运行时间。无条件的下界似乎很难得到，所以实际上所有已知的时间下界都是有条件的。多年来，证明计算问题的硬度的主要工具是NP-硬度约简，基于P neq NP的硬度。然而，当我们关心确切的运行时间(而不仅仅是多项式与非多项式)时，np -硬度就不适用了，特别是当问题已经在多项式时间内可解决时。近年来，一种基于“细粒度缩减”(fine-grained reduction)的新理论应运而生，该理论关注的是精确的运行时间。模仿np -硬度，方法是(1)选择一个关键问题X，据推测，对于某些t基本上需要t(n)时间，(2)以细粒度的方式将X减少到许多重要问题。这种方法发现了许多问题之间有意义的关系，有时甚至发现了等价类。用于建立硬度基础的主要关键问题有:3SUM问题、CNF-SAT问题(基于强指数时间假设(SETH))和全对最短路径问题。特别是近年来，基于seth的下界研究蓬勃发展，表明经典算法对近似直径、编辑距离、Frechet距离、最长公共子序列等问题是最优的。在这次演讲中，我将概述这一研究领域的当前进展，并将重点介绍一些令人兴奋的新发展及其与数据库理论的关系。

{"title":"Fine-grained Algorithms and Complexity","authors":"V. V. Williams","doi":"10.4230/LIPIcs.ICDT.2018.1","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2018.1","url":null,"abstract":"A central goal of algorithmic research is to determine how fast computational problems can be solved in the worst case. Theorems from complexity theory state that there are problems that, on inputs of size n, can be solved in t(n) time but not in O(t(n)^{1-epsilon}) time for epsilon>0. The main challenge is to determine where in this hierarchy various natural and important problems lie. Throughout the years, many ingenious algorithmic techniques have been developed and applied to obtain blazingly fast algorithms for many problems. Nevertheless, for many other central problems, the best known running times are essentially those of their classical algorithms from the 1950s and 1960s. Unconditional lower bounds seem very difficult to obtain, and so practically all known time lower bounds are conditional. For years, the main tool for proving hardness of computational problems have been NP-hardness reductions, basing hardness on P neq NP. However, when we care about the exact running time (as opposed to merely polynomial vs non-polynomial), NP-hardness is not applicable, especially if the problem is already solvable in polynomial time. In recent years, a new theory has been developed, based on \"fine-grained reductions\" that focus on exact running times. Mimicking NP-hardness, the approach is to (1) select a key problem X that is conjectured to require essentially t(n) time for some t, and (2) reduce X in a fine-grained way to many important problems. This approach has led to the discovery of many meaningful relationships between problems, and even sometimes to equivalence classes. The main key problems used to base hardness on have been: the 3SUM problem, the CNF-SAT problem (based on the Strong Exponential Time Hypothesis (SETH)) and the All Pairs Shortest Paths Problem. Research on SETH-based lower bounds has flourished in particular in recent years showing that the classical algorithms are optimal for problems such as Approximate Diameter, Edit Distance, Frechet Distance, Longest Common Subsequence etc. In this talk I will give an overview of the current progress in this area of study, and will highlight some exciting new developments and their relationship to database theory.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"12 1","pages":"1:1-1:1"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80236982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Satisfiability for SCULPT-Schemas for CSV-Like Data 类csv数据的雕刻模式的可满足性

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.14

J. Doleschal, W. Martens, F. Neven, Adam Witkowski

SCULPT is a simple schema language inspired by the recent working effort towards a recommendation by the World Wide Web Consortium (W3C) for tabular data and metadata on the Web. In its core, a SCULPT schema consists of a set of rules where left-hand sides select sets of regions in the tabular data and the right-hand sides describe the contents of these regions. A document (divided in cells by row- and column-delimiters) then satisfies a schema if it satisfies every rule. In this paper, we study the satisfiability problem for SCULPT schemas. As SCULPT describes grid-like structures, satisfiability obviously becomes undecidable rather quickly even for very restricted schemas. We define a schema language called L-SCULPT (Lego SCULPT) that restricts the walking power of SCULPT by selecting rectangular shaped areas and only considers tables for which selected regions do not intersect. Depending on the axes used by L-SCULPT, we show that satisfiability is PTIME-complete or undecidable. One of the tractable fragments is practically useful as it extends the structural core of the current W3C proposal for schemas over tabular data. We therefore see how the navigational power of the W3C proposal can be extended while still retaining tractable satisfiability tests.

SCULPT是一种简单的模式语言，其灵感来自万维网联盟(World Wide Web Consortium, W3C)最近针对Web上的表格数据和元数据提出的一项建议。在其核心中，一个SCULPT模式由一组规则组成，其中左侧选择表格数据中的区域集，右侧描述这些区域的内容。如果文档(按行和列分隔符划分为单元格)满足所有规则，则满足模式。在本文中，我们研究了sculppt模式的可满足性问题。正如sculppt所描述的网格结构一样，即使对于非常有限的模式，可满足性显然也会很快变得不可确定。我们定义了一种名为l - sculppt (Lego sculppt)的模式语言，它通过选择矩形区域来限制sculppt的行走能力，并且只考虑所选区域不相交的表。根据l - sculppt使用的轴，我们表明可满足性是ptime完全的或不可确定的。其中一个可处理的片段实际上很有用，因为它扩展了当前W3C关于表格数据的模式建议的结构核心。因此，我们看到了如何扩展W3C提案的导航功能，同时仍然保留可处理的满意度测试。

{"title":"Satisfiability for SCULPT-Schemas for CSV-Like Data","authors":"J. Doleschal, W. Martens, F. Neven, Adam Witkowski","doi":"10.4230/LIPIcs.ICDT.2018.14","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2018.14","url":null,"abstract":"SCULPT is a simple schema language inspired by the recent working effort towards a recommendation by the World Wide Web Consortium (W3C) for tabular data and metadata on the Web. In its core, a SCULPT schema consists of a set of rules where left-hand sides select sets of regions in the tabular data and the right-hand sides describe the contents of these regions. A document (divided in cells by row- and column-delimiters) then satisfies a schema if it satisfies every rule. In this paper, we study the satisfiability problem for SCULPT schemas. As SCULPT describes grid-like structures, satisfiability obviously becomes undecidable rather quickly even for very restricted schemas. We define a schema language called L-SCULPT (Lego SCULPT) that restricts the walking power of SCULPT by selecting rectangular shaped areas and only considers tables for which selected regions do not intersect. Depending on the axes used by L-SCULPT, we show that satisfiability is PTIME-complete or undecidable. One of the tractable fragments is practically useful as it extends the structural core of the current W3C proposal for schemas over tabular data. We therefore see how the navigational power of the W3C proposal can be extended while still retaining tractable satisfiability tests.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"28 1","pages":"14:1-14:19"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83789542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Join Algorithms: From External Memory to the BSP 连接算法:从外部存储器到BSP

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.2

K. Yi

Database systems have been traditionally disk-based, which had motivated the extensive study on external memory (EM) algorithms. However, as RAMs continue to get larger and cheaper, modern distributed data systems are increasingly adopting a main memory based, shared-nothing architecture, exemplified by systems like Spark and Flink. These systems can be abstracted by the BSP model (with variants like the MPC model and the MapReduce model), and there has been a strong revived interest in designing BSP algorithms for handling large amounts of data. With hard disks starting to fade away from the picture, EM algorithms may now seem less relevant. However, we observe that many of the recently developed join algorithms under the BSP model have a high degree of resemblance with their counterparts in the EM model. In this talk, I will present some recent results on join algorithms in the EM and BSP model, examine their relationships, and discuss a general theoretical framework for converting EM algorithms to the BSP.

数据库系统传统上是基于磁盘的，这激发了对外部存储器(EM)算法的广泛研究。然而，随着ram变得越来越大，越来越便宜，现代分布式数据系统越来越多地采用基于主存的无共享架构，例如Spark和Flink。这些系统可以通过BSP模型抽象出来(有MPC模型和MapReduce模型的变体)，并且对于设计BSP算法来处理大量数据已经有了强烈的兴趣。随着硬盘逐渐淡出人们的视野，EM算法现在可能显得不那么重要了。然而，我们观察到，许多最近开发的BSP模型下的连接算法与EM模型中的对应算法具有高度的相似性。在这次演讲中，我将介绍EM和BSP模型中连接算法的一些最新成果，检查它们之间的关系，并讨论将EM算法转换为BSP的一般理论框架。

引用次数: 0

Evaluation and Enumeration Problems for Regular Path Queries 常规路径查询的求值和枚举问题

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

Pub Date : 2018-01-01 DOI: 10.4230/LIPIcs.ICDT.2018.19

W. Martens, T. Trautner

Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths.

正则路径查询(rpq)是图数据库的核心组成部分。本文研究了在任意路径、最短路径和简单路径等几种语义下关于rpq评估的决策和枚举问题。虽然在多项式延迟下可以枚举任意路径和最短路径，但对于简单路径，情况要复杂得多。例如，如果一个给定的图包含一个特定长度的简单路径，这个问题已经有了高度非平凡解和长期开放问题的情况。本文从参数化复杂性的角度研究了简单路径的RPQ求值问题，并定义了一类在实际应用中较为突出的简单传递表达式，并对其求值问题证明了二分法。我们观察到，尽管简单的路径语义对于一般的rpq来说是难以处理的，但对于实际使用的绝大多数rpq来说是可行的。我们对简单路径研究的核心是一个独立兴趣的结果:有向图中的两条不相交路径问题是W[1]-如果用两条路径之一的长度参数化，则很难。

{"title":"Evaluation and Enumeration Problems for Regular Path Queries","authors":"W. Martens, T. Trautner","doi":"10.4230/LIPIcs.ICDT.2018.19","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2018.19","url":null,"abstract":"Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"104 1","pages":"19:1-19:21"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75507980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀