Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems最新文献

英文中文

Data Path Queries over Embedded Graph Databases 嵌入式图数据库的数据路径查询

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3524159

Diego Figueira, Artur Jeż, A. Lin

This paper initiates the study of data-path query languages (in particular, regular data path queries (RDPQ) and conjunctive RDPQ (CRDPQ)) in the classic setting of embedded finite model theory, wherein each graph is "embedded" into a background infinite structure (with a decidable FO theory or fragments thereof). Our goal is to address the current lack of support for typed attribute data (e.g. integer arithmetics) in existing data-path query languages, which are crucial in practice. We propose an extension of register automata by allowing powerful constraints over the theory and the database as guards, and having two types of registers: registers that can store values from the active domain, and read-only registers that can store arbitrary values. We prove NL data complexity for (C)RDPQ over the Presburger arithmetic, the real-closed field, the existential theory of automatic structures and word equations with regular constraints. All these results strictly extend the known NL data complexity of RDPQ with only equality comparisons, and provides an answer to a recent open problem posed by Libkin et al. Among others, we introduce one crucial proof technique for obtaining NL data complexity for data path queries over embedded graph databases called "Restricted Register Collapse (RRC)", inspired by the notion of Restricted Quantifier Collapse (RQC) in embedded finite model theory.

本文在嵌入式有限模型理论的经典背景下，开始研究数据路径查询语言(特别是正则数据路径查询(RDPQ)和合取RDPQ (CRDPQ))，其中每个图都被“嵌入”到一个背景无限结构中(具有可确定的FO理论或其片段)。我们的目标是解决现有数据路径查询语言目前缺乏对类型化属性数据(例如整数运算)的支持的问题，这在实践中是至关重要的。我们提出了对寄存器自动机的扩展，允许对理论和数据库的强大约束作为保护，并具有两种类型的寄存器:可以存储来自活动域的值的寄存器和可以存储任意值的只读寄存器。我们证明了(C)RDPQ在Presburger算法、实闭域、自动结构存在论和正则约束词方程上的NL数据复杂性。所有这些结果都严格扩展了RDPQ已知的NL数据复杂度，仅使用相等比较，并提供了Libkin等人最近提出的一个开放问题的答案。其中，我们介绍了一种关键的证明技术，用于获得嵌入式图数据库上数据路径查询的NL数据复杂性，称为“受限寄存器崩溃(RRC)”，灵感来自嵌入式有限模型理论中的受限量词崩溃(RQC)概念。

{"title":"Data Path Queries over Embedded Graph Databases","authors":"Diego Figueira, Artur Jeż, A. Lin","doi":"10.1145/3517804.3524159","DOIUrl":"https://doi.org/10.1145/3517804.3524159","url":null,"abstract":"This paper initiates the study of data-path query languages (in particular, regular data path queries (RDPQ) and conjunctive RDPQ (CRDPQ)) in the classic setting of embedded finite model theory, wherein each graph is \"embedded\" into a background infinite structure (with a decidable FO theory or fragments thereof). Our goal is to address the current lack of support for typed attribute data (e.g. integer arithmetics) in existing data-path query languages, which are crucial in practice. We propose an extension of register automata by allowing powerful constraints over the theory and the database as guards, and having two types of registers: registers that can store values from the active domain, and read-only registers that can store arbitrary values. We prove NL data complexity for (C)RDPQ over the Presburger arithmetic, the real-closed field, the existential theory of automatic structures and word equations with regular constraints. All these results strictly extend the known NL data complexity of RDPQ with only equality comparisons, and provides an answer to a recent open problem posed by Libkin et al. Among others, we introduce one crucial proof technique for obtaining NL data complexity for data path queries over embedded graph databases called \"Restricted Register Collapse (RRC)\", inspired by the notion of Restricted Quantifier Collapse (RQC) in embedded finite model theory.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123562586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

LACE: A Logical Approach to Collective Entity Resolution 蕾丝:集体实体决议的逻辑方法

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3526233

Meghyn Bienvenu, Gianluca Cima, Víctor Gutiérrez-Basulto

In this paper, we revisit the problem of entity resolution and propose a novel, logical framework, LACE, which mixes declarative and procedural elements to achieve a number of desirable properties. Our approach is fundamentally declarative in nature: it utilizes hard and soft rules to specify conditions under which pairs of entity references must or may be merged, together with denial constraints that enforce consistency of the resulting instance. Importantly, however, rule bodies are evaluated on the instance resulting from applying the already 'derived' merges. It is the dynamic nature of our semantics that enables us to capture collective entity resolution scenarios, where merges can trigger further merges, while at the same time ensuring that every merge can be justified. As the denial constraints restrict which merges can be performed together, we obtain a space of (maximal) solutions, from which we can naturally define notions of certain and possible merges and query answers. We explore the computational properties of our framework and determine the precise computational complexity of the relevant decision problems. Furthermore, as a first step towards implementing our approach, we demonstrate how we can encode the various reasoning tasks using answer set programming.

在本文中，我们重新审视了实体解析问题，并提出了一个新的逻辑框架LACE，它混合了声明性和过程性元素来实现许多理想的属性。我们的方法本质上是声明性的:它利用硬规则和软规则来指定必须或可能合并对实体引用的条件，以及强制结果实例一致性的拒绝约束。然而，重要的是，规则主体是在应用已经“派生”的合并所产生的实例上求值的。正是语义的动态特性使我们能够捕获集体实体解析场景，其中合并可以触发进一步的合并，同时确保每个合并都是合理的。由于拒绝约束限制了哪些合并可以一起执行，我们得到了一个(最大)解的空间，从中我们可以自然地定义某些和可能的合并的概念并查询答案。我们探索了我们的框架的计算特性，并确定了相关决策问题的精确计算复杂性。此外，作为实现我们方法的第一步，我们演示了如何使用答案集编程对各种推理任务进行编码。

{"title":"LACE: A Logical Approach to Collective Entity Resolution","authors":"Meghyn Bienvenu, Gianluca Cima, Víctor Gutiérrez-Basulto","doi":"10.1145/3517804.3526233","DOIUrl":"https://doi.org/10.1145/3517804.3526233","url":null,"abstract":"In this paper, we revisit the problem of entity resolution and propose a novel, logical framework, LACE, which mixes declarative and procedural elements to achieve a number of desirable properties. Our approach is fundamentally declarative in nature: it utilizes hard and soft rules to specify conditions under which pairs of entity references must or may be merged, together with denial constraints that enforce consistency of the resulting instance. Importantly, however, rule bodies are evaluated on the instance resulting from applying the already 'derived' merges. It is the dynamic nature of our semantics that enables us to capture collective entity resolution scenarios, where merges can trigger further merges, while at the same time ensuring that every merge can be justified. As the denial constraints restrict which merges can be performed together, we obtain a space of (maximal) solutions, from which we can naturally define notions of certain and possible merges and query answers. We explore the computational properties of our framework and determine the precise computational complexity of the relevant decision problems. Furthermore, as a first step towards implementing our approach, we demonstrate how we can encode the various reasoning tasks using answer set programming.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126007372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Query Evaluation by Circuits 电路查询求值

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3524142

Yilei Wang, K. Yi

In addition to its theoretical interest, computing with circuits has found applications in many other areas such as secure multi-party computation and outsourced query processing. Yet, the exact circuit complexity of query evaluation had remained an unexplored topic. In this paper, we present circuit constructions for conjunctive queries under degree constraints. These circuits have polylogarithmic depth and their sizes match the polymatroid bound up to polylogarithmic factors. We also propose a definition of output-sensitive circuit families and obtain such circuits with sizes matching their RAM counterparts.

除了理论上的兴趣之外，电路计算在许多其他领域也有应用，例如安全多方计算和外包查询处理。然而，查询求值的精确电路复杂度一直是一个未被探索的话题。在本文中，我们给出了度约束下的合取查询的电路构造。这些电路具有多对数深度，其尺寸与多对数因子绑定的多矩阵相匹配。我们还提出了输出敏感电路族的定义，并获得了与RAM对应电路尺寸匹配的输出敏感电路族。

引用次数: 4

Optimal Algorithms for Multiway Search on Partial Orders 偏序多路搜索的最优算法

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3524150

Shangqi Lu, W. Martens, Matthias Niewerth, Yufei Tao

We study partial order multiway search (POMS), which is a game between an algorithm A and an oracle, played on a directed acyclic graph G known to both parties. First, the oracle picks a vertex t in G called the target. Then, A needs to figure out which vertex is t by probing reachability. Specifically, in each probe, A selects a set Q of vertices in G whose size is bounded by a (pre-agreed) limit; the oracle reveals, for each vertex q ∈ Q, whether q can reach the target in G. The objective of A is to minimize the number of probes. This problem finds use in crowdsourcing, distributed file systems, software testing, etc. We describe an algorithm to solve POMS in O(log1+k n + d/k log1+dn) probes, where n is the number of vertices in G, k is the maximum permissible |Q|, and d is the largest out-degree of the vertices in G. We further establish the algorithm's asymptotic optimality by proving a matching lower bound. We also introduce a variant of POMS in the external memory (EM) computation model, which is the key to a black-box approach for converting a class of pointer-machine structures to their I/O-efficient counterparts. In the EM version of POMS, A is allowed to pre-compute a (disk-based) structure on G and is then required to clear its memory. The oracle (as before) picks a target t. A still needs to find t by issuing probes, except that the set Q in each probe must be read from the disk. The objective of A is now to minimize the number of I/Os. We present a structure that uses O(n/B) space and guarantees discovering the target in O(logB n + d/B log1+dn) I/Os where B is the block size, and n and d are as defined earlier. We establish the structure's asymptotic optimality by proving that any structure demands Ω(log_B n + d/B log1+d n) I/Os to find the target in the worst case regardless of the space consumption.

本文研究了在有向无环图G上，算法a与神谕之间的一种博弈——偏序多路搜索。首先，oracle在G中选择一个顶点t，称为目标。然后，A需要通过探测可达性来找出哪个顶点是t。具体来说，在每个探测中，A在G中选择一个集合Q，其大小受一个(预先约定的)限制;oracle显示，对于每个顶点q∈q, q是否能到达g中的目标。A的目标是使探测次数最小化。这个问题在众包、分布式文件系统、软件测试等领域都很常见。本文描述了一种用O(log1+k n +d /k log1+dn)个探针求解POMS的算法，其中n为G中的顶点数，k为允许的最大值Q, d为G中顶点的最大出度，并通过证明一个匹配的下界进一步证明了该算法的渐近最优性。我们还在外部存储器(EM)计算模型中引入了POMS的一种变体，这是将一类指针机结构转换为I/ o高效对应体的黑盒方法的关键。在EM版本的POMS中，允许A在G上预先计算一个(基于磁盘的)结构，然后需要清除其内存。oracle(和以前一样)选择一个目标t。a仍然需要通过发出探测来找到t，只是每个探测中的集合Q必须从磁盘读取。现在，A的目标是最小化I/ o的数量。我们提出了一个使用O(n/B)空间的结构，并保证在O(logB n +d /B log1+dn) I/O中发现目标，其中B是块大小，n和d与前面定义的一样。我们通过证明任何结构都需要Ω(log_B n +d /B log1+d n) I/ o来找到最坏情况下的目标，而不管空间消耗如何，从而建立了结构的渐近最优性。

{"title":"Optimal Algorithms for Multiway Search on Partial Orders","authors":"Shangqi Lu, W. Martens, Matthias Niewerth, Yufei Tao","doi":"10.1145/3517804.3524150","DOIUrl":"https://doi.org/10.1145/3517804.3524150","url":null,"abstract":"We study partial order multiway search (POMS), which is a game between an algorithm A and an oracle, played on a directed acyclic graph G known to both parties. First, the oracle picks a vertex t in G called the target. Then, A needs to figure out which vertex is t by probing reachability. Specifically, in each probe, A selects a set Q of vertices in G whose size is bounded by a (pre-agreed) limit; the oracle reveals, for each vertex q ∈ Q, whether q can reach the target in G. The objective of A is to minimize the number of probes. This problem finds use in crowdsourcing, distributed file systems, software testing, etc. We describe an algorithm to solve POMS in O(log1+k n + d/k log1+dn) probes, where n is the number of vertices in G, k is the maximum permissible |Q|, and d is the largest out-degree of the vertices in G. We further establish the algorithm's asymptotic optimality by proving a matching lower bound. We also introduce a variant of POMS in the external memory (EM) computation model, which is the key to a black-box approach for converting a class of pointer-machine structures to their I/O-efficient counterparts. In the EM version of POMS, A is allowed to pre-compute a (disk-based) structure on G and is then required to clear its memory. The oracle (as before) picks a target t. A still needs to find t by issuing probes, except that the set Q in each probe must be read from the disk. The objective of A is now to minimize the number of I/Os. We present a structure that uses O(n/B) space and guarantees discovering the target in O(logB n + d/B log1+dn) I/Os where B is the block size, and n and d are as defined earlier. We establish the structure's asymptotic optimality by proving that any structure demands Ω(log_B n + d/B log1+d n) I/Os to find the target in the worst case regardless of the space consumption.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122454583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

2022 ACM PODS Alberto O. Mendelzon Test-of-Time Award 2022年ACM PODS Alberto O. Mendelzon时间测试奖

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3526070

Michael Bender, Michael Benedikt, Sudeepa Roy

Citation. This paper took research on a fundamental problem in database research join query processing in a new direction. Its motivation was the bound on join query size of Atserias, Grohe, and Marx, now known as the AGM bound (FOCS 2008). This raised the question of whether a join algorithm can achieve a worst-case running time in line with this bound. This paper presents an algorithm that achieves this bound, while showing that traditional query plans cannot achieve it. In the process, they connect join processing questions with geometric inequalities, a connection that has proven fruitful in subsequent work. The algorithmic contribution in this paper almost immediately resonated within database applications when it was observed that a join algorithm recently implemented in industry, Leapfrog Triejoin, achieves a similar optimality guarantee. This led to a line of papers and implementations of join algorithms building off the ideas in the paper. The contribution of the paper to analysis of join queries has arguably been more profound – the connection between join query processing, geometric inequalities, and worst-case size bounds have been subsequently explored in many other contexts, including in the presence of integrity constraints. This work has already been honored with a “Gems of PODS” talk in PODS 2018: the conference paper, journal paper in JACM, and SIGMOD record survey article discussing later developments are all highly cited. This underlines the fact that this paper represented a major departure point for research in database theory.

引用。本文对数据库研究中的一个基本问题——联接查询处理进行了新的研究。其动机是Atserias、Grohe和Marx的连接查询大小的界限，现在称为AGM界限(FOCS 2008)。这就提出了一个问题，即连接算法是否可以达到符合此界限的最坏情况运行时间。本文提出了一种实现这一界限的算法，同时指出传统的查询计划无法实现这一界限。在此过程中，他们将连接处理问题与几何不等式联系起来，这种联系在随后的工作中被证明是富有成效的。当观察到最近在工业中实现的连接算法Leapfrog trijoin实现了类似的最优性保证时，本文中的算法贡献几乎立即在数据库应用程序中引起了共鸣。这导致了一系列的论文和基于论文思想的连接算法的实现。本文对连接查询分析的贡献可以说是更深远的——连接查询处理、几何不等式和最坏情况大小界限之间的联系随后在许多其他上下文中进行了探索，包括存在完整性约束的情况下。这项工作已经在PODS 2018上获得了“PODS的宝石”演讲的荣誉:会议论文，JACM的期刊论文，以及讨论后来发展的SIGMOD记录调查文章都被高度引用。这强调了一个事实，即本文代表了数据库理论研究的一个主要出发点。

{"title":"2022 ACM PODS Alberto O. Mendelzon Test-of-Time Award","authors":"Michael Bender, Michael Benedikt, Sudeepa Roy","doi":"10.1145/3517804.3526070","DOIUrl":"https://doi.org/10.1145/3517804.3526070","url":null,"abstract":"Citation. This paper took research on a fundamental problem in database research join query processing in a new direction. Its motivation was the bound on join query size of Atserias, Grohe, and Marx, now known as the AGM bound (FOCS 2008). This raised the question of whether a join algorithm can achieve a worst-case running time in line with this bound. This paper presents an algorithm that achieves this bound, while showing that traditional query plans cannot achieve it. In the process, they connect join processing questions with geometric inequalities, a connection that has proven fruitful in subsequent work. The algorithmic contribution in this paper almost immediately resonated within database applications when it was observed that a join algorithm recently implemented in industry, Leapfrog Triejoin, achieves a similar optimality guarantee. This led to a line of papers and implementations of join algorithms building off the ideas in the paper. The contribution of the paper to analysis of join queries has arguably been more profound – the connection between join query processing, geometric inequalities, and worst-case size bounds have been subsequently explored in many other contexts, including in the presence of integrity constraints. This work has already been honored with a “Gems of PODS” talk in PODS 2018: the conference paper, journal paper in JACM, and SIGMOD record survey article discussing later developments are all highly cited. This underlines the fact that this paper represented a major departure point for research in database theory.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132211230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robustness Against Read Committed: A Free Transactional Lunch 抗读提交的健壮性:免费的事务性午餐

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3524162

Brecht Vandevoort, Bas Ketsman, Christoph E. Koch, F. Neven

Transaction processing is a central part of most database applications. While serializability remains the gold standard for desirable transactional semantics, many database systems offer improved transaction throughput at the expense of introducing potential anomalies through the choice of a lower isolation level. Transactions are often not arbitrary but are constrained by a set of transaction programs defined at the application level (as is the case for TPC-C for instance), implying that not every potential anomaly can effectively be realized. The question central to this paper is the following: when - within the context of specific transaction programs - do isolation levels weaker than serializability, provide the same guarantees as serializability? We refer to the latter as the robustness problem. This paper surveys recent results on robustness testing against (multiversion) read committed focusing on complete rather than sufficient conditions. We show how to lift robustness testing to transaction templates as well as to programs to increase practical applicability. We discuss open questions and highlight promising directions for future research.

事务处理是大多数数据库应用程序的核心部分。虽然可序列化性仍然是理想事务语义的黄金标准，但许多数据库系统通过选择较低的隔离级别来提高事务吞吐量，但代价是引入了潜在的异常。事务通常不是任意的，而是受到在应用程序级别定义的一组事务程序的约束(例如TPC-C的情况)，这意味着并不是每个潜在的异常都可以有效地实现。本文的核心问题是:在特定事务程序的上下文中，何时隔离级别比可序列化性弱，提供与可序列化性相同的保证?我们将后者称为鲁棒性问题。本文调查了针对(多版本)read committed的鲁棒性测试的最新结果，重点关注完整而不是充分条件。我们将展示如何将健壮性测试提升到事务模板和程序，以提高实际的适用性。我们讨论了开放性问题，并强调了未来研究的有希望的方向。

{"title":"Robustness Against Read Committed: A Free Transactional Lunch","authors":"Brecht Vandevoort, Bas Ketsman, Christoph E. Koch, F. Neven","doi":"10.1145/3517804.3524162","DOIUrl":"https://doi.org/10.1145/3517804.3524162","url":null,"abstract":"Transaction processing is a central part of most database applications. While serializability remains the gold standard for desirable transactional semantics, many database systems offer improved transaction throughput at the expense of introducing potential anomalies through the choice of a lower isolation level. Transactions are often not arbitrary but are constrained by a set of transaction programs defined at the application level (as is the case for TPC-C for instance), implying that not every potential anomaly can effectively be realized. The question central to this paper is the following: when - within the context of specific transaction programs - do isolation levels weaker than serializability, provide the same guarantees as serializability? We refer to the latter as the robustness problem. This paper surveys recent results on robustness testing against (multiversion) read committed focusing on complete rather than sufficient conditions. We show how to lift robustness testing to transaction templates as well as to programs to increase practical applicability. We discuss open questions and highlight promising directions for future research.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"452 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124490247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The Gibbs-Rand Model 吉布斯-兰德模型

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3526227

Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi

Due to its many applications, the clustering ensemble problem has been subject of intense algorithmic study over the last two decades. The input to this problem is a set of clusterings; its goal is to output a clustering that minimizes the average distance to the input clusterings. In this paper, we propose, to the best of our knowledge, the first generative model for this problem. Our Gibbs-like model is parameterized by a center clustering, and by a scale ; the probability of a particular clustering decays exponentially with its scaled Rand distance to the center clustering. For our new model, we give polynomial-time algorithms for sampling, when the center clustering has a constant number of clusters and reconstruction, when the scale parameter is small. En route, we establish several interesting properties of our model. Our work shows that the combinatorial structure of a Gibbs-like model for clusterings is more intricate and challenging than the corresponding and well-studied (Mallows) model for permutations.

由于其广泛的应用，聚类集成问题在过去的二十年里一直是算法研究的主题。这个问题的输入是一组聚类;它的目标是输出一个最小化到输入聚类的平均距离的聚类。在本文中，据我们所知，我们提出了这个问题的第一个生成模型。我们的吉布斯模型由中心聚类和尺度参数化;特定聚类的概率随其到中心聚类的缩放兰德距离呈指数衰减。对于我们的新模型，我们给出了在中心聚类具有恒定簇数和重构时的多项式时间算法，当尺度参数较小时。在此过程中，我们建立了模型的几个有趣的属性。我们的工作表明，吉布斯类聚类模型的组合结构比相应的和研究得很好的排列模型(Mallows)更复杂和更具挑战性。

引用次数: 0

The Complexity of Regular Trail and Simple Path Queries on Undirected Graphs 无向图上规则轨迹查询和简单路径查询的复杂度

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3524149

W. Martens, Tina Popp

We study the data complexity of regular trail and simple path queries on undirected graphs. Using techniques from structural graph theory, ranging from the graph minor theorem to group-labeled graphs, we are able to identify several tractable and intractable subclasses of the regular languages. In particular, we establish that trail evaluation for simple chain regular expressions, which are common in practice, is tractable, whereas simple path evaluation is tractable for a large subclass. The problem of fully classifying all regular languages is quite non-trivial, even on undirected graphs, since it subsumes an intriguing problem that has been open for 30 years.

研究了无向图上规则轨迹查询和简单路径查询的数据复杂度。利用结构图论的技术，从图小定理到群标记图，我们能够识别出规则语言的几个易于处理和难以处理的子类。特别地，我们建立了在实践中常见的简单链正则表达式的跟踪计算是可处理的，而简单路径计算对于大型子类是可处理的。对所有正则语言进行完全分类的问题非常重要，即使在无向图上也是如此，因为它包含了一个已经开放了30年的有趣问题。

引用次数: 2

Query Evaluation over SLP-Represented Document Databases with Complex Document Editing 具有复杂文档编辑的slp表示文档数据库的查询评估

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3524158

Markus L. Schmid, Nicole Schweikardt

It is known that the query result of a regular spanner over a single document D can be enumerated after O(|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O(log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i.e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = D1, D2, ..., Dm that is represented by an SLP SDDB, and that supports regular spanners M1, M2, ..., Mk (meaning that we have data structures at our disposal that allow O(log |Di|)-delay enumeration of the result of spanner Mj on document Di). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression φ in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M1, ..., Mk. The total time required for such an update is O(k |φ| log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by φ. We stress the fact that the size |SDDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update. Our approach hinges on balancing properties of SLPs, and for our updates it is vital to manipulate the SLP that represents the database in such a way that these balancing properties are maintained.

众所周知，常规扳手对单个文档D的查询结果可以在O(|D|)预处理后，在数据复杂度上具有恒定的延迟(Florenzano等人，ACM TODS 2020, Amarilli等人，ACM TODS 2021)。已经证明(Schmid和Schweikardt, PODS'21)，如果文档由直线程序(SLP) S表示，则枚举可能具有O(log |D|)的延迟，但预处理在|S|中是线性的(在最好的情况下，在|D|中是对数的)。因此，如果文档是高度可压缩的，这个压缩设置允许在次线性时间内对扳手进行评估，即预处理和延迟的对数上限。在这项工作中，我们将这些结果扩展到动态设置。我们考虑一个文档数据库DDB = D1, D2，…， Dm，由SLP SDDB表示，支持常规扳手M1, M2，…Mk(这意味着我们可以使用允许0 (log |Di|)延迟枚举扳手Mj在文档Di上的结果的数据结构)。然后，我们可以通过文本编辑器中常见的一系列文本编辑操作(如复制和粘贴、删除或复制因子、连接文档等)操作DDB的现有文档来执行更新，并将由此构造的文档添加到数据库中。这样的操作称为复杂文档编辑，用合适的代数表达式φ表示。而且，在此操作之后，文档数据库仍然支持所有常规扳手M1，…这种更新所需的总时间为O(k |φ| log d)，其中d是在φ描述的复杂文档编辑中构建的任何中间文档的最大长度。我们强调这样一个事实，即SLP的大小(在静态情况下是预处理的上限)在数据中可能是对数的，但通常取决于文档的可压缩性(在最坏的情况下，它在数据中甚至是线性的)。与此相反，我们可以保证对更新数据的依赖是对数的，而不管SLP实现的实际压缩是多少。特别是，由复杂文档编辑执行的任何此类更新都会添加文档，这些文档的长度可能比执行此类更新所需的时间大得多。我们的方法取决于SLP的平衡属性，对于我们的更新来说，以维护这些平衡属性的方式操作代表数据库的SLP是至关重要的。

{"title":"Query Evaluation over SLP-Represented Document Databases with Complex Document Editing","authors":"Markus L. Schmid, Nicole Schweikardt","doi":"10.1145/3517804.3524158","DOIUrl":"https://doi.org/10.1145/3517804.3524158","url":null,"abstract":"It is known that the query result of a regular spanner over a single document D can be enumerated after O(|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O(log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i.e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = D1, D2, ..., Dm that is represented by an SLP SDDB, and that supports regular spanners M1, M2, ..., Mk (meaning that we have data structures at our disposal that allow O(log |Di|)-delay enumeration of the result of spanner Mj on document Di). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression φ in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M1, ..., Mk. The total time required for such an update is O(k |φ| log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by φ. We stress the fact that the size |SDDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update. Our approach hinges on balancing properties of SLPs, and for our updates it is vital to manipulate the SLP that represents the database in such a way that these balancing properties are maintained.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130187394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

When is the Evaluation of Extended CRPQ Tractable? 扩展CRPQ的评估何时可处理?

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pub Date : 2022-06-12 DOI: 10.1145/3517804.3524167

Diego Figueira, Varun Ramanathan

We investigate the complexity of the evaluation problem for ECRPQ: Conjunctive Regular Path Queries (CRPQ), extended with synchronous relations (aka regular or automatic). We give a characterization for the evaluation and parameterized evaluation problems of ECRPQ in terms of the underlying structure of queries. As we show, complexity can range between PSpace, NP and polynomial time for the evaluation problem, and between XNL, W[1], and FPT for parameterized evaluation.

我们研究了ECRPQ求值问题的复杂性:用同步关系扩展的合取正则路径查询(CRPQ)(也称为正则或自动)。从查询的底层结构出发，给出了ECRPQ的求值问题和参数化求值问题的表征。正如我们所展示的，对于评估问题，复杂度可以在PSpace、NP和多项式时间之间，对于参数化评估，复杂度可以在XNL、w[1]和FPT之间。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀