首页 > 最新文献

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory最新文献

英文 中文
Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs 加权输入的广义Greenwald-Khanna流分位数摘要
Sepehr Assadi, Nirmit Joshi, M. Prabhu, Vihan Shah
Estimating quantiles, like the median or percentiles, is a fundamental task in data mining and data science. A (streaming) quantile summary is a data structure that can process a set S of n elements in a streaming fashion and at the end, for any phi in (0,1], return a phi-quantile of S up to an eps error, i.e., return a phi'-quantile with phi'=phi +- eps. We are particularly interested in comparison-based summaries that only compare elements of the universe under a total ordering and are otherwise completely oblivious of the universe. The best known deterministic quantile summary is the 20-year old Greenwald-Khanna (GK) summary that uses O((1/eps) log(eps n)) space [SIGMOD'01]. This bound was recently proved to be optimal for all deterministic comparison-based summaries by Cormode and Vesle'y [PODS'20]. In this paper, we study weighted quantiles, a generalization of the quantiles problem, where each element arrives with a positive integer weight which denotes the number of copies of that element being inserted. The only known method of handling weighted inputs via GK summaries is the naive approach of breaking each weighted element into multiple unweighted items and feeding them one by one to the summary, which results in a prohibitively large update time (proportional to the maximum weight of input elements). We give the first non-trivial extension of GK summaries for weighted inputs and show that it takes O((1/eps) log(eps n)) space and O(log(1/eps)+ log log(eps n)) update time per element to process a stream of length n (under some quite mild assumptions on the range of weights and eps). En route to this, we also simplify the original GK summaries for unweighted quantiles.
估计分位数,如中位数或百分位数,是数据挖掘和数据科学中的一项基本任务。(流式)分位数汇总是一种数据结构,它可以以流式方式处理n个元素的集合S,最后,对于(0,1]中的任何phi,返回S的phi分位数,误差为eps,即返回phi'=phi +- eps的phi'-分位数。我们对基于比较的总结特别感兴趣,这种总结只比较宇宙中总顺序下的元素,否则就完全忽略了宇宙。最著名的确定性分位数总结是已有20年历史的Greenwald-Khanna (GK)总结,它使用O((1/eps) log(eps n))空间[SIGMOD'01]。最近,Cormode和Vesle'y [PODS'20]证明了该界对于所有基于确定性比较的总结都是最优的。在本文中,我们研究了加权分位数问题,这是分位数问题的一种推广,其中每个元素的权值为正整数,表示该元素被插入的拷贝数。通过GK摘要处理加权输入的唯一已知方法是将每个加权元素分解为多个未加权项并将它们一个接一个地提供给摘要的简单方法,这会导致非常大的更新时间(与输入元素的最大权重成正比)。我们给出了加权输入的GK摘要的第一个非平凡扩展,并表明它需要O((1/eps) log(eps n))空间和O(log(1/eps)+ log log(eps n))每个元素的更新时间来处理长度为n的流(在权重和eps范围的一些相当温和的假设下)。在此过程中,我们还简化了原始GK摘要的未加权分位数。
{"title":"Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs","authors":"Sepehr Assadi, Nirmit Joshi, M. Prabhu, Vihan Shah","doi":"10.48550/arXiv.2303.06288","DOIUrl":"https://doi.org/10.48550/arXiv.2303.06288","url":null,"abstract":"Estimating quantiles, like the median or percentiles, is a fundamental task in data mining and data science. A (streaming) quantile summary is a data structure that can process a set S of n elements in a streaming fashion and at the end, for any phi in (0,1], return a phi-quantile of S up to an eps error, i.e., return a phi'-quantile with phi'=phi +- eps. We are particularly interested in comparison-based summaries that only compare elements of the universe under a total ordering and are otherwise completely oblivious of the universe. The best known deterministic quantile summary is the 20-year old Greenwald-Khanna (GK) summary that uses O((1/eps) log(eps n)) space [SIGMOD'01]. This bound was recently proved to be optimal for all deterministic comparison-based summaries by Cormode and Vesle'y [PODS'20]. In this paper, we study weighted quantiles, a generalization of the quantiles problem, where each element arrives with a positive integer weight which denotes the number of copies of that element being inserted. The only known method of handling weighted inputs via GK summaries is the naive approach of breaking each weighted element into multiple unweighted items and feeding them one by one to the summary, which results in a prohibitively large update time (proportional to the maximum weight of input elements). We give the first non-trivial extension of GK summaries for weighted inputs and show that it takes O((1/eps) log(eps n)) space and O(log(1/eps)+ log log(eps n)) update time per element to process a stream of length n (under some quite mild assumptions on the range of weights and eps). En route to this, we also simplify the original GK summaries for unweighted quantiles.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87002261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simple Algorithm for Consistent Query Answering under Primary Keys 一种简单的主键下一致性查询应答算法
Diego Figueira, A. Padmanabha, L. Segoufin, C. Sirangelo
We consider the dichotomy conjecture for consistent query answering under primary key constraints stating that for every fixed Boolean conjunctive query q, testing whether it is certain over all repairs of a given inconsistent database is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set $Delta$ of subsets of database repairs with at most $k$ facts, where $k$ is the size of the query $q$. The algorithm runs in polynomial time and can be formally defined as: 1. Initialize $Delta$ with all sets $S$ of at most $k$ facts such that $S$ satisfies $q$. 2. Add any set $S$ of at most $k$ facts to $Delta$ if there exists a block $B$ (ie, a maximal set of facts sharing the same key) such that for every fact $a$ of $B$ there is a set $S' in Delta$ contained in $(S cup {a})$. The algorithm answers"$q$ is certain"iff $Delta$ eventually contains the empty set. The algorithm correctly computes certain answers when the query $q$ falls in the polynomial time cases for self-join-free queries and path queries. For arbitrary queries, the algorithm is an under-approximation: The query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.
我们考虑在主键约束下一致性查询回答的二分猜想,说明对于每个固定的布尔合查询q,测试给定的不一致数据库的所有修复是否确定是多项式时间或conp完全的。这个猜想已经在自连接无查询和路径查询中得到验证。我们提出了一种简单的膨胀不定点算法,用于一致查询回答,对于给定的数据库,它天真地计算最多具有$k$个事实的数据库修复子集的集合$Delta$,其中$k$是查询的大小$q$。该算法运行时间为多项式,可以正式定义为:1。用最多包含$k$个事实的所有集合$S$初始化$Delta$,使$S$满足$q$。2. 如果存在一个块$B$(即,共享相同键的最大事实集),则将最多包含$k$个事实的任何集合$S$添加到$Delta$,使得对于$B$的每个事实$a$, $(S cup {a})$中包含一个集合$S' in Delta$。如果$Delta$最终包含空集,算法会回答“$q$是确定的”。当查询$q$属于无自连接查询和路径查询的多项式时间情况时,算法正确地计算某些答案。对于任意查询,算法是一个欠近似值:如果算法声明是确定的,则保证查询是确定的。然而,存在多项式时间的某些查询(带有自连接),这些查询不会被算法识别出来。
{"title":"A Simple Algorithm for Consistent Query Answering under Primary Keys","authors":"Diego Figueira, A. Padmanabha, L. Segoufin, C. Sirangelo","doi":"10.48550/arXiv.2301.08482","DOIUrl":"https://doi.org/10.48550/arXiv.2301.08482","url":null,"abstract":"We consider the dichotomy conjecture for consistent query answering under primary key constraints stating that for every fixed Boolean conjunctive query q, testing whether it is certain over all repairs of a given inconsistent database is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set $Delta$ of subsets of database repairs with at most $k$ facts, where $k$ is the size of the query $q$. The algorithm runs in polynomial time and can be formally defined as: 1. Initialize $Delta$ with all sets $S$ of at most $k$ facts such that $S$ satisfies $q$. 2. Add any set $S$ of at most $k$ facts to $Delta$ if there exists a block $B$ (ie, a maximal set of facts sharing the same key) such that for every fact $a$ of $B$ there is a set $S' in Delta$ contained in $(S cup {a})$. The algorithm answers\"$q$ is certain\"iff $Delta$ eventually contains the empty set. The algorithm correctly computes certain answers when the query $q$ falls in the polynomial time cases for self-join-free queries and path queries. For arbitrary queries, the algorithm is an under-approximation: The query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79633726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Compact Data Structures Meet Databases (Invited Talk) 紧凑型数据结构遇上数据库(特邀讲座)
Gonzalo Navarro
{"title":"Compact Data Structures Meet Databases (Invited Talk)","authors":"Gonzalo Navarro","doi":"10.4230/LIPIcs.ICDT.2023.2","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.2","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78620278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Some Vignettes on Subgraph Counting Using Graph Orientations (Invited Talk) 利用图的方向进行子图计数的若干要点(特邀演讲)
C. Seshadhri, Floris Geerts, Brecht Vandevoort
Subgraph counting is a fundamental problem that spans many areas in computer science: database theory, logic, network science, data mining, and complexity theory. Given a large input graph G and a small pattern graph H , we wish to count the number of occurrences of H in G . In recent times, there has been a resurgence on using an old (maybe overlooked?) technique of orienting the edges of G and H , and then using a combination of brute-force enumeration and indexing. These orientation techniques appear to give the best of both worlds. There is a rigorous theoretical explanation behind these techniques, and they also have excellent empirical behavior (on large real-world graphs). Time and again, graph orientations help solve subgraph counting problems in various computational models, be it sampling, streaming, distributed, etc. In this paper, we give some short vignettes on how the orientation technique solves a variety of algorithmic problems.
子图计数是一个基本问题,它跨越了计算机科学的许多领域:数据库理论、逻辑、网络科学、数据挖掘和复杂性理论。给定一个大的输入图G和一个小的模式图H,我们希望计算H在G中出现的次数。最近,使用一种古老的(可能被忽视的?)技术重新兴起,这种技术定位G和H的边缘,然后结合使用暴力枚举和索引。这些定向技术似乎是两全其美。这些技术背后有严格的理论解释,它们也有出色的经验行为(在大型现实世界的图表上)。图的方向一次又一次地帮助解决了各种计算模型中的子图计数问题,比如采样、流、分布式等。在本文中,我们简要介绍了定向技术如何解决各种算法问题。
{"title":"Some Vignettes on Subgraph Counting Using Graph Orientations (Invited Talk)","authors":"C. Seshadhri, Floris Geerts, Brecht Vandevoort","doi":"10.4230/LIPIcs.ICDT.2023.3","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.3","url":null,"abstract":"Subgraph counting is a fundamental problem that spans many areas in computer science: database theory, logic, network science, data mining, and complexity theory. Given a large input graph G and a small pattern graph H , we wish to count the number of occurrences of H in G . In recent times, there has been a resurgence on using an old (maybe overlooked?) technique of orienting the edges of G and H , and then using a combination of brute-force enumeration and indexing. These orientation techniques appear to give the best of both worlds. There is a rigorous theoretical explanation behind these techniques, and they also have excellent empirical behavior (on large real-world graphs). Time and again, graph orientations help solve subgraph counting problems in various computational models, be it sampling, streaming, distributed, etc. In this paper, we give some short vignettes on how the orientation technique solves a variety of algorithmic problems.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90606643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Researcher's Digest of GQL (Invited Talk) GQL研究人员文摘(特邀演讲)
Nadime Francis, Amélie Gheerbrant, P. Guagliardo, L. Libkin, Victor Marsault, W. Martens, Filip Murlak, L. Peterfreund, Alexandra Rogova, D. Vrgoc
{"title":"A Researcher's Digest of GQL (Invited Talk)","authors":"Nadime Francis, Amélie Gheerbrant, P. Guagliardo, L. Libkin, Victor Marsault, W. Martens, Filip Murlak, L. Peterfreund, Alexandra Rogova, D. Vrgoc","doi":"10.4230/LIPIcs.ICDT.2023.1","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.1","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89748128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Enumerating Subgraphs of Constant Sizes in External Memory 在外部存储器中枚举常数大小的子图
Shiyuan Deng, Francesco Silvestri, Yufei Tao
We present an indivisible I/O-efficient algorithm for subgraph enumeration , where the objective is to list all the subgraphs of a massive graph G := ( V, E ) that are isomorphic to a pattern graph Q having k = O (1) vertices. Our algorithm performs O ( | E | k/ 2 M k/ 2 − 1 B log M/B | E | B + | E | ρ M ρ − 1 B ) I/Os with high probability, where ρ is the fractional edge covering number of Q (it always holds ρ ≥ k/ 2, regardless of Q ), M is the number of words in (internal) memory, and B is the number of words in a disk block. Our solution is optimal in the class of indivisible algorithms for all pattern graphs with ρ > k/ 2. When ρ = k/ 2, our algorithm is still optimal as long as M/B ≥ ( | E | /B ) ϵ for any constant ϵ > 0. 2012 ACM Subject Classification Theory of computation → Graph algorithms analysis; Information systems → Join algorithms
本文提出了一种子图枚举的不可分I/O效率算法,其目标是列出与具有k = O(1)个顶点的模式图Q同构的海量图G:= (V, E)的所有子图。我们的算法以高概率执行O (| E | k/ 2 M k/ 2−1 B log M/B | E | B + | E | ρ M ρ−1 B) I/O,其中ρ是覆盖Q的分数边数(无论Q如何,它总是保持ρ≥k/ 2), M是(内部)内存中的字数,B是磁盘块中的字数。对于ρ > k/ 2的所有模式图,我们的解在不可分算法中是最优的。当ρ = k/ 2时,对于任意常数ε > 0,只要M/B≥(| E | /B) ε,我们的算法仍然是最优的。2012 ACM学科分类计算理论→图算法分析;信息系统→联接算法
{"title":"Enumerating Subgraphs of Constant Sizes in External Memory","authors":"Shiyuan Deng, Francesco Silvestri, Yufei Tao","doi":"10.4230/LIPIcs.ICDT.2023.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.4","url":null,"abstract":"We present an indivisible I/O-efficient algorithm for subgraph enumeration , where the objective is to list all the subgraphs of a massive graph G := ( V, E ) that are isomorphic to a pattern graph Q having k = O (1) vertices. Our algorithm performs O ( | E | k/ 2 M k/ 2 − 1 B log M/B | E | B + | E | ρ M ρ − 1 B ) I/Os with high probability, where ρ is the fractional edge covering number of Q (it always holds ρ ≥ k/ 2, regardless of Q ), M is the number of words in (internal) memory, and B is the number of words in a disk block. Our solution is optimal in the class of indivisible algorithms for all pattern graphs with ρ > k/ 2. When ρ = k/ 2, our algorithm is still optimal as long as M/B ≥ ( | E | /B ) ϵ for any constant ϵ > 0. 2012 ACM Subject Classification Theory of computation → Graph algorithms analysis; Information systems → Join algorithms","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80111240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Size Bounds and Algorithms for Conjunctive Regular Path Queries 合取规则路径查询的大小边界和算法
Tamara Cucumides, Juan L. Reutter, D. Vrgoc
{"title":"Size Bounds and Algorithms for Conjunctive Regular Path Queries","authors":"Tamara Cucumides, Juan L. Reutter, D. Vrgoc","doi":"10.4230/LIPIcs.ICDT.2023.13","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.13","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74027432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Optimal Algorithm for Sliding Window Order Statistics 滑动窗口序统计量的最优算法
Pavel Raykov
Assume there is a data stream of elements and a window of size m . Sliding window algorithms compute various statistic functions over the last m elements of the data stream seen so far. The time complexity of a sliding window algorithm is measured as the time required to output an updated statistic function value every time a new element is read. For example, it is well known that computing the sliding window maximum/minimum has time complexity O (1) while computing the sliding window median has time complexity O (log m ). In this paper we close the gap between these two cases by (1) presenting an algorithm for computing the sliding window k -th smallest element in O (log k ) time and (2) prove that this time complexity is optimal.
假设有一个元素的数据流和一个大小为m的窗口。滑动窗口算法对到目前为止看到的数据流的最后m个元素计算各种统计函数。滑动窗口算法的时间复杂度是通过每次读取新元素时输出更新的统计函数值所需的时间来衡量的。例如,众所周知,计算滑动窗口最大值/最小值的时间复杂度为O(1),而计算滑动窗口中值的时间复杂度为O (log m)。在本文中,我们通过(1)提出了在O (log k)时间内计算滑动窗口第k个最小元素的算法,(2)证明了这种时间复杂度是最优的,从而缩小了这两种情况之间的差距。
{"title":"An Optimal Algorithm for Sliding Window Order Statistics","authors":"Pavel Raykov","doi":"10.4230/LIPIcs.ICDT.2023.5","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.5","url":null,"abstract":"Assume there is a data stream of elements and a window of size m . Sliding window algorithms compute various statistic functions over the last m elements of the data stream seen so far. The time complexity of a sliding window algorithm is measured as the time required to output an updated statistic function value every time a new element is read. For example, it is well known that computing the sliding window maximum/minimum has time complexity O (1) while computing the sliding window median has time complexity O (log m ). In this paper we close the gap between these two cases by (1) presenting an algorithm for computing the sliding window k -th smallest element in O (log k ) time and (2) prove that this time complexity is optimal.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85997470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Consistency of Probabilistic Databases with Independent Cells 具有独立单元格的概率数据库的一致性
Amir Gilad, Aviram Imber, B. Kimelfeld
A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies.
具有属性级不确定性的概率数据库由一些关系组成,其中某些属性的单元格可能包含概率分布,而不是确定性内容。这样的数据库隐式或显式地出现在噪声操作的上下文中,例如缺失数据输入,我们自动填充缺失值;列预测,我们预测未知属性;数据库清理(和修复),由于检测到错误或违反完整性约束,我们替换原始值。我们研究了在存在完整性约束的情况下关于单元值选择问题的计算复杂性。更准确地说,我们关注功能依赖并研究三个问题:(1)确定约束是否可以通过任何值的选择来满足,(2)找到最可能的选择,(3)计算满足约束的概率。这些问题的数据复杂性是由功能依赖集和不确定属性集合的组合决定的。我们对几类约束给出了可处理和难以处理的复杂性的完整分类,包括单个依赖、匹配约束和一元功能依赖。
{"title":"The Consistency of Probabilistic Databases with Independent Cells","authors":"Amir Gilad, Aviram Imber, B. Kimelfeld","doi":"10.48550/arXiv.2212.12104","DOIUrl":"https://doi.org/10.48550/arXiv.2212.12104","url":null,"abstract":"A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87755463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Approximation and Semantic Tree-width of Conjunctive Regular Path Queries 合取规则路径查询的逼近和语义树宽度
Diego Figueira, Rémi Morvan
We show that the problem of whether a query is equivalent to a query of tree-width $k$ is decidable, for the class of Unions of Conjunctive Regular Path Queries with two-way navigation (UC2RPQs). A previous result by Barcel'o, Romero, and Vardi has shown decidability for the case $k=1$, and here we show that decidability in fact holds for any arbitrary $k>1$. The algorithm is in 2ExpSpace, but for the restricted but practically relevant case where all regular expressions of the query are of the form $a^*$ or $(a_1 + dotsb + a_n)$ we show that the complexity of the problem drops to $Pi_2^p$. We also investigate the related problem of approximating a UC2RPQ by queries of small tree-width. We exhibit an algorithm which, for any fixed number $k$, builds the maximal under-approximation of tree-width $k$ of a UC2RPQ. The maximal under-approximation of tree-width $k$ of a query $q$ is a query $q'$ of tree-width $k$ which is contained in $q$ in a maximal and unique way, that is, such that for every query $q''$ of tree-width $k$, if $q''$ is contained in $q$ then $q''$ is also contained in $q'$.
我们证明了查询是否等同于树宽度$k$的查询的问题是可判定的,对于具有双向导航的合取正则路径查询的联合类(UC2RPQs)。先前Barceló, Romero和Vardi的结果表明了$k=1$的可判决性,这里我们表明,可判决性实际上适用于任何任意$k>1$。该算法在2ExpSpace中,但对于查询的所有正则表达式都是$a^*$或$(a_1 + dotsb + a_n)$形式的受限但实际相关的情况,我们显示问题的复杂性下降到$Pi_2^p$。我们还研究了用小树宽查询逼近UC2RPQ的相关问题。我们展示了一种算法,对于任意固定数$k$,构建UC2RPQ树宽度$k$的最大欠逼近。查询$q$的树宽度$k$的最大不足近似是查询$q'$的树宽度$k$,它以最大唯一的方式包含在$q$中,也就是说,对于每一个查询$q''$的树宽度$k$,如果$q$中包含$q''$,那么$q''$也包含在$q'$中。
{"title":"Approximation and Semantic Tree-width of Conjunctive Regular Path Queries","authors":"Diego Figueira, Rémi Morvan","doi":"10.48550/arXiv.2212.01679","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01679","url":null,"abstract":"We show that the problem of whether a query is equivalent to a query of tree-width $k$ is decidable, for the class of Unions of Conjunctive Regular Path Queries with two-way navigation (UC2RPQs). A previous result by Barcel'o, Romero, and Vardi has shown decidability for the case $k=1$, and here we show that decidability in fact holds for any arbitrary $k>1$. The algorithm is in 2ExpSpace, but for the restricted but practically relevant case where all regular expressions of the query are of the form $a^*$ or $(a_1 + dotsb + a_n)$ we show that the complexity of the problem drops to $Pi_2^p$. We also investigate the related problem of approximating a UC2RPQ by queries of small tree-width. We exhibit an algorithm which, for any fixed number $k$, builds the maximal under-approximation of tree-width $k$ of a UC2RPQ. The maximal under-approximation of tree-width $k$ of a query $q$ is a query $q'$ of tree-width $k$ which is contained in $q$ in a maximal and unique way, that is, such that for every query $q''$ of tree-width $k$, if $q''$ is contained in $q$ then $q''$ is also contained in $q'$.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85337927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1