Pub Date : 2023-03-11DOI: 10.48550/arXiv.2303.06288
Sepehr Assadi, Nirmit Joshi, M. Prabhu, Vihan Shah
Estimating quantiles, like the median or percentiles, is a fundamental task in data mining and data science. A (streaming) quantile summary is a data structure that can process a set S of n elements in a streaming fashion and at the end, for any phi in (0,1], return a phi-quantile of S up to an eps error, i.e., return a phi'-quantile with phi'=phi +- eps. We are particularly interested in comparison-based summaries that only compare elements of the universe under a total ordering and are otherwise completely oblivious of the universe. The best known deterministic quantile summary is the 20-year old Greenwald-Khanna (GK) summary that uses O((1/eps) log(eps n)) space [SIGMOD'01]. This bound was recently proved to be optimal for all deterministic comparison-based summaries by Cormode and Vesle'y [PODS'20]. In this paper, we study weighted quantiles, a generalization of the quantiles problem, where each element arrives with a positive integer weight which denotes the number of copies of that element being inserted. The only known method of handling weighted inputs via GK summaries is the naive approach of breaking each weighted element into multiple unweighted items and feeding them one by one to the summary, which results in a prohibitively large update time (proportional to the maximum weight of input elements). We give the first non-trivial extension of GK summaries for weighted inputs and show that it takes O((1/eps) log(eps n)) space and O(log(1/eps)+ log log(eps n)) update time per element to process a stream of length n (under some quite mild assumptions on the range of weights and eps). En route to this, we also simplify the original GK summaries for unweighted quantiles.
{"title":"Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs","authors":"Sepehr Assadi, Nirmit Joshi, M. Prabhu, Vihan Shah","doi":"10.48550/arXiv.2303.06288","DOIUrl":"https://doi.org/10.48550/arXiv.2303.06288","url":null,"abstract":"Estimating quantiles, like the median or percentiles, is a fundamental task in data mining and data science. A (streaming) quantile summary is a data structure that can process a set S of n elements in a streaming fashion and at the end, for any phi in (0,1], return a phi-quantile of S up to an eps error, i.e., return a phi'-quantile with phi'=phi +- eps. We are particularly interested in comparison-based summaries that only compare elements of the universe under a total ordering and are otherwise completely oblivious of the universe. The best known deterministic quantile summary is the 20-year old Greenwald-Khanna (GK) summary that uses O((1/eps) log(eps n)) space [SIGMOD'01]. This bound was recently proved to be optimal for all deterministic comparison-based summaries by Cormode and Vesle'y [PODS'20]. In this paper, we study weighted quantiles, a generalization of the quantiles problem, where each element arrives with a positive integer weight which denotes the number of copies of that element being inserted. The only known method of handling weighted inputs via GK summaries is the naive approach of breaking each weighted element into multiple unweighted items and feeding them one by one to the summary, which results in a prohibitively large update time (proportional to the maximum weight of input elements). We give the first non-trivial extension of GK summaries for weighted inputs and show that it takes O((1/eps) log(eps n)) space and O(log(1/eps)+ log log(eps n)) update time per element to process a stream of length n (under some quite mild assumptions on the range of weights and eps). En route to this, we also simplify the original GK summaries for unweighted quantiles.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"37 1","pages":"19:1-19:19"},"PeriodicalIF":0.0,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87002261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-20DOI: 10.48550/arXiv.2301.08482
Diego Figueira, A. Padmanabha, L. Segoufin, C. Sirangelo
We consider the dichotomy conjecture for consistent query answering under primary key constraints stating that for every fixed Boolean conjunctive query q, testing whether it is certain over all repairs of a given inconsistent database is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set $Delta$ of subsets of database repairs with at most $k$ facts, where $k$ is the size of the query $q$. The algorithm runs in polynomial time and can be formally defined as: 1. Initialize $Delta$ with all sets $S$ of at most $k$ facts such that $S$ satisfies $q$. 2. Add any set $S$ of at most $k$ facts to $Delta$ if there exists a block $B$ (ie, a maximal set of facts sharing the same key) such that for every fact $a$ of $B$ there is a set $S' in Delta$ contained in $(S cup {a})$. The algorithm answers"$q$ is certain"iff $Delta$ eventually contains the empty set. The algorithm correctly computes certain answers when the query $q$ falls in the polynomial time cases for self-join-free queries and path queries. For arbitrary queries, the algorithm is an under-approximation: The query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.
我们考虑在主键约束下一致性查询回答的二分猜想,说明对于每个固定的布尔合查询q,测试给定的不一致数据库的所有修复是否确定是多项式时间或conp完全的。这个猜想已经在自连接无查询和路径查询中得到验证。我们提出了一种简单的膨胀不定点算法,用于一致查询回答,对于给定的数据库,它天真地计算最多具有$k$个事实的数据库修复子集的集合$Delta$,其中$k$是查询的大小$q$。该算法运行时间为多项式,可以正式定义为:1。用最多包含$k$个事实的所有集合$S$初始化$Delta$,使$S$满足$q$。2. 如果存在一个块$B$(即,共享相同键的最大事实集),则将最多包含$k$个事实的任何集合$S$添加到$Delta$,使得对于$B$的每个事实$a$, $(S cup {a})$中包含一个集合$S' in Delta$。如果$Delta$最终包含空集,算法会回答“$q$是确定的”。当查询$q$属于无自连接查询和路径查询的多项式时间情况时,算法正确地计算某些答案。对于任意查询,算法是一个欠近似值:如果算法声明是确定的,则保证查询是确定的。然而,存在多项式时间的某些查询(带有自连接),这些查询不会被算法识别出来。
{"title":"A Simple Algorithm for Consistent Query Answering under Primary Keys","authors":"Diego Figueira, A. Padmanabha, L. Segoufin, C. Sirangelo","doi":"10.48550/arXiv.2301.08482","DOIUrl":"https://doi.org/10.48550/arXiv.2301.08482","url":null,"abstract":"We consider the dichotomy conjecture for consistent query answering under primary key constraints stating that for every fixed Boolean conjunctive query q, testing whether it is certain over all repairs of a given inconsistent database is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set $Delta$ of subsets of database repairs with at most $k$ facts, where $k$ is the size of the query $q$. The algorithm runs in polynomial time and can be formally defined as: 1. Initialize $Delta$ with all sets $S$ of at most $k$ facts such that $S$ satisfies $q$. 2. Add any set $S$ of at most $k$ facts to $Delta$ if there exists a block $B$ (ie, a maximal set of facts sharing the same key) such that for every fact $a$ of $B$ there is a set $S' in Delta$ contained in $(S cup {a})$. The algorithm answers\"$q$ is certain\"iff $Delta$ eventually contains the empty set. The algorithm correctly computes certain answers when the query $q$ falls in the polynomial time cases for self-join-free queries and path queries. For arbitrary queries, the algorithm is an under-approximation: The query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"112 1","pages":"24:1-24:18"},"PeriodicalIF":0.0,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79633726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.ICDT.2023.3
C. Seshadhri, Floris Geerts, Brecht Vandevoort
Subgraph counting is a fundamental problem that spans many areas in computer science: database theory, logic, network science, data mining, and complexity theory. Given a large input graph G and a small pattern graph H , we wish to count the number of occurrences of H in G . In recent times, there has been a resurgence on using an old (maybe overlooked?) technique of orienting the edges of G and H , and then using a combination of brute-force enumeration and indexing. These orientation techniques appear to give the best of both worlds. There is a rigorous theoretical explanation behind these techniques, and they also have excellent empirical behavior (on large real-world graphs). Time and again, graph orientations help solve subgraph counting problems in various computational models, be it sampling, streaming, distributed, etc. In this paper, we give some short vignettes on how the orientation technique solves a variety of algorithmic problems.
{"title":"Some Vignettes on Subgraph Counting Using Graph Orientations (Invited Talk)","authors":"C. Seshadhri, Floris Geerts, Brecht Vandevoort","doi":"10.4230/LIPIcs.ICDT.2023.3","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.3","url":null,"abstract":"Subgraph counting is a fundamental problem that spans many areas in computer science: database theory, logic, network science, data mining, and complexity theory. Given a large input graph G and a small pattern graph H , we wish to count the number of occurrences of H in G . In recent times, there has been a resurgence on using an old (maybe overlooked?) technique of orienting the edges of G and H , and then using a combination of brute-force enumeration and indexing. These orientation techniques appear to give the best of both worlds. There is a rigorous theoretical explanation behind these techniques, and they also have excellent empirical behavior (on large real-world graphs). Time and again, graph orientations help solve subgraph counting problems in various computational models, be it sampling, streaming, distributed, etc. In this paper, we give some short vignettes on how the orientation technique solves a variety of algorithmic problems.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"77 1","pages":"3:1-3:10"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90606643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.ICDT.2023.1
Nadime Francis, Amélie Gheerbrant, P. Guagliardo, L. Libkin, Victor Marsault, W. Martens, Filip Murlak, L. Peterfreund, Alexandra Rogova, D. Vrgoc
{"title":"A Researcher's Digest of GQL (Invited Talk)","authors":"Nadime Francis, Amélie Gheerbrant, P. Guagliardo, L. Libkin, Victor Marsault, W. Martens, Filip Murlak, L. Peterfreund, Alexandra Rogova, D. Vrgoc","doi":"10.4230/LIPIcs.ICDT.2023.1","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.1","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"6 1","pages":"1:1-1:22"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89748128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.ICDT.2023.4
Shiyuan Deng, Francesco Silvestri, Yufei Tao
We present an indivisible I/O-efficient algorithm for subgraph enumeration , where the objective is to list all the subgraphs of a massive graph G := ( V, E ) that are isomorphic to a pattern graph Q having k = O (1) vertices. Our algorithm performs O ( | E | k/ 2 M k/ 2 − 1 B log M/B | E | B + | E | ρ M ρ − 1 B ) I/Os with high probability, where ρ is the fractional edge covering number of Q (it always holds ρ ≥ k/ 2, regardless of Q ), M is the number of words in (internal) memory, and B is the number of words in a disk block. Our solution is optimal in the class of indivisible algorithms for all pattern graphs with ρ > k/ 2. When ρ = k/ 2, our algorithm is still optimal as long as M/B ≥ ( | E | /B ) ϵ for any constant ϵ > 0. 2012 ACM Subject Classification Theory of computation → Graph algorithms analysis; Information systems → Join algorithms
本文提出了一种子图枚举的不可分I/O效率算法,其目标是列出与具有k = O(1)个顶点的模式图Q同构的海量图G:= (V, E)的所有子图。我们的算法以高概率执行O (| E | k/ 2 M k/ 2−1 B log M/B | E | B + | E | ρ M ρ−1 B) I/O,其中ρ是覆盖Q的分数边数(无论Q如何,它总是保持ρ≥k/ 2), M是(内部)内存中的字数,B是磁盘块中的字数。对于ρ > k/ 2的所有模式图,我们的解在不可分算法中是最优的。当ρ = k/ 2时,对于任意常数ε > 0,只要M/B≥(| E | /B) ε,我们的算法仍然是最优的。2012 ACM学科分类计算理论→图算法分析;信息系统→联接算法
{"title":"Enumerating Subgraphs of Constant Sizes in External Memory","authors":"Shiyuan Deng, Francesco Silvestri, Yufei Tao","doi":"10.4230/LIPIcs.ICDT.2023.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.4","url":null,"abstract":"We present an indivisible I/O-efficient algorithm for subgraph enumeration , where the objective is to list all the subgraphs of a massive graph G := ( V, E ) that are isomorphic to a pattern graph Q having k = O (1) vertices. Our algorithm performs O ( | E | k/ 2 M k/ 2 − 1 B log M/B | E | B + | E | ρ M ρ − 1 B ) I/Os with high probability, where ρ is the fractional edge covering number of Q (it always holds ρ ≥ k/ 2, regardless of Q ), M is the number of words in (internal) memory, and B is the number of words in a disk block. Our solution is optimal in the class of indivisible algorithms for all pattern graphs with ρ > k/ 2. When ρ = k/ 2, our algorithm is still optimal as long as M/B ≥ ( | E | /B ) ϵ for any constant ϵ > 0. 2012 ACM Subject Classification Theory of computation → Graph algorithms analysis; Information systems → Join algorithms","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"24 1","pages":"4:1-4:20"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80111240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.ICDT.2023.13
Tamara Cucumides, Juan L. Reutter, D. Vrgoc
{"title":"Size Bounds and Algorithms for Conjunctive Regular Path Queries","authors":"Tamara Cucumides, Juan L. Reutter, D. Vrgoc","doi":"10.4230/LIPIcs.ICDT.2023.13","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.13","url":null,"abstract":"","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"47 1","pages":"13:1-13:17"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74027432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.ICDT.2023.5
Pavel Raykov
Assume there is a data stream of elements and a window of size m . Sliding window algorithms compute various statistic functions over the last m elements of the data stream seen so far. The time complexity of a sliding window algorithm is measured as the time required to output an updated statistic function value every time a new element is read. For example, it is well known that computing the sliding window maximum/minimum has time complexity O (1) while computing the sliding window median has time complexity O (log m ). In this paper we close the gap between these two cases by (1) presenting an algorithm for computing the sliding window k -th smallest element in O (log k ) time and (2) prove that this time complexity is optimal.
{"title":"An Optimal Algorithm for Sliding Window Order Statistics","authors":"Pavel Raykov","doi":"10.4230/LIPIcs.ICDT.2023.5","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.5","url":null,"abstract":"Assume there is a data stream of elements and a window of size m . Sliding window algorithms compute various statistic functions over the last m elements of the data stream seen so far. The time complexity of a sliding window algorithm is measured as the time required to output an updated statistic function value every time a new element is read. For example, it is well known that computing the sliding window maximum/minimum has time complexity O (1) while computing the sliding window median has time complexity O (log m ). In this paper we close the gap between these two cases by (1) presenting an algorithm for computing the sliding window k -th smallest element in O (log k ) time and (2) prove that this time complexity is optimal.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"16 1","pages":"5:1-5:13"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85997470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-23DOI: 10.48550/arXiv.2212.12104
Amir Gilad, Aviram Imber, B. Kimelfeld
A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies.
{"title":"The Consistency of Probabilistic Databases with Independent Cells","authors":"Amir Gilad, Aviram Imber, B. Kimelfeld","doi":"10.48550/arXiv.2212.12104","DOIUrl":"https://doi.org/10.48550/arXiv.2212.12104","url":null,"abstract":"A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"12 1","pages":"22:1-22:19"},"PeriodicalIF":0.0,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87755463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-03DOI: 10.48550/arXiv.2212.01679
Diego Figueira, Rémi Morvan
We show that the problem of whether a query is equivalent to a query of tree-width $k$ is decidable, for the class of Unions of Conjunctive Regular Path Queries with two-way navigation (UC2RPQs). A previous result by Barcel'o, Romero, and Vardi has shown decidability for the case $k=1$, and here we show that decidability in fact holds for any arbitrary $k>1$. The algorithm is in 2ExpSpace, but for the restricted but practically relevant case where all regular expressions of the query are of the form $a^*$ or $(a_1 + dotsb + a_n)$ we show that the complexity of the problem drops to $Pi_2^p$. We also investigate the related problem of approximating a UC2RPQ by queries of small tree-width. We exhibit an algorithm which, for any fixed number $k$, builds the maximal under-approximation of tree-width $k$ of a UC2RPQ. The maximal under-approximation of tree-width $k$ of a query $q$ is a query $q'$ of tree-width $k$ which is contained in $q$ in a maximal and unique way, that is, such that for every query $q''$ of tree-width $k$, if $q''$ is contained in $q$ then $q''$ is also contained in $q'$.
{"title":"Approximation and Semantic Tree-width of Conjunctive Regular Path Queries","authors":"Diego Figueira, Rémi Morvan","doi":"10.48550/arXiv.2212.01679","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01679","url":null,"abstract":"We show that the problem of whether a query is equivalent to a query of tree-width $k$ is decidable, for the class of Unions of Conjunctive Regular Path Queries with two-way navigation (UC2RPQs). A previous result by Barcel'o, Romero, and Vardi has shown decidability for the case $k=1$, and here we show that decidability in fact holds for any arbitrary $k>1$. The algorithm is in 2ExpSpace, but for the restricted but practically relevant case where all regular expressions of the query are of the form $a^*$ or $(a_1 + dotsb + a_n)$ we show that the complexity of the problem drops to $Pi_2^p$. We also investigate the related problem of approximating a UC2RPQ by queries of small tree-width. We exhibit an algorithm which, for any fixed number $k$, builds the maximal under-approximation of tree-width $k$ of a UC2RPQ. The maximal under-approximation of tree-width $k$ of a query $q$ is a query $q'$ of tree-width $k$ which is contained in $q$ in a maximal and unique way, that is, such that for every query $q''$ of tree-width $k$, if $q''$ is contained in $q$ then $q''$ is also contained in $q'$.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"848 1","pages":"15:1-15:19"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85337927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}