分类路径计数查询的数据结构

Theor. Comput. Sci. Pub Date : 2022-10-01 DOI:10.4230/LIPIcs.CPM.2021.15

Meng He, Serikzhan Kazi

{"title":"分类路径计数查询的数据结构","authors":"Meng He, Serikzhan Kazi","doi":"10.4230/LIPIcs.CPM.2021.15","DOIUrl":null,"url":null,"abstract":"Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [σ] = {1, 2, . . . , σ}. We preprocess the tree T in order to support categorical path counting queries, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O( √ n lg lg σ lg w ), where w = Ω(lg n) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1 ≤ t ≤ n, we propose an O(n + n 2 t2 )-word, O(t lg lg σ lg w ) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O(lg n/ lg lg n) time. Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x, y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(n lg lg n + (n/t)4)-word data structure with O(t lg lg n) query time, or an O(n + (n/t)4)-word data structure with O(t lg n) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O(n3/4 lg n) query time. We then extend the approach to the trees weighted with vectors from [n], where d is a constant integer greater than or equal to 2. We present a data structure with O(n lgd−1+ε n + (n/t)2d+2) words of space and O(t lg d−1 n (lg lg n)d−2 ) query time. For an O(n · polylog n)-space solution, one thus has O(n 2d+1 2d+2 · polylog n) query time. The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on sketching. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1 ± ε)-approximation (i.e. within 1 ± ε of the true answer) of the number of distinct categories on the given path, with probability 1 − δ, where 0 < ε, δ < 1 are constants. The data structure occupies O(n + n t lg n) words of space, for the query time of O(t lg n). For trees weighted with d-dimensional weight vectors (d ≥ 1), we propose a data structure with O((n + n t lg n) lg n) words of space and O(t lgd+1 n) query time. All these problems generalize the corresponding categorical range counting problems in Euclidean space Rd+1, for respective d, by replacing one of the dimensions with a tree topology. 2012 ACM Subject Classification Theory of computation → Data structures design and analysis","PeriodicalId":23063,"journal":{"name":"Theor. Comput. Sci.","volume":"9 1","pages":"97-111"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Data Structures for Categorical Path Counting Queries\",\"authors\":\"Meng He, Serikzhan Kazi\",\"doi\":\"10.4230/LIPIcs.CPM.2021.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [σ] = {1, 2, . . . , σ}. We preprocess the tree T in order to support categorical path counting queries, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O( √ n lg lg σ lg w ), where w = Ω(lg n) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1 ≤ t ≤ n, we propose an O(n + n 2 t2 )-word, O(t lg lg σ lg w ) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O(lg n/ lg lg n) time. Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x, y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(n lg lg n + (n/t)4)-word data structure with O(t lg lg n) query time, or an O(n + (n/t)4)-word data structure with O(t lg n) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O(n3/4 lg n) query time. We then extend the approach to the trees weighted with vectors from [n], where d is a constant integer greater than or equal to 2. We present a data structure with O(n lgd−1+ε n + (n/t)2d+2) words of space and O(t lg d−1 n (lg lg n)d−2 ) query time. For an O(n · polylog n)-space solution, one thus has O(n 2d+1 2d+2 · polylog n) query time. The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on sketching. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1 ± ε)-approximation (i.e. within 1 ± ε of the true answer) of the number of distinct categories on the given path, with probability 1 − δ, where 0 < ε, δ < 1 are constants. The data structure occupies O(n + n t lg n) words of space, for the query time of O(t lg n). For trees weighted with d-dimensional weight vectors (d ≥ 1), we propose a data structure with O((n + n t lg n) lg n) words of space and O(t lgd+1 n) query time. All these problems generalize the corresponding categorical range counting problems in Euclidean space Rd+1, for respective d, by replacing one of the dimensions with a tree topology. 2012 ACM Subject Classification Theory of computation → Data structures design and analysis\",\"PeriodicalId\":23063,\"journal\":{\"name\":\"Theor. Comput. Sci.\",\"volume\":\"9 1\",\"pages\":\"97-111\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Theor. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.CPM.2021.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theor. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2021.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

考虑一棵有n个节点的有序树T，每个节点被分配一个类别，从字母表[σ] ={1,2，…,σ}。我们对树T进行预处理，以支持分类路径计数查询，该查询要求查询T中两个查询节点x和y之间的路径上出现的不同类别的数量。对于这个问题，我们提出了一个线性空间数据结构，查询时间为O(√n lg lg σ lg w)，其中w = Ω(lg n)是单词ram中的单词大小。正如我们的证明所示，假设矩阵乘法不能比三次乘法更快地求解(仅使用组合方法)，我们的结果是最优的，除了多对数加速。对于权衡参数1≤t≤n，我们提出了一个O(n + n2 t2)字，O(t lg lg σ lg w)查询时间的数据结构。我们还考虑c近似分类路径计数查询，它必须返回查询路径上出现的不同类别数量的近似值，方法是对每个类别至少计数一次，最多计数c次。我们描述了一个线性空间数据结构，它支持在O(lgn / lglgn)时间内进行2-近似分类路径计数查询。接下来，我们将分类路径计数查询推广到加权树。这里，查询指定了两个节点x、y和一个正交范围Q。这样形成的分类路径范围计数查询的答案是，如果只考虑权重在Q内的节点，则从x到y的路径上出现的不同类别的数量。我们提出了一个查询时间为O(t lg lg n)的O(n lg lg n + (n/t)4)字的数据结构，或者一个查询时间为O(t lg n)的O(n + (n/t)4)字的数据结构。对于权衡参数t的适当选择，这意味着查询时间为O(n3/ 4lgn)的线性空间数据结构。然后，我们将该方法扩展到由来自[n]的向量加权的树，其中d是大于或等于2的常数整数。我们提出了一个具有O(n lgd−1+ε n + (n/t)2d+2)字空间和O(t lgd−1)n (lg lgn)d−2)查询时间的数据结构。对于O(n·polylog n)空间解，则有O(n 2d+1 2d+2·polylog n)查询时间。我们证明的下限所揭示的固有困难促使我们考虑基于草图的数据结构。在非加权树中，我们提出了一种草图数据结构来解决近似分类路径计数问题，该问题要求给定路径上不同类别的数量(1±ε)近似(即在真实答案的1±ε范围内)，概率为1−δ，其中0 < ε， δ < 1是常数。该数据结构占用O(n + n t lgn)个字的空间，查询时间为O(t lgn)。对于d维权向量(d≥1)加权的树，我们提出了O((n + n t lgn) lgn个字的空间和O(t lgd+ 1n)个字的查询时间的数据结构。所有这些问题都推广了欧几里德空间Rd+1中相应的范畴范围计数问题，对于各自的d，通过用树形拓扑替换其中一个维度。2012 ACM学科分类:计算理论→数据结构设计与分析

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data Structures for Categorical Path Counting Queries

Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [σ] = {1, 2, . . . , σ}. We preprocess the tree T in order to support categorical path counting queries, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O( √ n lg lg σ lg w ), where w = Ω(lg n) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1 ≤ t ≤ n, we propose an O(n + n 2 t2 )-word, O(t lg lg σ lg w ) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O(lg n/ lg lg n) time. Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x, y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(n lg lg n + (n/t)4)-word data structure with O(t lg lg n) query time, or an O(n + (n/t)4)-word data structure with O(t lg n) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O(n3/4 lg n) query time. We then extend the approach to the trees weighted with vectors from [n], where d is a constant integer greater than or equal to 2. We present a data structure with O(n lgd−1+ε n + (n/t)2d+2) words of space and O(t lg d−1 n (lg lg n)d−2 ) query time. For an O(n · polylog n)-space solution, one thus has O(n 2d+1 2d+2 · polylog n) query time. The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on sketching. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1 ± ε)-approximation (i.e. within 1 ± ε of the true answer) of the number of distinct categories on the given path, with probability 1 − δ, where 0 < ε, δ < 1 are constants. The data structure occupies O(n + n t lg n) words of space, for the query time of O(t lg n). For trees weighted with d-dimensional weight vectors (d ≥ 1), we propose a data structure with O((n + n t lg n) lg n) words of space and O(t lgd+1 n) query time. All these problems generalize the corresponding categorical range counting problems in Euclidean space Rd+1, for respective d, by replacing one of the dimensions with a tree topology. 2012 ACM Subject Classification Theory of computation → Data structures design and analysis

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Theor. Comput. Sci.

自引率

0.00%

发文量

期刊最新文献

On the Parameterized Complexity of s-club Cluster Deletion Problems Spiking neural P systems with weights and delays on synapses Iterated Uniform Finite-State Transducers on Unary Languages Lazy Regular Sensing State Complexity of Finite Partial Languages