Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.SEA.2023.4
Jarno N. Alanko, E. Biagi, S. Puglisi, Jaakko Vuohtoniemi
Given an alphabet Σ of σ = | Σ | symbols, a degenerate (or indeterminate) string X is a sequence X = X [0] , X [1] . . . , X [ n − 1] of n subsets of Σ. Since their introduction in the mid 70s, degenerate strings have been widely studied, with applications driven by their being a natural model for sequences in which there is a degree of uncertainty about the precise symbol at a given position, such as those arising in genomics and proteomics. In this paper we introduce a new data structural tool for degenerate strings, called the subset wavelet tree (SubsetWT). A SubsetWT supports two basic operations on degenerate strings: subset-rank( i, c ), which returns the number of subsets up to the i -th subset in the degenerate string that contain the symbol c ; and subset-select( i, c ), which returns the index in the degenerate string of the i -th subset that contains symbol c . These queries are analogs of rank and select queries that have been widely studied for ordinary strings. Via experiments in a real genomics application in which degenerate strings are fundamental, we show that subset wavelet trees are practical data structures, and in particular offer an attractive space-time tradeoff. Along the way we investigate data structures for supporting (normal) rank queries on base-4 and base-3 sequences, which may be of independent interest. Our C++ implementations of the data structures are available at https://github.com/jnalanko/SubsetWT .
{"title":"Subset Wavelet Trees","authors":"Jarno N. Alanko, E. Biagi, S. Puglisi, Jaakko Vuohtoniemi","doi":"10.4230/LIPIcs.SEA.2023.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.4","url":null,"abstract":"Given an alphabet Σ of σ = | Σ | symbols, a degenerate (or indeterminate) string X is a sequence X = X [0] , X [1] . . . , X [ n − 1] of n subsets of Σ. Since their introduction in the mid 70s, degenerate strings have been widely studied, with applications driven by their being a natural model for sequences in which there is a degree of uncertainty about the precise symbol at a given position, such as those arising in genomics and proteomics. In this paper we introduce a new data structural tool for degenerate strings, called the subset wavelet tree (SubsetWT). A SubsetWT supports two basic operations on degenerate strings: subset-rank( i, c ), which returns the number of subsets up to the i -th subset in the degenerate string that contain the symbol c ; and subset-select( i, c ), which returns the index in the degenerate string of the i -th subset that contains symbol c . These queries are analogs of rank and select queries that have been widely studied for ordinary strings. Via experiments in a real genomics application in which degenerate strings are fundamental, we show that subset wavelet trees are practical data structures, and in particular offer an attractive space-time tradeoff. Along the way we investigate data structures for supporting (normal) rank queries on base-4 and base-3 sequences, which may be of independent interest. Our C++ implementations of the data structures are available at https://github.com/jnalanko/SubsetWT .","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"23 1","pages":"4:1-4:14"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86347335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.SEA.2023.7
Diego Díaz-Domínguez, Saska Dönges, S. Puglisi, Leena Salmela
Given a string X of length n on alphabet σ , the FM-index data structure allows counting all occurrences of a pattern P of length m in O ( m ) time via an algorithm called backward search . An important difficulty when searching with an FM-index is to support queries on L , the Burrows-Wheeler transform of X , while L is in compressed form. This problem has been the subject of intense research for 25 years now. Run-length encoding of L is an effective way to reduce index size, in particular when the data being indexed is highly-repetitive, which is the case in many types of modern data, including those arising from versioned document collections and in pangenomics. This paper takes a back-to-basics look at supporting backward search in FM-indexes, exploring and engineering two simple designs. The first divides the BWT string into blocks containing b symbols each and then run-length compresses each block separately, possibly introducing new runs (compared to applying run-length encoding once, to the whole string). Each block stores counts of each symbol that occurs before the block. This method supports the operation rank c ( L, i ) (i.e., count the number of times c occurs in the prefix L [1 ..i ]) by first determining the block i/b in which i falls and scanning the block to the appropriate position counting occurrences of c along the way. This partial answer to rank c ( L, i ) is then added to the stored count of c symbols before the block to determine the final answer. Our second design has a similar structure, but instead divides the run-length-encoded version of L into blocks containing an equal number of runs. The trick then is to determine the block in which a query falls, which is achieved via a predecessor query over the block starting positions. We show via extensive experiments on a wide range of repetitive text collections that these FM-indexes are not only easy to implement, but also fast and space efficient in practice.
{"title":"Simple Runs-Bounded FM-Index Designs Are Fast","authors":"Diego Díaz-Domínguez, Saska Dönges, S. Puglisi, Leena Salmela","doi":"10.4230/LIPIcs.SEA.2023.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.7","url":null,"abstract":"Given a string X of length n on alphabet σ , the FM-index data structure allows counting all occurrences of a pattern P of length m in O ( m ) time via an algorithm called backward search . An important difficulty when searching with an FM-index is to support queries on L , the Burrows-Wheeler transform of X , while L is in compressed form. This problem has been the subject of intense research for 25 years now. Run-length encoding of L is an effective way to reduce index size, in particular when the data being indexed is highly-repetitive, which is the case in many types of modern data, including those arising from versioned document collections and in pangenomics. This paper takes a back-to-basics look at supporting backward search in FM-indexes, exploring and engineering two simple designs. The first divides the BWT string into blocks containing b symbols each and then run-length compresses each block separately, possibly introducing new runs (compared to applying run-length encoding once, to the whole string). Each block stores counts of each symbol that occurs before the block. This method supports the operation rank c ( L, i ) (i.e., count the number of times c occurs in the prefix L [1 ..i ]) by first determining the block i/b in which i falls and scanning the block to the appropriate position counting occurrences of c along the way. This partial answer to rank c ( L, i ) is then added to the stored count of c symbols before the block to determine the final answer. Our second design has a similar structure, but instead divides the run-length-encoded version of L into blocks containing an equal number of runs. The trick then is to determine the block in which a query falls, which is achieved via a predecessor query over the block starting positions. We show via extensive experiments on a wide range of repetitive text collections that these FM-indexes are not only easy to implement, but also fast and space efficient in practice.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"1 1","pages":"7:1-7:16"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88177891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.SEA.2023.12
M. Fischer, A. Gupte
We present multilinear and mixed-integer multilinear programs to find a Nash equilibrium in multi-player noncooperative games. We compare the formulations to common algorithms in Gambit, and conclude that a multilinear feasibility program finds a Nash equilibrium faster than any of the methods we compare it to, including the quantal response equilibrium method, which is recommended for large games. Hence, the multilinear feasibility program is an alternative method to find a Nash equilibrium in multi-player games, and outperforms many common algorithms. The mixed-integer formulations are generalisations of known mixed-integer programs for two-player games, however unlike two-player games, these mixed-integer programs do not give better performance than existing algorithms.
{"title":"Multilinear Formulations for Computing a Nash Equilibrium of Multi-Player Games","authors":"M. Fischer, A. Gupte","doi":"10.4230/LIPIcs.SEA.2023.12","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.12","url":null,"abstract":"We present multilinear and mixed-integer multilinear programs to find a Nash equilibrium in multi-player noncooperative games. We compare the formulations to common algorithms in Gambit, and conclude that a multilinear feasibility program finds a Nash equilibrium faster than any of the methods we compare it to, including the quantal response equilibrium method, which is recommended for large games. Hence, the multilinear feasibility program is an alternative method to find a Nash equilibrium in multi-player games, and outperforms many common algorithms. The mixed-integer formulations are generalisations of known mixed-integer programs for two-player games, however unlike two-player games, these mixed-integer programs do not give better performance than existing algorithms.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"189 1","pages":"12:1-12:14"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79465444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.SEA.2023.9
Satya Tamby, D. Vanderpooten
Optimizing over the efficient set of a discrete multi-objective problem is a challenging issue. The main reason is that, unlike when optimizing over the feasible set, the efficient set is implicitly characterized. Therefore, methods designed for this purpose iteratively generate efficient solutions by solving appropriate single-objective problems. However, the number of efficient solutions can be quite large and the problems to be solved can be difficult practically. Thus, the challenge is both to minimize the number of iterations and to reduce the difficulty of the problems to be solved at each iteration. In this paper, a new enumeration scheme is proposed. By introducing some constraints and optimizing over projections of the search region, potentially large parts of the search space can be discarded, drastically reducing the number of iterations. Moreover, the single-objective programs to be solved can be guaranteed to be feasible, and a starting solution can be provided allowing warm start resolutions. This results in a fast algorithm that is simple to implement. Experimental computations on two standard multi-objective instance families show that our approach seems to perform significantly faster than the state of the art algorithm.
{"title":"Optimizing over the Efficient Set of a Multi-Objective Discrete Optimization Problem","authors":"Satya Tamby, D. Vanderpooten","doi":"10.4230/LIPIcs.SEA.2023.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.9","url":null,"abstract":"Optimizing over the efficient set of a discrete multi-objective problem is a challenging issue. The main reason is that, unlike when optimizing over the feasible set, the efficient set is implicitly characterized. Therefore, methods designed for this purpose iteratively generate efficient solutions by solving appropriate single-objective problems. However, the number of efficient solutions can be quite large and the problems to be solved can be difficult practically. Thus, the challenge is both to minimize the number of iterations and to reduce the difficulty of the problems to be solved at each iteration. In this paper, a new enumeration scheme is proposed. By introducing some constraints and optimizing over projections of the search region, potentially large parts of the search space can be discarded, drastically reducing the number of iterations. Moreover, the single-objective programs to be solved can be guaranteed to be feasible, and a starting solution can be provided allowing warm start resolutions. This results in a fast algorithm that is simple to implement. Experimental computations on two standard multi-objective instance families show that our approach seems to perform significantly faster than the state of the art algorithm.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"26 1","pages":"9:1-9:13"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74671234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.SEA.2023.13
Ömer Burak Onar, T. Ekim, Z. Taşkın
This paper considers the Maximum Selective Tree Problem (MSelTP) as a generalization of the Maximum Induced Tree problem. Given an undirected graph with a partition of its vertex set into clusters, MSelTP aims to choose the maximum number of vertices such that at most one vertex per cluster is selected and the graph induced by the selected vertices is a tree. To the best of our knowledge, MSelTP has not been studied before although several related optimization problems have been investigated in the literature. We propose two mixed integer programming formulations for MSelTP; one based on connectivity constraints, the other based on cycle elimination constraints. In addition, we develop two exact cutting plane procedures to solve the problem to optimality. On graphs with up to 25 clusters, up to 250 vertices, and varying densities, we conduct computational experiments to compare the results of two solution procedures with solving a compact integer programming formulation of MSelTP. Our experiments indicate that the algorithm CPAXnY outperforms the other procedures overall except for graphs with low density and large cluster size, and that the algorithm CPAX yields better results in terms of the average time of instances optimally solved and the overall average time.
{"title":"Integer Programming Formulations and Cutting Plane Algorithms for the Maximum Selective Tree Problem","authors":"Ömer Burak Onar, T. Ekim, Z. Taşkın","doi":"10.4230/LIPIcs.SEA.2023.13","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.13","url":null,"abstract":"This paper considers the Maximum Selective Tree Problem (MSelTP) as a generalization of the Maximum Induced Tree problem. Given an undirected graph with a partition of its vertex set into clusters, MSelTP aims to choose the maximum number of vertices such that at most one vertex per cluster is selected and the graph induced by the selected vertices is a tree. To the best of our knowledge, MSelTP has not been studied before although several related optimization problems have been investigated in the literature. We propose two mixed integer programming formulations for MSelTP; one based on connectivity constraints, the other based on cycle elimination constraints. In addition, we develop two exact cutting plane procedures to solve the problem to optimality. On graphs with up to 25 clusters, up to 250 vertices, and varying densities, we conduct computational experiments to compare the results of two solution procedures with solving a compact integer programming formulation of MSelTP. Our experiments indicate that the algorithm CPAXnY outperforms the other procedures overall except for graphs with low density and large cluster size, and that the algorithm CPAX yields better results in terms of the average time of instances optimally solved and the overall average time.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"56 1","pages":"13:1-13:18"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88555069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.SEA.2023.19
Meng He, Zhen Liu
We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.
我们对距离模式问题进行了实验研究。在这个问题的确切版本中,我们预处理一个数组A,这样给定一个查询范围[A, b],可以有效地找到A [A, b]中最频繁的元素。对于这个问题,我们最重要的发现是,使用简洁的数据结构来编码更多预先计算的信息的策略不仅有助于Chan等人(数组中范围模式查询的线性空间数据结构,Theory of Computing Systems, 2013)在理论上改善了以前的结果,而且还帮助我们在实践中实现了最佳的时间/空间权衡;我们甚至更进一步,用简洁的数据结构替换他们解决方案中的更多组件,并进一步提高性能。在这个问题的近似版本中,a (1 + ε)-近似范围模式查询查找在a [a,b]中出现的元素至少是F a,b / (1 + ε),其中F a,b是a [a,b]中模式的频率。我们实现了该问题之前的所有解,发现即使当ε = 1 2时,这些解的平均近似比在实践中也接近于1,并且它们提供的查询时间比最佳精确解快得多。这些解决方案实现了不同的有用的时空权衡,其中El-Zein等人(On Approximate Range Mode and Range Selection,第30届国际算法与计算研讨会,2019)为我们提供了一个空间利用率仅为35的解决方案。6%到93。存储32位整数输入数组的成本的8%(在大多数情况下,空间成本更接近下限,在所有数据集中,平均空间成本为每个符号20.2位)。它的非简洁版本也很突出,它的查询支持速度至少比其他0 (nε)字结构快几倍,而在实践中只使用了稍微多一点的空间。
{"title":"Exact and Approximate Range Mode Query Data Structures in Practice","authors":"Meng He, Zhen Liu","doi":"10.4230/LIPIcs.SEA.2023.19","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.19","url":null,"abstract":"We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"165 1","pages":"19:1-19:22"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74320651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.SEA.2023.10
Sebastian Angrick, Ben Bals, Katrin Casel, S. Cohen, T. Friedrich, Niko Hastrich, Theresa Hradilak, Davis Issac, Otto Kißig, Jonas Schmidt, Leo Wendt
In the Directed Feedback Vertex Set (DFVS) problem, one is given a directed graph G = ( V, E ) and wants to find a minimum cardinality set S ⊆ V such that G − S is acyclic. DFVS is a fundamental problem in computer science and finds applications in areas such as deadlock detection. The problem was the subject of the 2022 PACE coding challenge. We develop a novel exact algorithm for the problem that is tailored to perform well on instances that are mostly bi-directed. For such instances, we adapt techniques from the well-researched vertex cover problem. Our core idea is an iterative reduction to vertex cover. To this end, we also develop a new reduction rule that reduces the number of not bi-directed edges. With the resulting algorithm, we were able to win third place in the exact track of the PACE challenge. We perform computational experiments and compare the running time to other exact algorithms, in particular to the winning algorithm in PACE. Our experiments show that we outpace the other algorithms on instances that have a low density of uni-directed edges.
{"title":"Solving Directed Feedback Vertex Set by Iterative Reduction to Vertex Cover","authors":"Sebastian Angrick, Ben Bals, Katrin Casel, S. Cohen, T. Friedrich, Niko Hastrich, Theresa Hradilak, Davis Issac, Otto Kißig, Jonas Schmidt, Leo Wendt","doi":"10.4230/LIPIcs.SEA.2023.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.10","url":null,"abstract":"In the Directed Feedback Vertex Set (DFVS) problem, one is given a directed graph G = ( V, E ) and wants to find a minimum cardinality set S ⊆ V such that G − S is acyclic. DFVS is a fundamental problem in computer science and finds applications in areas such as deadlock detection. The problem was the subject of the 2022 PACE coding challenge. We develop a novel exact algorithm for the problem that is tailored to perform well on instances that are mostly bi-directed. For such instances, we adapt techniques from the well-researched vertex cover problem. Our core idea is an iterative reduction to vertex cover. To this end, we also develop a new reduction rule that reduces the number of not bi-directed edges. With the resulting algorithm, we were able to win third place in the exact track of the PACE challenge. We perform computational experiments and compare the running time to other exact algorithms, in particular to the winning algorithm in PACE. Our experiments show that we outpace the other algorithms on instances that have a low density of uni-directed edges.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"266 1","pages":"10:1-10:14"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89062088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.4230/LIPIcs.SEA.2023.2
G. Kritikakis, I. Tollis
We present a fast and practical algorithm to compute the transitive closure (TC) of a directed graph. It is based on computing a reachability indexing scheme of a directed acyclic graph (DAG), G = ( V, E ). Given any path/chain decomposition of G we show how to compute in parameterized linear time such a reachability scheme that can answer reachability queries in constant time. The experimental results reveal that our method is significantly faster in practice than the theoretical bounds imply, indicating that path/chain decomposition algorithms can be applied to obtain fast and practical solutions to the transitive closure (TC) problem. Furthermore, we show that the number of non-transitive edges of a DAG G is ≤ width ∗ | V | and that we can find a substantially large subset of the transitive edges of G in linear time using a path/chain decomposition. Our extensive experimental results show the interplay between these concepts in various models of DAGs. 2012 ACM Subject Classification Theory of computation → Theory and algorithms for application domains; Theory of computation → Design and analysis of algorithms
给出了一种计算有向图传递闭包的快速实用算法。它基于计算有向无环图(DAG)的可达性索引方案,G = (V, E)。给定G的任意路径/链分解,我们展示了如何在参数化线性时间内计算这样一个可达性方案,该方案可以在常数时间内回答可达性查询。实验结果表明,我们的方法在实践中比理论界限所暗示的要快得多,这表明路径/链分解算法可以应用于传递闭包(TC)问题的快速实用解。进一步,我们证明了DAG G的非传递边的个数≤width * | V |,并且我们可以使用路径/链分解在线性时间内找到G的一个相当大的传递边子集。我们广泛的实验结果显示了这些概念在各种dag模型中的相互作用。2012 ACM学科分类:计算理论→应用领域的理论与算法;计算理论→算法设计与分析
{"title":"Fast Reachability Using DAG Decomposition","authors":"G. Kritikakis, I. Tollis","doi":"10.4230/LIPIcs.SEA.2023.2","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.2","url":null,"abstract":"We present a fast and practical algorithm to compute the transitive closure (TC) of a directed graph. It is based on computing a reachability indexing scheme of a directed acyclic graph (DAG), G = ( V, E ). Given any path/chain decomposition of G we show how to compute in parameterized linear time such a reachability scheme that can answer reachability queries in constant time. The experimental results reveal that our method is significantly faster in practice than the theoretical bounds imply, indicating that path/chain decomposition algorithms can be applied to obtain fast and practical solutions to the transitive closure (TC) problem. Furthermore, we show that the number of non-transitive edges of a DAG G is ≤ width ∗ | V | and that we can find a substantially large subset of the transitive edges of G in linear time using a path/chain decomposition. Our extensive experimental results show the interplay between these concepts in various models of DAGs. 2012 ACM Subject Classification Theory of computation → Theory and algorithms for application domains; Theory of computation → Design and analysis of algorithms","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"182 1","pages":"2:1-2:17"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80684702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-10DOI: 10.4230/LIPIcs.SEA.2021.10
Seungbum Jo, Wooyoung Park, S. Rao
We design a practical variant of an encoding for range Top-2 query (RT2Q) and evaluate its performance. Given an array $A[1,n]$ of $n$ elements from a total order, the range Top-2 encoding problem is to construct a data structure that answers ${textsf{RT2Q}}{}$, which returns the positions of the first and second largest elements within a given range of $A$, without accessing the array $A$ at query time. We design the following two implementations: (i) an implementation based on an alternative representation of Davoodi et al.’s [Phil. Trans. Royal Soc. A, 2016] data structure, which supports queries efficiently. Experimental results show that our implementation is efficient in practice and gives improved time-space trade-offs compared with the indexing data structures (which keep the original array $A$ as part of the data structure) for range maximum queries. (ii) Another implementation based on Jo et al.’s ${textsf{RT2Q}}{}$ encoding on $2 times n$ array [CPM, 2016], which can be constructed in $O(n)$ time. We compare our encoding with Gawrychowski and Nicholson’s optimal encoding [ICALP, 2015] and show that in most cases, our encoding shows faster construction time while using a competitive space in practice.
我们为范围Top-2查询(RT2Q)设计了一种实用的编码变体,并评估了其性能。给定一个数组$A[1,n]$,包含总顺序为$n的$n个元素,范围Top-2编码问题是构造一个回答${textsf{RT2Q}}{}$的数据结构,它返回$A$给定范围内第一大和第二大元素的位置,而不需要在查询时访问数组$A$。我们设计了以下两种实现:(i)基于Davoodi等人的替代表示的实现。反式。皇家Soc。[A, 2016]数据结构,有效支持查询。实验结果表明,我们的实现在实践中是有效的,并且与索引数据结构(保留原始数组$A$作为数据结构的一部分)相比,在范围最大查询中提供了更好的时间-空间权衡。(ii)基于Jo等人在$2 times n$数组上的${textsf{RT2Q}}{}$编码的另一种实现[CPM, 2016],可以在$O(n)$ time内构造。我们将我们的编码与Gawrychowski和Nicholson的最优编码[ICALP, 2015]进行了比较,并表明在大多数情况下,我们的编码在实践中使用竞争空间时显示出更快的构建时间。
{"title":"Practical Implementation of Encoding Range Top-2 Queries","authors":"Seungbum Jo, Wooyoung Park, S. Rao","doi":"10.4230/LIPIcs.SEA.2021.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2021.10","url":null,"abstract":"\u0000 We design a practical variant of an encoding for range Top-2 query (RT2Q) and evaluate its performance. Given an array $A[1,n]$ of $n$ elements from a total order, the range Top-2 encoding problem is to construct a data structure that answers ${textsf{RT2Q}}{}$, which returns the positions of the first and second largest elements within a given range of $A$, without accessing the array $A$ at query time. We design the following two implementations: (i) an implementation based on an alternative representation of Davoodi et al.’s [Phil. Trans. Royal Soc. A, 2016] data structure, which supports queries efficiently. Experimental results show that our implementation is efficient in practice and gives improved time-space trade-offs compared with the indexing data structures (which keep the original array $A$ as part of the data structure) for range maximum queries. (ii) Another implementation based on Jo et al.’s ${textsf{RT2Q}}{}$ encoding on $2 times n$ array [CPM, 2016], which can be constructed in $O(n)$ time. We compare our encoding with Gawrychowski and Nicholson’s optimal encoding [ICALP, 2015] and show that in most cases, our encoding shows faster construction time while using a competitive space in practice.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"340 1","pages":"10:1-10:13"},"PeriodicalIF":0.0,"publicationDate":"2022-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78049011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-24DOI: 10.48550/arXiv.2208.11371
P. Bille, I. L. Gørtz, S. Puglisi, Simon R. Tarnow
Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string $S$ is compressed relative to a second string $R$ (called the reference) by parsing $S$ into a sequence of substrings that occur in $R$. RLZ is particularly effective at compressing sets of strings that have a high degree of similarity to the reference string, such as a set of genomes of individuals from the same species. With the now cheap cost of DNA sequencing, such data sets have become extremely abundant and are rapidly growing. In this paper, instead of using a single reference string for the entire collection, we investigate the use of different reference strings for subsets of the collection, with the aim of improving compression. In particular, we form a rooted tree (or hierarchy) on the strings and then compressed each string using RLZ with parent as reference, storing only the root of the tree in plain text. To decompress, we traverse the tree in BFS order starting at the root, decompressing children with respect to their parent. We show that this approach leads to a twofold improvement in compression on bacterial genome data sets, with negligible effect on decompression time compared to the standard single reference approach. We show that an effective hierarchy for a given set of strings can be constructed by computing the optimal arborescence of a completed weighted digraph of the strings, with weights as the number of phrases in the RLZ parsing of the source and destination vertices. We further show that instead of computing the complete graph, a sparse graph derived using locality sensitive hashing can significantly reduce the cost of computing a good hierarchy, without adversely effecting compression performance.
{"title":"Hierarchical Relative Lempel-Ziv Compression","authors":"P. Bille, I. L. Gørtz, S. Puglisi, Simon R. Tarnow","doi":"10.48550/arXiv.2208.11371","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11371","url":null,"abstract":"Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string $S$ is compressed relative to a second string $R$ (called the reference) by parsing $S$ into a sequence of substrings that occur in $R$. RLZ is particularly effective at compressing sets of strings that have a high degree of similarity to the reference string, such as a set of genomes of individuals from the same species. With the now cheap cost of DNA sequencing, such data sets have become extremely abundant and are rapidly growing. In this paper, instead of using a single reference string for the entire collection, we investigate the use of different reference strings for subsets of the collection, with the aim of improving compression. In particular, we form a rooted tree (or hierarchy) on the strings and then compressed each string using RLZ with parent as reference, storing only the root of the tree in plain text. To decompress, we traverse the tree in BFS order starting at the root, decompressing children with respect to their parent. We show that this approach leads to a twofold improvement in compression on bacterial genome data sets, with negligible effect on decompression time compared to the standard single reference approach. We show that an effective hierarchy for a given set of strings can be constructed by computing the optimal arborescence of a completed weighted digraph of the strings, with weights as the number of phrases in the RLZ parsing of the source and destination vertices. We further show that instead of computing the complete graph, a sparse graph derived using locality sensitive hashing can significantly reduce the cost of computing a good hierarchy, without adversely effecting compression performance.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"24 1","pages":"18:1-18:16"},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90970214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}