Bulletin of the Society of Sea Water Science, Japan最新文献_第2页

Subset Wavelet Trees 子集小波树

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2023-01-01 DOI: 10.4230/LIPIcs.SEA.2023.4

Jarno N. Alanko, E. Biagi, S. Puglisi, Jaakko Vuohtoniemi

Given an alphabet Σ of σ = | Σ | symbols, a degenerate (or indeterminate) string X is a sequence X = X [0] , X [1] . . . , X [ n − 1] of n subsets of Σ. Since their introduction in the mid 70s, degenerate strings have been widely studied, with applications driven by their being a natural model for sequences in which there is a degree of uncertainty about the precise symbol at a given position, such as those arising in genomics and proteomics. In this paper we introduce a new data structural tool for degenerate strings, called the subset wavelet tree (SubsetWT). A SubsetWT supports two basic operations on degenerate strings: subset-rank( i, c ), which returns the number of subsets up to the i -th subset in the degenerate string that contain the symbol c ; and subset-select( i, c ), which returns the index in the degenerate string of the i -th subset that contains symbol c . These queries are analogs of rank and select queries that have been widely studied for ordinary strings. Via experiments in a real genomics application in which degenerate strings are fundamental, we show that subset wavelet trees are practical data structures, and in particular offer an attractive space-time tradeoff. Along the way we investigate data structures for supporting (normal) rank queries on base-4 and base-3 sequences, which may be of independent interest. Our C++ implementations of the data structures are available at https://github.com/jnalanko/SubsetWT .

给定Σ = | Σ |符号的字母表Σ，简并(或不确定)字符串X是一个序列X = X [0]， X[1]…， Σ的n个子集中的X [n−1]。自70年代中期引入以来，简并弦得到了广泛的研究，其应用是由于它们是序列的自然模型，其中在给定位置上的精确符号存在一定程度的不确定性，例如基因组学和蛋白质组学中出现的序列。本文介绍了一种新的数据结构工具，称为子集小波树(SubsetWT)。SubsetWT支持对退化字符串的两种基本操作:subset-rank(i, c)，它返回退化字符串中包含符号c的第i个子集的子集数;还有subset-select(i, c)，它返回包含符号c的第i个子集的退化字符串中的索引。这些查询类似于对普通字符串进行了广泛研究的rank和select查询。通过一个以简并字符串为基础的真实基因组学应用实验，我们证明了子集小波树是实用的数据结构，特别是提供了一个有吸引力的时空权衡。在此过程中，我们研究了支持以4为基数和以3为基数序列的(正常)排名查询的数据结构，这可能是独立的兴趣。我们的数据结构的c++实现可以在https://github.com/jnalanko/SubsetWT上获得。

{"title":"Subset Wavelet Trees","authors":"Jarno N. Alanko, E. Biagi, S. Puglisi, Jaakko Vuohtoniemi","doi":"10.4230/LIPIcs.SEA.2023.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.4","url":null,"abstract":"Given an alphabet Σ of σ = | Σ | symbols, a degenerate (or indeterminate) string X is a sequence X = X [0] , X [1] . . . , X [ n − 1] of n subsets of Σ. Since their introduction in the mid 70s, degenerate strings have been widely studied, with applications driven by their being a natural model for sequences in which there is a degree of uncertainty about the precise symbol at a given position, such as those arising in genomics and proteomics. In this paper we introduce a new data structural tool for degenerate strings, called the subset wavelet tree (SubsetWT). A SubsetWT supports two basic operations on degenerate strings: subset-rank( i, c ), which returns the number of subsets up to the i -th subset in the degenerate string that contain the symbol c ; and subset-select( i, c ), which returns the index in the degenerate string of the i -th subset that contains symbol c . These queries are analogs of rank and select queries that have been widely studied for ordinary strings. Via experiments in a real genomics application in which degenerate strings are fundamental, we show that subset wavelet trees are practical data structures, and in particular offer an attractive space-time tradeoff. Along the way we investigate data structures for supporting (normal) rank queries on base-4 and base-3 sequences, which may be of independent interest. Our C++ implementations of the data structures are available at https://github.com/jnalanko/SubsetWT .","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"23 1","pages":"4:1-4:14"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86347335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Simple Runs-Bounded FM-Index Designs Are Fast 简单的运行-有限的fm -索引设计是快速的

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2023-01-01 DOI: 10.4230/LIPIcs.SEA.2023.7

Diego Díaz-Domínguez, Saska Dönges, S. Puglisi, Leena Salmela

Given a string X of length n on alphabet σ , the FM-index data structure allows counting all occurrences of a pattern P of length m in O ( m ) time via an algorithm called backward search . An important difficulty when searching with an FM-index is to support queries on L , the Burrows-Wheeler transform of X , while L is in compressed form. This problem has been the subject of intense research for 25 years now. Run-length encoding of L is an effective way to reduce index size, in particular when the data being indexed is highly-repetitive, which is the case in many types of modern data, including those arising from versioned document collections and in pangenomics. This paper takes a back-to-basics look at supporting backward search in FM-indexes, exploring and engineering two simple designs. The first divides the BWT string into blocks containing b symbols each and then run-length compresses each block separately, possibly introducing new runs (compared to applying run-length encoding once, to the whole string). Each block stores counts of each symbol that occurs before the block. This method supports the operation rank c ( L, i ) (i.e., count the number of times c occurs in the prefix L [1 ..i ]) by first determining the block i/b in which i falls and scanning the block to the appropriate position counting occurrences of c along the way. This partial answer to rank c ( L, i ) is then added to the stored count of c symbols before the block to determine the final answer. Our second design has a similar structure, but instead divides the run-length-encoded version of L into blocks containing an equal number of runs. The trick then is to determine the block in which a query falls, which is achieved via a predecessor query over the block starting positions. We show via extensive experiments on a wide range of repetitive text collections that these FM-indexes are not only easy to implement, but also fast and space efficient in practice.

给定字母σ上长度为n的字符串X, FM-index数据结构允许通过一种称为向后搜索的算法，在O (m)时间内计算长度为m的模式P的所有出现次数。当使用fm索引进行搜索时，一个重要的困难是支持L上的查询，即X的Burrows-Wheeler变换，而L是压缩形式。这个问题已经被深入研究了25年。L的运行长度编码是减少索引大小的有效方法，特别是当索引的数据高度重复时，这在许多类型的现代数据中都是如此，包括来自版本化文档集合和泛基因组学的数据。本文从根本上探讨了在fm索引中支持向后搜索，探索和设计了两个简单的设计。第一种方法是将BWT字符串分成每个包含b个符号的块，然后分别对每个块进行运行长度压缩，可能会引入新的运行(与对整个字符串应用一次运行长度编码相比)。每个块存储在该块之前出现的每个符号的计数。此方法支持操作秩c (L, i)(即，计数c在前缀L[1 ..]中出现的次数)。I])，首先确定I所在的块I /b，并扫描块到适当的位置，一路上计数c的出现次数。然后，将c (L, i)的部分答案添加到块之前存储的c个符号的计数中，以确定最终答案。我们的第二个设计具有类似的结构，但将运行长度编码版本的L划分为包含相同运行次数的块。接下来的技巧是确定查询落在哪个块中，这是通过对块起始位置的前导查询实现的。我们通过对大量重复文本集合的大量实验表明，这些fm索引不仅易于实现，而且在实践中速度快，空间效率高。

{"title":"Simple Runs-Bounded FM-Index Designs Are Fast","authors":"Diego Díaz-Domínguez, Saska Dönges, S. Puglisi, Leena Salmela","doi":"10.4230/LIPIcs.SEA.2023.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.7","url":null,"abstract":"Given a string X of length n on alphabet σ , the FM-index data structure allows counting all occurrences of a pattern P of length m in O ( m ) time via an algorithm called backward search . An important difficulty when searching with an FM-index is to support queries on L , the Burrows-Wheeler transform of X , while L is in compressed form. This problem has been the subject of intense research for 25 years now. Run-length encoding of L is an effective way to reduce index size, in particular when the data being indexed is highly-repetitive, which is the case in many types of modern data, including those arising from versioned document collections and in pangenomics. This paper takes a back-to-basics look at supporting backward search in FM-indexes, exploring and engineering two simple designs. The first divides the BWT string into blocks containing b symbols each and then run-length compresses each block separately, possibly introducing new runs (compared to applying run-length encoding once, to the whole string). Each block stores counts of each symbol that occurs before the block. This method supports the operation rank c ( L, i ) (i.e., count the number of times c occurs in the prefix L [1 ..i ]) by first determining the block i/b in which i falls and scanning the block to the appropriate position counting occurrences of c along the way. This partial answer to rank c ( L, i ) is then added to the stored count of c symbols before the block to determine the final answer. Our second design has a similar structure, but instead divides the run-length-encoded version of L into blocks containing an equal number of runs. The trick then is to determine the block in which a query falls, which is achieved via a predecessor query over the block starting positions. We show via extensive experiments on a wide range of repetitive text collections that these FM-indexes are not only easy to implement, but also fast and space efficient in practice.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"1 1","pages":"7:1-7:16"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88177891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multilinear Formulations for Computing a Nash Equilibrium of Multi-Player Games 计算多人博弈纳什均衡的多线性公式

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2023-01-01 DOI: 10.4230/LIPIcs.SEA.2023.12

M. Fischer, A. Gupte

We present multilinear and mixed-integer multilinear programs to find a Nash equilibrium in multi-player noncooperative games. We compare the formulations to common algorithms in Gambit, and conclude that a multilinear feasibility program finds a Nash equilibrium faster than any of the methods we compare it to, including the quantal response equilibrium method, which is recommended for large games. Hence, the multilinear feasibility program is an alternative method to find a Nash equilibrium in multi-player games, and outperforms many common algorithms. The mixed-integer formulations are generalisations of known mixed-integer programs for two-player games, however unlike two-player games, these mixed-integer programs do not give better performance than existing algorithms.

本文提出了多线性和混合整数多线性规划来求解多参与人非合作对策中的纳什均衡。我们将公式与Gambit中的常见算法进行了比较，并得出结论，多线性可行性程序比我们比较的任何方法(包括推荐用于大型游戏的量子响应平衡方法)都更快地找到纳什均衡。因此，多线性可行性规划是在多人博弈中寻找纳什均衡的一种替代方法，并且优于许多常用算法。混合整数公式是已知的双人博弈的混合整数程序的推广，然而与双人博弈不同，这些混合整数程序并不比现有算法提供更好的性能。

引用次数: 0

Optimizing over the Efficient Set of a Multi-Objective Discrete Optimization Problem 多目标离散优化问题的有效集优化

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2023-01-01 DOI: 10.4230/LIPIcs.SEA.2023.9

Satya Tamby, D. Vanderpooten

Optimizing over the efficient set of a discrete multi-objective problem is a challenging issue. The main reason is that, unlike when optimizing over the feasible set, the efficient set is implicitly characterized. Therefore, methods designed for this purpose iteratively generate efficient solutions by solving appropriate single-objective problems. However, the number of efficient solutions can be quite large and the problems to be solved can be difficult practically. Thus, the challenge is both to minimize the number of iterations and to reduce the difficulty of the problems to be solved at each iteration. In this paper, a new enumeration scheme is proposed. By introducing some constraints and optimizing over projections of the search region, potentially large parts of the search space can be discarded, drastically reducing the number of iterations. Moreover, the single-objective programs to be solved can be guaranteed to be feasible, and a starting solution can be provided allowing warm start resolutions. This results in a fast algorithm that is simple to implement. Experimental computations on two standard multi-objective instance families show that our approach seems to perform significantly faster than the state of the art algorithm.

离散多目标问题的有效集优化是一个具有挑战性的问题。主要原因是，与在可行集上优化不同，有效集是隐式表征的。因此，为此目的设计的方法通过求解适当的单目标问题来迭代地生成有效的解。然而，有效解决方案的数量可能相当大，要解决的问题实际上可能很困难。因此，挑战在于最小化迭代次数和降低每次迭代要解决的问题的难度。本文提出了一种新的枚举方案。通过引入一些约束和优化搜索区域的投影，可能会丢弃大部分搜索空间，从而大大减少迭代次数。保证了所要求解的单目标方案的可行性，并提供了一个允许热启动的启动解。这就产生了一种易于实现的快速算法。在两个标准多目标实例族上的实验计算表明，我们的方法似乎比最先进的算法执行得快得多。

{"title":"Optimizing over the Efficient Set of a Multi-Objective Discrete Optimization Problem","authors":"Satya Tamby, D. Vanderpooten","doi":"10.4230/LIPIcs.SEA.2023.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.9","url":null,"abstract":"Optimizing over the efficient set of a discrete multi-objective problem is a challenging issue. The main reason is that, unlike when optimizing over the feasible set, the efficient set is implicitly characterized. Therefore, methods designed for this purpose iteratively generate efficient solutions by solving appropriate single-objective problems. However, the number of efficient solutions can be quite large and the problems to be solved can be difficult practically. Thus, the challenge is both to minimize the number of iterations and to reduce the difficulty of the problems to be solved at each iteration. In this paper, a new enumeration scheme is proposed. By introducing some constraints and optimizing over projections of the search region, potentially large parts of the search space can be discarded, drastically reducing the number of iterations. Moreover, the single-objective programs to be solved can be guaranteed to be feasible, and a starting solution can be provided allowing warm start resolutions. This results in a fast algorithm that is simple to implement. Experimental computations on two standard multi-objective instance families show that our approach seems to perform significantly faster than the state of the art algorithm.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"26 1","pages":"9:1-9:13"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74671234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integer Programming Formulations and Cutting Plane Algorithms for the Maximum Selective Tree Problem 最大选择树问题的整数规划公式和切平面算法

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2023-01-01 DOI: 10.4230/LIPIcs.SEA.2023.13

Ömer Burak Onar, T. Ekim, Z. Taşkın

This paper considers the Maximum Selective Tree Problem (MSelTP) as a generalization of the Maximum Induced Tree problem. Given an undirected graph with a partition of its vertex set into clusters, MSelTP aims to choose the maximum number of vertices such that at most one vertex per cluster is selected and the graph induced by the selected vertices is a tree. To the best of our knowledge, MSelTP has not been studied before although several related optimization problems have been investigated in the literature. We propose two mixed integer programming formulations for MSelTP; one based on connectivity constraints, the other based on cycle elimination constraints. In addition, we develop two exact cutting plane procedures to solve the problem to optimality. On graphs with up to 25 clusters, up to 250 vertices, and varying densities, we conduct computational experiments to compare the results of two solution procedures with solving a compact integer programming formulation of MSelTP. Our experiments indicate that the algorithm CPAXnY outperforms the other procedures overall except for graphs with low density and large cluster size, and that the algorithm CPAX yields better results in terms of the average time of instances optimally solved and the overall average time.

本文将最大选择树问题(MSelTP)作为最大诱导树问题的推广。给定一个无向图，其顶点集被划分为簇，MSelTP的目标是选择顶点的最大数量，这样每个簇最多选择一个顶点，并且由所选顶点诱导的图是树。据我们所知，尽管文献中已经研究了几个相关的优化问题，但MSelTP之前还没有被研究过。我们提出了MSelTP的两个混合整数规划公式;一个基于连通性约束，另一个基于循环消除约束。此外，我们还开发了两个精确的切割平面程序，以解决该问题的最优性。在具有多达25个簇、多达250个顶点和不同密度的图上，我们进行了计算实验，以比较两个解决过程的结果与解决MSelTP的紧凑整数规划公式。我们的实验表明，除了低密度和大簇大小的图之外，算法CPAXnY总体上优于其他过程，并且算法CPAX在最优解决实例的平均时间和总体平均时间方面产生更好的结果。

{"title":"Integer Programming Formulations and Cutting Plane Algorithms for the Maximum Selective Tree Problem","authors":"Ömer Burak Onar, T. Ekim, Z. Taşkın","doi":"10.4230/LIPIcs.SEA.2023.13","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.13","url":null,"abstract":"This paper considers the Maximum Selective Tree Problem (MSelTP) as a generalization of the Maximum Induced Tree problem. Given an undirected graph with a partition of its vertex set into clusters, MSelTP aims to choose the maximum number of vertices such that at most one vertex per cluster is selected and the graph induced by the selected vertices is a tree. To the best of our knowledge, MSelTP has not been studied before although several related optimization problems have been investigated in the literature. We propose two mixed integer programming formulations for MSelTP; one based on connectivity constraints, the other based on cycle elimination constraints. In addition, we develop two exact cutting plane procedures to solve the problem to optimality. On graphs with up to 25 clusters, up to 250 vertices, and varying densities, we conduct computational experiments to compare the results of two solution procedures with solving a compact integer programming formulation of MSelTP. Our experiments indicate that the algorithm CPAXnY outperforms the other procedures overall except for graphs with low density and large cluster size, and that the algorithm CPAX yields better results in terms of the average time of instances optimally solved and the overall average time.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"56 1","pages":"13:1-13:18"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88555069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exact and Approximate Range Mode Query Data Structures in Practice 精确和近似范围模式查询数据结构的实践

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2023-01-01 DOI: 10.4230/LIPIcs.SEA.2023.19

Meng He, Zhen Liu

We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.

我们对距离模式问题进行了实验研究。在这个问题的确切版本中，我们预处理一个数组A，这样给定一个查询范围[A, b]，可以有效地找到A [A, b]中最频繁的元素。对于这个问题，我们最重要的发现是，使用简洁的数据结构来编码更多预先计算的信息的策略不仅有助于Chan等人(数组中范围模式查询的线性空间数据结构，Theory of Computing Systems, 2013)在理论上改善了以前的结果，而且还帮助我们在实践中实现了最佳的时间/空间权衡;我们甚至更进一步，用简洁的数据结构替换他们解决方案中的更多组件，并进一步提高性能。在这个问题的近似版本中，a (1 + ε)-近似范围模式查询查找在a [a,b]中出现的元素至少是F a,b / (1 + ε)，其中F a,b是a [a,b]中模式的频率。我们实现了该问题之前的所有解，发现即使当ε = 1 2时，这些解的平均近似比在实践中也接近于1，并且它们提供的查询时间比最佳精确解快得多。这些解决方案实现了不同的有用的时空权衡，其中El-Zein等人(On Approximate Range Mode and Range Selection，第30届国际算法与计算研讨会，2019)为我们提供了一个空间利用率仅为35的解决方案。6%到93。存储32位整数输入数组的成本的8%(在大多数情况下，空间成本更接近下限，在所有数据集中，平均空间成本为每个符号20.2位)。它的非简洁版本也很突出，它的查询支持速度至少比其他0 (nε)字结构快几倍，而在实践中只使用了稍微多一点的空间。

{"title":"Exact and Approximate Range Mode Query Data Structures in Practice","authors":"Meng He, Zhen Liu","doi":"10.4230/LIPIcs.SEA.2023.19","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.19","url":null,"abstract":"We conduct an experimental study on the range mode problem. In the exact version of the problem, we preprocess an array A , such that given a query range [ a, b ], the most frequent element in A [ a, b ] can be found efficiently. For this problem, our most important finding is that the strategy of using succinct data structures to encode more precomputed information not only helped Chan et al. (Linear-space data structures for range mode query in arrays, Theory of Computing Systems, 2013) improve previous results in theory but also helps us achieve the best time/space tradeoff in practice; we even go a step further to replace more components in their solution with succinct data structures and improve the performance further. In the approximate version of this problem, a (1 + ε )-approximate range mode query looks for an element whose occurrences in A [ a, b ] is at least F a,b / (1 + ε ), where F a,b is the frequency of the mode in A [ a, b ]. We implement all previous solutions to this problems and find that, even when ε = 1 2 , the average approximation ratio of these solutions is close to 1 in practice, and they provide much faster query time than the best exact solution. These solutions achieve different useful time-space tradeoffs, and among them, El-Zein et al. (On Approximate Range Mode and Range Selection, 30th International Symposium on Algorithms and Computation, 2019) provide us with one solution whose space usage is only 35 . 6% to 93 . 8% of the cost of storing the input array of 32-bit integers (in most cases, the space cost is closer to the lower end, and the average space cost is 20.2 bits per symbol among all datasets). Its non-succinct version also stands out with query support at least several times faster than other O ( nε )-word structures while using only slightly more space in practice.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"165 1","pages":"19:1-19:22"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74320651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Solving Directed Feedback Vertex Set by Iterative Reduction to Vertex Cover 用迭代约简法求解有向反馈顶点集

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2023-01-01 DOI: 10.4230/LIPIcs.SEA.2023.10

Sebastian Angrick, Ben Bals, Katrin Casel, S. Cohen, T. Friedrich, Niko Hastrich, Theresa Hradilak, Davis Issac, Otto Kißig, Jonas Schmidt, Leo Wendt

In the Directed Feedback Vertex Set (DFVS) problem, one is given a directed graph G = ( V, E ) and wants to find a minimum cardinality set S ⊆ V such that G − S is acyclic. DFVS is a fundamental problem in computer science and finds applications in areas such as deadlock detection. The problem was the subject of the 2022 PACE coding challenge. We develop a novel exact algorithm for the problem that is tailored to perform well on instances that are mostly bi-directed. For such instances, we adapt techniques from the well-researched vertex cover problem. Our core idea is an iterative reduction to vertex cover. To this end, we also develop a new reduction rule that reduces the number of not bi-directed edges. With the resulting algorithm, we were able to win third place in the exact track of the PACE challenge. We perform computational experiments and compare the running time to other exact algorithms, in particular to the winning algorithm in PACE. Our experiments show that we outpace the other algorithms on instances that have a low density of uni-directed edges.

在有向反馈顶点集(DFVS)问题中，给定一个有向图G = (V, E)，并求出一个使G−S为无环的最小基数集S⊥V。DFVS是计算机科学中的一个基本问题，在死锁检测等领域得到了广泛的应用。这个问题是2022年PACE编码挑战的主题。我们为这个问题开发了一种新的精确算法，该算法可以在大多数双向的实例上表现良好。对于这种情况，我们采用了研究得很好的顶点覆盖问题的技术。我们的核心理念是迭代减少顶点覆盖。为此，我们还开发了一种新的减少非双向边数量的约简规则。通过最终的算法，我们在PACE挑战赛的赛道上获得了第三名。我们进行了计算实验，并将运行时间与其他精确算法进行了比较，特别是与PACE中的获胜算法进行了比较。我们的实验表明，我们在单向边密度低的实例上优于其他算法。

{"title":"Solving Directed Feedback Vertex Set by Iterative Reduction to Vertex Cover","authors":"Sebastian Angrick, Ben Bals, Katrin Casel, S. Cohen, T. Friedrich, Niko Hastrich, Theresa Hradilak, Davis Issac, Otto Kißig, Jonas Schmidt, Leo Wendt","doi":"10.4230/LIPIcs.SEA.2023.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2023.10","url":null,"abstract":"In the Directed Feedback Vertex Set (DFVS) problem, one is given a directed graph G = ( V, E ) and wants to find a minimum cardinality set S ⊆ V such that G − S is acyclic. DFVS is a fundamental problem in computer science and finds applications in areas such as deadlock detection. The problem was the subject of the 2022 PACE coding challenge. We develop a novel exact algorithm for the problem that is tailored to perform well on instances that are mostly bi-directed. For such instances, we adapt techniques from the well-researched vertex cover problem. Our core idea is an iterative reduction to vertex cover. To this end, we also develop a new reduction rule that reduces the number of not bi-directed edges. With the resulting algorithm, we were able to win third place in the exact track of the PACE challenge. We perform computational experiments and compare the running time to other exact algorithms, in particular to the winning algorithm in PACE. Our experiments show that we outpace the other algorithms on instances that have a low density of uni-directed edges.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"266 1","pages":"10:1-10:14"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89062088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Fast Reachability Using DAG Decomposition 使用DAG分解的快速可达性

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2023-01-01 DOI: 10.4230/LIPIcs.SEA.2023.2

G. Kritikakis, I. Tollis

We present a fast and practical algorithm to compute the transitive closure (TC) of a directed graph. It is based on computing a reachability indexing scheme of a directed acyclic graph (DAG), G = ( V, E ). Given any path/chain decomposition of G we show how to compute in parameterized linear time such a reachability scheme that can answer reachability queries in constant time. The experimental results reveal that our method is significantly faster in practice than the theoretical bounds imply, indicating that path/chain decomposition algorithms can be applied to obtain fast and practical solutions to the transitive closure (TC) problem. Furthermore, we show that the number of non-transitive edges of a DAG G is ≤ width ∗ | V | and that we can find a substantially large subset of the transitive edges of G in linear time using a path/chain decomposition. Our extensive experimental results show the interplay between these concepts in various models of DAGs. 2012 ACM Subject Classification Theory of computation → Theory and algorithms for application domains; Theory of computation → Design and analysis of algorithms

给出了一种计算有向图传递闭包的快速实用算法。它基于计算有向无环图(DAG)的可达性索引方案，G = (V, E)。给定G的任意路径/链分解，我们展示了如何在参数化线性时间内计算这样一个可达性方案，该方案可以在常数时间内回答可达性查询。实验结果表明，我们的方法在实践中比理论界限所暗示的要快得多，这表明路径/链分解算法可以应用于传递闭包(TC)问题的快速实用解。进一步，我们证明了DAG G的非传递边的个数≤width * | V |，并且我们可以使用路径/链分解在线性时间内找到G的一个相当大的传递边子集。我们广泛的实验结果显示了这些概念在各种dag模型中的相互作用。2012 ACM学科分类:计算理论→应用领域的理论与算法;计算理论→算法设计与分析

引用次数: 2

Practical Implementation of Encoding Range Top-2 Queries 编码范围前2查询的实际实现

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2022-09-10 DOI: 10.4230/LIPIcs.SEA.2021.10

Seungbum Jo, Wooyoung Park, S. Rao

We design a practical variant of an encoding for range Top-2 query (RT2Q) and evaluate its performance. Given an array $A[1,n]$ of $n$ elements from a total order, the range Top-2 encoding problem is to construct a data structure that answers ${textsf{RT2Q}}{}$, which returns the positions of the first and second largest elements within a given range of $A$, without accessing the array $A$ at query time. We design the following two implementations: (i) an implementation based on an alternative representation of Davoodi et al.’s [Phil. Trans. Royal Soc. A, 2016] data structure, which supports queries efficiently. Experimental results show that our implementation is efficient in practice and gives improved time-space trade-offs compared with the indexing data structures (which keep the original array $A$ as part of the data structure) for range maximum queries. (ii) Another implementation based on Jo et al.’s ${textsf{RT2Q}}{}$ encoding on $2 times n$ array [CPM, 2016], which can be constructed in $O(n)$ time. We compare our encoding with Gawrychowski and Nicholson’s optimal encoding [ICALP, 2015] and show that in most cases, our encoding shows faster construction time while using a competitive space in practice.

我们为范围Top-2查询(RT2Q)设计了一种实用的编码变体，并评估了其性能。给定一个数组$A[1,n]$，包含总顺序为$n的$n个元素，范围Top-2编码问题是构造一个回答${textsf{RT2Q}}{}$的数据结构，它返回$A$给定范围内第一大和第二大元素的位置，而不需要在查询时访问数组$A$。我们设计了以下两种实现:(i)基于Davoodi等人的替代表示的实现。反式。皇家Soc。[A, 2016]数据结构，有效支持查询。实验结果表明，我们的实现在实践中是有效的，并且与索引数据结构(保留原始数组$A$作为数据结构的一部分)相比，在范围最大查询中提供了更好的时间-空间权衡。(ii)基于Jo等人在$2 times n$数组上的${textsf{RT2Q}}{}$编码的另一种实现[CPM, 2016]，可以在$O(n)$ time内构造。我们将我们的编码与Gawrychowski和Nicholson的最优编码[ICALP, 2015]进行了比较，并表明在大多数情况下，我们的编码在实践中使用竞争空间时显示出更快的构建时间。

{"title":"Practical Implementation of Encoding Range Top-2 Queries","authors":"Seungbum Jo, Wooyoung Park, S. Rao","doi":"10.4230/LIPIcs.SEA.2021.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2021.10","url":null,"abstract":"\u0000 We design a practical variant of an encoding for range Top-2 query (RT2Q) and evaluate its performance. Given an array $A[1,n]$ of $n$ elements from a total order, the range Top-2 encoding problem is to construct a data structure that answers ${textsf{RT2Q}}{}$, which returns the positions of the first and second largest elements within a given range of $A$, without accessing the array $A$ at query time. We design the following two implementations: (i) an implementation based on an alternative representation of Davoodi et al.’s [Phil. Trans. Royal Soc. A, 2016] data structure, which supports queries efficiently. Experimental results show that our implementation is efficient in practice and gives improved time-space trade-offs compared with the indexing data structures (which keep the original array $A$ as part of the data structure) for range maximum queries. (ii) Another implementation based on Jo et al.’s ${textsf{RT2Q}}{}$ encoding on $2 times n$ array [CPM, 2016], which can be constructed in $O(n)$ time. We compare our encoding with Gawrychowski and Nicholson’s optimal encoding [ICALP, 2015] and show that in most cases, our encoding shows faster construction time while using a competitive space in practice.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"340 1","pages":"10:1-10:13"},"PeriodicalIF":0.0,"publicationDate":"2022-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78049011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical Relative Lempel-Ziv Compression 分层相对Lempel-Ziv压缩

Bulletin of the Society of Sea Water Science, Japan

Pub Date : 2022-08-24 DOI: 10.48550/arXiv.2208.11371

P. Bille, I. L. Gørtz, S. Puglisi, Simon R. Tarnow

Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string $S$ is compressed relative to a second string $R$ (called the reference) by parsing $S$ into a sequence of substrings that occur in $R$. RLZ is particularly effective at compressing sets of strings that have a high degree of similarity to the reference string, such as a set of genomes of individuals from the same species. With the now cheap cost of DNA sequencing, such data sets have become extremely abundant and are rapidly growing. In this paper, instead of using a single reference string for the entire collection, we investigate the use of different reference strings for subsets of the collection, with the aim of improving compression. In particular, we form a rooted tree (or hierarchy) on the strings and then compressed each string using RLZ with parent as reference, storing only the root of the tree in plain text. To decompress, we traverse the tree in BFS order starting at the root, decompressing children with respect to their parent. We show that this approach leads to a twofold improvement in compression on bacterial genome data sets, with negligible effect on decompression time compared to the standard single reference approach. We show that an effective hierarchy for a given set of strings can be constructed by computing the optimal arborescence of a completed weighted digraph of the strings, with weights as the number of phrases in the RLZ parsing of the source and destination vertices. We further show that instead of computing the complete graph, a sparse graph derived using locality sensitive hashing can significantly reduce the cost of computing a good hierarchy, without adversely effecting compression performance.

相对Lempel-Ziv (RLZ)解析是一种字典压缩方法，通过将字符串$S$解析为出现在$R$中的子字符串序列，将字符串$S$相对于第二个字符串$R$(称为引用)进行压缩。RLZ在压缩与参考字符串高度相似的字符串集(例如来自同一物种的个体的一组基因组)方面特别有效。随着现在DNA测序成本的降低，这样的数据集变得非常丰富，并且正在迅速增长。在本文中，我们研究了对集合的子集使用不同的引用字符串，而不是对整个集合使用单个引用字符串，目的是提高压缩。特别是，我们在字符串上形成一个有根的树(或层次结构)，然后使用RLZ以parent作为引用压缩每个字符串，仅以纯文本形式存储树的根。为了解压缩，我们以BFS顺序从根节点开始遍历树，相对于父节点解压缩子节点。我们表明，与标准的单一参考方法相比，这种方法导致细菌基因组数据集压缩的两倍改进，对解压时间的影响可以忽略不计。我们表明，对于给定的字符串集合，可以通过计算字符串的完整加权有向图的最优树形来构建有效的层次结构，权重作为源顶点和目标顶点的RLZ解析中的短语数。我们进一步表明，使用位置敏感哈希法派生的稀疏图可以显著降低计算良好层次结构的成本，而不会对压缩性能产生不利影响，而不是计算完整图。

{"title":"Hierarchical Relative Lempel-Ziv Compression","authors":"P. Bille, I. L. Gørtz, S. Puglisi, Simon R. Tarnow","doi":"10.48550/arXiv.2208.11371","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11371","url":null,"abstract":"Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string $S$ is compressed relative to a second string $R$ (called the reference) by parsing $S$ into a sequence of substrings that occur in $R$. RLZ is particularly effective at compressing sets of strings that have a high degree of similarity to the reference string, such as a set of genomes of individuals from the same species. With the now cheap cost of DNA sequencing, such data sets have become extremely abundant and are rapidly growing. In this paper, instead of using a single reference string for the entire collection, we investigate the use of different reference strings for subsets of the collection, with the aim of improving compression. In particular, we form a rooted tree (or hierarchy) on the strings and then compressed each string using RLZ with parent as reference, storing only the root of the tree in plain text. To decompress, we traverse the tree in BFS order starting at the root, decompressing children with respect to their parent. We show that this approach leads to a twofold improvement in compression on bacterial genome data sets, with negligible effect on decompression time compared to the standard single reference approach. We show that an effective hierarchy for a given set of strings can be constructed by computing the optimal arborescence of a completed weighted digraph of the strings, with weights as the number of phrases in the RLZ parsing of the source and destination vertices. We further show that instead of computing the complete graph, a sparse graph derived using locality sensitive hashing can significantly reduce the cost of computing a good hierarchy, without adversely effecting compression performance.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"24 1","pages":"18:1-18:16"},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90970214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1