Shoko Naruke, K. Fujiwara, T. Sugo, S. Kawai-Noma, D. Umeno, Kyoichi Saito
{"title":"Reduction of Supercooling of Heavy Water with Silver Iodide","authors":"Shoko Naruke, K. Fujiwara, T. Sugo, S. Kawai-Noma, D. Umeno, Kyoichi Saito","doi":"10.11457/SWSJ.72.1_41","DOIUrl":"https://doi.org/10.11457/SWSJ.72.1_41","url":null,"abstract":"","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"60 1","pages":"41-42"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81955058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-01DOI: 10.4230/LIPIcs.SEA.2018.6
N. Sedaghat, Tamon Stephen, L. Chindelevitch
The problem of computing the dual of a monotone Boolean function f is a fundamental problem in theoretical computer science with numerous applications. The related problem of duality testing (given two monotone Boolean functions f and g, declare that they are dual or provide a certificate that shows they are not) has a complexity that is not yet known. However, two quasi-polynomial time algorithms for it, often referred to as FK-A and FK-B, were proposed by Fredman and Khachiyan in 1996, with the latter having a better complexity guarantee. These can be naturally used as a subroutine in computing the dual of f . In this paper, we investigate this use of the FK-B algorithm for the computation of the dual of a monotone Boolean function, and present practical improvements to its performance. First, we show how FK-B can be modified to produce multiple certificates (Boolean vectors on which the functions defined by the original f and the current dual g do not provide outputs consistent with duality). Second, we show how the number of redundancy tests one of the more costly and time-consuming steps of FK-B can be substantially reduced in this context. Lastly, we describe a simple memoization technique that avoids the solution of multiple identical subproblems. We test our approach on a number of inputs coming from computational biology as well as combinatorics. These modifications provide a substantial speed-up, as much as an order of magnitude, for FK-B dualization relative to a naive implementation. Although other methods may end up being faster in practice, our work paves the way for a principled optimization process for the generation of monotone Boolean functions and their duals from an oracle. 2012 ACM Subject Classification Computing methodologies → Boolean algebra algorithms
{"title":"Speeding up Dualization in the Fredman-Khachiyan Algorithm B","authors":"N. Sedaghat, Tamon Stephen, L. Chindelevitch","doi":"10.4230/LIPIcs.SEA.2018.6","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2018.6","url":null,"abstract":"The problem of computing the dual of a monotone Boolean function f is a fundamental problem in theoretical computer science with numerous applications. The related problem of duality testing (given two monotone Boolean functions f and g, declare that they are dual or provide a certificate that shows they are not) has a complexity that is not yet known. However, two quasi-polynomial time algorithms for it, often referred to as FK-A and FK-B, were proposed by Fredman and Khachiyan in 1996, with the latter having a better complexity guarantee. These can be naturally used as a subroutine in computing the dual of f . In this paper, we investigate this use of the FK-B algorithm for the computation of the dual of a monotone Boolean function, and present practical improvements to its performance. First, we show how FK-B can be modified to produce multiple certificates (Boolean vectors on which the functions defined by the original f and the current dual g do not provide outputs consistent with duality). Second, we show how the number of redundancy tests one of the more costly and time-consuming steps of FK-B can be substantially reduced in this context. Lastly, we describe a simple memoization technique that avoids the solution of multiple identical subproblems. We test our approach on a number of inputs coming from computational biology as well as combinatorics. These modifications provide a substantial speed-up, as much as an order of magnitude, for FK-B dualization relative to a naive implementation. Although other methods may end up being faster in practice, our work paves the way for a principled optimization process for the generation of monotone Boolean functions and their duals from an oracle. 2012 ACM Subject Classification Computing methodologies → Boolean algebra algorithms","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"42 1","pages":"6:1-6:13"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87563201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monovalent Selective Cation-exchange Membranes Prepared from PVA-based Block Copolymers","authors":"Saeko Harada, Yuriko Kakihana, Mitsuru Higa","doi":"10.11457/SWSJ.72.6_338","DOIUrl":"https://doi.org/10.11457/SWSJ.72.6_338","url":null,"abstract":"","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"10 1","pages":"338-339"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87385548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Nagashima, H. Minamisawa, T. Nakagama, K. Saitoh, Hiromichi Asamoto
{"title":"High-performance Liquid Chromatographic Evaluation of Effect of Metal Ion on Amyloid Fibril Formation","authors":"K. Nagashima, H. Minamisawa, T. Nakagama, K. Saitoh, Hiromichi Asamoto","doi":"10.11457/SWSJ.72.6_336","DOIUrl":"https://doi.org/10.11457/SWSJ.72.6_336","url":null,"abstract":"","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"40 1","pages":"336-337"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84112231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-01DOI: 10.4230/LIPIcs.SEA.2018.27
V. Buchhold, P. Sanders, D. Wagner
Given an urban road network and a set of origin-destination (OD) pairs, the traffic assignment problem asks for the traffic flow on each road segment. A common solution employs a feasible-direction method, where the direction-finding step requires many shortest-path computations. In this paper, we significantly accelerate the computation of flow patterns, enabling interactive transportation and urban planning applications. We achieve this by revisiting and carefully engineering known speedup techniques for shortest paths, and combining them with customizable contraction hierarchies. In particular, our accelerated elimination tree search is more than an order of magnitude faster for local queries than the original algorithm, and our centralized search speeds up batched point-to-point shortest paths by a factor of up to 6. These optimizations are independent of traffic assignment and can be generally used for (batched) point-to-point queries. In contrast to prior work, our evaluation uses real-world data for all parts of the problem. On a metropolitan area encompassing more than 2.7 million inhabitants, we reduce the flow-pattern computation for a typical two-hour morning peak from 76.5 to 10.5 seconds on one core, and 4.3 seconds on four cores. This represents a speedup of 18 over the state of the art, and three orders of magnitude over the Dijkstra-based baseline.
{"title":"Real-Time Traffic Assignment Using Fast Queries in Customizable Contraction Hierarchies","authors":"V. Buchhold, P. Sanders, D. Wagner","doi":"10.4230/LIPIcs.SEA.2018.27","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2018.27","url":null,"abstract":"Given an urban road network and a set of origin-destination (OD) pairs, the traffic assignment problem asks for the traffic flow on each road segment. A common solution employs a feasible-direction method, where the direction-finding step requires many shortest-path computations. In this paper, we significantly accelerate the computation of flow patterns, enabling interactive transportation and urban planning applications. We achieve this by revisiting and carefully engineering known speedup techniques for shortest paths, and combining them with customizable contraction hierarchies. In particular, our accelerated elimination tree search is more than an order of magnitude faster for local queries than the original algorithm, and our centralized search speeds up batched point-to-point shortest paths by a factor of up to 6. These optimizations are independent of traffic assignment and can be generally used for (batched) point-to-point queries. In contrast to prior work, our evaluation uses real-world data for all parts of the problem. On a metropolitan area encompassing more than 2.7 million inhabitants, we reduce the flow-pattern computation for a typical two-hour morning peak from 76.5 to 10.5 seconds on one core, and 4.3 seconds on four cores. This represents a speedup of 18 over the state of the art, and three orders of magnitude over the Dijkstra-based baseline.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"19 4","pages":"27:1-27:15"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91456563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-01DOI: 10.4230/LIPIcs.SEA.2018.28
P. Kaski, Juho Lauri, Suhas Thejaswi
Given a vertex-colored graph H and a multiset M of colors as input, the graph motif problem asks us to decide whether H has a connected induced subgraph whose multiset of colors agrees with M. The graph motif problem is NP-complete but known to admit randomized algorithms based on constrained multilinear sieving over GF(2^b) that run in time O(2^kk^2m {M({2^b})}) and with a false-negative probability of at most k/2^{b-1} for a connected m-edge input and a motif of size k. On modern CPU microarchitectures such algorithms have practical edge-linear scalability to inputs with billions of edges for small motif sizes, as demonstrated by Bjorklund, Kaski, Kowalik, and Lauri [ALENEX'15]. This scalability to large graphs prompts the dual question whether it is possible to scale to large motif sizes. We present a vertex-localized variant of the constrained multilinear sieve that enables us to obtain, in time O(2^kk^2m{M({2^b})}) and for every vertex simultaneously, whether the vertex participates in at least one match with the motif, with a per-vertex probability of at most k/2^{b-1} for a false negative. Furthermore, the algorithm is easily vector-parallelizable for up to 2^k threads, and parallelizable for up to 2^kn threads, where n is the number of vertices in H. Here {M({2^b})} is the time complexity to multiply in GF(2^b). We demonstrate with an open-source implementation that our variant of constrained multilinear sieving can be engineered for vector-parallel microarchitectures to yield hardware utilization that is bound by the available memory bandwidth. Our main engineering contributions are (a) a version of the recurrence for tightly labeled arborescences that can be executed as a sequence of memory-and-arithmetic coalescent parallel workloads on multiple GPUs, and (b) a bit-sliced low-level implementation for arithmetic in characteristic 2 to support (a).
给定一个顶点彩色图H和一个多色集M作为输入,H图图案的问题让我们决定是否有一个连接诱导子图,其多重集的颜色同意M图图案的问题是np完全但已知承认基于多重线性约束的随机算法筛选/ GF (2 ^ b)中运行时间O (2 ^ kk ^ 2 M M {} ({2 ^ b}))和假阴性的可能性最多k / 2 ^ {b}连接m-edge输入和图案的大小k。在现代处理器微体系结构这类算法实用edge-linear可伸缩性Bjorklund, Kaski, Kowalik和Lauri [ALENEX'15]证明,小图案尺寸的输入具有数十亿条边。这种对大型图形的可扩展性提出了两个问题,即是否有可能缩放到大型主题尺寸。我们提出了约束多线性筛的顶点局部化变体,使我们能够在时间O(2^kk^2m{M({2^b})})中同时获得每个顶点是否与基序至少有一次匹配,对于假阴性,每个顶点的概率最多为k/2^{b-1}。此外,该算法可以很容易地对多达2^k个线程进行向量并行化,并且可以并行化多达2^kn个线程,其中n是h中的顶点数,这里{M({2^b})}是在GF(2^b)中相乘的时间复杂度。我们通过一个开源实现证明,我们的约束多线性筛分变体可以设计用于矢量并行微架构,以产生受可用内存带宽约束的硬件利用率。我们的主要工程贡献是(a)一个紧密标记的树形序列的递归版本,可以在多个gpu上作为内存和算术合并并行工作负载的序列执行,以及(b)一个特征2中的算术的位切片低级实现来支持(a)。
{"title":"Engineering Motif Search for Large Motifs","authors":"P. Kaski, Juho Lauri, Suhas Thejaswi","doi":"10.4230/LIPIcs.SEA.2018.28","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2018.28","url":null,"abstract":"Given a vertex-colored graph H and a multiset M of colors as input, the graph motif problem asks us to decide whether H has a connected induced subgraph whose multiset of colors agrees with M. The graph motif problem is NP-complete but known to admit randomized algorithms based on constrained multilinear sieving over GF(2^b) that run in time O(2^kk^2m {M({2^b})}) and with a false-negative probability of at most k/2^{b-1} for a connected m-edge input and a motif of size k. On modern CPU microarchitectures such algorithms have practical edge-linear scalability to inputs with billions of edges for small motif sizes, as demonstrated by Bjorklund, Kaski, Kowalik, and Lauri [ALENEX'15]. This scalability to large graphs prompts the dual question whether it is possible to scale to large motif sizes. We present a vertex-localized variant of the constrained multilinear sieve that enables us to obtain, in time O(2^kk^2m{M({2^b})}) and for every vertex simultaneously, whether the vertex participates in at least one match with the motif, with a per-vertex probability of at most k/2^{b-1} for a false negative. Furthermore, the algorithm is easily vector-parallelizable for up to 2^k threads, and parallelizable for up to 2^kn threads, where n is the number of vertices in H. Here {M({2^b})} is the time complexity to multiply in GF(2^b). We demonstrate with an open-source implementation that our variant of constrained multilinear sieving can be engineered for vector-parallel microarchitectures to yield hardware utilization that is bound by the available memory bandwidth. Our main engineering contributions are (a) a version of the recurrence for tightly labeled arborescences that can be executed as a sequence of memory-and-arithmetic coalescent parallel workloads on multiple GPUs, and (b) a bit-sliced low-level implementation for arithmetic in characteristic 2 to support (a).","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"70 1","pages":"28:1-28:19"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79171735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-01DOI: 10.4230/LIPIcs.SEA.2018.11
Michael Bastubbe, M. Lübbecke, Jonas T. Witt
In Dantzig-Wolfe reformulation of an integer program one convexifies a subset of the constraints, leading to potentially stronger dual bounds from the respective linear programming relaxation. As the subset can be chosen arbitrarily, this includes the trivial cases of convexifying no and all constraints, resulting in a weakest and strongest reformulation, respectively. Our computational study aims at better understanding of what happens in between these extremes. For a collection of integer programs with few constraints we compute, optimally solve, and evaluate the relaxations of all possible (exponentially many) Dantzig-Wolfe reformulations (with mild extensions to larger models from the MIPLIBs). We observe that only a tiny number of different dual bounds actually occur and that only a few inclusion-wise minimal representatives exist for each. This aligns with considerably different impacts of individual constraints on the strengthening the relaxation, some of which have almost no influence. In contrast, types of constraints that are convexified in textbook reformulations have a larger effect. We relate our experiments to what could be called a hierarchy of Dantzig-Wolfe reformulations.
{"title":"A Computational Investigation on the Strength of Dantzig-Wolfe Reformulations","authors":"Michael Bastubbe, M. Lübbecke, Jonas T. Witt","doi":"10.4230/LIPIcs.SEA.2018.11","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2018.11","url":null,"abstract":"In Dantzig-Wolfe reformulation of an integer program one convexifies a subset of the constraints, leading to potentially stronger dual bounds from the respective linear programming relaxation. As the subset can be chosen arbitrarily, this includes the trivial cases of convexifying no and all constraints, resulting in a weakest and strongest reformulation, respectively. Our computational study aims at better understanding of what happens in between these extremes. For a collection of integer programs with few constraints we compute, optimally solve, and evaluate the relaxations of all possible (exponentially many) Dantzig-Wolfe reformulations (with mild extensions to larger models from the MIPLIBs). We observe that only a tiny number of different dual bounds actually occur and that only a few inclusion-wise minimal representatives exist for each. This aligns with considerably different impacts of individual constraints on the strengthening the relaxation, some of which have almost no influence. In contrast, types of constraints that are convexified in textbook reformulations have a larger effect. We relate our experiments to what could be called a hierarchy of Dantzig-Wolfe reformulations.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"4 1","pages":"11:1-11:12"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89303212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-01DOI: 10.4230/LIPIcs.SEA.2018.30
A. Piperno
Colour refinement is at the heart of all the most efficient graph isomorphism software packages. In this paper we present a method for extending the applicability of refinement algorithms to directed graphs with weighted edges. We use {Traces} as a reference software, but the proposed solution is easily transferrable to any other refinement-based graph isomorphism tool in the literature. We substantiate the claim that the performances of the original algorithm remain substantially unchanged by showing experiments for some classes of benchmark graphs.
{"title":"Isomorphism Test for Digraphs with Weighted Edges","authors":"A. Piperno","doi":"10.4230/LIPIcs.SEA.2018.30","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2018.30","url":null,"abstract":"Colour refinement is at the heart of all the most efficient graph isomorphism software packages. In this paper we present a method for extending the applicability of refinement algorithms to directed graphs with weighted edges. We use {Traces} as a reference software, but the proposed solution is easily transferrable to any other refinement-based graph isomorphism tool in the literature. We substantiate the claim that the performances of the original algorithm remain substantially unchanged by showing experiments for some classes of benchmark graphs.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"1 1","pages":"30:1-30:13"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79909986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-01DOI: 10.4230/LIPIcs.SEA.2018.24
L. C. Aleardi, Gaspard Denis, Éric Fusy
We consider the problem of computing a spherical crossing-free geodesic drawing of a planar graph: this problem, as well as the closely related spherical parameterization problem, has attracted a lot of attention in the last two decades both in theory and in practice, motivated by a number of applications ranging from texture mapping to mesh remeshing and morphing. Our main concern is to design and implement a linear time algorithm for the computation of spherical drawings provided with theoretical guarantees. While not being aesthetically pleasing, our method is extremely fast and can be used as initial placer for spherical iterative methods and spring embedders. We provide experimental comparison with initial placers based on planar Tutte parameterization. Finally we explore the use of spherical drawings as initial layouts for (Euclidean) spring embedders: experimental evidence shows that this greatly helps to untangle the layout and to reach better local minima.
{"title":"Fast Spherical Drawing of Triangulations: An Experimental Study of Graph Drawing Tools","authors":"L. C. Aleardi, Gaspard Denis, Éric Fusy","doi":"10.4230/LIPIcs.SEA.2018.24","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2018.24","url":null,"abstract":"We consider the problem of computing a spherical crossing-free geodesic drawing of a planar graph: this problem, as well as the closely related spherical parameterization problem, has attracted a lot of attention in the last two decades both in theory and in practice, motivated by a number of applications ranging from texture mapping to mesh remeshing and morphing. Our main concern is to design and implement a linear time algorithm for the computation of spherical drawings provided with theoretical guarantees. While not being aesthetically pleasing, our method is extremely fast and can be used as initial placer for spherical iterative methods and spring embedders. We provide experimental comparison with initial placers based on planar Tutte parameterization. Finally we explore the use of spherical drawings as initial layouts for (Euclidean) spring embedders: experimental evidence shows that this greatly helps to untangle the layout and to reach better local minima.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"49 1","pages":"24:1-24:14"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74990724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-01DOI: 10.4230/LIPIcs.SEA.2018.16
S. Pissis, Ahmad Retha
An elastic-degenerate string is a sequence of n sets of strings of total length N . It has been introduced to represent multiple sequence alignments of closely-related sequences in a compact form. For a standard pattern of length m, pattern matching in an elastic-degenerate text can be solved on-line in time O(nm2 +N) with pre-processing time and space O(m) (Grossi et al., CPM 2017). A fast bit-vector algorithm requiring time O(N · dmw e) with pre-processing time and space O(m·dmw e), where w is the size of the computer word, was also presented. In this paper we consider the same problem for a set of patterns of total length M . A straightforward generalization of the existing bit-vector algorithm would require time O(N · dMw e) with pre-processing time and space O(M · dMw e), which is prohibitive in practice. We present a new on-line O(N · d M w e)-time algorithm with pre-processing time and space O(M). We present experimental results using both synthetic and real data demonstrating the performance of the algorithm. We further demonstrate a real application of our algorithm in a pipeline for discovery and verification of minimal absent words (MAWs) in the human genome showing that a significant number of previously discovered MAWs are in fact false-positives when a population’s variants are considered. 2012 ACM Subject Classification Theory of computation → Pattern matching
{"title":"Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line","authors":"S. Pissis, Ahmad Retha","doi":"10.4230/LIPIcs.SEA.2018.16","DOIUrl":"https://doi.org/10.4230/LIPIcs.SEA.2018.16","url":null,"abstract":"An elastic-degenerate string is a sequence of n sets of strings of total length N . It has been introduced to represent multiple sequence alignments of closely-related sequences in a compact form. For a standard pattern of length m, pattern matching in an elastic-degenerate text can be solved on-line in time O(nm2 +N) with pre-processing time and space O(m) (Grossi et al., CPM 2017). A fast bit-vector algorithm requiring time O(N · dmw e) with pre-processing time and space O(m·dmw e), where w is the size of the computer word, was also presented. In this paper we consider the same problem for a set of patterns of total length M . A straightforward generalization of the existing bit-vector algorithm would require time O(N · dMw e) with pre-processing time and space O(M · dMw e), which is prohibitive in practice. We present a new on-line O(N · d M w e)-time algorithm with pre-processing time and space O(M). We present experimental results using both synthetic and real data demonstrating the performance of the algorithm. We further demonstrate a real application of our algorithm in a pipeline for discovery and verification of minimal absent words (MAWs) in the human genome showing that a significant number of previously discovered MAWs are in fact false-positives when a population’s variants are considered. 2012 ACM Subject Classification Theory of computation → Pattern matching","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"25 1","pages":"16:1-16:14"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78245247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}