{"title":"An Invertible Transform for Efficient String Matching in Labeled Digraphs","authors":"Abhinav Nellore, Austin Nguyen, Reid F. Thompson","doi":"10.4230/LIPIcs.CPM.2021.20","DOIUrl":null,"url":null,"abstract":"Let $G = (V, E)$ be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet $\\Omega$, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of $G$ into a weakly connected digraph $G' = (V', E')$ that enables solving the decision problem of whether there is a walk in $G$ matching an arbitrarily long query string $q$ in time linear in $|q|$ and independent of $|E|$ and $|V|$. We show $G$ is uniquely determined by $G'$ when for every $v_\\ell \\in V$, there is some distinct string $s_\\ell$ on $\\Omega$ such that $v_\\ell$ is the origin of a closed walk in $G$ matching $s_\\ell$, and no other walk in $G$ matches $s_\\ell$ unless it starts and ends at $v_\\ell$. We then exploit this invertibility condition to strategically alter any $G$ so its transform $G'$ enables retrieval of all $t$ terminal vertices of walks in the unaltered $G$ matching $q$ in $O(|q| + t \\log |V|)$ time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2021.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Let $G = (V, E)$ be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet $\Omega$, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of $G$ into a weakly connected digraph $G' = (V', E')$ that enables solving the decision problem of whether there is a walk in $G$ matching an arbitrarily long query string $q$ in time linear in $|q|$ and independent of $|E|$ and $|V|$. We show $G$ is uniquely determined by $G'$ when for every $v_\ell \in V$, there is some distinct string $s_\ell$ on $\Omega$ such that $v_\ell$ is the origin of a closed walk in $G$ matching $s_\ell$, and no other walk in $G$ matches $s_\ell$ unless it starts and ends at $v_\ell$. We then exploit this invertibility condition to strategically alter any $G$ so its transform $G'$ enables retrieval of all $t$ terminal vertices of walks in the unaltered $G$ matching $q$ in $O(|q| + t \log |V|)$ time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.