Pub Date : 2023-04-24DOI: 10.48550/arXiv.2304.11932
P. Schnoebelen, Julien Veron
Using arch-jumping functions and properties of the arch factorization of words, we propose a new algorithm for computing the subword circular universality index of words. We also introduce the subword universality signature for words, that leads to simple algorithms for the universality indexes of SLP-compressed words.
{"title":"On arch factorization and subword universality for words and compressed words","authors":"P. Schnoebelen, Julien Veron","doi":"10.48550/arXiv.2304.11932","DOIUrl":"https://doi.org/10.48550/arXiv.2304.11932","url":null,"abstract":"Using arch-jumping functions and properties of the arch factorization of words, we propose a new algorithm for computing the subword circular universality index of words. We also introduce the subword universality signature for words, that leads to simple algorithms for the universality indexes of SLP-compressed words.","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"29 1","pages":"274-287"},"PeriodicalIF":0.0,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73654292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-11DOI: 10.48550/arXiv.2304.05270
Duncan Adamson, Maria Kosche, Tore Koss, F. Manea, Stefan Siemer
We consider the longest common subsequence problem in the context of subsequences with gap constraints. In particular, following Day et al. 2022, we consider the setting when the distance (i. e., the gap) between two consecutive symbols of the subsequence has to be between a lower and an upper bound (which may depend on the position of those symbols in the subsequence or on the symbols bordering the gap) as well as the case where the entire subsequence is found in a bounded range (defined by a single upper bound), considered by Kosche et al. 2022. In all these cases, we present effcient algorithms for determining the length of the longest common constrained subsequence between two given strings.
我们考虑了带间隙约束的子序列中的最长公共子序列问题。特别是,第二天等。2022年,我们认为距离时的设置(即差距)连续两个符号之间的子序列必须是低和上界(可能取决于子序列中的位置的符号或符号接壤)的差距以及整个子序列的情况是发现在有限范围内(定义为一个上限),认为Kosche et al . 2022。在所有这些情况下,我们提出了有效的算法来确定两个给定字符串之间最长公共约束子序列的长度。
{"title":"Longest Common Subsequence with Gap Constraints","authors":"Duncan Adamson, Maria Kosche, Tore Koss, F. Manea, Stefan Siemer","doi":"10.48550/arXiv.2304.05270","DOIUrl":"https://doi.org/10.48550/arXiv.2304.05270","url":null,"abstract":"We consider the longest common subsequence problem in the context of subsequences with gap constraints. In particular, following Day et al. 2022, we consider the setting when the distance (i. e., the gap) between two consecutive symbols of the subsequence has to be between a lower and an upper bound (which may depend on the position of those symbols in the subsequence or on the symbols bordering the gap) as well as the case where the entire subsequence is found in a bounded range (defined by a single upper bound), considered by Kosche et al. 2022. In all these cases, we present effcient algorithms for determining the length of the longest common constrained subsequence between two given strings.","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"70 1","pages":"60-76"},"PeriodicalIF":0.0,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88466673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-10DOI: 10.48550/arXiv.2304.04583
Duncan Adamson
A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] , dots w[i_{|u|}]$, for some set of indices $1 leq i_1
单词$w$的子序列是一个单词$u$,对于某些索引集$1 leq i_1
{"title":"Ranking and Unranking k-subsequence universal words","authors":"Duncan Adamson","doi":"10.48550/arXiv.2304.04583","DOIUrl":"https://doi.org/10.48550/arXiv.2304.04583","url":null,"abstract":"A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] , dots w[i_{|u|}]$, for some set of indices $1 leq i_1<i_2<dots<i_k leq |w|$. A word $w$ is $k$-subsequence universal over an alphabet $Sigma$ if every word in $Sigma^k$ appears in $w$ as a subsequence. In this paper, we provide new algorithms for $k$-subsequence universal words of fixed length $n$ over the alphabet $Sigma = {1,2,dots, sigma}$. Letting $mathcal{U}(n,k,sigma)$ denote the set of $n$-length $k$-subsequence universal words over $Sigma$, we provide: * an $O(n k sigma)$ time algorithm for counting the size of $mathcal{U}(n,k,sigma)$; * an $O(n k sigma)$ time algorithm for ranking words in the set $mathcal{U}(n,k,sigma)$; * an $O(n k sigma)$ time algorithm for unranking words from the set $mathcal{U}(n,k,sigma)$; * an algorithm for enumerating the set $mathcal{U}(n,k,sigma)$ with $O(n sigma)$ delay after $O(n k sigma)$ preprocessing.","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"114 1","pages":"47-59"},"PeriodicalIF":0.0,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80796228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-03DOI: 10.48550/arXiv.2303.01726
Hiroto Fujimaru, Yuto Nakashima, Shunsuke Inenaga
Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string $T$, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string $T$, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if $e$ is the number of edges of the CDAWG for string $T$, then the number of new edges added to the CDAWG after a left-end edit operation on $T$ is less than $e$. Further, we present almost matching lower bounds on the sensitivity of CDAWGs for all cases of insertion, deletion, and substitution.
紧凑型有向无环字图(cdawg) [Blumer et al. 1987]是字符串的基本数据结构,在文本模式搜索、数据压缩和模式发现中有应用。直观上,字符串$T$的CDAWG是通过合并同一字符串$T$的后缀树[Weiner 1973]的同构子树得到的,因此CDAWG是一种紧凑的索引结构。在本文中,我们研究了当在输入字符串$T$的左端执行单个字符编辑操作(插入、删除或替换)时CDAWG的敏感性,即我们感兴趣的是在左端编辑操作后CDAWG大小的最坏情况增加。我们证明了如果$e$是字符串$T$的CDAWG的边数,那么对$T$进行左端编辑操作后添加到CDAWG的新边数小于$e$。此外,我们提出了几乎匹配的cdawg对所有插入、删除和替换情况敏感性的下界。
{"title":"On Sensitivity of Compact Directed Acyclic Word Graphs","authors":"Hiroto Fujimaru, Yuto Nakashima, Shunsuke Inenaga","doi":"10.48550/arXiv.2303.01726","DOIUrl":"https://doi.org/10.48550/arXiv.2303.01726","url":null,"abstract":"Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string $T$, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string $T$, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if $e$ is the number of edges of the CDAWG for string $T$, then the number of new edges added to the CDAWG after a left-end edit operation on $T$ is less than $e$. Further, we present almost matching lower bounds on the sensitivity of CDAWGs for all cases of insertion, deletion, and substitution.","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"64 1 1","pages":"168-180"},"PeriodicalIF":0.0,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87745307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-27DOI: 10.48550/arXiv.2302.13647
France Gheeraert, Giuseppe Romana, Manon Stipulanti
Firstly studied by Kempa and Prezza in 2018 as the cement of text compression algorithms, string attractors have become a compelling object of theoretical research within the community of combinatorics on words. In this context, they have been studied for several families of finite and infinite words. In this paper, we obtain string attractors of prefixes of particular infinite words generalizing k-bonacci words (including the famous Fibonacci word) and obtained as fixed points of k-bonacci-like morphisms. In fact, our description involves the numeration systems classically derived from the considered morphisms
{"title":"String attractors of fixed points of k-bonacci-like morphisms","authors":"France Gheeraert, Giuseppe Romana, Manon Stipulanti","doi":"10.48550/arXiv.2302.13647","DOIUrl":"https://doi.org/10.48550/arXiv.2302.13647","url":null,"abstract":"Firstly studied by Kempa and Prezza in 2018 as the cement of text compression algorithms, string attractors have become a compelling object of theoretical research within the community of combinatorics on words. In this context, they have been studied for several families of finite and infinite words. In this paper, we obtain string attractors of prefixes of particular infinite words generalizing k-bonacci words (including the famous Fibonacci word) and obtained as fixed points of k-bonacci-like morphisms. In fact, our description involves the numeration systems classically derived from the considered morphisms","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"71 1","pages":"192-205"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88102894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-25DOI: 10.48550/arXiv.2302.13147
D. Gabric, J. Shallit
A emph{palindrome} is a word that reads the same forwards and backwards. A emph{block palindrome factorization} (or emph{BP-factorization}) is a factorization of a word into blocks that becomes palindrome if each identical block is replaced by a distinct symbol. We call the number of blocks in a BP-factorization the emph{width} of the BP-factorization. The emph{largest BP-factorization} of a word $w$ is the BP-factorization of $w$ with the maximum width. We study words with certain BP-factorizations. First, we give a recurrence for the number of length-$n$ words with largest BP-factorization of width $t$. Second, we show that the expected width of the largest BP-factorization of a word tends to a constant. Third, we give some results on another extremal variation of BP-factorization, the emph{smallest BP-factorization}. A emph{border} of a word $w$ is a non-empty word that is both a proper prefix and suffix of $w$. Finally, we conclude by showing a connection between words with a unique border and words whose smallest and largest BP-factorizations coincide.
{"title":"Smallest and Largest Block Palindrome Factorizations","authors":"D. Gabric, J. Shallit","doi":"10.48550/arXiv.2302.13147","DOIUrl":"https://doi.org/10.48550/arXiv.2302.13147","url":null,"abstract":"A emph{palindrome} is a word that reads the same forwards and backwards. A emph{block palindrome factorization} (or emph{BP-factorization}) is a factorization of a word into blocks that becomes palindrome if each identical block is replaced by a distinct symbol. We call the number of blocks in a BP-factorization the emph{width} of the BP-factorization. The emph{largest BP-factorization} of a word $w$ is the BP-factorization of $w$ with the maximum width. We study words with certain BP-factorizations. First, we give a recurrence for the number of length-$n$ words with largest BP-factorization of width $t$. Second, we show that the expected width of the largest BP-factorization of a word tends to a constant. Third, we give some results on another extremal variation of BP-factorization, the emph{smallest BP-factorization}. A emph{border} of a word $w$ is a non-empty word that is both a proper prefix and suffix of $w$. Finally, we conclude by showing a connection between words with a unique border and words whose smallest and largest BP-factorizations coincide.","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"278 1","pages":"181-191"},"PeriodicalIF":0.0,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75330832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.48550/arXiv.2302.00405
N. Rampersad, J. Shallit
We show how to obtain, via a unified framework provided by logic and automata theory, many classical results of Brillhart and Morton on Rudin-Shapiro sums. The techniques also facilitate easy proofs for new results.
{"title":"Rudin-Shapiro Sums Via Automata Theory and Logic","authors":"N. Rampersad, J. Shallit","doi":"10.48550/arXiv.2302.00405","DOIUrl":"https://doi.org/10.48550/arXiv.2302.00405","url":null,"abstract":"We show how to obtain, via a unified framework provided by logic and automata theory, many classical results of Brillhart and Morton on Rudin-Shapiro sums. The techniques also facilitate easy proofs for new results.","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"10 1","pages":"233-246"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75931482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-15DOI: 10.48550/arXiv.2301.06145
Lucas Mol, N. Rampersad, J. Shallit
We study various aspects of Dyck words appearing in binary sequences, where $0$ is treated as a left parenthesis and $1$ as a right parenthesis. We show that binary words that are $7/3$-power-free have bounded nesting level, but this no longer holds for larger repetition exponents. We give an explicit characterization of the factors of the Thue-Morse word that are Dyck, and show how to count them. We also prove tight upper and lower bounds on $f(n)$, the number of Dyck factors of Thue-Morse of length $2n$.
{"title":"Dyck Words, Pattern Avoidance, and Automatic Sequences","authors":"Lucas Mol, N. Rampersad, J. Shallit","doi":"10.48550/arXiv.2301.06145","DOIUrl":"https://doi.org/10.48550/arXiv.2301.06145","url":null,"abstract":"We study various aspects of Dyck words appearing in binary sequences, where $0$ is treated as a left parenthesis and $1$ as a right parenthesis. We show that binary words that are $7/3$-power-free have bounded nesting level, but this no longer holds for larger repetition exponents. We give an explicit characterization of the factors of the Thue-Morse word that are Dyck, and show how to count them. We also prove tight upper and lower bounds on $f(n)$, the number of Dyck factors of Thue-Morse of length $2n$.","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"74 1","pages":"220-232"},"PeriodicalIF":0.0,"publicationDate":"2023-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74568044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1007/978-3-031-33180-0_22
Pierre-Adrien Tahay
{"title":"Characteristic Sequences of the Sets of Sums of Squares as Columns of Cellular Automata","authors":"Pierre-Adrien Tahay","doi":"10.1007/978-3-031-33180-0_22","DOIUrl":"https://doi.org/10.1007/978-3-031-33180-0_22","url":null,"abstract":"","PeriodicalId":31852,"journal":{"name":"Beyond Words","volume":"42 1","pages":"288-300"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76281091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}