The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time

Shay Golan, T. Kociumaka, T. Kopelowitz, E. Porat
{"title":"The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time","authors":"Shay Golan, T. Kociumaka, T. Kopelowitz, E. Porat","doi":"10.4230/LIPIcs.CPM.2020.15","DOIUrl":null,"url":null,"abstract":"We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\\tilde O(k)$ space and $\\tilde O\\big(\\sqrt k\\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\\tilde O(n\\sqrt k)$, and the fastest known offline algorithm, which costs $\\tilde O\\big(n + \\min\\big(\\frac{nk}{\\sqrt m},\\sigma n\\big)\\big)$ time. Moreover, it is not known whether improvements over the $\\tilde O(n\\sqrt k)$ total time are possible when using more than $O(k)$ space. \nWe address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\\le s \\le m$, uses $\\tilde O(s)$ space and costs $\\tilde O\\big(n+\\min\\big(\\frac {nk^2}m,\\frac{nk}{\\sqrt s},\\frac{\\sigma nm}s\\big)\\big)$ total time. For $s=m$, the total runtime becomes $\\tilde O\\big(n + \\min\\big(\\frac{nk}{\\sqrt m},\\sigma n\\big)\\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\\tilde O\\big(\\sqrt k\\big)$.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2020.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)$ time. Moreover, it is not known whether improvements over the $\tilde O(n\sqrt k)$ total time are possible when using more than $O(k)$ space. We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses $\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{\sigma nm}s\big)\big)$ total time. For $s=m$, the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt k\big)$.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
流k不匹配问题:空间和总时间之间的权衡
我们在长度为$m$的模式和长度为$n$的流文本上重新审视流模型中的$k$ -不匹配问题,它们都在大小为$\sigma$的字母表上。Clifford等人[SODA 2019]目前最先进的流媒体$k$ -mismatch问题算法使用$\tilde O(k)$空间和$\tilde O\big(\sqrt k\big)$每个字符的最坏情况时间。已知空间复杂度是(无条件)最优的,每个字符的最坏情况时间匹配一个条件下界。然而,该算法的总时间成本为$\tilde O(n\sqrt k)$,与已知最快的离线算法(花费$\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)$时间)之间存在差距。此外,还不知道当使用超过$O(k)$的空间时,对$\tilde O(n\sqrt k)$总时间的改进是否可能。我们通过为$k$ -mismatch问题设计一个随机流算法来解决这些差距,该算法给定一个整数参数$k\le s \le m$,使用$\tilde O(s)$空间并花费$\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{\sigma nm}s\big)\big)$总时间。对于$s=m$,总运行时间变为$\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)$,这与最快的离线算法的时间成本相匹配。此外,每个字符的最坏情况时间成本仍然是$\tilde O\big(\sqrt k\big)$。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal LZ-End Parsing is Hard From Bit-Parallelism to Quantum String Matching for Labelled Graphs Order-Preserving Squares in Strings Sliding Window String Indexing in Streams Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1