{"title":"Approximate string matching in sublinear expected time","authors":"W. I. Chang, E. Lawler","doi":"10.1109/FSCS.1990.89530","DOIUrl":null,"url":null,"abstract":"The k differences approximate string matching problem specifies a text string of length n, a pattern string of length m, and the number k of differences (insertions, deletions, substitutions) allowed in a match, and asks for every location in the text where a match occurs. Previous algorithms required at least O(nk) time. When k is as large as a fraction of m, no substantial progress has been made over O(nm) dynamic programming. The authors have investigated much faster algorithms for restricted cases of the problem, such as when the text string is random and errors are not too frequent. They have devised an algorithm that, for k<m/log n+O(1), runs in time O((n/m)k log n) on the average. In the worst case their algorithm is O(nk), but it is still an improvement in that it is very practical and uses only O(n) space compared with O(n) or O(n/sup 2/). The authors define an approximate substring matching problem and give efficient algorithms based on their techniques. Special cases include several applications to genetics and molecular biology.<<ETX>>","PeriodicalId":271949,"journal":{"name":"Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"111","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSCS.1990.89530","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 111
Abstract
The k differences approximate string matching problem specifies a text string of length n, a pattern string of length m, and the number k of differences (insertions, deletions, substitutions) allowed in a match, and asks for every location in the text where a match occurs. Previous algorithms required at least O(nk) time. When k is as large as a fraction of m, no substantial progress has been made over O(nm) dynamic programming. The authors have investigated much faster algorithms for restricted cases of the problem, such as when the text string is random and errors are not too frequent. They have devised an algorithm that, for k>