熵有限文本的快速模式匹配

Proceedings DCC '95 Data Compression Conference Pub Date : 1995-03-28 DOI:10.1109/DCC.1995.515518

Shenfeng Chen, J. Reif

{"title":"熵有限文本的快速模式匹配","authors":"Shenfeng Chen, J. Reif","doi":"10.1109/DCC.1995.515518","DOIUrl":null,"url":null,"abstract":"We present the first known case of one-dimensional and two-dimensional string matching algorithms for text with bounded entropy. Let n be the length of the text and m be the length of the pattern. We show that the expected complexity of the algorithms is related to the entropy of the text for various assumptions of the distribution of the pattern. For the case of uniformly distributed patterns, our one dimensional matching algorithm works in O(nlogm/(pm)) expected running time where H is the entropy of the text and p=1-(1-H/sup 2/)/sup H/(1+H)/. The worst case running time T can also be bounded by (n log m/p(m+/spl radic/V))/spl les/T/spl les/(n log m/p(m-/spl radic/V)) if V is the variance of the source from which the pattern is generated. Our algorithm utilizes data structures and probabilistic analysis techniques that are found in certain lossless data compression schemes.","PeriodicalId":107017,"journal":{"name":"Proceedings DCC '95 Data Compression Conference","volume":"635 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Fast pattern matching for entropy bounded text\",\"authors\":\"Shenfeng Chen, J. Reif\",\"doi\":\"10.1109/DCC.1995.515518\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present the first known case of one-dimensional and two-dimensional string matching algorithms for text with bounded entropy. Let n be the length of the text and m be the length of the pattern. We show that the expected complexity of the algorithms is related to the entropy of the text for various assumptions of the distribution of the pattern. For the case of uniformly distributed patterns, our one dimensional matching algorithm works in O(nlogm/(pm)) expected running time where H is the entropy of the text and p=1-(1-H/sup 2/)/sup H/(1+H)/. The worst case running time T can also be bounded by (n log m/p(m+/spl radic/V))/spl les/T/spl les/(n log m/p(m-/spl radic/V)) if V is the variance of the source from which the pattern is generated. Our algorithm utilizes data structures and probabilistic analysis techniques that are found in certain lossless data compression schemes.\",\"PeriodicalId\":107017,\"journal\":{\"name\":\"Proceedings DCC '95 Data Compression Conference\",\"volume\":\"635 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC '95 Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1995.515518\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '95 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1995.515518","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们提出了已知的第一个有界熵文本的一维和二维字符串匹配算法。设n为文本的长度，m为模式的长度。我们表明，对于模式分布的各种假设，算法的预期复杂性与文本的熵有关。对于均匀分布模式的情况，我们的一维匹配算法在O(nlogm/(pm))预期运行时间内工作，其中H是文本的熵，p=1-(1-H/sup 2/)/sup H/(1+H)/。如果V是生成模式的源的方差，则最坏情况下运行时间T也可以由(n log m/p(m+/spl radic/V))/spl les/T/spl les/(n log m/p(m-/spl radic/V))限定。我们的算法利用在某些无损数据压缩方案中发现的数据结构和概率分析技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast pattern matching for entropy bounded text

We present the first known case of one-dimensional and two-dimensional string matching algorithms for text with bounded entropy. Let n be the length of the text and m be the length of the pattern. We show that the expected complexity of the algorithms is related to the entropy of the text for various assumptions of the distribution of the pattern. For the case of uniformly distributed patterns, our one dimensional matching algorithm works in O(nlogm/(pm)) expected running time where H is the entropy of the text and p=1-(1-H/sup 2/)/sup H/(1+H)/. The worst case running time T can also be bounded by (n log m/p(m+/spl radic/V))/spl les/T/spl les/(n log m/p(m-/spl radic/V)) if V is the variance of the source from which the pattern is generated. Our algorithm utilizes data structures and probabilistic analysis techniques that are found in certain lossless data compression schemes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings DCC '95 Data Compression Conference

自引率

0.00%

发文量

期刊最新文献

Multiplication-free subband coding of color images Constraining the size of the instantaneous alphabet in trellis quantizers Classified conditional entropy coding of LSP parameters Lattice-based designs of direct sum codebooks for vector quantization On the performance of affine index assignments for redundancy free source-channel coding