String matching with stopper compression

J. Rautio, Jani Tanninen, J. Tarhio
{"title":"String matching with stopper compression","authors":"J. Rautio, Jani Tanninen, J. Tarhio","doi":"10.1109/DCC.2002.1000012","DOIUrl":null,"url":null,"abstract":"Summary form only given. We consider string searching in compressed texts. We utilize a compression method related to static Huffman compression. Characters are encoded as variable length sequences of base symbols, which consist of a fixed number of bits. Because the length of a code as base symbols varies, we divide base symbols into stoppers and continuers in order to be able to recognize where a new code starts. Stoppers can only be used as the last base symbol of a code. All other base symbols are continuers which can be used anywhere but as the last base symbol of a code. Our searching algorithm is a variation of the Boyer-Moore-Horspool algorithm. The shift function is based on several base symbols in order to achieve longer jumps than the ordinary occurrence heuristic. If four bits are used for base symbols, we apply bytes of eight bits for shift calculation.","PeriodicalId":420897,"journal":{"name":"Proceedings DCC 2002. Data Compression Conference","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC 2002. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2002.1000012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Summary form only given. We consider string searching in compressed texts. We utilize a compression method related to static Huffman compression. Characters are encoded as variable length sequences of base symbols, which consist of a fixed number of bits. Because the length of a code as base symbols varies, we divide base symbols into stoppers and continuers in order to be able to recognize where a new code starts. Stoppers can only be used as the last base symbol of a code. All other base symbols are continuers which can be used anywhere but as the last base symbol of a code. Our searching algorithm is a variation of the Boyer-Moore-Horspool algorithm. The shift function is based on several base symbols in order to achieve longer jumps than the ordinary occurrence heuristic. If four bits are used for base symbols, we apply bytes of eight bits for shift calculation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
与塞子压缩匹配的字符串
只提供摘要形式。我们考虑在压缩文本中进行字符串搜索。我们利用一种与静态霍夫曼压缩相关的压缩方法。字符被编码为由固定位数组成的可变长度基符号序列。因为代码作为基符号的长度是不同的,所以我们将基符号分为止点和连续点,以便能够识别新的代码从哪里开始。塞子只能用作代码的最后一个基本符号。所有其他基符号都是连续符号,可以在任何地方使用,但不能作为代码的最后一个基符号。我们的搜索算法是Boyer-Moore-Horspool算法的一种变体。移位函数基于几个基本符号,以实现比普通启发式更长的跳跃。如果基符号使用4位,则移位计算使用8位的字节。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reduced complexity quantization under classification constraints Less redundant codes for variable size dictionaries Compression techniques for active video content LZAC lossless data compression Data compression of correlated non-binary sources using punctured turbo codes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1