{"title":"Sorted sliding window compression","authors":"U. Graf","doi":"10.1109/DCC.1999.785684","DOIUrl":null,"url":null,"abstract":"Sorted Sliding Window Compression (SSWC) uses a new model (Sorted Sliding Window Model | SSWM) to encode strings e cient, which appear again while encoding a symbol sequence. The SSWM holds statistics of all strings up to certain length k in a sliding window of size n (the sliding window is de ned like in lz77). The compression program can use the SSWM to determine if the string of the next symbols are already contained in the sliding window and returns the length of match. SSWM gives directly statistics (borders of subinterval in an interval) for use in entropy encoding methods like Arithmetic Coding or Dense Coding [Gra97]. For a given number in an interval and the string length the SSWM gives back the corresponding string which is used in decompressing. After an encoding (decoding) step the model is updated with the just encoded (decoded) characters. The Model sorts all string starting points in the sliding window lexicographically. A simple way to implement the SSWM is by exhaustive search in the sliding window. An implementation with a B-tree together with special binary searches is used here. SSWC is a simple compression scheme, which uses this new model to evaluate its properties. It looks on the next characters to encode and determines the longest match with the SSWM. If the match is smaller than 2, the character is encoded. Otherwise the length and the subinterval of the string are encoded. The length values are encoded together with the single characters by using the same adaptive frequency model. Additionally some rules are used to reduce the matching length if the code length get worse. Encoding of frequencies and intervals is done with Dense Coding. SSWC is in average better than gzip [Gai93] on the Calgary corpus: 0:2 0:5 bits-per-byte better on most les and at most 0:03 bits-per-byte worse (progc and progl). This proves the quality and gives con dence in the usability of SSWM as a new building block in models for compression. SSWM has O(log k) computing complexity on all operations and needs O(n) space. SSWM can be used to implement PPM or Markov models in limited space environments because it holds all necessary informations.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1999.785684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Sorted Sliding Window Compression (SSWC) uses a new model (Sorted Sliding Window Model | SSWM) to encode strings e cient, which appear again while encoding a symbol sequence. The SSWM holds statistics of all strings up to certain length k in a sliding window of size n (the sliding window is de ned like in lz77). The compression program can use the SSWM to determine if the string of the next symbols are already contained in the sliding window and returns the length of match. SSWM gives directly statistics (borders of subinterval in an interval) for use in entropy encoding methods like Arithmetic Coding or Dense Coding [Gra97]. For a given number in an interval and the string length the SSWM gives back the corresponding string which is used in decompressing. After an encoding (decoding) step the model is updated with the just encoded (decoded) characters. The Model sorts all string starting points in the sliding window lexicographically. A simple way to implement the SSWM is by exhaustive search in the sliding window. An implementation with a B-tree together with special binary searches is used here. SSWC is a simple compression scheme, which uses this new model to evaluate its properties. It looks on the next characters to encode and determines the longest match with the SSWM. If the match is smaller than 2, the character is encoded. Otherwise the length and the subinterval of the string are encoded. The length values are encoded together with the single characters by using the same adaptive frequency model. Additionally some rules are used to reduce the matching length if the code length get worse. Encoding of frequencies and intervals is done with Dense Coding. SSWC is in average better than gzip [Gai93] on the Calgary corpus: 0:2 0:5 bits-per-byte better on most les and at most 0:03 bits-per-byte worse (progc and progl). This proves the quality and gives con dence in the usability of SSWM as a new building block in models for compression. SSWM has O(log k) computing complexity on all operations and needs O(n) space. SSWM can be used to implement PPM or Markov models in limited space environments because it holds all necessary informations.