{"title":"Some entropic bounds for Lempel-Ziv algorithms","authors":"S. Rao Kosaraju, G. Manzini","doi":"10.1109/DCC.1997.582106","DOIUrl":null,"url":null,"abstract":"Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H/sub k/(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H/sub k/(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H/sub 0/. First we show that for low entropy strings LZ78 compression ratio can be much higher than H/sub 0/. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '97. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1997.582106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H/sub k/(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H/sub k/(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H/sub 0/. First we show that for low entropy strings LZ78 compression ratio can be much higher than H/sub 0/. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings.