{"title":"High Performance Word-Codeword Mapping Algorithm on PPM","authors":"J. Adiego, Miguel A. Martínez-Prieto, P. Fuente","doi":"10.1109/DCC.2009.40","DOIUrl":null,"url":null,"abstract":"The word-codeword mapping technique allows words to be managed in PPM modelling when a natural language text file is being compressed. The main idea for managing words is to assign them codes in order to improve the compression. The previous work was focused on proposing several mapping adaptive algorithms and evaluating them. In this paper, we propose a semi-static word-codeword mapping method that takes advantage of by previous knowledge of some statistical data of the vocabulary. We test our idea implementing a basic prototype, dubbed mppm2, which also retains all the desirable features of a word-codeword mapping technique. The comparison with other techniques and compressors shows that our proposal is a very competitive choice for compressing natural language texts. In fact, empirical results show that our prototype achieves a very good compression for this type of documents.","PeriodicalId":377880,"journal":{"name":"2009 Data Compression Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2009.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
The word-codeword mapping technique allows words to be managed in PPM modelling when a natural language text file is being compressed. The main idea for managing words is to assign them codes in order to improve the compression. The previous work was focused on proposing several mapping adaptive algorithms and evaluating them. In this paper, we propose a semi-static word-codeword mapping method that takes advantage of by previous knowledge of some statistical data of the vocabulary. We test our idea implementing a basic prototype, dubbed mppm2, which also retains all the desirable features of a word-codeword mapping technique. The comparison with other techniques and compressors shows that our proposal is a very competitive choice for compressing natural language texts. In fact, empirical results show that our prototype achieves a very good compression for this type of documents.