{"title":"构建基于单词的文本压缩算法","authors":"R. Horspool, G. Cormack","doi":"10.1109/DCC.1992.227475","DOIUrl":null,"url":null,"abstract":"Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms.<<ETX>>","PeriodicalId":170269,"journal":{"name":"Data Compression Conference, 1992.","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"96","resultStr":"{\"title\":\"Constructing word-based text compression algorithms\",\"authors\":\"R. Horspool, G. Cormack\",\"doi\":\"10.1109/DCC.1992.227475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms.<<ETX>>\",\"PeriodicalId\":170269,\"journal\":{\"name\":\"Data Compression Conference, 1992.\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"96\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Compression Conference, 1992.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1992.227475\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Compression Conference, 1992.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1992.227475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Constructing word-based text compression algorithms
Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms.<>