{"title":"使用文本加密进行数据压缩","authors":"H. Kruse, A. Mukherjee","doi":"10.1109/DCC.1997.582107","DOIUrl":null,"url":null,"abstract":"Summary form only given. We discuss the use of a new algorithm to preprocess text in order to improve the compression ratio of textual documents, in particular online documents such as web pages on the World Wide Web. The algorithm was first introduced in an earlier paper, and in this paper we discuss the applicability of our algorithm in Internet and Intranet environments, and present additional performance measurements regarding compression ratios, memory requirements and run time. Our results show that our preprocessing algorithm usually leads to a significantly improved compression ratio. Our algorithm requires a static dictionary shared by the compressor and the decompressor. The basic idea of the algorithm is to define a unique encryption or signature for each word in the dictionary, and to replace each word in the input text by its signature. Each signature consists mostly of the special character '*' plus as many alphabetic characters as necessary to make the signature unique among all words of the same length in the dictionary. In the resulting cryptic text the most frequently used character is typically the '*' character, and standard compression algorithms like LZW applied to the cryptic text can exploit this redundancy in order to achieve better compression ratios. We compared the performance of our algorithm to other text compression algorithms, including standard compression algorithms such as gzip, Unix 'compress' and PPM, and to one text compression algorithm which uses a static dictionary.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Data compression using text encryption\",\"authors\":\"H. Kruse, A. Mukherjee\",\"doi\":\"10.1109/DCC.1997.582107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. We discuss the use of a new algorithm to preprocess text in order to improve the compression ratio of textual documents, in particular online documents such as web pages on the World Wide Web. The algorithm was first introduced in an earlier paper, and in this paper we discuss the applicability of our algorithm in Internet and Intranet environments, and present additional performance measurements regarding compression ratios, memory requirements and run time. Our results show that our preprocessing algorithm usually leads to a significantly improved compression ratio. Our algorithm requires a static dictionary shared by the compressor and the decompressor. The basic idea of the algorithm is to define a unique encryption or signature for each word in the dictionary, and to replace each word in the input text by its signature. Each signature consists mostly of the special character '*' plus as many alphabetic characters as necessary to make the signature unique among all words of the same length in the dictionary. In the resulting cryptic text the most frequently used character is typically the '*' character, and standard compression algorithms like LZW applied to the cryptic text can exploit this redundancy in order to achieve better compression ratios. We compared the performance of our algorithm to other text compression algorithms, including standard compression algorithms such as gzip, Unix 'compress' and PPM, and to one text compression algorithm which uses a static dictionary.\",\"PeriodicalId\":403990,\"journal\":{\"name\":\"Proceedings DCC '97. Data Compression Conference\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC '97. Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1997.582107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '97. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1997.582107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Summary form only given. We discuss the use of a new algorithm to preprocess text in order to improve the compression ratio of textual documents, in particular online documents such as web pages on the World Wide Web. The algorithm was first introduced in an earlier paper, and in this paper we discuss the applicability of our algorithm in Internet and Intranet environments, and present additional performance measurements regarding compression ratios, memory requirements and run time. Our results show that our preprocessing algorithm usually leads to a significantly improved compression ratio. Our algorithm requires a static dictionary shared by the compressor and the decompressor. The basic idea of the algorithm is to define a unique encryption or signature for each word in the dictionary, and to replace each word in the input text by its signature. Each signature consists mostly of the special character '*' plus as many alphabetic characters as necessary to make the signature unique among all words of the same length in the dictionary. In the resulting cryptic text the most frequently used character is typically the '*' character, and standard compression algorithms like LZW applied to the cryptic text can exploit this redundancy in order to achieve better compression ratios. We compared the performance of our algorithm to other text compression algorithms, including standard compression algorithms such as gzip, Unix 'compress' and PPM, and to one text compression algorithm which uses a static dictionary.