首页 > 最新文献

Proceedings DCC '97. Data Compression Conference最新文献

英文 中文
Error resiliency issues in wavelet compression 小波压缩中的错误弹性问题
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582150
A. Youssef
Summary form only given. Error resiliency is the ability to tolerate uncorrectable errors with graceful quality degradation. It differs from traditional uses of error-correcting coding (ECC) in two major respects: (1) it assigns differentiated (rather than uniform) error protection to different segments of the data, and (2) if errors cannot be corrected in some data segments, a good- (albeit degraded-) quality reconstruction of the data is still possible. The error resiliency approach is suitable in lossy compression, particularly, DCT-based and wavelet-based compression. Under those compression schemes, the data is separated into different frequencies or frequency bands. Since the human visual and auditory systems are more sensitive to lower-frequency data than to higher-frequency data, it is better to protect the lower-frequency data more than the higher-frequency data, given a constant ECC bit rate. With this differentiated error protection, the probability of recovering from errors in the lower-frequency data is higher, and thus the probability of reconstructing good-quality data (e.g., image, video or sound) is higher. For effective and efficient error resiliency, many issues need careful study and some are addressed in this paper. We investigate our error resiliency approaches applied to wavelet compression of images.
只提供摘要形式。错误弹性是容忍不可纠正的错误并使质量下降的能力。它与纠错编码(ECC)的传统用途在两个主要方面有所不同:(1)它为不同的数据段分配差异化(而不是统一)的错误保护,(2)如果某些数据段中的错误无法纠正,仍然可以对数据进行良好(尽管降级)的质量重建。误差弹性方法适用于有损压缩,特别是基于dct和小波的压缩。在这些压缩方案下,数据被分离到不同的频率或频带。由于人类的视觉和听觉系统对低频数据比高频数据更敏感,因此在给定恒定的ECC比特率的情况下,对低频数据的保护要优于对高频数据的保护。使用这种差别化错误保护,从低频数据中的错误中恢复的概率更高,因此重建高质量数据(例如图像、视频或声音)的概率更高。为了实现有效和高效的错误恢复,需要仔细研究许多问题,其中一些问题在本文中得到了解决。我们研究了应用于图像小波压缩的误差弹性方法。
{"title":"Error resiliency issues in wavelet compression","authors":"A. Youssef","doi":"10.1109/DCC.1997.582150","DOIUrl":"https://doi.org/10.1109/DCC.1997.582150","url":null,"abstract":"Summary form only given. Error resiliency is the ability to tolerate uncorrectable errors with graceful quality degradation. It differs from traditional uses of error-correcting coding (ECC) in two major respects: (1) it assigns differentiated (rather than uniform) error protection to different segments of the data, and (2) if errors cannot be corrected in some data segments, a good- (albeit degraded-) quality reconstruction of the data is still possible. The error resiliency approach is suitable in lossy compression, particularly, DCT-based and wavelet-based compression. Under those compression schemes, the data is separated into different frequencies or frequency bands. Since the human visual and auditory systems are more sensitive to lower-frequency data than to higher-frequency data, it is better to protect the lower-frequency data more than the higher-frequency data, given a constant ECC bit rate. With this differentiated error protection, the probability of recovering from errors in the lower-frequency data is higher, and thus the probability of reconstructing good-quality data (e.g., image, video or sound) is higher. For effective and efficient error resiliency, many issues need careful study and some are addressed in this paper. We investigate our error resiliency approaches applied to wavelet compression of images.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131468102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image compression in medical image databases using set redundancy 基于集冗余的医学图像数据库图像压缩
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582104
K. Karadimitriou, M. Fenstermacher
Summary form only given. Image compression is achieved by eliminating various types of redundancy that exist in the pixel values. Individual gray-scale images contain interpixel, psychovisual, and coding redundancy. However, sets of similar images contain an additional type of redundancy: the set redundancy. Set redundancy is the inter-image redundancy that results from the common information found in more than one image in the set. Set redundancy can be used to improve compression. Medical imaging is one of the best application areas for the enhanced compression model (ECM) and the set redundancy compression (SRC) methods. Medical images classified by modality and type of exam are very similar to one another, because of the standard procedures used in radiology. Therefore, medical image databases contain large amounts of set redundancy, which the ECM can efficiently reduce. Tests performed on a test database of CT brain scans showed significant compression improvement when the images were pre-processed with SRC methods to reduce set redundancy. The images were obtained from a random population of patients, and the tests were performed with the standard compression techniques used in radiology: Huffman encoding, arithmetic coding, and Lempel-Ziv compression. The best improvement resulted from combining the min-max predictive method with Huffman compression. In our tests we used genetic algorithms to identify the sets of similar images in the image database.
只提供摘要形式。图像压缩是通过消除存在于像素值中的各种冗余来实现的。单个灰度图像包含像素间冗余、心理视觉冗余和编码冗余。然而,相似图像的集合包含一种额外的冗余:集合冗余。集合冗余是指在集合中多个图像中发现的公共信息所产生的图像间冗余。设置冗余可以用来提高压缩。医学影像是增强压缩模型(ECM)和集冗余压缩(SRC)方法的最佳应用领域之一。由于放射学中使用的标准程序,按模式和检查类型分类的医学图像彼此非常相似。因此,医学图像数据库中包含大量的集合冗余,ECM可以有效地减少这些冗余。在CT脑部扫描的测试数据库上进行的测试表明,当使用SRC方法对图像进行预处理以减少集合冗余时,压缩效果显著改善。图像是从随机患者群体中获得的,并使用放射学中使用的标准压缩技术进行测试:霍夫曼编码、算术编码和Lempel-Ziv压缩。将最小-最大预测方法与霍夫曼压缩相结合,改进效果最好。在我们的测试中,我们使用遗传算法来识别图像数据库中的相似图像集。
{"title":"Image compression in medical image databases using set redundancy","authors":"K. Karadimitriou, M. Fenstermacher","doi":"10.1109/DCC.1997.582104","DOIUrl":"https://doi.org/10.1109/DCC.1997.582104","url":null,"abstract":"Summary form only given. Image compression is achieved by eliminating various types of redundancy that exist in the pixel values. Individual gray-scale images contain interpixel, psychovisual, and coding redundancy. However, sets of similar images contain an additional type of redundancy: the set redundancy. Set redundancy is the inter-image redundancy that results from the common information found in more than one image in the set. Set redundancy can be used to improve compression. Medical imaging is one of the best application areas for the enhanced compression model (ECM) and the set redundancy compression (SRC) methods. Medical images classified by modality and type of exam are very similar to one another, because of the standard procedures used in radiology. Therefore, medical image databases contain large amounts of set redundancy, which the ECM can efficiently reduce. Tests performed on a test database of CT brain scans showed significant compression improvement when the images were pre-processed with SRC methods to reduce set redundancy. The images were obtained from a random population of patients, and the tests were performed with the standard compression techniques used in radiology: Huffman encoding, arithmetic coding, and Lempel-Ziv compression. The best improvement resulted from combining the min-max predictive method with Huffman compression. In our tests we used genetic algorithms to identify the sets of similar images in the image database.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126436945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Linear-time, incremental hierarchy inference for compression 用于压缩的线性时间、增量层次推理
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.581951
C. Nevill-Manning, I. Witten
Data compression and learning are, in some sense, two sides of the same coin. If we paraphrase Occam's razor by saying that a small theory is better than a larger theory with the same explanatory power, we can characterize data compression as a preoccupation with small, and learning as a preoccupation with better. Nevill-Manning et al. (see Proc. Data Compression Conference, Los Alamitos, CA, p.244-253, 1994) presented an algorithm, since dubbed SEQUITUR, that presents both faces of the compression/learning coin. Its performance as a data compression scheme outstrips other dictionary schemes, and the structures that it learns from sequences as diverse as DNA and music are intuitively compelling. We present three new results that characterize SEQUITUR's computational and compression performance. First, we prove that SEQUITUR operates in time linear in n, the length of the input sequence, despite its ability to build a hierarchy as deep as log(n). Second, we show that a sequence can be compressed incrementally, improving on the non-incremental algorithm that was described by Nevill-Manning et al., and making on-line compression feasible. Third, we present an intriguing result that emerged during benchmarking; whereas PPMC outperforms SEQUITUR on most files in the Calgary corpus, SEQUITUR regains the lead when tested on multimegabyte sequences. We make some tentative conclusions about the underlying reasons for this phenomenon, and about the nature of current compression benchmarking.
从某种意义上说,数据压缩和学习是同一枚硬币的两面。如果我们将奥卡姆剃刀理论解释为具有相同解释力的小理论比大理论更好,我们可以将数据压缩描述为专注于小,而学习则专注于更好。内维尔-曼宁等人(参见Proc. Data Compression Conference, Los Alamitos, CA, p.244-253, 1994)提出了一种算法,此后被称为SEQUITUR,它呈现了压缩/学习硬币的两个方面。作为一种数据压缩方案,它的性能超过了其他字典方案,而且它从DNA和音乐等多种序列中学习的结构直观上令人信服。我们提出了三个新的结果,表征了SEQUITUR的计算和压缩性能。首先,我们证明了尽管SEQUITUR能够建立一个深度为log(n)的层次结构,但它在输入序列的长度n上是时间线性的。其次,我们证明了序列可以增量压缩,改进了由neville - manning等人描述的非增量算法,并使在线压缩成为可能。第三,我们提出了在基准测试期间出现的一个有趣的结果;虽然PPMC在卡尔加里语料库中的大多数文件上优于SEQUITUR,但在对多兆字节序列进行测试时,SEQUITUR重新获得领先优势。我们对这一现象的潜在原因以及当前压缩基准测试的本质做出了一些初步的结论。
{"title":"Linear-time, incremental hierarchy inference for compression","authors":"C. Nevill-Manning, I. Witten","doi":"10.1109/DCC.1997.581951","DOIUrl":"https://doi.org/10.1109/DCC.1997.581951","url":null,"abstract":"Data compression and learning are, in some sense, two sides of the same coin. If we paraphrase Occam's razor by saying that a small theory is better than a larger theory with the same explanatory power, we can characterize data compression as a preoccupation with small, and learning as a preoccupation with better. Nevill-Manning et al. (see Proc. Data Compression Conference, Los Alamitos, CA, p.244-253, 1994) presented an algorithm, since dubbed SEQUITUR, that presents both faces of the compression/learning coin. Its performance as a data compression scheme outstrips other dictionary schemes, and the structures that it learns from sequences as diverse as DNA and music are intuitively compelling. We present three new results that characterize SEQUITUR's computational and compression performance. First, we prove that SEQUITUR operates in time linear in n, the length of the input sequence, despite its ability to build a hierarchy as deep as log(n). Second, we show that a sequence can be compressed incrementally, improving on the non-incremental algorithm that was described by Nevill-Manning et al., and making on-line compression feasible. Third, we present an intriguing result that emerged during benchmarking; whereas PPMC outperforms SEQUITUR on most files in the Calgary corpus, SEQUITUR regains the lead when tested on multimegabyte sequences. We make some tentative conclusions about the underlying reasons for this phenomenon, and about the nature of current compression benchmarking.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128888589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
Temporally scalable video coding using nonlinear deinterlacing 使用非线性去隔行的时间可伸缩视频编码
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582078
S. Bayrakeri, R. Mersereau
Summary form only given. Although a simple solution to multi-resolution video coding is the simulcast technique, in which each resolution of multi-resolution video is coded independently, a more efficient method is scalable coding. In scalable video coding, the lower resolution reproduction of the video is used in coding the higher resolution video. Temporal scalability is a tool intended for use in a range of applications, such as broadcasting of interlaced TV and progressive HDTV, for which migration to higher temporal resolution is necessary. Based on the simulation results with an MPEG-2 video encoder, it is observed that scalable coding with nonlinear interpolation achieves a 2-3 dB PSNR improvement over the simulcast coding at the same total bit-rate. However, for progressive input: interlace-to-interlace type temporal scalability, the scalable coding performance is lower than that of single layer progressive coding. This is expected as the coding performance decreases in interlaced coding due to the difficulty of motion estimation in interlaced sequences.
只提供摘要形式。虽然多分辨率视频编码的一种简单的解决方案是联播技术,其中多分辨率视频的每个分辨率都是独立编码的,但更有效的方法是可扩展编码。在可伸缩视频编码中,用低分辨率的视频再现来编码高分辨率的视频。时间可扩展性是一种用于一系列应用的工具,例如隔行电视广播和渐进式高清电视,因此迁移到更高的时间分辨率是必要的。基于MPEG-2视频编码器的仿真结果表明,在相同的总比特率下,采用非线性插值的可扩展编码比同时广播编码的PSNR提高了2-3 dB。然而,对于逐行输入:隔行到隔行类型的时间可扩展性,可扩展性编码性能低于单层逐行编码。这是预期的编码性能下降,在隔行编码由于运动估计的困难在隔行序列。
{"title":"Temporally scalable video coding using nonlinear deinterlacing","authors":"S. Bayrakeri, R. Mersereau","doi":"10.1109/DCC.1997.582078","DOIUrl":"https://doi.org/10.1109/DCC.1997.582078","url":null,"abstract":"Summary form only given. Although a simple solution to multi-resolution video coding is the simulcast technique, in which each resolution of multi-resolution video is coded independently, a more efficient method is scalable coding. In scalable video coding, the lower resolution reproduction of the video is used in coding the higher resolution video. Temporal scalability is a tool intended for use in a range of applications, such as broadcasting of interlaced TV and progressive HDTV, for which migration to higher temporal resolution is necessary. Based on the simulation results with an MPEG-2 video encoder, it is observed that scalable coding with nonlinear interpolation achieves a 2-3 dB PSNR improvement over the simulcast coding at the same total bit-rate. However, for progressive input: interlace-to-interlace type temporal scalability, the scalable coding performance is lower than that of single layer progressive coding. This is expected as the coding performance decreases in interlaced coding due to the difficulty of motion estimation in interlaced sequences.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132284610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Compressing address trace data for cache simulations 压缩地址跟踪数据缓存模拟
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582096
A. Fox, T. Grun
Summary form only given. Our new approach of storing address traces, the RPS format (recovered program structure), is based on two ideas: first, the structure of the underlying program is reconstructed from the address trace, and second, the output is decomposed in multiple files such that gzip can take advantage of repeated input patterns. In the first step, the control flow of the program is determined by identifying the basic blocks, i.e., the straight segments of code with no jumps in and out. Then, the invocation sequence of basic blocks is written to a file, which can be compressed by a factor of more than 35, since gzip can easily detect patterns in it. The basic block data contains information on the length of a basic block and on the position of load and store instructions among all instructions. Their addresses are stored in separate files. In the second step, the load and store references are partitioned in global, local and unassigned variable classes. Global variables have the same value for all invocations of a basic block, local variables can be represented as base + offset, where offset is a constant and base only changes between invocations of a basic block. All other variables are "unassigned" and their addresses are stored in separate files as a difference to the previous value.
只提供摘要形式。我们存储地址跟踪的新方法RPS格式(恢复的程序结构)基于两个思想:第一,从地址跟踪重构底层程序的结构;第二,将输出分解为多个文件,以便gzip可以利用重复的输入模式。在第一步中,程序的控制流程是通过识别基本块来确定的,即没有跳跃的直接代码段。然后,将基本块的调用序列写入一个文件,该文件可以被压缩35倍以上,因为gzip可以很容易地检测到其中的模式。基本块数据包含基本块的长度以及所有指令中加载和存储指令的位置信息。它们的地址存储在不同的文件中。在第二步中,将加载和存储引用划分为全局、局部和未分配的变量类。全局变量对于基本块的所有调用具有相同的值,局部变量可以表示为基数+偏移量,其中偏移量是常量,而基数仅在基本块的调用之间变化。所有其他变量都是“未赋值的”,它们的地址作为与前一个值的差异存储在单独的文件中。
{"title":"Compressing address trace data for cache simulations","authors":"A. Fox, T. Grun","doi":"10.1109/DCC.1997.582096","DOIUrl":"https://doi.org/10.1109/DCC.1997.582096","url":null,"abstract":"Summary form only given. Our new approach of storing address traces, the RPS format (recovered program structure), is based on two ideas: first, the structure of the underlying program is reconstructed from the address trace, and second, the output is decomposed in multiple files such that gzip can take advantage of repeated input patterns. In the first step, the control flow of the program is determined by identifying the basic blocks, i.e., the straight segments of code with no jumps in and out. Then, the invocation sequence of basic blocks is written to a file, which can be compressed by a factor of more than 35, since gzip can easily detect patterns in it. The basic block data contains information on the length of a basic block and on the position of load and store instructions among all instructions. Their addresses are stored in separate files. In the second step, the load and store references are partitioned in global, local and unassigned variable classes. Global variables have the same value for all invocations of a basic block, local variables can be represented as base + offset, where offset is a constant and base only changes between invocations of a basic block. All other variables are \"unassigned\" and their addresses are stored in separate files as a difference to the previous value.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114475725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Entropy-constrained successively refinable scalar quantization 熵约束连续可细化标量量子化
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582057
H. Jafarkhani, H. Brunk, N. Farvardin
We study the design of entropy-constrained successively refinable scalar quantizers. We propose two algorithms to minimize the average distortion and design such a quantizer. We consider two sets of constraints on the entropy: (i) constraint on the average rate and (ii) constraint on aggregate rates. Both algorithms can be easily extended to design vector quantizers.
研究了熵约束连续可细化标量量化器的设计。我们提出了两种最小化平均失真的算法,并设计了这样一个量化器。我们考虑了对熵的两组约束:(i)对平均速率的约束和(ii)对聚合速率的约束。这两种算法都可以很容易地扩展到设计矢量量化器。
{"title":"Entropy-constrained successively refinable scalar quantization","authors":"H. Jafarkhani, H. Brunk, N. Farvardin","doi":"10.1109/DCC.1997.582057","DOIUrl":"https://doi.org/10.1109/DCC.1997.582057","url":null,"abstract":"We study the design of entropy-constrained successively refinable scalar quantizers. We propose two algorithms to minimize the average distortion and design such a quantizer. We consider two sets of constraints on the entropy: (i) constraint on the average rate and (ii) constraint on aggregate rates. Both algorithms can be easily extended to design vector quantizers.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128743167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Data compression using text encryption 使用文本加密进行数据压缩
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582107
H. Kruse, A. Mukherjee
Summary form only given. We discuss the use of a new algorithm to preprocess text in order to improve the compression ratio of textual documents, in particular online documents such as web pages on the World Wide Web. The algorithm was first introduced in an earlier paper, and in this paper we discuss the applicability of our algorithm in Internet and Intranet environments, and present additional performance measurements regarding compression ratios, memory requirements and run time. Our results show that our preprocessing algorithm usually leads to a significantly improved compression ratio. Our algorithm requires a static dictionary shared by the compressor and the decompressor. The basic idea of the algorithm is to define a unique encryption or signature for each word in the dictionary, and to replace each word in the input text by its signature. Each signature consists mostly of the special character '*' plus as many alphabetic characters as necessary to make the signature unique among all words of the same length in the dictionary. In the resulting cryptic text the most frequently used character is typically the '*' character, and standard compression algorithms like LZW applied to the cryptic text can exploit this redundancy in order to achieve better compression ratios. We compared the performance of our algorithm to other text compression algorithms, including standard compression algorithms such as gzip, Unix 'compress' and PPM, and to one text compression algorithm which uses a static dictionary.
只提供摘要形式。我们讨论了一种新的文本预处理算法的使用,以提高文本文档的压缩比,特别是在线文档,如万维网上的网页。该算法在之前的一篇论文中首次介绍,在本文中,我们讨论了我们的算法在Internet和Intranet环境中的适用性,并介绍了有关压缩比、内存需求和运行时间的其他性能度量。结果表明,我们的预处理算法通常可以显著提高压缩比。我们的算法需要压缩器和解压缩器共享一个静态字典。该算法的基本思想是为字典中的每个单词定义唯一的加密或签名,并用其签名替换输入文本中的每个单词。每个签名主要由特殊字符“*”加上尽可能多的字母字符组成,以使签名在字典中相同长度的所有单词中唯一。在生成的加密文本中,最常用的字符通常是“*”字符,应用于加密文本的标准压缩算法(如LZW)可以利用这种冗余来获得更好的压缩比。我们将算法的性能与其他文本压缩算法(包括标准压缩算法,如gzip、Unix 'compress'和PPM)以及使用静态字典的文本压缩算法进行了比较。
{"title":"Data compression using text encryption","authors":"H. Kruse, A. Mukherjee","doi":"10.1109/DCC.1997.582107","DOIUrl":"https://doi.org/10.1109/DCC.1997.582107","url":null,"abstract":"Summary form only given. We discuss the use of a new algorithm to preprocess text in order to improve the compression ratio of textual documents, in particular online documents such as web pages on the World Wide Web. The algorithm was first introduced in an earlier paper, and in this paper we discuss the applicability of our algorithm in Internet and Intranet environments, and present additional performance measurements regarding compression ratios, memory requirements and run time. Our results show that our preprocessing algorithm usually leads to a significantly improved compression ratio. Our algorithm requires a static dictionary shared by the compressor and the decompressor. The basic idea of the algorithm is to define a unique encryption or signature for each word in the dictionary, and to replace each word in the input text by its signature. Each signature consists mostly of the special character '*' plus as many alphabetic characters as necessary to make the signature unique among all words of the same length in the dictionary. In the resulting cryptic text the most frequently used character is typically the '*' character, and standard compression algorithms like LZW applied to the cryptic text can exploit this redundancy in order to achieve better compression ratios. We compared the performance of our algorithm to other text compression algorithms, including standard compression algorithms such as gzip, Unix 'compress' and PPM, and to one text compression algorithm which uses a static dictionary.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133225446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
The ELS-coder: a rapid entropy coder els编码器:一种快速熵编码器
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582144
D. Withers
Summary form only given. The ELS-coder is a new entropy-coding algorithm combining rapid encoding and decoding with near-optimum compression ratios. It can be combined with data-modeling methods to produce data-compression applications for text, images, or any type of digital data. Previous algorithms for entropy coding include Huffman coding, arithmetic coding, and the Q- and QM-coders, but all show limitations of speed or compression performance, so that new algorithms continue to be of interest. The ELS-coder, which uses no multiplication or division operations, operates more rapidly than traditional arithmetic coding. It compresses more effectively than Huffman coding (especially for a binary alphabet) and more effectively than the Q- or QM-coder except for symbol probabilities very close to zero or one.
只提供摘要形式。els编码器是一种新的熵编码算法,它结合了快速的编解码和接近最佳的压缩比。它可以与数据建模方法相结合,为文本、图像或任何类型的数字数据生成数据压缩应用程序。以前的熵编码算法包括霍夫曼编码、算术编码以及Q-和qm编码器,但它们都显示出速度或压缩性能的限制,因此新的算法继续引起人们的兴趣。els编码器不使用乘法或除法运算,比传统的算术编码操作更快。除了符号概率非常接近0或1之外,它比霍夫曼编码更有效(特别是对于二进制字母),比Q-或qm编码器更有效。
{"title":"The ELS-coder: a rapid entropy coder","authors":"D. Withers","doi":"10.1109/DCC.1997.582144","DOIUrl":"https://doi.org/10.1109/DCC.1997.582144","url":null,"abstract":"Summary form only given. The ELS-coder is a new entropy-coding algorithm combining rapid encoding and decoding with near-optimum compression ratios. It can be combined with data-modeling methods to produce data-compression applications for text, images, or any type of digital data. Previous algorithms for entropy coding include Huffman coding, arithmetic coding, and the Q- and QM-coders, but all show limitations of speed or compression performance, so that new algorithms continue to be of interest. The ELS-coder, which uses no multiplication or division operations, operates more rapidly than traditional arithmetic coding. It compresses more effectively than Huffman coding (especially for a binary alphabet) and more effectively than the Q- or QM-coder except for symbol probabilities very close to zero or one.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125643698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A context-tree weighting method for text generating sources 文本生成源的上下文树加权方法
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582140
T. Tjalkens, P. Volf, F. Willems
Summary form only given. The authors discuss context tree weighting (Willems et al. 1995). This was originally introduced as a sequential universal source coding method for the class of binary tree sources. The paper discusses the application of the method to the compaction of ASCII sequences. The estimation of redundancy and model redundancy are also considered.
只提供摘要形式。作者讨论了上下文树权重(Willems et al. 1995)。这最初是作为二叉树源类的顺序通用源编码方法引入的。本文讨论了该方法在ASCII序列压缩中的应用。同时考虑了冗余估计和模型冗余估计。
{"title":"A context-tree weighting method for text generating sources","authors":"T. Tjalkens, P. Volf, F. Willems","doi":"10.1109/DCC.1997.582140","DOIUrl":"https://doi.org/10.1109/DCC.1997.582140","url":null,"abstract":"Summary form only given. The authors discuss context tree weighting (Willems et al. 1995). This was originally introduced as a sequential universal source coding method for the class of binary tree sources. The paper discusses the application of the method to the compaction of ASCII sequences. The estimation of redundancy and model redundancy are also considered.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128035466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Generalization and improvement to PPM's "blending" PPM“混合”的推广与改进
Pub Date : 1997-03-25 DOI: 10.1109/DCC.1997.582082
S. Bunton
Summary form only given. The best-performing method in the data compression literature for computing probability estimates of sequences on-line using a suffix-tree model is the blending technique used by PPM. Blending can be viewed as a bottom-up recursive procedure for computing a mixture, barring one missing term for each level of the recursion, where a mixture is basically a weighted average of several probability estimates. The author shows the relative effectiveness of most combinations of mixture weighting functions and inheritance evaluation times. The results of a study on the value of using update exclusion, especially in models using state selection, are also sown.
只提供摘要形式。在使用后缀树模型在线计算序列概率估计的数据压缩文献中,性能最好的方法是PPM使用的混合技术。混合可以看作是计算混合的自下而上递归过程,除非递归的每一级都缺少一个项,其中混合基本上是几个概率估计的加权平均值。作者证明了大多数混合加权函数和继承评估次数组合的相对有效性。本文还对更新排除的应用价值,特别是在状态选择模型中的应用价值进行了研究。
{"title":"Generalization and improvement to PPM's \"blending\"","authors":"S. Bunton","doi":"10.1109/DCC.1997.582082","DOIUrl":"https://doi.org/10.1109/DCC.1997.582082","url":null,"abstract":"Summary form only given. The best-performing method in the data compression literature for computing probability estimates of sequences on-line using a suffix-tree model is the blending technique used by PPM. Blending can be viewed as a bottom-up recursive procedure for computing a mixture, barring one missing term for each level of the recursion, where a mixture is basically a weighted average of several probability estimates. The author shows the relative effectiveness of most combinations of mixture weighting functions and inheritance evaluation times. The results of a study on the value of using update exclusion, especially in models using state selection, are also sown.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127843006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings DCC '97. Data Compression Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1