首页 > 最新文献

Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)最新文献

英文 中文
Utilizing soft information in decoding of variable length codes 软信息在变长码译码中的应用
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.755662
Jiangtao Wen, J. Villasenor
We present a method for utilizing soft information in decoding of variable length codes (VLCs). When compared with traditional VLC decoding, which is performed using "hard" input bits and a state machine, soft-input VLC decoding offers improved performance in terms of packet and symbol error rates. Soft-input VLC decoding is free from the risk, encountered in hard decision VLC decoders in noisy environments, of terminating the decoding in an unsynchronized state, and it offers the possibility to exploit a priori knowledge, if available, of the number of symbols contained in the packet.
提出了一种利用软信息进行变长码译码的方法。与使用“硬”输入位和状态机执行的传统VLC解码相比,软输入VLC解码在数据包和符号错误率方面提供了更好的性能。软输入VLC解码没有在嘈杂环境中的硬决策VLC解码器中遇到的在非同步状态下终止解码的风险,并且它提供了利用先验知识(如果可用)的可能性,即包中包含的符号数量。
{"title":"Utilizing soft information in decoding of variable length codes","authors":"Jiangtao Wen, J. Villasenor","doi":"10.1109/DCC.1999.755662","DOIUrl":"https://doi.org/10.1109/DCC.1999.755662","url":null,"abstract":"We present a method for utilizing soft information in decoding of variable length codes (VLCs). When compared with traditional VLC decoding, which is performed using \"hard\" input bits and a state machine, soft-input VLC decoding offers improved performance in terms of packet and symbol error rates. Soft-input VLC decoding is free from the risk, encountered in hard decision VLC decoders in noisy environments, of terminating the decoding in an unsynchronized state, and it offers the possibility to exploit a priori knowledge, if available, of the number of symbols contained in the packet.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130954858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Rate-distortion optimized spatial scalability for DCT-based video coding 基于dct的视频编码的率失真优化空间扩展性
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.785682
M. Gallant, F. Kossentini
Summary form only given. We present our work on rate-distortion (RD) optimized spatial scalability for MC-DCT based video coding. Extending our work on RD optimized coding from the single layered to the multi-layered framework, we incorporate the additional inter-layer coding dependencies present in a multilayered framework into the set of permissible coding parameters. We employ the Lagrangian rate-distortion functional as it provides an elegant framework for determining the optimal choice of motion vectors, coding modes, and quantized coefficient levels by weighting a distortion term against a resulting rate term. We obtain a simple relationship between the Lagrangian parameter /spl lambda/, that controls rate-distortion tradeoffs, and the reference and enhancement layer quantization parameters QP, to allow the RD optimized framework to work easily in conjunction with rate control techniques that control the average bit rate by adjusting the quantization parameters. We then incorporate these relationships into our coder and generate two-layer bit streams with both the non-RD optimized coder and the RD optimized coder. We also generate RD optimized single-layer bit streams with the same resolution as the second layer of the two-layer bit streams. For the two-layer bit streams, we obtain a 0.6 to 1.4 dB improvement in PSNR by using RD optimization in both the base and enhancement layers. Compared to the single-layer bit stream, RD optimization in both the base and enhancement layers causes the decrease in PSNR to be reduced from 1.1 to 1.7 dB, to 0.3 to 0.5 dB.
只提供摘要形式。我们介绍了基于MC-DCT的视频编码的率失真(RD)优化的空间可扩展性。将我们在RD优化编码方面的工作从单层扩展到多层框架,我们将多层框架中存在的额外的层间编码依赖关系合并到允许的编码参数集中。我们采用拉格朗日率-失真函数,因为它提供了一个优雅的框架,通过对结果率项加权失真项来确定运动矢量、编码模式和量化系数水平的最佳选择。我们获得了控制率失真权衡的拉格朗日参数/spl lambda/与参考层和增强层量化参数QP之间的简单关系,从而使RD优化框架能够轻松地与通过调整量化参数控制平均比特率的速率控制技术结合使用。然后我们将这些关系合并到我们的编码器中,并使用非RD优化的编码器和RD优化的编码器生成两层比特流。我们还生成了RD优化的单层比特流,其分辨率与两层比特流的第二层相同。对于两层比特流,通过在基础层和增强层使用RD优化,我们获得了0.6至1.4 dB的PSNR改进。与单层比特流相比,基层和增强层的RD优化使PSNR的下降从1.1到1.7 dB,降低到0.3到0.5 dB。
{"title":"Rate-distortion optimized spatial scalability for DCT-based video coding","authors":"M. Gallant, F. Kossentini","doi":"10.1109/DCC.1999.785682","DOIUrl":"https://doi.org/10.1109/DCC.1999.785682","url":null,"abstract":"Summary form only given. We present our work on rate-distortion (RD) optimized spatial scalability for MC-DCT based video coding. Extending our work on RD optimized coding from the single layered to the multi-layered framework, we incorporate the additional inter-layer coding dependencies present in a multilayered framework into the set of permissible coding parameters. We employ the Lagrangian rate-distortion functional as it provides an elegant framework for determining the optimal choice of motion vectors, coding modes, and quantized coefficient levels by weighting a distortion term against a resulting rate term. We obtain a simple relationship between the Lagrangian parameter /spl lambda/, that controls rate-distortion tradeoffs, and the reference and enhancement layer quantization parameters QP, to allow the RD optimized framework to work easily in conjunction with rate control techniques that control the average bit rate by adjusting the quantization parameters. We then incorporate these relationships into our coder and generate two-layer bit streams with both the non-RD optimized coder and the RD optimized coder. We also generate RD optimized single-layer bit streams with the same resolution as the second layer of the two-layer bit streams. For the two-layer bit streams, we obtain a 0.6 to 1.4 dB improvement in PSNR by using RD optimization in both the base and enhancement layers. Compared to the single-layer bit stream, RD optimization in both the base and enhancement layers causes the decrease in PSNR to be reduced from 1.1 to 1.7 dB, to 0.3 to 0.5 dB.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"699 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126946095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory-efficient scalable line-based image coding 内存高效的可扩展的基于线的图像编码
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.755671
E. Ordentlich, D. Taubman, M. Weinberger, G. Seroussi, M. Marcellin
We study the problem of memory-efficient scalable image compression and investigate some tradeoffs in the complexity versus coding efficiency space. The focus is on a low-complexity algorithm centered around the use of sub-bit-planes, scan-causal modeling, and a simplified arithmetic coder. This algorithm approaches the lowest possible memory usage for scalable wavelet-based image compression and demonstrates that the generation of a scalable bit-stream is not incompatible with a low-memory architecture.
我们研究了具有内存效率的可扩展图像压缩问题,并在复杂度和编码效率空间之间进行了一些权衡。重点是围绕使用子位平面、扫描因果建模和简化的算术编码器的低复杂度算法。该算法为基于小波的可伸缩图像压缩提供了尽可能低的内存使用,并证明了可伸缩比特流的生成与低内存架构并不兼容。
{"title":"Memory-efficient scalable line-based image coding","authors":"E. Ordentlich, D. Taubman, M. Weinberger, G. Seroussi, M. Marcellin","doi":"10.1109/DCC.1999.755671","DOIUrl":"https://doi.org/10.1109/DCC.1999.755671","url":null,"abstract":"We study the problem of memory-efficient scalable image compression and investigate some tradeoffs in the complexity versus coding efficiency space. The focus is on a low-complexity algorithm centered around the use of sub-bit-planes, scan-causal modeling, and a simplified arithmetic coder. This algorithm approaches the lowest possible memory usage for scalable wavelet-based image compression and demonstrates that the generation of a scalable bit-stream is not incompatible with a low-memory architecture.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127035446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Random access decompression using binary arithmetic coding 随机访问解压缩使用二进制算术编码
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.755680
H. Lekatsas, W. Wolf
We present an algorithm based on arithmetic coding that allows decompression to start at any point in the compressed file. This random access requirement poses some restrictions on the implementation of arithmetic coding and on the model used. Our main application area is executable code compression for computer systems where machine instructions are decompressed on-the-fly before execution. We focus on the decompression side of arithmetic coding and we propose a fast decoding scheme based on finite state machines. Furthermore, we present a method to decode multiple bits per cycle, while keeping the size of the decoder small.
我们提出了一种基于算术编码的算法,允许在压缩文件中的任何点开始解压缩。这种随机访问要求对算术编码的实现和所使用的模型提出了一些限制。我们的主要应用领域是计算机系统的可执行代码压缩,其中机器指令在执行前动态解压缩。重点研究了算术编码的解压缩部分,提出了一种基于有限状态机的快速解码方案。此外,我们提出了一种每个周期解码多个比特的方法,同时保持解码器的大小较小。
{"title":"Random access decompression using binary arithmetic coding","authors":"H. Lekatsas, W. Wolf","doi":"10.1109/DCC.1999.755680","DOIUrl":"https://doi.org/10.1109/DCC.1999.755680","url":null,"abstract":"We present an algorithm based on arithmetic coding that allows decompression to start at any point in the compressed file. This random access requirement poses some restrictions on the implementation of arithmetic coding and on the model used. Our main application area is executable code compression for computer systems where machine instructions are decompressed on-the-fly before execution. We focus on the decompression side of arithmetic coding and we propose a fast decoding scheme based on finite state machines. Furthermore, we present a method to decode multiple bits per cycle, while keeping the size of the decoder small.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"326 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123238571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Application of a word-based text compression method to Japanese and Chinese texts 基于词的文本压缩方法在日文和中文文本中的应用
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.785718
S. Yoshida, T. Morihara, H. Yahagi, Noriko Itani
Summary form only given. 16-bit Asian language texts are difficult to compress using conventional 8-bit sampling text compression schemes. Recently the word-based text compression method has been studied with the intention of compressing Japanese and Chinese texts individually. In order to compress a large number of small-sized Japanese documents, such as groupware and E-mail, we applied a semi-adaptive word-based method to Japanese at DCC'98. To further enable multilingual text compression, we also applied a static word-based method to both the Japanese and Chinese texts and evaluated compression characteristics and performance using a computer simulation.
只提供摘要形式。使用传统的8位采样文本压缩方案很难压缩16位的亚洲语言文本。近年来,人们对基于词的文本压缩方法进行了研究,目的是分别对日文和中文文本进行压缩。为了压缩大量的小型日语文档,如群件和电子邮件,我们在DCC'98上对日语应用了半自适应的基于单词的方法。为了进一步实现多语言文本压缩,我们还对日语和中文文本应用了静态的基于单词的方法,并使用计算机模拟评估了压缩特性和性能。
{"title":"Application of a word-based text compression method to Japanese and Chinese texts","authors":"S. Yoshida, T. Morihara, H. Yahagi, Noriko Itani","doi":"10.1109/DCC.1999.785718","DOIUrl":"https://doi.org/10.1109/DCC.1999.785718","url":null,"abstract":"Summary form only given. 16-bit Asian language texts are difficult to compress using conventional 8-bit sampling text compression schemes. Recently the word-based text compression method has been studied with the intention of compressing Japanese and Chinese texts individually. In order to compress a large number of small-sized Japanese documents, such as groupware and E-mail, we applied a semi-adaptive word-based method to Japanese at DCC'98. To further enable multilingual text compression, we also applied a static word-based method to both the Japanese and Chinese texts and evaluated compression characteristics and performance using a computer simulation.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128965983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Compression of SAR and ultrasound imagery using texture models 利用纹理模型压缩SAR和超声图像
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.785704
J. Rosiles, Mark J. T. Smith
Summary form only given. This paper addresses an approach for handling SAR and US images with different statistical properties. The approach is based on a image-structure/speckle-texture decomposition. The image model in this case views an image X(i,j) as the combination of two components: an image structure S(i,j) and a speckle texture T(i,j). An octave-band subband decomposition is performed on the data and the structure is separated from the speckle by applying soft-thresholding to the high frequency subband coefficients. The coefficients remaining after the operation are used to synthesize S(i,j) while the complement set of coefficients is a representation of T(i,j). Once the two components are obtained, they are coded separately. S(i,j) has a low frequency characteristic similar to natural images and is suitable for conventional compression techniques. In the proposed algorithm we use a quadtree coder for S(i,j). The speckle component is parametrized using a texture model. Two texture models have been tested: a 2D-AR model and the pyramid-based algorithm proposed by Heeger and Bergen. For the latter, a compact parametrization of the texture is achieved by modeling the histograms of T(i,j) and its pyramid subbands as generalized Gaussians. The synthesized speckle is visually similar to the original for both models. The image is reconstructed by adding together the decoded structure and the synthesized speckle. The subjective quality gains obtained from the proposed approach are evident. We performed a subjective test, which followed the CCIR recommendation 500-4 for image quality assessment. Several codecs were included in the tests.
只提供摘要形式。本文提出了一种处理具有不同统计属性的SAR和US图像的方法。该方法基于图像结构/斑点纹理分解。本例中的图像模型将图像X(i,j)视为两个组件的组合:图像结构S(i,j)和散斑纹理T(i,j)。对数据进行倍频带子带分解,通过对高频子带系数进行软阈值处理,将结构与散斑分离。运算后剩余的系数用于合成S(i,j),而系数的补集是T(i,j)的表示。一旦获得了这两个组件,就分别对它们进行编码。S(i,j)具有与自然图像相似的低频特性,适用于传统压缩技术。在提出的算法中,我们对S(i,j)使用四叉树编码器。使用纹理模型对散斑组件进行参数化。已经测试了两种纹理模型:2D-AR模型和Heeger和Bergen提出的基于金字塔的算法。对于后者,通过将T(i,j)及其金字塔子带的直方图建模为广义高斯函数来实现纹理的紧凑参数化。合成的散斑在视觉上与两个模型的原始散斑相似。通过将解码后的结构与合成的散斑叠加,重构图像。从所提出的方法中获得的主观质量收益是显而易见的。我们进行了主观测试,该测试遵循CCIR关于图像质量评估的建议500-4。测试中包含了几个编解码器。
{"title":"Compression of SAR and ultrasound imagery using texture models","authors":"J. Rosiles, Mark J. T. Smith","doi":"10.1109/DCC.1999.785704","DOIUrl":"https://doi.org/10.1109/DCC.1999.785704","url":null,"abstract":"Summary form only given. This paper addresses an approach for handling SAR and US images with different statistical properties. The approach is based on a image-structure/speckle-texture decomposition. The image model in this case views an image X(i,j) as the combination of two components: an image structure S(i,j) and a speckle texture T(i,j). An octave-band subband decomposition is performed on the data and the structure is separated from the speckle by applying soft-thresholding to the high frequency subband coefficients. The coefficients remaining after the operation are used to synthesize S(i,j) while the complement set of coefficients is a representation of T(i,j). Once the two components are obtained, they are coded separately. S(i,j) has a low frequency characteristic similar to natural images and is suitable for conventional compression techniques. In the proposed algorithm we use a quadtree coder for S(i,j). The speckle component is parametrized using a texture model. Two texture models have been tested: a 2D-AR model and the pyramid-based algorithm proposed by Heeger and Bergen. For the latter, a compact parametrization of the texture is achieved by modeling the histograms of T(i,j) and its pyramid subbands as generalized Gaussians. The synthesized speckle is visually similar to the original for both models. The image is reconstructed by adding together the decoded structure and the synthesized speckle. The subjective quality gains obtained from the proposed approach are evident. We performed a subjective test, which followed the CCIR recommendation 500-4 for image quality assessment. Several codecs were included in the tests.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116710535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A blending model for efficient compression of smooth images 一种有效压缩平滑图像的混合模型
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.755672
J. Mayer
The proposed novel lossy image compression approach represents an image as segments comprised of variable-sized right-angled triangles. The recursive triangular partitioning proposed is shown to be more efficient than square partitioning. A novel and economic blending model (similar to Bezier polynomials) is applied to represent each triangular surface. A framework to design blending surfaces for triangular regions is presented. This economic model allows coefficient (control point) sharing among neighbor triangles. Sharing results in blockiness reduction as compared to block-based techniques. The technique is specially appealing to images with smooth transitions. Compression and visual quality results compare favorably against a wavelet codec using decomposition into seven bands. As an alternative, a greedy algorithm based on priority queues is proposed to further reduce the entropy of the control point bitstream. This optimization step achieves better performance in a rate-distortion R-D sense when compared to uniform quantization of the control points.
提出了一种新的有损图像压缩方法,将图像表示为由可变大小的直角三角形组成的片段。所提出的递归三角分区比平方分区更有效。采用一种新颖而经济的混合模型(类似于贝塞尔多项式)来表示每个三角形表面。提出了一个三角形区域混合曲面的设计框架。这个经济模型允许系数(控制点)在相邻三角形之间共享。与基于块的技术相比,共享可以减少块。该技术对平滑过渡的图像特别有吸引力。压缩和视觉质量的结果与使用分解成七个波段的小波编解码器比较有利。为了进一步降低控制点比特流的熵,提出了一种基于优先级队列的贪心算法。与控制点的均匀量化相比,该优化步骤在率失真R-D意义上实现了更好的性能。
{"title":"A blending model for efficient compression of smooth images","authors":"J. Mayer","doi":"10.1109/DCC.1999.755672","DOIUrl":"https://doi.org/10.1109/DCC.1999.755672","url":null,"abstract":"The proposed novel lossy image compression approach represents an image as segments comprised of variable-sized right-angled triangles. The recursive triangular partitioning proposed is shown to be more efficient than square partitioning. A novel and economic blending model (similar to Bezier polynomials) is applied to represent each triangular surface. A framework to design blending surfaces for triangular regions is presented. This economic model allows coefficient (control point) sharing among neighbor triangles. Sharing results in blockiness reduction as compared to block-based techniques. The technique is specially appealing to images with smooth transitions. Compression and visual quality results compare favorably against a wavelet codec using decomposition into seven bands. As an alternative, a greedy algorithm based on priority queues is proposed to further reduce the entropy of the control point bitstream. This optimization step achieves better performance in a rate-distortion R-D sense when compared to uniform quantization of the control points.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117121560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Word-based compression methods for large text documents 大型文本文档的基于单词的压缩方法
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.785680
J. Dvorský, J. Pokorný, V. Snás̃el
Summary form only given. We present a new compression method, called WLZW, which is a word-based modification of classic LZW. The algorithm is two-phase, it uses only one table for words and non-words (so called tokens), and a single data structure for the lexicon is usable as a text index. The length of words and non-words is restricted. This feature improves the compress ratio achieved. Tokens of unlimited length alternate, when they are read from the input stream. Because of restricted length of tokens alternating of tokens is corrupted, because some tokens are divided into several parts of same type. To save alternating of tokens two special tokens are created. They are empty word and empty non-word. They contain no character. Empty word is inserted between two non-words and empty non-word between two words. Alternating of tokens is saved for all sequences of tokens. The alternating of tokens is an important piece of information. With this knowledge the kind of the next token can be predicted. One selected (so-called victim) non-word can be deleted from input stream. An algorithm to search the victim is also presented. In the decompression phase, a deleted victim is recognized as an error in alternating of words and non-words in sequence. The algorithm was tested on many texts in different formats (ASCII, RTF). The Canterbury corpus, a large set, was used as a standard for publication results. The compression ratio achieved is fairly good, on average 25%-22%. Decompression is very fast. Moreover, the algorithm enables evaluation of database queries in given text. This supports the idea of leaving data in the compressed state as long as possible, and to decompress it when it is necessary.
只提供摘要形式。我们提出了一种新的压缩方法,称为WLZW,它是对经典LZW的基于词的改进。该算法是两阶段的,它只使用一个表来存储单词和非单词(所谓的令牌),并且词典的单个数据结构可用作文本索引。单词和非单词的长度是有限制的。这个特性提高了压缩比。无限长度的令牌在从输入流中读取时交替使用。由于对令牌长度的限制,令牌的交替被破坏,因为一些令牌被分成几个相同类型的部分。为了节省令牌的交替,创建了两个特殊的令牌。它们是空洞的话语和空洞的非话语。它们不包含字符。空单词插入两个非单词之间,空非单词插入两个单词之间。所有标记序列都保存标记的交替。令牌的交替是一条重要的信息。有了这些知识,就可以预测下一个标记的类型。可以从输入流中删除一个选定的(所谓的受害者)非单词。提出了一种搜索受害者的算法。在解压缩阶段,删除的受害者被识别为单词和非单词顺序交替的错误。该算法在许多不同格式的文本(ASCII, RTF)上进行了测试。坎特伯雷语料库,一个大集合,被用作出版结果的标准。实现的压缩比相当好,平均为25%-22%。解压非常快。此外,该算法支持对给定文本中的数据库查询进行评估。这支持将数据尽可能长时间地保持在压缩状态,并在必要时解压缩的想法。
{"title":"Word-based compression methods for large text documents","authors":"J. Dvorský, J. Pokorný, V. Snás̃el","doi":"10.1109/DCC.1999.785680","DOIUrl":"https://doi.org/10.1109/DCC.1999.785680","url":null,"abstract":"Summary form only given. We present a new compression method, called WLZW, which is a word-based modification of classic LZW. The algorithm is two-phase, it uses only one table for words and non-words (so called tokens), and a single data structure for the lexicon is usable as a text index. The length of words and non-words is restricted. This feature improves the compress ratio achieved. Tokens of unlimited length alternate, when they are read from the input stream. Because of restricted length of tokens alternating of tokens is corrupted, because some tokens are divided into several parts of same type. To save alternating of tokens two special tokens are created. They are empty word and empty non-word. They contain no character. Empty word is inserted between two non-words and empty non-word between two words. Alternating of tokens is saved for all sequences of tokens. The alternating of tokens is an important piece of information. With this knowledge the kind of the next token can be predicted. One selected (so-called victim) non-word can be deleted from input stream. An algorithm to search the victim is also presented. In the decompression phase, a deleted victim is recognized as an error in alternating of words and non-words in sequence. The algorithm was tested on many texts in different formats (ASCII, RTF). The Canterbury corpus, a large set, was used as a standard for publication results. The compression ratio achieved is fairly good, on average 25%-22%. Decompression is very fast. Moreover, the algorithm enables evaluation of database queries in given text. This supports the idea of leaving data in the compressed state as long as possible, and to decompress it when it is necessary.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116144588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An open interface for probabilistic models of text 文本概率模型的开放接口
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.785679
J. Cleary, W. Teahan
Summary form only given. An application program interface (API) for meddling sequential text is described. The API is intended to shield the user from details of the modelling and probability estimation process. This should enable different implementations of models to be replaced transparently in application programs. The motivation for this API is work on the use of textual models for applications in addition to strict data compression. The API is probabilistic, that is, it supplies the probability of the next symbol in the sequence. It is general enough to deal accurately with models that include escapes for probabilities. The concepts abstracted by the API are explained together with details of the API calls. Such predictive models can be used for a number of applications other than compression. Users of the models do not want to be concerned about the details either of the implementation of the models or how they were trained and the sources of the training text. The problem considered is how to permit code for different models and actual trained models themselves to be interchanged easily between users. The fundamental idea is that it should be possible to write application programs independent of the details of particular modelling code, that it should be possible to implement different modelling code independent of the various applications, and that it should be possible to easily exchange different pre-trained models between users. It is hoped that this independence will foster the exchange and use of high-performance modelling code, the construction of sophisticated adaptive systems based on the best available models, and the proliferation and provision of high-quality models of standard text types such as English or other natural languages, and easy comparison of different modelling techniques.
只提供摘要形式。描述了干涉顺序文本的应用程序接口(API)。该API旨在使用户不了解建模和概率估计过程的细节。这样就可以在应用程序中透明地替换模型的不同实现。除了严格的数据压缩之外,这个API的动机是在应用程序中使用文本模型。API是概率性的,也就是说,它提供序列中下一个符号的概率。对于包含转义概率的模型,一般都可以准确地处理。通过API抽象的概念与API调用的细节一起进行了解释。这种预测模型可以用于除压缩之外的许多应用程序。模型的用户不希望关心模型实现的细节,也不希望关心它们是如何训练的,以及训练文本的来源。所考虑的问题是如何允许不同模型的代码和实际训练模型本身在用户之间容易地交换。其基本思想是,应该可以独立于特定建模代码的细节编写应用程序,应该可以独立于各种应用程序实现不同的建模代码,并且应该可以在用户之间轻松交换不同的预训练模型。希望这种独立性将促进高性能建模代码的交流和使用,基于最佳可用模型的复杂自适应系统的构建,标准文本类型(如英语或其他自然语言)的高质量模型的扩散和提供,以及不同建模技术的容易比较。
{"title":"An open interface for probabilistic models of text","authors":"J. Cleary, W. Teahan","doi":"10.1109/DCC.1999.785679","DOIUrl":"https://doi.org/10.1109/DCC.1999.785679","url":null,"abstract":"Summary form only given. An application program interface (API) for meddling sequential text is described. The API is intended to shield the user from details of the modelling and probability estimation process. This should enable different implementations of models to be replaced transparently in application programs. The motivation for this API is work on the use of textual models for applications in addition to strict data compression. The API is probabilistic, that is, it supplies the probability of the next symbol in the sequence. It is general enough to deal accurately with models that include escapes for probabilities. The concepts abstracted by the API are explained together with details of the API calls. Such predictive models can be used for a number of applications other than compression. Users of the models do not want to be concerned about the details either of the implementation of the models or how they were trained and the sources of the training text. The problem considered is how to permit code for different models and actual trained models themselves to be interchanged easily between users. The fundamental idea is that it should be possible to write application programs independent of the details of particular modelling code, that it should be possible to implement different modelling code independent of the various applications, and that it should be possible to easily exchange different pre-trained models between users. It is hoped that this independence will foster the exchange and use of high-performance modelling code, the construction of sophisticated adaptive systems based on the best available models, and the proliferation and provision of high-quality models of standard text types such as English or other natural languages, and easy comparison of different modelling techniques.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132324277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A streaming piecewise-constant model 流式分段常数模型
Pub Date : 1999-03-29 DOI: 10.1109/DCC.1999.755670
Paul J. Ausbeck
The piecewise-constant image model (PWC) is remarkably effective for compressing palette images. This paper discloses a new streaming version of PWC that retains the excellent compression efficiency of the original algorithm while dramatically enhancing compression performance. Further, compression throughput is made more constant, making it possible to code sparse images very quickly.
分段常数图像模型(PWC)对于调色板图像的压缩是非常有效的。本文提出了一种新的流媒体PWC算法,在保留原有算法优良压缩效率的同时,压缩性能得到了显著提高。此外,压缩吞吐量变得更加稳定,使得非常快速地编码稀疏图像成为可能。
{"title":"A streaming piecewise-constant model","authors":"Paul J. Ausbeck","doi":"10.1109/DCC.1999.755670","DOIUrl":"https://doi.org/10.1109/DCC.1999.755670","url":null,"abstract":"The piecewise-constant image model (PWC) is remarkably effective for compressing palette images. This paper discloses a new streaming version of PWC that retains the excellent compression efficiency of the original algorithm while dramatically enhancing compression performance. Further, compression throughput is made more constant, making it possible to code sparse images very quickly.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121411494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1