Practical Parallel Algorithms for Dictionary Data Compression

2009 Data Compression Conference Pub Date : 2009-03-16 DOI:10.1109/DCC.2009.84

L. Cinque, S. Agostino, L. Lombardi

{"title":"Practical Parallel Algorithms for Dictionary Data Compression","authors":"L. Cinque, S. Agostino, L. Lombardi","doi":"10.1109/DCC.2009.84","DOIUrl":null,"url":null,"abstract":"PRAM CREW parallel algorithms requiring logarithmic time and a linear number of processors exist for sliding (LZ1) and static dictionary compression. On the other hand, LZ2 compression seems hard to parallelize. Both adaptive methods work with prefix dictionaries, that is, all prefixes of a dictionary element are dictionary elements.Therefore, it is reasonable to use prefix dictionaries also for the static method. A left to right semi-greedy approach exists to compute an optimal parsing of a string with a prefix static dictionary. The left to right greedy approach is enough to achieve optimal compression with a sliding dictionary since such dictionary is both prefix and suffix. We assume the window is bounded by a constant. With the practical assumption that the dictionary elements have constant length we present PRAM EREW algorithms for sliding and static dictionary compression still requiring logarithmic time and a linear number of processors. A PRAM EREW decoder for static dictionary compression can be easily designed with a linear number of processors and logarithmic time. A work-optimal logarithmic time PRAM EREW decoder exists for sliding dictionary compression when the window has constant length. The simplest model for parallel computation is an array of processors with distibuted memory and no interconnections, therefore, no communication cost. An approximation scheme to optimal compression with prefix static dictionaries was designed running with the same complexity of the previous algorithms on such model. It was presented for a massively parallel architecture but in virtue of its scalability it can be implemented on a small scale system as well.We describe such approach and extend it to the sliding dictionary method. The approximation scheme for sliding dictionaries is suitable for small scale systems but due to its adaptiveness it is practical for a large scale system when the file size is large. A two-dimensional extension of the sliding dictionary method to lossless compression of bi-level images, called BLOCK MATCHING, is also discussed. We designed a parallel implementation of such heuristic on a constant size array of processors and experimented it with up to 32 processors of a 256 Intel Xeon 3.06 GHz processors machine (avogadro.cilea.it) on a test set of large topographic images. We achieved the expected speed-up, obtaining parallel compression and decompression about twenty-five times faster than the sequential ones.","PeriodicalId":377880,"journal":{"name":"2009 Data Compression Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2009.84","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

PRAM CREW parallel algorithms requiring logarithmic time and a linear number of processors exist for sliding (LZ1) and static dictionary compression. On the other hand, LZ2 compression seems hard to parallelize. Both adaptive methods work with prefix dictionaries, that is, all prefixes of a dictionary element are dictionary elements.Therefore, it is reasonable to use prefix dictionaries also for the static method. A left to right semi-greedy approach exists to compute an optimal parsing of a string with a prefix static dictionary. The left to right greedy approach is enough to achieve optimal compression with a sliding dictionary since such dictionary is both prefix and suffix. We assume the window is bounded by a constant. With the practical assumption that the dictionary elements have constant length we present PRAM EREW algorithms for sliding and static dictionary compression still requiring logarithmic time and a linear number of processors. A PRAM EREW decoder for static dictionary compression can be easily designed with a linear number of processors and logarithmic time. A work-optimal logarithmic time PRAM EREW decoder exists for sliding dictionary compression when the window has constant length. The simplest model for parallel computation is an array of processors with distibuted memory and no interconnections, therefore, no communication cost. An approximation scheme to optimal compression with prefix static dictionaries was designed running with the same complexity of the previous algorithms on such model. It was presented for a massively parallel architecture but in virtue of its scalability it can be implemented on a small scale system as well.We describe such approach and extend it to the sliding dictionary method. The approximation scheme for sliding dictionaries is suitable for small scale systems but due to its adaptiveness it is practical for a large scale system when the file size is large. A two-dimensional extension of the sliding dictionary method to lossless compression of bi-level images, called BLOCK MATCHING, is also discussed. We designed a parallel implementation of such heuristic on a constant size array of processors and experimented it with up to 32 processors of a 256 Intel Xeon 3.06 GHz processors machine (avogadro.cilea.it) on a test set of large topographic images. We achieved the expected speed-up, obtaining parallel compression and decompression about twenty-five times faster than the sequential ones.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

字典数据压缩的实用并行算法

对于滑动(LZ1)和静态字典压缩，存在需要对数时间和线性处理器数的PRAM CREW并行算法。另一方面，LZ2压缩似乎很难并行化。这两种自适应方法都使用前缀字典，也就是说，字典元素的所有前缀都是字典元素。因此，对于静态方法也使用前缀字典是合理的。存在一种从左到右的半贪婪方法来计算具有前缀静态字典的字符串的最佳解析。从左到右贪婪的方法足以实现滑动字典的最佳压缩，因为这种字典既是前缀又是后缀。我们假设窗口以常数为界。在实际假设字典元素具有恒定长度的情况下，我们提出了用于滑动和静态字典压缩的PRAM EREW算法，该算法仍然需要对数时间和线性数量的处理器。用于静态字典压缩的PRAM EREW解码器可以很容易地用线性数量的处理器和对数时间设计。当窗口长度不变时，存在一种适合滑动字典压缩的对数时间PRAM - EREW解码器。最简单的并行计算模型是一组具有分布式内存的处理器，没有互连，因此没有通信成本。设计了一种带前缀静态字典的最优压缩近似方案，在该模型上以相同的复杂度运行。它是为大规模并行架构提出的，但由于其可扩展性，它也可以在小规模系统上实现。我们描述了这种方法，并将其扩展到滑动字典方法。滑动字典的近似方案适用于小规模系统，但由于其自适应性，它适用于文件大小较大的大规模系统。本文还讨论了滑动字典方法在双级图像无损压缩中的二维扩展，即块匹配。我们在一个固定大小的处理器阵列上设计了这种启发式算法的并行实现，并在一台256 Intel Xeon 3.06 GHz处理器的机器(avogadro.cilea.it)上使用多达32个处理器在大型地形图像的测试集上进行了实验。我们实现了预期的加速，获得的并行压缩和解压缩比顺序压缩快25倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助