Evolutionary synthesis of lossless compression algorithms with GP-zip3

A. Kattan, R. Poli
{"title":"Evolutionary synthesis of lossless compression algorithms with GP-zip3","authors":"A. Kattan, R. Poli","doi":"10.1109/CEC.2010.5585956","DOIUrl":null,"url":null,"abstract":"Here we propose GP-zip3, a system which uses Genetic Programming to find optimal ways to combine standard compression algorithms for the purpose of compressing files and archives. GP-zip3 evolves programs with multiple components. One component analyses statistical features extracted from the raw data to be compressed (seen as a sequence of 8-bit integers) to divide the data into blocks. These blocks are then projected onto a two-dimensional Euclidean space via two further (evolved) program components. K-means clustering is applied to group similar data blocks. Each cluster is then labelled with the optimal compression algorithm for its member blocks. Once a program that achieves good compression is evolved, it can be used on unseen data without the requirement for any further evolution. GP-zip3 is similar to its predecessor, GP-zip2. Both systems outperform a variety of standard compression algorithms and are faster than other evolutionary compression techniques. However, GP-zip2 was still substantially slower than off-the-shelf algorithms. GP-zip3 alleviates this problem by using a novel fitness evaluation strategy. More specifically, GP-zip3 evolves and then uses decision trees to predict the performance of GP individuals without requiring them to be used to compress the training data. As shown in a variety of experiments, this speeds up evolution in GP-zip3 considerably over GP-zip2 while achieving similar compression results, thereby significantly broadening the scope of application of the approach.","PeriodicalId":6344,"journal":{"name":"2009 IEEE Congress on Evolutionary Computation","volume":"17 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Congress on Evolutionary Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2010.5585956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Here we propose GP-zip3, a system which uses Genetic Programming to find optimal ways to combine standard compression algorithms for the purpose of compressing files and archives. GP-zip3 evolves programs with multiple components. One component analyses statistical features extracted from the raw data to be compressed (seen as a sequence of 8-bit integers) to divide the data into blocks. These blocks are then projected onto a two-dimensional Euclidean space via two further (evolved) program components. K-means clustering is applied to group similar data blocks. Each cluster is then labelled with the optimal compression algorithm for its member blocks. Once a program that achieves good compression is evolved, it can be used on unseen data without the requirement for any further evolution. GP-zip3 is similar to its predecessor, GP-zip2. Both systems outperform a variety of standard compression algorithms and are faster than other evolutionary compression techniques. However, GP-zip2 was still substantially slower than off-the-shelf algorithms. GP-zip3 alleviates this problem by using a novel fitness evaluation strategy. More specifically, GP-zip3 evolves and then uses decision trees to predict the performance of GP individuals without requiring them to be used to compress the training data. As shown in a variety of experiments, this speeds up evolution in GP-zip3 considerably over GP-zip2 while achieving similar compression results, thereby significantly broadening the scope of application of the approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于GP-zip3的无损压缩算法的进化综合
本文提出了GP-zip3,这是一个使用遗传规划来寻找最佳方法来结合标准压缩算法来压缩文件和档案的系统。GP-zip3开发了具有多个组件的程序。一个组件分析从要压缩的原始数据中提取的统计特征(视为8位整数序列),以将数据划分为块。然后通过两个进一步(进化)的程序组件将这些块投影到二维欧几里得空间上。采用K-means聚类对相似的数据块进行分组。然后用其成员块的最优压缩算法标记每个聚类。一旦一个程序实现了良好的压缩,它就可以用于看不见的数据,而不需要任何进一步的改进。GP-zip3与其前身GP-zip2相似。这两种系统都优于各种标准压缩算法,并且比其他进化压缩技术更快。然而,GP-zip2仍然比现成的算法慢得多。GP-zip3通过使用一种新颖的适应度评估策略缓解了这一问题。更具体地说,GP-zip3进化,然后使用决策树来预测GP个体的表现,而不需要使用它们来压缩训练数据。各种实验表明,这大大加快了GP-zip3的进化速度,同时获得了相似的压缩结果,从而大大拓宽了该方法的应用范围。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Step-Size Individualization: a Case Study for The Fish School Search Family A Genetic Ant Colony Optimization Algorithm for Inter-domain Path Computation problem under the Domain Uniqueness constraint A Simulated IMO-DRSA Approach for Cognitive Reduction in Multiobjective Financial Portfolio Interactive Optimization Applying Never-Ending Learning (NEL) Principles to Build a Gene Ontology (GO) Biocurator Many Layer Transfer Learning Genetic Algorithm (MLTLGA): a New Evolutionary Transfer Learning Approach Applied To Pneumonia Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1