TreeZip: A New Algorithm for Compressing Large Collections of Evolutionary Trees

Suzanne J. Matthews, Seung-Jin Sul, T. Williams
{"title":"TreeZip: A New Algorithm for Compressing Large Collections of Evolutionary Trees","authors":"Suzanne J. Matthews, Seung-Jin Sul, T. Williams","doi":"10.1109/DCC.2010.64","DOIUrl":null,"url":null,"abstract":"Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic analysis often produce thousands of hypothetical trees that can represent the true phylogeny. These large collections of trees are costly to store. We introduce TreeZip, a novel algorithm designed to losslessly compress phylogenetic trees. The advantage of TreeZip is its ability to uniquely store the shared information among trees and compress the relationships effectively. We evaluate the performance of our approach over fourteen tree collections ranging from 2,505 to 150,000 trees corresponding to 0.6MB to 434MB in storage. Our results demonstrate that TreeZip effectively compresses phylogenetic trees, typically compressing a file to 2% or less of its original size. When coupled with 7zip, TreeZip can compress a file to less than 1% of its original size. On our largest dataset, TreeZip+7zip compressed the input file to .008% of its original size. Our results strongly suggest that TreeZip is an ideal approach for compressing phylogenetic trees.","PeriodicalId":299459,"journal":{"name":"2010 Data Compression Conference","volume":"256 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2010.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic analysis often produce thousands of hypothetical trees that can represent the true phylogeny. These large collections of trees are costly to store. We introduce TreeZip, a novel algorithm designed to losslessly compress phylogenetic trees. The advantage of TreeZip is its ability to uniquely store the shared information among trees and compress the relationships effectively. We evaluate the performance of our approach over fourteen tree collections ranging from 2,505 to 150,000 trees corresponding to 0.6MB to 434MB in storage. Our results demonstrate that TreeZip effectively compresses phylogenetic trees, typically compressing a file to 2% or less of its original size. When coupled with 7zip, TreeZip can compress a file to less than 1% of its original size. On our largest dataset, TreeZip+7zip compressed the input file to .008% of its original size. Our results strongly suggest that TreeZip is an ideal approach for compressing phylogenetic trees.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TreeZip:一种压缩大型进化树集合的新算法
进化树是代表一组生物之间关系的家族树。系统发育分析通常会产生数以千计的假想树,这些树可以代表真正的系统发育。这些大量树木的储存成本很高。我们介绍了TreeZip,一个新的算法,旨在无损压缩系统发生树。TreeZip的优点是它能够在树之间唯一地存储共享信息并有效地压缩关系。我们通过14个树集合(从2,505到150,000棵树,对应0.6MB到434MB的存储空间)来评估我们的方法的性能。我们的结果表明,TreeZip有效地压缩了系统发育树,通常将文件压缩到原始大小的2%或更少。当与7zip结合使用时,TreeZip可以将文件压缩到小于其原始大小的1%。在我们最大的数据集上,TreeZip+7zip将输入文件压缩到原始大小的0.008%。我们的结果强烈表明,TreeZip是压缩系统发育树的理想方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Shape Recognition Using Vector Quantization Lossless Reduced Cutset Coding of Markov Random Fields Optimized Analog Mappings for Distributed Source-Channel Coding An MCMC Approach to Lossy Compression of Continuous Sources Lossless Compression of Mapped Domain Linear Prediction Residual for ITU-T Recommendation G.711.0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1