{"title":"TreeZip: A New Algorithm for Compressing Large Collections of Evolutionary Trees","authors":"Suzanne J. Matthews, Seung-Jin Sul, T. Williams","doi":"10.1109/DCC.2010.64","DOIUrl":null,"url":null,"abstract":"Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic analysis often produce thousands of hypothetical trees that can represent the true phylogeny. These large collections of trees are costly to store. We introduce TreeZip, a novel algorithm designed to losslessly compress phylogenetic trees. The advantage of TreeZip is its ability to uniquely store the shared information among trees and compress the relationships effectively. We evaluate the performance of our approach over fourteen tree collections ranging from 2,505 to 150,000 trees corresponding to 0.6MB to 434MB in storage. Our results demonstrate that TreeZip effectively compresses phylogenetic trees, typically compressing a file to 2% or less of its original size. When coupled with 7zip, TreeZip can compress a file to less than 1% of its original size. On our largest dataset, TreeZip+7zip compressed the input file to .008% of its original size. Our results strongly suggest that TreeZip is an ideal approach for compressing phylogenetic trees.","PeriodicalId":299459,"journal":{"name":"2010 Data Compression Conference","volume":"256 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2010.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic analysis often produce thousands of hypothetical trees that can represent the true phylogeny. These large collections of trees are costly to store. We introduce TreeZip, a novel algorithm designed to losslessly compress phylogenetic trees. The advantage of TreeZip is its ability to uniquely store the shared information among trees and compress the relationships effectively. We evaluate the performance of our approach over fourteen tree collections ranging from 2,505 to 150,000 trees corresponding to 0.6MB to 434MB in storage. Our results demonstrate that TreeZip effectively compresses phylogenetic trees, typically compressing a file to 2% or less of its original size. When coupled with 7zip, TreeZip can compress a file to less than 1% of its original size. On our largest dataset, TreeZip+7zip compressed the input file to .008% of its original size. Our results strongly suggest that TreeZip is an ideal approach for compressing phylogenetic trees.