A Randomized Algorithm for Comparing Sets of Phylogenetic Trees

Seung-Jin Sul, T. Williams
{"title":"A Randomized Algorithm for Comparing Sets of Phylogenetic Trees","authors":"Seung-Jin Sul, T. Williams","doi":"10.1142/9781860947995_0015","DOIUrl":null,"url":null,"abstract":"Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the ”true” tree. Post-processing techniques such as stri ct consensus trees are widely used to summarize the evolutionary relationships into a single tree. H owever, valuable information is lost during the summarization process. A more elementary step is to produce estimates of the topological differences that exist among all pairs of trees. We design a new randomized algorithm, called Hash-RF, that computes the all-to-all Robinson-Foulds (RF) distance—the most common distance metric for comparing two phylogenetic trees. Our approach uses a hash table to organize the bipartitions of a tree, and a universal hashing function makes our algorithm randomized. We compare the performance of our Hash-RF algorithm to PAUP*’s implementation of computing the all-to-all RF distance matrix. Our experiments focus on the algorithmic performance of comparing sets of biological trees, where the size of each tree ranged from 500 to 2,000 taxa and the collection of trees varied from 200 to 1,000 trees. Our experimental results clearly show that our Hash-RF algorithm is up to 500 times faster than PAUP*’s approach. Thus, Hash-RF provides an efficient alter native to a single tree summary of a collection of trees and potentially gives researchers the abil ity to explore their data in new and interesting ways.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"65 1","pages":"121-130"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860947995_0015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the ”true” tree. Post-processing techniques such as stri ct consensus trees are widely used to summarize the evolutionary relationships into a single tree. H owever, valuable information is lost during the summarization process. A more elementary step is to produce estimates of the topological differences that exist among all pairs of trees. We design a new randomized algorithm, called Hash-RF, that computes the all-to-all Robinson-Foulds (RF) distance—the most common distance metric for comparing two phylogenetic trees. Our approach uses a hash table to organize the bipartitions of a tree, and a universal hashing function makes our algorithm randomized. We compare the performance of our Hash-RF algorithm to PAUP*’s implementation of computing the all-to-all RF distance matrix. Our experiments focus on the algorithmic performance of comparing sets of biological trees, where the size of each tree ranged from 500 to 2,000 taxa and the collection of trees varied from 200 to 1,000 trees. Our experimental results clearly show that our Hash-RF algorithm is up to 500 times faster than PAUP*’s approach. Thus, Hash-RF provides an efficient alter native to a single tree summary of a collection of trees and potentially gives researchers the abil ity to explore their data in new and interesting ways.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种比较系统发育树集的随机算法
系统发育分析通常会产生大量的候选进化树,每一个都是对“真正的”进化树的假设。严格共识树等后处理技术被广泛用于将进化关系归纳为一棵树。然而,在总结的过程中,有价值的信息丢失了。一个更基本的步骤是对所有树对之间存在的拓扑差异进行估计。我们设计了一种新的随机算法,称为哈希-RF,它计算所有到所有的罗宾逊-福尔兹(RF)距离,这是比较两个系统发育树最常见的距离度量。我们的方法使用哈希表来组织树的二分区,而通用哈希函数使我们的算法随机化。我们将我们的Hash-RF算法的性能与PAUP*计算全对全RF距离矩阵的实现进行了比较。我们的实验重点是比较生物树集的算法性能,其中每棵树的大小从500到2000个分类群不等,树木的集合从200到1000棵不等。我们的实验结果清楚地表明,我们的Hash-RF算法比PAUP*的方法快500倍。因此,Hash-RF提供了一种有效的替代树集合的单一树摘要,并可能使研究人员能够以新的和有趣的方式探索他们的数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tuning Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding. The Future of Bioinformatics CHEMICAL COMPOUND CLASSIFICATION WITH AUTOMATICALLY MINED STRUCTURE PATTERNS. Predicting Nucleolar Proteins Using Support-Vector Machines Proceedings of the 6th Asia-Pacific Bioinformatics Conference, APBC 2008, 14-17 January 2008, Kyoto, Japan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1