Compression-based distance methods as an alternative to statistical methods for constructing phylogenetic trees

Mohamed El-Dirany, Forrest Wang, J. Furst, J. Rogers, D. Raicu
{"title":"Compression-based distance methods as an alternative to statistical methods for constructing phylogenetic trees","authors":"Mohamed El-Dirany, Forrest Wang, J. Furst, J. Rogers, D. Raicu","doi":"10.1109/BIBM.2016.7822676","DOIUrl":null,"url":null,"abstract":"Distance based methods for constructing phylogenetic trees have long been considered inconsistent and inferior to the more dominant statistical methods. However, use of compression methods specific to DNA could prove valuable in improving the effectiveness of distance based methods. To demonstrate the validity of distance-based methods when utilizing current DNA compression algorithms, such as MFCompress, we have applied such a method to datasets of closely related species of fish from the suborder Labroidei and to strains of Ebola. In both cases, we have managed to produce trees that are either very similar or identical to published trees produced using statistically based methods. This suggests that distance based methods can perform comparably to statistically based methods without requiring as much pre-processing of original DNA sequences or system resources. Additionally, the results also stress the importance of using accurate methods of calculating species distance due to the way that one specific DNA compression algorithm, MFCompress, consistently and convincingly managed to outperform other popular, general use compression algorithms.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"207 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Distance based methods for constructing phylogenetic trees have long been considered inconsistent and inferior to the more dominant statistical methods. However, use of compression methods specific to DNA could prove valuable in improving the effectiveness of distance based methods. To demonstrate the validity of distance-based methods when utilizing current DNA compression algorithms, such as MFCompress, we have applied such a method to datasets of closely related species of fish from the suborder Labroidei and to strains of Ebola. In both cases, we have managed to produce trees that are either very similar or identical to published trees produced using statistically based methods. This suggests that distance based methods can perform comparably to statistically based methods without requiring as much pre-processing of original DNA sequences or system resources. Additionally, the results also stress the importance of using accurate methods of calculating species distance due to the way that one specific DNA compression algorithm, MFCompress, consistently and convincingly managed to outperform other popular, general use compression algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于压缩的距离方法作为构建系统发育树的统计方法的替代方法
基于距离的构建系统发育树的方法长期以来被认为不一致,而且不如更占优势的统计方法。然而,使用特定于DNA的压缩方法在提高基于距离的方法的有效性方面可能证明是有价值的。为了证明基于距离的方法在利用当前DNA压缩算法(如MFCompress)时的有效性,我们将这种方法应用于Labroidei亚目密切相关的鱼类物种和埃博拉病毒菌株的数据集。在这两种情况下,我们都成功地生成了与使用基于统计的方法生成的已发表的树非常相似或相同的树。这表明基于距离的方法可以与基于统计的方法相媲美,而不需要对原始DNA序列或系统资源进行过多的预处理。此外,研究结果还强调了使用精确方法计算物种距离的重要性,因为一种特定的DNA压缩算法MFCompress一贯且令人信服地优于其他流行的通用压缩算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The role of high performance, grid and cloud computing in high-throughput sequencing A novel algorithm for identifying essential proteins by integrating subcellular localization CNNsite: Prediction of DNA-binding residues in proteins using Convolutional Neural Network with sequence features Inferring Social Influence of anti-Tobacco mass media campaigns Emotion recognition from multi-channel EEG data through Convolutional Recurrent Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1