Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments

IF 3.6 1区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY Molecular Phylogenetics and Evolution Pub Date : 2024-08-30 DOI:10.1016/j.ympev.2024.108181
Nikita Kulikov , Fatemeh Derakhshandeh , Christoph Mayer
{"title":"Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments","authors":"Nikita Kulikov ,&nbsp;Fatemeh Derakhshandeh ,&nbsp;Christoph Mayer","doi":"10.1016/j.ympev.2024.108181","DOIUrl":null,"url":null,"abstract":"<div><p>Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the phylogenetic tree reconstruction based on the Maximum Likelihood method. In this study, we present neural networks to predict the best model of sequence evolution and the correct topology for four sequence alignments of nucleotide or amino acid sequence data. We trained neural networks with different architectures using simulated alignments for a wide range of evolutionary models, model parameters and branch lengths. By comparing the accuracy of model and topology prediction of the trained neural networks with Maximum Likelihood and Neighbour Joining methods, we show that for quartet trees, the neural network classifier outperforms the Neighbour Joining method and is in most cases as good as the Maximum Likelihood method to infer the best model of sequence evolution and the best tree topology. These results are consistent for nucleotide and amino acid sequence data. We also show that our method is superior for model selection than previously published methods based on convolutionary networks. Furthermore, we found that neural network classifiers are much faster than the IQ-TREE implementation of the Maximum Likelihood method. Our results show that neural networks could become a true competitor for the Maximum Likelihood method in phylogenetic reconstructions.</p></div>","PeriodicalId":56109,"journal":{"name":"Molecular Phylogenetics and Evolution","volume":"200 ","pages":"Article 108181"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1055790324001738/pdfft?md5=036b68ef8f10032070e9c004f3188ff9&pid=1-s2.0-S1055790324001738-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Phylogenetics and Evolution","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1055790324001738","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the phylogenetic tree reconstruction based on the Maximum Likelihood method. In this study, we present neural networks to predict the best model of sequence evolution and the correct topology for four sequence alignments of nucleotide or amino acid sequence data. We trained neural networks with different architectures using simulated alignments for a wide range of evolutionary models, model parameters and branch lengths. By comparing the accuracy of model and topology prediction of the trained neural networks with Maximum Likelihood and Neighbour Joining methods, we show that for quartet trees, the neural network classifier outperforms the Neighbour Joining method and is in most cases as good as the Maximum Likelihood method to infer the best model of sequence evolution and the best tree topology. These results are consistent for nucleotide and amino acid sequence data. We also show that our method is superior for model selection than previously published methods based on convolutionary networks. Furthermore, we found that neural network classifiers are much faster than the IQ-TREE implementation of the Maximum Likelihood method. Our results show that neural networks could become a true competitor for the Maximum Likelihood method in phylogenetic reconstructions.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在重建系统发生树和确定四分类群排列的最佳进化模型时,机器学习的效果不亚于最大似然法
利用分子数据重建系统发生树在生命科学研究的许多领域都很重要。该学科的黄金标准是基于最大似然法的系统发生树重建。在本研究中,我们提出了神经网络来预测核苷酸或氨基酸序列数据的四次序列排列的最佳序列进化模型和正确拓扑结构。我们使用各种进化模型、模型参数和分支长度的模拟排列来训练具有不同架构的神经网络。通过比较训练神经网络与最大似然法和邻接法预测模型和拓扑的准确性,我们发现对于四叉树,神经网络分类器的效果优于邻接法,而且在大多数情况下与最大似然法一样能推断出最佳序列进化模型和最佳树拓扑。这些结果在核苷酸和氨基酸序列数据中是一致的。我们还表明,在模型选择方面,我们的方法优于之前发表的基于卷积网络的方法。此外,我们发现神经网络分类器比最大似然法的 IQ-TREE 实现要快得多。我们的研究结果表明,在系统发育重建中,神经网络可以成为最大似然法的真正竞争者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Molecular Phylogenetics and Evolution
Molecular Phylogenetics and Evolution 生物-进化生物学
CiteScore
7.50
自引率
7.30%
发文量
249
审稿时长
7.5 months
期刊介绍: Molecular Phylogenetics and Evolution is dedicated to bringing Darwin''s dream within grasp - to "have fairly true genealogical trees of each great kingdom of Nature." The journal provides a forum for molecular studies that advance our understanding of phylogeny and evolution, further the development of phylogenetically more accurate taxonomic classifications, and ultimately bring a unified classification for all the ramifying lines of life. Phylogeographic studies will be considered for publication if they offer EXCEPTIONAL theoretical or empirical advances.
期刊最新文献
Retraction notice to "Phylogenomic data exploration with increased sampling provides new insights into the higher-level relationships of butterflies and moths (Lepidoptera)" [Mol. Phylogenet. Evol. 197 (2024) 108113]. Species delimitation and historical biogeography of Sturisoma Swainson, 1838 (Loricariidae: Loricariinae): Hidden diversity along the Amazon River Gondwanan relic or recent arrival? The biogeographic origins and systematics of Australian tarantulas. Forget-me-not phylogenomics: Improving the resolution and taxonomy of a rapid island and mountain radiation in Aotearoa New Zealand (Myosotis; Boraginaceae). Taken to extremes: Loss of plastid rpl32 in Streptophyta and Cuscuta's unconventional solution for its replacement.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1