在重建系统发生树和确定四分类群排列的最佳进化模型时，机器学习的效果不亚于最大似然法

IF 3.6 1区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY Molecular Phylogenetics and Evolution Pub Date : 2024-08-30 DOI:10.1016/j.ympev.2024.108181

Nikita Kulikov , Fatemeh Derakhshandeh , Christoph Mayer

{"title":"在重建系统发生树和确定四分类群排列的最佳进化模型时，机器学习的效果不亚于最大似然法","authors":"Nikita Kulikov , Fatemeh Derakhshandeh , Christoph Mayer","doi":"10.1016/j.ympev.2024.108181","DOIUrl":null,"url":null,"abstract":"<div><p>Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the phylogenetic tree reconstruction based on the Maximum Likelihood method. In this study, we present neural networks to predict the best model of sequence evolution and the correct topology for four sequence alignments of nucleotide or amino acid sequence data. We trained neural networks with different architectures using simulated alignments for a wide range of evolutionary models, model parameters and branch lengths. By comparing the accuracy of model and topology prediction of the trained neural networks with Maximum Likelihood and Neighbour Joining methods, we show that for quartet trees, the neural network classifier outperforms the Neighbour Joining method and is in most cases as good as the Maximum Likelihood method to infer the best model of sequence evolution and the best tree topology. These results are consistent for nucleotide and amino acid sequence data. We also show that our method is superior for model selection than previously published methods based on convolutionary networks. Furthermore, we found that neural network classifiers are much faster than the IQ-TREE implementation of the Maximum Likelihood method. Our results show that neural networks could become a true competitor for the Maximum Likelihood method in phylogenetic reconstructions.</p></div>","PeriodicalId":56109,"journal":{"name":"Molecular Phylogenetics and Evolution","volume":"200 ","pages":"Article 108181"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1055790324001738/pdfft?md5=036b68ef8f10032070e9c004f3188ff9&pid=1-s2.0-S1055790324001738-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments\",\"authors\":\"Nikita Kulikov , Fatemeh Derakhshandeh , Christoph Mayer\",\"doi\":\"10.1016/j.ympev.2024.108181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the phylogenetic tree reconstruction based on the Maximum Likelihood method. In this study, we present neural networks to predict the best model of sequence evolution and the correct topology for four sequence alignments of nucleotide or amino acid sequence data. We trained neural networks with different architectures using simulated alignments for a wide range of evolutionary models, model parameters and branch lengths. By comparing the accuracy of model and topology prediction of the trained neural networks with Maximum Likelihood and Neighbour Joining methods, we show that for quartet trees, the neural network classifier outperforms the Neighbour Joining method and is in most cases as good as the Maximum Likelihood method to infer the best model of sequence evolution and the best tree topology. These results are consistent for nucleotide and amino acid sequence data. We also show that our method is superior for model selection than previously published methods based on convolutionary networks. Furthermore, we found that neural network classifiers are much faster than the IQ-TREE implementation of the Maximum Likelihood method. Our results show that neural networks could become a true competitor for the Maximum Likelihood method in phylogenetic reconstructions.</p></div>\",\"PeriodicalId\":56109,\"journal\":{\"name\":\"Molecular Phylogenetics and Evolution\",\"volume\":\"200 \",\"pages\":\"Article 108181\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1055790324001738/pdfft?md5=036b68ef8f10032070e9c004f3188ff9&pid=1-s2.0-S1055790324001738-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Phylogenetics and Evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1055790324001738\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Phylogenetics and Evolution","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1055790324001738","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

利用分子数据重建系统发生树在生命科学研究的许多领域都很重要。该学科的黄金标准是基于最大似然法的系统发生树重建。在本研究中，我们提出了神经网络来预测核苷酸或氨基酸序列数据的四次序列排列的最佳序列进化模型和正确拓扑结构。我们使用各种进化模型、模型参数和分支长度的模拟排列来训练具有不同架构的神经网络。通过比较训练神经网络与最大似然法和邻接法预测模型和拓扑的准确性，我们发现对于四叉树，神经网络分类器的效果优于邻接法，而且在大多数情况下与最大似然法一样能推断出最佳序列进化模型和最佳树拓扑。这些结果在核苷酸和氨基酸序列数据中是一致的。我们还表明，在模型选择方面，我们的方法优于之前发表的基于卷积网络的方法。此外，我们发现神经网络分类器比最大似然法的 IQ-TREE 实现要快得多。我们的研究结果表明，在系统发育重建中，神经网络可以成为最大似然法的真正竞争者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments

Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the phylogenetic tree reconstruction based on the Maximum Likelihood method. In this study, we present neural networks to predict the best model of sequence evolution and the correct topology for four sequence alignments of nucleotide or amino acid sequence data. We trained neural networks with different architectures using simulated alignments for a wide range of evolutionary models, model parameters and branch lengths. By comparing the accuracy of model and topology prediction of the trained neural networks with Maximum Likelihood and Neighbour Joining methods, we show that for quartet trees, the neural network classifier outperforms the Neighbour Joining method and is in most cases as good as the Maximum Likelihood method to infer the best model of sequence evolution and the best tree topology. These results are consistent for nucleotide and amino acid sequence data. We also show that our method is superior for model selection than previously published methods based on convolutionary networks. Furthermore, we found that neural network classifiers are much faster than the IQ-TREE implementation of the Maximum Likelihood method. Our results show that neural networks could become a true competitor for the Maximum Likelihood method in phylogenetic reconstructions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular Phylogenetics and Evolution 生物-进化生物学

CiteScore

7.50

自引率

7.30%

发文量

249

审稿时长

7.5 months

期刊介绍： Molecular Phylogenetics and Evolution is dedicated to bringing Darwin''s dream within grasp - to "have fairly true genealogical trees of each great kingdom of Nature." The journal provides a forum for molecular studies that advance our understanding of phylogeny and evolution, further the development of phylogenetically more accurate taxonomic classifications, and ultimately bring a unified classification for all the ramifying lines of life. Phylogeographic studies will be considered for publication if they offer EXCEPTIONAL theoretical or empirical advances.