Nucleotide Substitution Model Selection Is Not Necessary for Bayesian Inference of Phylogeny With Well-Behaved Priors.

IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Systematic Biology Pub Date : 2023-12-30 DOI:10.1093/sysbio/syad041
Luiza Guimarães Fabreti, Sebastian Höhna
{"title":"Nucleotide Substitution Model Selection Is Not Necessary for Bayesian Inference of Phylogeny With Well-Behaved Priors.","authors":"Luiza Guimarães Fabreti, Sebastian Höhna","doi":"10.1093/sysbio/syad041","DOIUrl":null,"url":null,"abstract":"<p><p>Model selection aims to choose the most adequate model for the statistical analysis at hand. The model must be complex enough to capture the complexity of the data but should be simple enough not to overfit. In phylogenetics, the most common model selection scenario concerns selecting an adequate substitution and partition model for sequence evolution to infer a phylogenetic tree. Previously, several studies showed that substitution model under-parameterization can bias phylogenetic studies. Here, we explored the impact of substitution model over-parameterization in a Bayesian statistical framework. We performed simulations under the simplest substitution model, the Jukes-Cantor model, and compare posterior estimates of phylogenetic tree topologies and tree length under the true model to the most complex model, the $\\text{GTR}+\\Gamma+\\text{I}$ substitution model, including over-splitting the data into additional subsets (i.e., applying partitioned models). We explored 4 choices of prior distributions: the default substitution model priors of MrBayes, BEAST2, and RevBayes and a newly devised prior choice (Tame). Our results show that Bayesian inference of phylogeny is robust to substitution model over-parameterization and over-partitioning but only under our new prior settings. All 3 current default priors introduced biases for the estimated tree length. We conclude that substitution and partition model selection are superfluous steps in Bayesian phylogenetic inference pipelines if well-behaved prior distributions are applied and more effort should focus on more complex and biologically realistic substitution models.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"1418-1432"},"PeriodicalIF":6.1000,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syad041","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Model selection aims to choose the most adequate model for the statistical analysis at hand. The model must be complex enough to capture the complexity of the data but should be simple enough not to overfit. In phylogenetics, the most common model selection scenario concerns selecting an adequate substitution and partition model for sequence evolution to infer a phylogenetic tree. Previously, several studies showed that substitution model under-parameterization can bias phylogenetic studies. Here, we explored the impact of substitution model over-parameterization in a Bayesian statistical framework. We performed simulations under the simplest substitution model, the Jukes-Cantor model, and compare posterior estimates of phylogenetic tree topologies and tree length under the true model to the most complex model, the $\text{GTR}+\Gamma+\text{I}$ substitution model, including over-splitting the data into additional subsets (i.e., applying partitioned models). We explored 4 choices of prior distributions: the default substitution model priors of MrBayes, BEAST2, and RevBayes and a newly devised prior choice (Tame). Our results show that Bayesian inference of phylogeny is robust to substitution model over-parameterization and over-partitioning but only under our new prior settings. All 3 current default priors introduced biases for the estimated tree length. We conclude that substitution and partition model selection are superfluous steps in Bayesian phylogenetic inference pipelines if well-behaved prior distributions are applied and more effort should focus on more complex and biologically realistic substitution models.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
核苷酸替换模型选择对具有良好先验的系统发育贝叶斯推断并非必要。
模型选择的目的是为当前的统计分析选择最合适的模型。模型必须足够复杂,以捕捉数据的复杂性,但也应足够简单,不至于过度拟合。在系统发育学中,最常见的模型选择情况是为序列进化选择一个合适的替换和分割模型,以推断系统发生树。之前的一些研究表明,替换模型参数化不足会使系统发育研究出现偏差。在此,我们在贝叶斯统计框架下探讨了替换模型参数过高的影响。我们在最简单的替换模型--Jukes-Cantor模型下进行了模拟,并比较了真实模型与最复杂模型--$\text{GTR}+\Gamma+\text{I}$替换模型下的系统发生树拓扑和树长的后验估计值,包括将数据过度分割成额外的子集(即应用分区模型)。我们探索了 4 种先验分布选择:MrBayes、BEAST2 和 RevBayes 的默认替换模型先验,以及一种新设计的先验选择(Tame)。我们的研究结果表明,贝叶斯系统发育推断对替代模型过度参数化和过度分区具有鲁棒性,但只有在我们新的先验设置下才具有这种鲁棒性。目前所有 3 个默认先验都会对估计的树长产生偏差。我们的结论是,如果应用良好的先验分布,替代和分区模型选择是贝叶斯系统发育推断流水线中多余的步骤,更多的精力应集中在更复杂和更符合生物现实的替代模型上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Systematic Biology
Systematic Biology 生物-进化生物学
CiteScore
13.00
自引率
7.70%
发文量
70
审稿时长
6-12 weeks
期刊介绍: Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.
期刊最新文献
A Double-edged Sword: Evolutionary Novelty along Deep-time Diversity Oscillation in An Iconic Group of Predatory Insects (Neuroptera: Mantispoidea) Are Modern Cryptic Species Detectable in the Fossil Record? A Case Study on Agamid Lizards. Bayesian Selection of Relaxed-clock Models: Distinguishing Between Independent and Autocorrelated Rates. Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1