Assessing the Adequacy of Morphological Models using Posterior Predictive Simulations

IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Systematic Biology Pub Date : 2024-10-07 DOI:10.1093/sysbio/syae055
Laura P A Mulvey, Michael R May, Jeremy M Brown, Sebastian Höhna, April M Wright, Rachel C M Warnock
{"title":"Assessing the Adequacy of Morphological Models using Posterior Predictive Simulations","authors":"Laura P A Mulvey, Michael R May, Jeremy M Brown, Sebastian Höhna, April M Wright, Rachel C M Warnock","doi":"10.1093/sysbio/syae055","DOIUrl":null,"url":null,"abstract":"Reconstructing the evolutionary history of different groups of organisms provides insight into how life originated and diversified on Earth. Phylogenetic trees are commonly used to estimate this evolutionary history. Within Bayesian phylogenetics a major step in estimating a tree is in choosing an appropriate model of character evolution. While the most common character data used is molecular sequence data, morphological data remains a vital source of information. The use of morphological characters allows for the incorporation fossil taxa, and despite advances in molecular sequencing, continues to play a significant role in neontology. Moreover, it is the main data source that allows us to unite extinct and extant taxa directly under the same generating process. We therefore require suitable models of morphological character evolution, the most common being the Mk Lewis model. While it is frequently used in both palaeobiology and neontology, it is not known whether the simple Mk substitution model, or any extensions to it, provide a sufficiently good description of the process of morphological evolution. In this study we investigate the impact of different morphological models on empirical tetrapod data sets. Specifically, we compare unpartitioned Mk models with those where characters are partitioned by the number of observed states, both with and without allowing for rate variation across sites and accounting for ascertainment bias. We show that the choice of substitution model has an impact on both topology and branch lengths, highlighting the importance of model choice. Through simulations, we validate the use of the model adequacy approach, posterior predictive simulations, for choosing an appropriate model. Additionally, we compare the performance of model adequacy with Bayesian model selection. We demonstrate how model selection approaches based on marginal likelihoods are not appropriate for choosing between models with partition schemes that vary in character state space (i.e., that vary in Q-matrix state size). Using posterior predictive simulations, we found that current variations of the Mk model are often performing adequately in capturing the evolutionary dynamics that generated our data. We do not find any preference for a particular model extension across multiple data sets, indicating that there is no ‘one size fits all’ when it comes to morphological data and that careful consideration should be given to choosing models of discrete character evolution. By using suitable models of character evolution, we can increase our confidence in our phylogenetic estimates, which should in turn allow us to gain more accurate insights into the evolutionary history of both extinct and extant taxa.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"54 1","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syae055","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Reconstructing the evolutionary history of different groups of organisms provides insight into how life originated and diversified on Earth. Phylogenetic trees are commonly used to estimate this evolutionary history. Within Bayesian phylogenetics a major step in estimating a tree is in choosing an appropriate model of character evolution. While the most common character data used is molecular sequence data, morphological data remains a vital source of information. The use of morphological characters allows for the incorporation fossil taxa, and despite advances in molecular sequencing, continues to play a significant role in neontology. Moreover, it is the main data source that allows us to unite extinct and extant taxa directly under the same generating process. We therefore require suitable models of morphological character evolution, the most common being the Mk Lewis model. While it is frequently used in both palaeobiology and neontology, it is not known whether the simple Mk substitution model, or any extensions to it, provide a sufficiently good description of the process of morphological evolution. In this study we investigate the impact of different morphological models on empirical tetrapod data sets. Specifically, we compare unpartitioned Mk models with those where characters are partitioned by the number of observed states, both with and without allowing for rate variation across sites and accounting for ascertainment bias. We show that the choice of substitution model has an impact on both topology and branch lengths, highlighting the importance of model choice. Through simulations, we validate the use of the model adequacy approach, posterior predictive simulations, for choosing an appropriate model. Additionally, we compare the performance of model adequacy with Bayesian model selection. We demonstrate how model selection approaches based on marginal likelihoods are not appropriate for choosing between models with partition schemes that vary in character state space (i.e., that vary in Q-matrix state size). Using posterior predictive simulations, we found that current variations of the Mk model are often performing adequately in capturing the evolutionary dynamics that generated our data. We do not find any preference for a particular model extension across multiple data sets, indicating that there is no ‘one size fits all’ when it comes to morphological data and that careful consideration should be given to choosing models of discrete character evolution. By using suitable models of character evolution, we can increase our confidence in our phylogenetic estimates, which should in turn allow us to gain more accurate insights into the evolutionary history of both extinct and extant taxa.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用后验预测模拟评估形态学模型的适当性
重建不同生物类群的进化史有助于深入了解生命如何在地球上起源和多样化。系统发生树通常用于估算这种进化历史。在贝叶斯系统发育学中,估计系统树的一个主要步骤是选择一个合适的特征进化模型。虽然最常用的特征数据是分子序列数据,但形态数据仍然是重要的信息来源。使用形态特征可以纳入化石类群,尽管分子测序技术在不断进步,但形态特征在新生物学中仍然发挥着重要作用。此外,它也是使我们能够将已灭绝类群和现生类群直接整合到同一生成过程中的主要数据来源。因此,我们需要合适的形态特征演化模型,最常见的是 Mk Lewis 模型。虽然该模型在古生物学和新生物学中经常被使用,但简单的 Mk 替换模型或其扩展模型是否能对形态演化过程提供足够好的描述还不得而知。在本研究中,我们研究了不同形态模型对四足动物经验数据集的影响。具体来说,我们比较了未分区的 Mk 模型和按观察到的状态数量对特征进行分区的模型,既考虑到了不同位点的速率变化,也考虑到了确定偏差。我们发现,替代模型的选择对拓扑结构和分支长度都有影响,这突出了模型选择的重要性。通过模拟,我们验证了使用模型充分性方法--后验预测模拟--来选择合适的模型。此外,我们还比较了模型充分性与贝叶斯模型选择的性能。我们证明了基于边际似然的模型选择方法如何不适合在具有不同特征状态空间(即不同 Q 矩阵状态大小)的分区方案的模型之间进行选择。通过后验预测模拟,我们发现 Mk 模型的当前变体往往能充分捕捉到产生数据的进化动态。在多个数据集中,我们没有发现对某一特定模型扩展的偏好,这表明在形态学数据方面没有 "一刀切 "的做法,在选择离散特征演化模型时应慎重考虑。通过使用合适的特征演化模型,我们可以提高系统发生学估计的可信度,从而使我们能够更准确地了解已灭绝类群和现生类群的演化历史。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Systematic Biology
Systematic Biology 生物-进化生物学
CiteScore
13.00
自引率
7.70%
发文量
70
审稿时长
6-12 weeks
期刊介绍: Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.
期刊最新文献
A Double-edged Sword: Evolutionary Novelty along Deep-time Diversity Oscillation in An Iconic Group of Predatory Insects (Neuroptera: Mantispoidea) Are Modern Cryptic Species Detectable in the Fossil Record? A Case Study on Agamid Lizards. Bayesian Selection of Relaxed-clock Models: Distinguishing Between Independent and Autocorrelated Rates. Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1