The ideal approach to Bayesian phylogenetic inference is to estimate all parameters of interest jointly in a single hierarchical model. However, this is often not feasible in practice due to the high computational cost. Instead, phylogenetic pipelines generally consist of sequential analyses, whereby a single point estimate from a given analysis is used as input for the next analysis (e.g., a single multiple sequence alignment is used to estimate a gene tree). In this framework, uncertainty is not propagated from step to step, which can lead to inaccurate or spuriously confident results. Here, we formally develop and test a sequential inference approach for Bayesian phylogenetic inference, which uses importance sampling to generate observations for the next step of an analysis pipeline from the posterior distribution produced in the previous step. Our sequential inference approach presented here not only accounts for uncertainty between analysis steps, but also allows for greater flexibility in software choice (and hence model availability) and can be computationally more efficient than the traditional joint inference approach when multiple models are being tested. We show that our sequential inference approach is identical in practice to the joint inference approach only if sufficient information in the data is present (a narrow posterior distribution) and/or sufficiently many importance samples are used. Conversely, we show that the common practice of using a single point estimate can be biased, e.g., a single phylogeny estimate to transform an unrooted phylogeny into a time-calibrated phylogeny. We demonstrate the theory of sequential Bayesian inference using both a toy example and an empirical case study of divergence-time estimation in insects using a relaxed clock model from transcriptome data. In the empirical example, we estimate three posterior distributions of branch lengths from the same data (DNA character matrix with a GTR+Γ+I substitution model, an amino acid data matrix with empirical substitution models, and an amino acid data matrix with the PhyloBayes CAT-GTR model). Finally, we apply three different node-calibration strategies and show that divergence-time estimates are affected by both the data source and underlying substitution process to estimate branch lengths as well as the node-calibration strategies. Thus, our new sequential Bayesian phylogenetic inference provides the opportunity to efficiently test different approaches for divergence time estimation, including branch-length estimation from other software.
贝叶斯系统发育推断的理想方法是在单一分层模型中联合估计所有相关参数。然而,由于计算成本较高,这在实践中往往并不可行。取而代之的是,系统发育管道一般由连续分析组成,即把给定分析中的单点估计值作为下一步分析的输入(例如,用单个多序列比对来估计基因树)。在这个框架中,不确定性不会从一个步骤传播到另一个步骤,这可能导致不准确或虚假的可信结果。在这里,我们正式开发并测试了一种贝叶斯系统发育推断的顺序推断方法,该方法使用重要性采样从上一步产生的后验分布中为下一步分析流水线生成观测值。我们在此介绍的顺序推断方法不仅考虑了分析步骤之间的不确定性,而且在软件选择(从而模型可用性)方面具有更大的灵活性,并且在测试多个模型时比传统的联合推断方法计算效率更高。我们的研究表明,只有当数据中存在足够的信息(窄后验分布)和/或使用了足够多的重要性样本时,我们的顺序推断方法在实践中才与联合推断方法相同。相反,我们证明了使用单点估计的常见做法可能存在偏差,例如,使用单个系统发育估计将未根系统发育转化为时间校准系统发育。我们通过一个玩具示例和一个实证案例研究证明了序列贝叶斯推断理论,即利用转录组数据中的松弛时钟模型对昆虫的分化时间进行估计。在经验示例中,我们从相同的数据(采用 GTR+Γ+I 替代模型的 DNA 特征矩阵、采用经验替代模型的氨基酸数据矩阵和采用 PhyloBayes CAT-GTR 模型的氨基酸数据矩阵)中估计了三个分支长度的后验分布。最后,我们应用了三种不同的节点校准策略,结果表明分歧时间估计值既受数据源和基础替代过程的影响,也受估计分支长度的节点校准策略的影响。因此,我们新的序列贝叶斯系统发育推断方法为有效测试不同的分歧时间估计方法(包括其他软件的分支长度估计方法)提供了机会。
{"title":"Sequential Bayesian Phylogenetic Inference.","authors":"Sebastian Höhna, Allison Y Hsiang","doi":"10.1093/sysbio/syae020","DOIUrl":"https://doi.org/10.1093/sysbio/syae020","url":null,"abstract":"<p><p>The ideal approach to Bayesian phylogenetic inference is to estimate all parameters of interest jointly in a single hierarchical model. However, this is often not feasible in practice due to the high computational cost. Instead, phylogenetic pipelines generally consist of sequential analyses, whereby a single point estimate from a given analysis is used as input for the next analysis (e.g., a single multiple sequence alignment is used to estimate a gene tree). In this framework, uncertainty is not propagated from step to step, which can lead to inaccurate or spuriously confident results. Here, we formally develop and test a sequential inference approach for Bayesian phylogenetic inference, which uses importance sampling to generate observations for the next step of an analysis pipeline from the posterior distribution produced in the previous step. Our sequential inference approach presented here not only accounts for uncertainty between analysis steps, but also allows for greater flexibility in software choice (and hence model availability) and can be computationally more efficient than the traditional joint inference approach when multiple models are being tested. We show that our sequential inference approach is identical in practice to the joint inference approach only if sufficient information in the data is present (a narrow posterior distribution) and/or sufficiently many importance samples are used. Conversely, we show that the common practice of using a single point estimate can be biased, e.g., a single phylogeny estimate to transform an unrooted phylogeny into a time-calibrated phylogeny. We demonstrate the theory of sequential Bayesian inference using both a toy example and an empirical case study of divergence-time estimation in insects using a relaxed clock model from transcriptome data. In the empirical example, we estimate three posterior distributions of branch lengths from the same data (DNA character matrix with a GTR+Γ+I substitution model, an amino acid data matrix with empirical substitution models, and an amino acid data matrix with the PhyloBayes CAT-GTR model). Finally, we apply three different node-calibration strategies and show that divergence-time estimates are affected by both the data source and underlying substitution process to estimate branch lengths as well as the node-calibration strategies. Thus, our new sequential Bayesian phylogenetic inference provides the opportunity to efficiently test different approaches for divergence time estimation, including branch-length estimation from other software.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141071866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua P Egan, Andrew M Simons, Mohammad Sadegh Alavi-Yeganeh, Michael P Hammer, Prasert Tongnunui, Dahiana Arcila, Ricardo Betancur-R, Devin D Bloom
Migration independently evolved numerous times in animals, with a myriad of ecological and evolutionary implications. In fishes, perhaps the most extreme form of migration is diadromy, the migration between marine and freshwater environments. Key and longstanding questions are: how many times has diadromy evolved in fishes, how frequently do diadromous clades give rise to non-diadromous species, and does diadromy influence lineage diversification rates? Many diadromous fishes have large geographic ranges with constituent populations that use isolated freshwater habitats. This may limit gene flow among some populations, increasing the likelihood of speciation in diadromous lineages relative to non-diadromous lineages. Alternatively, diadromy may reduce lineage diversification rates if migration is associated with enhanced dispersal capacity that facilitates gene flow within and between populations. Clupeiformes (herrings, sardines, shads and anchovies) is a model clade for testing hypotheses about the evolution of diadromy because it includes an exceptionally high proportion of diadromous species and several independent evolutionary origins of diadromy. However, relationships among major clupeiform lineages remain unresolved and existing phylogenies sparsely sampled diadromous species, limiting the resolution of phylogenetically-informed statistical analyses. We assembled a phylogenomic dataset and used multi-species coalescent and concatenation-based approaches to generate the most comprehensive, highly-resolved clupeiform phylogeny to date, clarifying associations among several major clades and identifying recalcitrant relationships needing further examination. We determined that variation in rates of sequence evolution (heterotachy) and base-composition (non-stationarity) had little impact on our results. Using this phylogeny, we characterized evolutionary patterns of diadromy and tested for differences in lineage diversification rates between diadromous, marine, and freshwater lineages. We identified thirteen transitions to diadromy, all during the Cenozoic Era (ten origins of anadromy, two origins of catadromy, and one origin of amphidromy), and seven losses of diadromy. Two diadromous lineages rapidly generated non-diadromous species, demonstrating that diadromy is not an evolutionary dead-end. We discovered considerably faster transition rates out of diadromy than to diadromy. The largest lineage diversification rate increase in Clupeiformes was associated with a transition to diadromy, but we uncovered little statistical support for categorically faster lineage diversification rates in diadromous versus non-diadromous fishes. We propose that diadromy may increase the potential for accelerated lineage diversification, particularly in species that migrate long distances. However, this potential may only be realized in certain biogeographic contexts, such as when diadromy allows access to ecosystems in which there is limited competition from
{"title":"Phylogenomics, Lineage Diversification Rates, and the Evolution of Diadromy in Clupeiformes (Anchovies, Herrings, Sardines, and Relatives)","authors":"Joshua P Egan, Andrew M Simons, Mohammad Sadegh Alavi-Yeganeh, Michael P Hammer, Prasert Tongnunui, Dahiana Arcila, Ricardo Betancur-R, Devin D Bloom","doi":"10.1093/sysbio/syae022","DOIUrl":"https://doi.org/10.1093/sysbio/syae022","url":null,"abstract":"Migration independently evolved numerous times in animals, with a myriad of ecological and evolutionary implications. In fishes, perhaps the most extreme form of migration is diadromy, the migration between marine and freshwater environments. Key and longstanding questions are: how many times has diadromy evolved in fishes, how frequently do diadromous clades give rise to non-diadromous species, and does diadromy influence lineage diversification rates? Many diadromous fishes have large geographic ranges with constituent populations that use isolated freshwater habitats. This may limit gene flow among some populations, increasing the likelihood of speciation in diadromous lineages relative to non-diadromous lineages. Alternatively, diadromy may reduce lineage diversification rates if migration is associated with enhanced dispersal capacity that facilitates gene flow within and between populations. Clupeiformes (herrings, sardines, shads and anchovies) is a model clade for testing hypotheses about the evolution of diadromy because it includes an exceptionally high proportion of diadromous species and several independent evolutionary origins of diadromy. However, relationships among major clupeiform lineages remain unresolved and existing phylogenies sparsely sampled diadromous species, limiting the resolution of phylogenetically-informed statistical analyses. We assembled a phylogenomic dataset and used multi-species coalescent and concatenation-based approaches to generate the most comprehensive, highly-resolved clupeiform phylogeny to date, clarifying associations among several major clades and identifying recalcitrant relationships needing further examination. We determined that variation in rates of sequence evolution (heterotachy) and base-composition (non-stationarity) had little impact on our results. Using this phylogeny, we characterized evolutionary patterns of diadromy and tested for differences in lineage diversification rates between diadromous, marine, and freshwater lineages. We identified thirteen transitions to diadromy, all during the Cenozoic Era (ten origins of anadromy, two origins of catadromy, and one origin of amphidromy), and seven losses of diadromy. Two diadromous lineages rapidly generated non-diadromous species, demonstrating that diadromy is not an evolutionary dead-end. We discovered considerably faster transition rates out of diadromy than to diadromy. The largest lineage diversification rate increase in Clupeiformes was associated with a transition to diadromy, but we uncovered little statistical support for categorically faster lineage diversification rates in diadromous versus non-diadromous fishes. We propose that diadromy may increase the potential for accelerated lineage diversification, particularly in species that migrate long distances. However, this potential may only be realized in certain biogeographic contexts, such as when diadromy allows access to ecosystems in which there is limited competition from ","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140954248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George P Tiley, Andrew A Crowl, Paul S Manos, Emily B Sessa, Claudia Solís-Lemus, Anne D Yoder, J Gordon Burleigh
Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.
{"title":"Benefits and Limits of Phasing Alleles for Network Inference of Allopolyploid Complexes.","authors":"George P Tiley, Andrew A Crowl, Paul S Manos, Emily B Sessa, Claudia Solís-Lemus, Anne D Yoder, J Gordon Burleigh","doi":"10.1093/sysbio/syae024","DOIUrl":"https://doi.org/10.1093/sysbio/syae024","url":null,"abstract":"<p><p>Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140908806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edward A Myers, Rhett M Rautsaw, Miguel Borja, Jason Jones, Christoph I Grünwald, Matthew L Holding, Felipe Grazziotin, Christopher L Parkinson
Phylogenomics allows us to uncover the historical signal of evolutionary processes through time and estimate phylogenetic networks accounting for these signals. Insight from genome-wide data further allows us to pinpoint the contributions to phylogenetic signal from hybridization, introgression, and ancestral polymorphism across the genome. Here we focus on how these processes have contributed to phylogenetic discordance among rattlesnakes (genera Crotalus and Sistrurus), a group for which there are numerous conflicting phylogenetic hypotheses based on a diverse array of molecular datasets and analytical methods. We address the instability of the rattlesnake phylogeny using genomic data generated from transcriptomes sampled from nearly all known species. These genomic data, analyzed with coalescent and network-based approaches, reveal numerous instances of rapid speciation where individual gene trees conflict with the species tree. Moreover, the evolutionary history of rattlesnakes is dominated by incomplete speciation and frequent hybridization, both of which have likely influenced past interpretations of phylogeny. We present a new framework in which the evolutionary relationships of this group can only be understood in light of genome-wide data and network-based analytical methods. Our data suggest that network radiations, like seen within the rattlesnakes, can only be understood in a phylogenomic context, necessitating similar approaches in our attempts to understand evolutionary history in other rapidly radiating species.
{"title":"Phylogenomic discordance is driven by wide-spread introgression and incomplete lineage sorting during rapid species diversification within rattlesnakes (Viperidae: Crotalus and Sistrurus)","authors":"Edward A Myers, Rhett M Rautsaw, Miguel Borja, Jason Jones, Christoph I Grünwald, Matthew L Holding, Felipe Grazziotin, Christopher L Parkinson","doi":"10.1093/sysbio/syae018","DOIUrl":"https://doi.org/10.1093/sysbio/syae018","url":null,"abstract":"Phylogenomics allows us to uncover the historical signal of evolutionary processes through time and estimate phylogenetic networks accounting for these signals. Insight from genome-wide data further allows us to pinpoint the contributions to phylogenetic signal from hybridization, introgression, and ancestral polymorphism across the genome. Here we focus on how these processes have contributed to phylogenetic discordance among rattlesnakes (genera Crotalus and Sistrurus), a group for which there are numerous conflicting phylogenetic hypotheses based on a diverse array of molecular datasets and analytical methods. We address the instability of the rattlesnake phylogeny using genomic data generated from transcriptomes sampled from nearly all known species. These genomic data, analyzed with coalescent and network-based approaches, reveal numerous instances of rapid speciation where individual gene trees conflict with the species tree. Moreover, the evolutionary history of rattlesnakes is dominated by incomplete speciation and frequent hybridization, both of which have likely influenced past interpretations of phylogeny. We present a new framework in which the evolutionary relationships of this group can only be understood in light of genome-wide data and network-based analytical methods. Our data suggest that network radiations, like seen within the rattlesnakes, can only be understood in a phylogenomic context, necessitating similar approaches in our attempts to understand evolutionary history in other rapidly radiating species.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140821053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J Luis Leal, Pascal Milesi, Eva Hodková, Qiujie Zhou, Jennifer James, D Magnus Eklund, Tanja Pyhäjärvi, Jarkko Salojärvi, Martin Lascoux
Introgression allows polyploid species to acquire new genomic content from diploid progenitors or from other unrelated diploid or polyploid lineages, contributing to genetic diversity and facilitating adaptive allele discovery. In some cases, high levels of introgression elicit the replacement of large numbers of alleles inherited from the polyploid’s ancestral species, profoundly reshaping the polyploid’s genomic composition. In such complex polyploids, it is often difficult to determine which taxa were the progenitor species and which taxa provided additional introgressive blocks through subsequent hybridization. Here, we use population-level genomic data to reconstruct the phylogenetic history of Betula pubescens (downy birch), a tetraploid species often assumed to be of allopolyploid origin and which is known to hybridize with at least four other birch species. This was achieved by modeling polyploidization and introgression events under the multispecies coalescent and then using an approximate Bayesian computation rejection algorithm to evaluate and compare competing polyploidization models. We provide evidence that B. pubescens is the outcome of an autoploid genome doubling event in the common ancestor of B. pendula and its extant sister species, B. platyphylla, that took place approximately 178,000–188,000 generations ago. Extensive hybridization with B. pendula, B. nana, and B. humilis followed in the aftermath of autopolyploidization, with the relative contribution of each of these species to the B. pubescens genome varying markedly across the species’ range. Functional analysis of B. pubescens loci containing alleles introgressed from B. nana identified multiple genes involved in climate adaptation, while loci containing alleles derived from B. humilis revealed several genes involved in the regulation of meiotic stability and pollen viability in plant species.
外来入侵使多倍体物种能够从二倍体祖先或其他不相关的二倍体或多倍体品系中获得新的基因组内容,从而促进遗传多样性,促进适应性等位基因的发现。在某些情况下,高水平的引种会导致从多倍体祖先物种继承的大量等位基因被替换,从而深刻改变多倍体的基因组组成。在这种复杂的多倍体中,通常很难确定哪些类群是祖先物种,哪些类群通过随后的杂交提供了额外的导入块。在本文中,我们利用种群级基因组数据重建了桦树(Betula pubescens)的系统发育历史,桦树是一个四倍体物种,通常被假定为起源于全多倍体,已知至少与其他四个桦树物种杂交。为此,我们在多物种凝聚下建立了多倍体化和引种事件模型,然后使用近似贝叶斯计算剔除算法来评估和比较相互竞争的多倍体化模型。我们提供的证据表明,B. pubescens 是 B. pendula 及其现生姊妹种 B. platyphylla 的共同祖先在大约 178,000-188,000 代前发生的自倍基因组加倍事件的结果。在自多倍体化之后,B. pubescens与B. pendula、B. nana和B. humilis发生了广泛杂交,这些物种对B. pubescens基因组的相对贡献在整个物种分布区有明显差异。对含有从B. nana导入的等位基因的B. pubescens基因座进行的功能分析发现了多个参与气候适应的基因,而含有从B. humilis导入的等位基因的基因座则发现了多个参与调节植物物种减数分裂稳定性和花粉活力的基因。
{"title":"Complex Polyploids: Origins, Genomic Composition, and Role of Introgressed Alleles","authors":"J Luis Leal, Pascal Milesi, Eva Hodková, Qiujie Zhou, Jennifer James, D Magnus Eklund, Tanja Pyhäjärvi, Jarkko Salojärvi, Martin Lascoux","doi":"10.1093/sysbio/syae012","DOIUrl":"https://doi.org/10.1093/sysbio/syae012","url":null,"abstract":"Introgression allows polyploid species to acquire new genomic content from diploid progenitors or from other unrelated diploid or polyploid lineages, contributing to genetic diversity and facilitating adaptive allele discovery. In some cases, high levels of introgression elicit the replacement of large numbers of alleles inherited from the polyploid’s ancestral species, profoundly reshaping the polyploid’s genomic composition. In such complex polyploids, it is often difficult to determine which taxa were the progenitor species and which taxa provided additional introgressive blocks through subsequent hybridization. Here, we use population-level genomic data to reconstruct the phylogenetic history of Betula pubescens (downy birch), a tetraploid species often assumed to be of allopolyploid origin and which is known to hybridize with at least four other birch species. This was achieved by modeling polyploidization and introgression events under the multispecies coalescent and then using an approximate Bayesian computation rejection algorithm to evaluate and compare competing polyploidization models. We provide evidence that B. pubescens is the outcome of an autoploid genome doubling event in the common ancestor of B. pendula and its extant sister species, B. platyphylla, that took place approximately 178,000–188,000 generations ago. Extensive hybridization with B. pendula, B. nana, and B. humilis followed in the aftermath of autopolyploidization, with the relative contribution of each of these species to the B. pubescens genome varying markedly across the species’ range. Functional analysis of B. pubescens loci containing alleles introgressed from B. nana identified multiple genes involved in climate adaptation, while loci containing alleles derived from B. humilis revealed several genes involved in the regulation of meiotic stability and pollen viability in plant species.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140607746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amy R Tims, Peter J Unmack, Michael P Hammer, Culum Brown, Mark Adams, Matthew D McGee
Crater lake fishes are common evolutionary model systems, with recent studies suggesting a key role for gene flow in promoting rapid adaptation and speciation. However, the study of these young lakes can be complicated by human-mediated extinctions. Museum genomics approaches integrating genetic data from recently extinct species are therefore critical to understanding the complex evolutionary histories of these fragile systems. Here, we examine the evolutionary history of an extinct Southern Hemisphere crater lake endemic, the rainbowfish Melanotaenia eachamensis. We undertook comprehensive sampling of extant rainbowfish populations of the Atherton Tablelands of Australia alongside historical museum material to understand the evolutionary origins of the extinct crater lake population and the dynamics of gene flow across the ecoregion. The extinct crater lake species is genetically distinct from all other nearby populations due to historic introgression between two proximate riverine lineages, similar to other prominent crater lake speciation systems, but this historic gene flow has not been sufficient to induce a species flock. Our results suggest that museum genomics approaches can be successfully combined with extant sampling to unravel complex speciation dynamics involving recently extinct species.
{"title":"Museum genomics reveals the hybrid origin of an extinct crater lake endemic","authors":"Amy R Tims, Peter J Unmack, Michael P Hammer, Culum Brown, Mark Adams, Matthew D McGee","doi":"10.1093/sysbio/syae017","DOIUrl":"https://doi.org/10.1093/sysbio/syae017","url":null,"abstract":"Crater lake fishes are common evolutionary model systems, with recent studies suggesting a key role for gene flow in promoting rapid adaptation and speciation. However, the study of these young lakes can be complicated by human-mediated extinctions. Museum genomics approaches integrating genetic data from recently extinct species are therefore critical to understanding the complex evolutionary histories of these fragile systems. Here, we examine the evolutionary history of an extinct Southern Hemisphere crater lake endemic, the rainbowfish Melanotaenia eachamensis. We undertook comprehensive sampling of extant rainbowfish populations of the Atherton Tablelands of Australia alongside historical museum material to understand the evolutionary origins of the extinct crater lake population and the dynamics of gene flow across the ecoregion. The extinct crater lake species is genetically distinct from all other nearby populations due to historic introgression between two proximate riverine lineages, similar to other prominent crater lake speciation systems, but this historic gene flow has not been sufficient to induce a species flock. Our results suggest that museum genomics approaches can be successfully combined with extant sampling to unravel complex speciation dynamics involving recently extinct species.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140544951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sally Potter, Craig Moritz, Maxine P Piggott, Jason G Bragg, Ana C Afonso Silva, Ke Bi, Christiana McDonald-Spicer, Rustamzhon Turakulov, Mark D B Eldridge
Increased sampling of genomes and populations across closely related species has revealed that levels of genetic exchange during and after speciation are higher than previously thought. One obvious manifestation of such exchange is strong cytonuclear discordance, where the divergence in mitochondrial DNA (mtDNA) differs from that for nuclear genes more (or less) than expected from differences between mtDNA and nuclear DNA (nDNA) in population size and mutation rate. Given genome-scale datasets and coalescent modelling, we can now confidently identify cases of strong discordance and test specifically for historical or recent introgression as the cause. Using population sampling, combining exon capture data from historical museum specimens and recently collected tissues we showcase how genomic tools can resolve complex evolutionary histories in the brachyotis group of rock-wallabies (Petrogale). In particular, applying population and phylogenomic approaches we can assess the role of demographic processes in driving complex evolutionary patterns and assess a role of ancient introgression and hybridisation. We find that described species are well supported as monophyletic taxa for nDNA genes, but not for mtDNA, with cytonuclear discordance involving at least four operational taxonomic units (OTUs) across four species which diverged 183-278 kya. ABC modelling of nDNA gene trees supports introgression during or after speciation for some taxon pairs with cytonuclear discordance. Given substantial differences in body size between the species involved, this evidence for gene flow is surprising. Heterogenous patterns of introgression were identified but do not appear to be associated with chromosome differences between species. These and previous results suggest that dynamic past climates across the monsoonal tropics could have promoted reticulation among related species.
{"title":"Museum skins enable identification of introgression associated with cytonuclear discordance","authors":"Sally Potter, Craig Moritz, Maxine P Piggott, Jason G Bragg, Ana C Afonso Silva, Ke Bi, Christiana McDonald-Spicer, Rustamzhon Turakulov, Mark D B Eldridge","doi":"10.1093/sysbio/syae016","DOIUrl":"https://doi.org/10.1093/sysbio/syae016","url":null,"abstract":"Increased sampling of genomes and populations across closely related species has revealed that levels of genetic exchange during and after speciation are higher than previously thought. One obvious manifestation of such exchange is strong cytonuclear discordance, where the divergence in mitochondrial DNA (mtDNA) differs from that for nuclear genes more (or less) than expected from differences between mtDNA and nuclear DNA (nDNA) in population size and mutation rate. Given genome-scale datasets and coalescent modelling, we can now confidently identify cases of strong discordance and test specifically for historical or recent introgression as the cause. Using population sampling, combining exon capture data from historical museum specimens and recently collected tissues we showcase how genomic tools can resolve complex evolutionary histories in the brachyotis group of rock-wallabies (Petrogale). In particular, applying population and phylogenomic approaches we can assess the role of demographic processes in driving complex evolutionary patterns and assess a role of ancient introgression and hybridisation. We find that described species are well supported as monophyletic taxa for nDNA genes, but not for mtDNA, with cytonuclear discordance involving at least four operational taxonomic units (OTUs) across four species which diverged 183-278 kya. ABC modelling of nDNA gene trees supports introgression during or after speciation for some taxon pairs with cytonuclear discordance. Given substantial differences in body size between the species involved, this evidence for gene flow is surprising. Heterogenous patterns of introgression were identified but do not appear to be associated with chromosome differences between species. These and previous results suggest that dynamic past climates across the monsoonal tropics could have promoted reticulation among related species.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140352043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos J Pavón-Vázquez, Qaantah Rana, Keaka Farleigh, Erika Crispo, Mimi Zeng, Jeevanie Liliah, Daniel Mulcahy, Alfredo Ascanio, Tereza Jezkova, Adam D Leaché, Tomas Flouri, Ziheng Yang, Christopher Blair
The opposing forces of gene flow and isolation are two major processes shaping genetic diversity. Understanding how these vary across space and time is necessary to identify the environmental features that promote diversification. The detection of considerable geographic structure in taxa from the arid Nearctic has prompted research into the drivers of isolation in the region. Several geographic features have been proposed as barriers to gene flow, including the Colorado River, Western Continental Divide, and a hypothetical Mid-Peninsular Seaway in Baja California. However, recent studies suggest that the role of barriers in genetic differentiation may have been overestimated when compared to other mechanisms of divergence. In this study, we infer historical and spatial patterns of connectivity and isolation in Desert Spiny Lizards (Sceloporus magister) and Baja Spiny Lizards (S. zosteromus), which together form a species complex composed of parapatric lineages with wide distributions in arid western North America. Our analyses incorporate mitochondrial sequences, genomic-scale data, and past and present climatic data to evaluate the nature and strength of barriers to gene flow in the region. Our approach relies on estimates of migration under the multispecies coalescent to understand the history of lineage divergence in the face of gene flow. Results show that the S. magister complex is geographically structured, but we also detect instances of gene flow. The Continental Divide is a strong barrier to gene flow, while the Colorado River is more permeable. Analyses yield conflicting results for the catalyst of differentiation of peninsular lineages in S. zosteromus. Our study shows how large-scale genomic data for thoroughly sampled species can shed new light on biogeography. Furthermore, our approach highlights the need for the combined analysis of multiple sources of evidence to adequately characterize the drivers of divergence.
{"title":"Gene Flow and Isolation in the Arid Nearctic Revealed by Genomic Analyses of Desert Spiny Lizards","authors":"Carlos J Pavón-Vázquez, Qaantah Rana, Keaka Farleigh, Erika Crispo, Mimi Zeng, Jeevanie Liliah, Daniel Mulcahy, Alfredo Ascanio, Tereza Jezkova, Adam D Leaché, Tomas Flouri, Ziheng Yang, Christopher Blair","doi":"10.1093/sysbio/syae001","DOIUrl":"https://doi.org/10.1093/sysbio/syae001","url":null,"abstract":"The opposing forces of gene flow and isolation are two major processes shaping genetic diversity. Understanding how these vary across space and time is necessary to identify the environmental features that promote diversification. The detection of considerable geographic structure in taxa from the arid Nearctic has prompted research into the drivers of isolation in the region. Several geographic features have been proposed as barriers to gene flow, including the Colorado River, Western Continental Divide, and a hypothetical Mid-Peninsular Seaway in Baja California. However, recent studies suggest that the role of barriers in genetic differentiation may have been overestimated when compared to other mechanisms of divergence. In this study, we infer historical and spatial patterns of connectivity and isolation in Desert Spiny Lizards (Sceloporus magister) and Baja Spiny Lizards (S. zosteromus), which together form a species complex composed of parapatric lineages with wide distributions in arid western North America. Our analyses incorporate mitochondrial sequences, genomic-scale data, and past and present climatic data to evaluate the nature and strength of barriers to gene flow in the region. Our approach relies on estimates of migration under the multispecies coalescent to understand the history of lineage divergence in the face of gene flow. Results show that the S. magister complex is geographically structured, but we also detect instances of gene flow. The Continental Divide is a strong barrier to gene flow, while the Colorado River is more permeable. Analyses yield conflicting results for the catalyst of differentiation of peninsular lineages in S. zosteromus. Our study shows how large-scale genomic data for thoroughly sampled species can shed new light on biogeography. Furthermore, our approach highlights the need for the combined analysis of multiple sources of evidence to adequately characterize the drivers of divergence.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139400471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Determining the link between genomic and phenotypic change is a fundamental goal in evolutionary biology. Insights into this link can be gained by using a phylogenetic approach to test for correlations between rates of molecular and morphological evolution. However, there has been persistent uncertainty about the relationship between these rates, partly because conflicting results have been obtained using various methods that have not been examined in detail. We carried out a simulation study to evaluate the performance of 5 statistical methods for detecting correlated rates of evolution. Our simulations explored the evolution of molecular sequences and morphological characters under a range of conditions. Of the methods tested, Bayesian relaxed-clock estimation of branch rates was able to detect correlated rates of evolution correctly in the largest number of cases. This was followed by correlations of root-to-tip distances, Bayesian model selection, independent sister-pairs contrasts, and likelihood-based model selection. As expected, the power to detect correlated rates increased with the amount of data, both in terms of tree size and number of morphological characters. Likewise, greater among-lineage rate variation in the data led to improved performance of all 5 methods, particularly for Bayesian relaxed-clock analysis when the rate model was mismatched. We then applied these methods to a data set from flowering plants and did not find evidence of a correlation in evolutionary rates between genomic data and morphological characters. The results of our study have practical implications for phylogenetic analyses of combined molecular and morphological data sets, and highlight the conditions under which the links between genomic and phenotypic rates of evolution can be evaluated quantitatively.
{"title":"Evaluating the Accuracy of Methods for Detecting Correlated Rates of Molecular and Morphological Evolution.","authors":"Yasmin Asar, Hervé Sauquet, Simon Y W Ho","doi":"10.1093/sysbio/syad055","DOIUrl":"10.1093/sysbio/syad055","url":null,"abstract":"<p><p>Determining the link between genomic and phenotypic change is a fundamental goal in evolutionary biology. Insights into this link can be gained by using a phylogenetic approach to test for correlations between rates of molecular and morphological evolution. However, there has been persistent uncertainty about the relationship between these rates, partly because conflicting results have been obtained using various methods that have not been examined in detail. We carried out a simulation study to evaluate the performance of 5 statistical methods for detecting correlated rates of evolution. Our simulations explored the evolution of molecular sequences and morphological characters under a range of conditions. Of the methods tested, Bayesian relaxed-clock estimation of branch rates was able to detect correlated rates of evolution correctly in the largest number of cases. This was followed by correlations of root-to-tip distances, Bayesian model selection, independent sister-pairs contrasts, and likelihood-based model selection. As expected, the power to detect correlated rates increased with the amount of data, both in terms of tree size and number of morphological characters. Likewise, greater among-lineage rate variation in the data led to improved performance of all 5 methods, particularly for Bayesian relaxed-clock analysis when the rate model was mismatched. We then applied these methods to a data set from flowering plants and did not find evidence of a correlation in evolutionary rates between genomic data and morphological characters. The results of our study have practical implications for phylogenetic analyses of combined molecular and morphological data sets, and highlight the conditions under which the links between genomic and phenotypic rates of evolution can be evaluated quantitatively.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10924723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10554842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods are based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of a constant substitution rate across lineages and genes, which is commonly violated in many groups. In this work, we quantify the effects of rate variation on the D test (also known as ABBA-BABA test), the D3 test, and HyDe. All 3 tests are used widely across a range of taxonomic groups, in part because they are very fast to compute. We consider rate variation across species lineages, across genes, their lineage-by-gene interaction, and rate variation across gene-tree edges. We simulated species networks according to a birth-death-hybridization process, so as to capture a range of realistic species phylogenies. For all 3 methods tested, we found a marked increase in the false discovery of reticulation (type-1 error rate) when there is rate variation across species lineages. The D3 test was the most sensitive, with around 80% type-1 error, such that D3 appears to more sensitive to a departure from the clock than to the presence of reticulation. For all 3 tests, the power to detect hybridization events decreased as the number of hybridization events increased, indicating that multiple hybridization events can obscure one another if they occur within a small subset of taxa. Our study highlights the need to consider rate variation when using site-based summary statistics, and points to the advantages of methods that do not require assumptions on evolutionary rates across lineages or across genes.
{"title":"Summary Tests of Introgression Are Highly Sensitive to Rate Variation Across Lineages.","authors":"Lauren E Frankel, Cécile Ané","doi":"10.1093/sysbio/syad056","DOIUrl":"10.1093/sysbio/syad056","url":null,"abstract":"<p><p>The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods are based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of a constant substitution rate across lineages and genes, which is commonly violated in many groups. In this work, we quantify the effects of rate variation on the D test (also known as ABBA-BABA test), the D3 test, and HyDe. All 3 tests are used widely across a range of taxonomic groups, in part because they are very fast to compute. We consider rate variation across species lineages, across genes, their lineage-by-gene interaction, and rate variation across gene-tree edges. We simulated species networks according to a birth-death-hybridization process, so as to capture a range of realistic species phylogenies. For all 3 methods tested, we found a marked increase in the false discovery of reticulation (type-1 error rate) when there is rate variation across species lineages. The D3 test was the most sensitive, with around 80% type-1 error, such that D3 appears to more sensitive to a departure from the clock than to the presence of reticulation. For all 3 tests, the power to detect hybridization events decreased as the number of hybridization events increased, indicating that multiple hybridization events can obscure one another if they occur within a small subset of taxa. Our study highlights the need to consider rate variation when using site-based summary statistics, and points to the advantages of methods that do not require assumptions on evolutionary rates across lineages or across genes.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10214455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}