首页 > 最新文献

Systematic Biology最新文献

英文 中文
Online tree expansion could help solve the problem of scalability in Bayesian phylogenetics. 在线树扩展有助于解决贝叶斯系统发育中的可扩展性问题。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad045
Jakub Truszkowski, Allison Perrigo, David Broman, Fredrik Ronquist, Alexandre Antonelli

Bayesian phylogenetics is now facing a critical point. Over the last 20 years, Bayesian methods have reshaped phylogenetic inference and gained widespread popularity due to their high accuracy, the ability to quantify the uncertainty of inferences and the possibility of accommodating multiple aspects of evolutionary processes in the models that are used. Unfortunately, Bayesian methods are computationally expensive, and typical applications involve at most a few hundred sequences. This is problematic in the age of rapidly expanding genomic data and increasing scope of evolutionary analyses, forcing researchers to resort to less accurate but faster methods, such as maximum parsimony and maximum likelihood. Does this spell doom for Bayesian methods? Not necessarily. Here, we discuss some recently proposed approaches that could help scale up Bayesian analyses of evolutionary problems considerably. We focus on two particular aspects: online phylogenetics, where new data sequences are added to existing analyses, and alternatives to Markov chain Monte Carlo (MCMC) for scalable Bayesian inference. We identify 5 specific challenges and discuss how they might be overcome. We believe that online phylogenetic approaches and Sequential Monte Carlo hold great promise and could potentially speed up tree inference by orders of magnitude. We call for collaborative efforts to speed up the development of methods for real-time tree expansion through online phylogenetics.

贝叶斯系统发育学现在正面临一个临界点。在过去的20年里,贝叶斯方法重塑了系统发育推断,并因其高准确性、量化推断不确定性的能力以及在所使用的模型中适应进化过程多个方面的可能性而广受欢迎。不幸的是,贝叶斯方法在计算上是昂贵的,并且典型的应用最多涉及几百个序列。在基因组数据迅速扩展和进化分析范围不断扩大的时代,这是一个问题,迫使研究人员采用不太准确但更快的方法,如最大简约和最大可能性。这是否意味着贝叶斯方法的末日?不一定。在这里,我们讨论了一些最近提出的方法,这些方法可以帮助大大扩大进化问题的贝叶斯分析。我们专注于两个特定的方面:在线系统发育学,其中新的数据序列被添加到现有的分析中,以及用于可扩展贝叶斯推理的马尔可夫链蒙特卡罗(MCMC)的替代方案。我们确定了5个具体挑战,并讨论了如何克服这些挑战。我们相信,在线系统发育方法和序列蒙特卡罗方法具有很大的前景,并有可能将树推断速度提高几个数量级。我们呼吁共同努力,通过在线系统发育学加快实时树扩展方法的开发。
{"title":"Online tree expansion could help solve the problem of scalability in Bayesian phylogenetics.","authors":"Jakub Truszkowski, Allison Perrigo, David Broman, Fredrik Ronquist, Alexandre Antonelli","doi":"10.1093/sysbio/syad045","DOIUrl":"10.1093/sysbio/syad045","url":null,"abstract":"<p><p>Bayesian phylogenetics is now facing a critical point. Over the last 20 years, Bayesian methods have reshaped phylogenetic inference and gained widespread popularity due to their high accuracy, the ability to quantify the uncertainty of inferences and the possibility of accommodating multiple aspects of evolutionary processes in the models that are used. Unfortunately, Bayesian methods are computationally expensive, and typical applications involve at most a few hundred sequences. This is problematic in the age of rapidly expanding genomic data and increasing scope of evolutionary analyses, forcing researchers to resort to less accurate but faster methods, such as maximum parsimony and maximum likelihood. Does this spell doom for Bayesian methods? Not necessarily. Here, we discuss some recently proposed approaches that could help scale up Bayesian analyses of evolutionary problems considerably. We focus on two particular aspects: online phylogenetics, where new data sequences are added to existing analyses, and alternatives to Markov chain Monte Carlo (MCMC) for scalable Bayesian inference. We identify 5 specific challenges and discuss how they might be overcome. We believe that online phylogenetic approaches and Sequential Monte Carlo hold great promise and could potentially speed up tree inference by orders of magnitude. We call for collaborative efforts to speed up the development of methods for real-time tree expansion through online phylogenetics.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10235207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PARNAS: Objectively Selecting the Most Representative Taxa on a Phylogeny. 客观地选择一个系统发育上最具代表性的分类群。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad028
Alexey Markin, Sanket Wagle, Siddhant Grover, Amy L Vincent Baker, Oliver Eulenstein, Tavis K Anderson

The use of next-generation sequencing technology has enabled phylogenetic studies with hundreds of thousands of taxa. Such large-scale phylogenies have become a critical component in genomic epidemiology in pathogens such as SARS-CoV-2 and influenza A virus. However, detailed phenotypic characterization of pathogens or generating a computationally tractable dataset for detailed phylogenetic analyses requires objective subsampling of taxa. To address this need, we propose parnas, an objective and flexible algorithm to sample and select taxa that best represent observed diversity by solving a generalized k-medoids problem on a phylogenetic tree. parnas solves this problem efficiently and exactly by novel optimizations and adapting algorithms from operations research. For more nuanced selections, taxa can be weighted with metadata or genetic sequence parameters, and the pool of potential representatives can be user-constrained. Motivated by influenza A virus genomic surveillance and vaccine design, parnas can be applied to identify representative taxa that optimally cover the diversity in a phylogeny within a specified distance radius. We demonstrated that parnas is more efficient and flexible than existing approaches. To demonstrate its utility, we applied parnas to 1) quantify SARS-CoV-2 genetic diversity over time, 2) select representative influenza A virus in swine genes derived from over 5 years of genomic surveillance data, and 3) identify gaps in H3N2 human influenza A virus vaccine coverage. We suggest that our method, through the objective selection of representatives in a phylogeny, provides criteria for quantifying genetic diversity that has application in the the rational design of multivalent vaccines and genomic epidemiology. PARNAS is available at https://github.com/flu-crew/parnas.

下一代测序技术的使用使人们能够对数十万个分类群进行系统发育研究。这种大规模的系统发育已成为严重急性呼吸系统综合征冠状病毒2型和甲型流感病毒等病原体基因组流行病学的关键组成部分。然而,病原体的详细表型表征或为详细的系统发育分析生成可计算处理的数据集需要对分类群进行客观的亚采样。为了满足这一需求,我们提出了parnas,这是一种客观而灵活的算法,通过解决系统发育树上的广义k-medoids问题来采样和选择最能代表观察到的多样性的分类群。parnas通过新颖的优化和运筹学中的自适应算法,有效而准确地解决了这一问题。对于更细微的选择,可以使用元数据或遗传序列参数对分类群进行加权,并且潜在代表的库可以受到用户限制。受甲型流感病毒基因组监测和疫苗设计的启发,parnas可用于识别在特定距离半径内最佳覆盖系统发育多样性的代表性分类群。我们证明了parnas比现有方法更高效、更灵活。为了证明其实用性,我们将parnas应用于1)量化随着时间的推移的严重急性呼吸系统综合征冠状病毒2型的遗传多样性,2)从5年以上的基因组监测数据中选择猪基因中的代表性甲型流感病毒,以及3)确定H3N2人甲型流感病毒疫苗覆盖率的差距。我们认为,我们的方法通过客观选择系统发育学中的代表,为量化遗传多样性提供了标准,可用于多价疫苗的合理设计和基因组流行病学。PARNAS可在https://github.com/flu-crew/parnas.
{"title":"PARNAS: Objectively Selecting the Most Representative Taxa on a Phylogeny.","authors":"Alexey Markin, Sanket Wagle, Siddhant Grover, Amy L Vincent Baker, Oliver Eulenstein, Tavis K Anderson","doi":"10.1093/sysbio/syad028","DOIUrl":"10.1093/sysbio/syad028","url":null,"abstract":"<p><p>The use of next-generation sequencing technology has enabled phylogenetic studies with hundreds of thousands of taxa. Such large-scale phylogenies have become a critical component in genomic epidemiology in pathogens such as SARS-CoV-2 and influenza A virus. However, detailed phenotypic characterization of pathogens or generating a computationally tractable dataset for detailed phylogenetic analyses requires objective subsampling of taxa. To address this need, we propose parnas, an objective and flexible algorithm to sample and select taxa that best represent observed diversity by solving a generalized k-medoids problem on a phylogenetic tree. parnas solves this problem efficiently and exactly by novel optimizations and adapting algorithms from operations research. For more nuanced selections, taxa can be weighted with metadata or genetic sequence parameters, and the pool of potential representatives can be user-constrained. Motivated by influenza A virus genomic surveillance and vaccine design, parnas can be applied to identify representative taxa that optimally cover the diversity in a phylogeny within a specified distance radius. We demonstrated that parnas is more efficient and flexible than existing approaches. To demonstrate its utility, we applied parnas to 1) quantify SARS-CoV-2 genetic diversity over time, 2) select representative influenza A virus in swine genes derived from over 5 years of genomic surveillance data, and 3) identify gaps in H3N2 human influenza A virus vaccine coverage. We suggest that our method, through the objective selection of representatives in a phylogeny, provides criteria for quantifying genetic diversity that has application in the the rational design of multivalent vaccines and genomic epidemiology. PARNAS is available at https://github.com/flu-crew/parnas.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627562/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9543633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Major Features of Macroevolution. 宏观进化的主要特征。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad032
L Francisco Henao-Diaz, Matt Pennell

Evolutionary dynamics operating across deep time leave footprints in the shapes of phylogenetic trees. For the last several decades, researchers have used increasingly large and robust phylogenies to study the evolutionary history of individual clades and to investigate the causes of the glaring disparities in diversity among groups. Whereas typically not the focal point of individual clade-level studies, many researchers have remarked on recurrent patterns that have been observed across many different groups and at many different time scales. Whereas previous studies have documented various such regularities in topology and branch length distributions, they have typically focused on a single pattern and used a disparate collection (oftentimes, of quite variable reliability) of trees to assess it. Here we take advantage of modern megaphylogenies and unify previous disparate observations about the shapes embedded in the Tree of Life to create a catalog of the "major features of macroevolution." By characterizing such a large swath of subtrees in a consistent way, we hope to provide a set of phenomena that process-based macroevolutionary models of diversification ought to seek to explain.

跨越时间的进化动力学在系统发育树的形状上留下了足迹。在过去的几十年里,研究人员使用越来越大和强大的系统发育来研究单个分支的进化史,并调查群体之间多样性差异明显的原因。尽管通常不是单个分支级别研究的焦点,但许多研究人员对在许多不同的群体和许多不同的时间尺度上观察到的复发模式进行了评论。尽管之前的研究已经记录了拓扑结构和分支长度分布中的各种此类规律,但他们通常关注单一模式,并使用不同的树集合(通常具有相当可变的可靠性)来评估它。在这里,我们利用现代大系统发育学的优势,将之前对生命之树中嵌入的形状的不同观察统一起来,创建一个“宏观进化的主要特征”目录。通过以一致的方式表征如此大的子树,我们希望提供一组基于过程的多样化宏观进化模型应该寻求解释的现象。
{"title":"The Major Features of Macroevolution.","authors":"L Francisco Henao-Diaz, Matt Pennell","doi":"10.1093/sysbio/syad032","DOIUrl":"10.1093/sysbio/syad032","url":null,"abstract":"<p><p>Evolutionary dynamics operating across deep time leave footprints in the shapes of phylogenetic trees. For the last several decades, researchers have used increasingly large and robust phylogenies to study the evolutionary history of individual clades and to investigate the causes of the glaring disparities in diversity among groups. Whereas typically not the focal point of individual clade-level studies, many researchers have remarked on recurrent patterns that have been observed across many different groups and at many different time scales. Whereas previous studies have documented various such regularities in topology and branch length distributions, they have typically focused on a single pattern and used a disparate collection (oftentimes, of quite variable reliability) of trees to assess it. Here we take advantage of modern megaphylogenies and unify previous disparate observations about the shapes embedded in the Tree of Life to create a catalog of the \"major features of macroevolution.\" By characterizing such a large swath of subtrees in a consistent way, we hope to provide a set of phenomena that process-based macroevolutionary models of diversification ought to seek to explain.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9542216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A k-mer-Based Approach for Phylogenetic Classification of Taxa in Environmental Genomic Data. 基于k-mer的环境基因组数据类群系统发育分类方法。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad037
Julia Van Etten, Timothy G Stephens, Debashish Bhattacharya

In the age of genome sequencing, whole-genome data is readily and frequently generated, leading to a wealth of new information that can be used to advance various fields of research. New approaches, such as alignment-free phylogenetic methods that utilize k-mer-based distance scoring, are becoming increasingly popular given their ability to rapidly generate phylogenetic information from whole-genome data. However, these methods have not yet been tested using environmental data, which often tends to be highly fragmented and incomplete. Here, we compare the results of one alignment-free approach (which utilizes the D2 statistic) to traditional multi-gene maximum likelihood trees in 3 algal groups that have high-quality genome data available. In addition, we simulate lower-quality, fragmented genome data using these algae to test method robustness to genome quality and completeness. Finally, we apply the alignment-free approach to environmental metagenome assembled genome data of unclassified Saccharibacteria and Trebouxiophyte algae, and single-cell amplified data from uncultured marine stramenopiles to demonstrate its utility with real datasets. We find that in all instances, the alignment-free method produces phylogenies that are comparable, and often more informative, than those created using the traditional multi-gene approach. The k-mer-based method performs well even when there are significant missing data that include marker genes traditionally used for tree reconstruction. Our results demonstrate the value of alignment-free approaches for classifying novel, often cryptic or rare, species, that may not be culturable or are difficult to access using single-cell methods, but fill important gaps in the tree of life.

在基因组测序时代,全基因组数据很容易且频繁地生成,从而产生了丰富的新信息,可用于推进各个研究领域。新方法,如利用基于k-mer的距离评分的无比对系统发育方法,由于其能够从全基因组数据中快速生成系统发育信息,因此越来越受欢迎。然而,这些方法尚未使用环境数据进行测试,这些数据往往高度分散和不完整。在这里,我们将一种无比对方法(利用D2统计)的结果与具有高质量基因组数据的3个藻类组中的传统多基因最大似然树进行了比较。此外,我们使用这些藻类模拟低质量、碎片化的基因组数据,以测试方法对基因组质量和完整性的稳健性。最后,我们将无比对方法应用于未分类的糖杆菌和树状藻类的环境宏基因组组装基因组数据,以及未培养的海洋扁藻的单细胞扩增数据,以证明其在真实数据集中的实用性。我们发现,在所有情况下,无比对方法产生的系统发育与使用传统多基因方法创建的系统发育相比具有可比性,而且往往信息量更大。即使存在包括传统上用于树重建的标记基因的显著缺失数据,基于k-mer的方法也表现良好。我们的研究结果证明了无比对方法在分类新物种(通常是神秘或稀有物种)方面的价值,这些物种可能不可培养或难以使用单细胞方法获得,但填补了生命树中的重要空白。
{"title":"A k-mer-Based Approach for Phylogenetic Classification of Taxa in Environmental Genomic Data.","authors":"Julia Van Etten, Timothy G Stephens, Debashish Bhattacharya","doi":"10.1093/sysbio/syad037","DOIUrl":"10.1093/sysbio/syad037","url":null,"abstract":"<p><p>In the age of genome sequencing, whole-genome data is readily and frequently generated, leading to a wealth of new information that can be used to advance various fields of research. New approaches, such as alignment-free phylogenetic methods that utilize k-mer-based distance scoring, are becoming increasingly popular given their ability to rapidly generate phylogenetic information from whole-genome data. However, these methods have not yet been tested using environmental data, which often tends to be highly fragmented and incomplete. Here, we compare the results of one alignment-free approach (which utilizes the D2 statistic) to traditional multi-gene maximum likelihood trees in 3 algal groups that have high-quality genome data available. In addition, we simulate lower-quality, fragmented genome data using these algae to test method robustness to genome quality and completeness. Finally, we apply the alignment-free approach to environmental metagenome assembled genome data of unclassified Saccharibacteria and Trebouxiophyte algae, and single-cell amplified data from uncultured marine stramenopiles to demonstrate its utility with real datasets. We find that in all instances, the alignment-free method produces phylogenies that are comparable, and often more informative, than those created using the traditional multi-gene approach. The k-mer-based method performs well even when there are significant missing data that include marker genes traditionally used for tree reconstruction. Our results demonstrate the value of alignment-free approaches for classifying novel, often cryptic or rare, species, that may not be culturable or are difficult to access using single-cell methods, but fill important gaps in the tree of life.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9999697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PhyloCoalSimulations: A Simulator for Network Multispecies Coalescent Models, Including a New Extension for the Inheritance of Gene Flow. PhyloCoalSimulations:网络多物种聚结模型的模拟器,包括基因流遗传的新扩展。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad030
John Fogg, Elizabeth S Allman, Cécile Ané

We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance. At each reticulation, multiple lineages of a given locus are inherited from parental populations chosen at random, either independently across lineages or with positive correlation according to a Dirichlet process. This process may account for locus-specific probabilities of inheritance, for example. We implemented the simulation of gene trees under these network coalescent models in the Julia package PhyloCoalSimulations, which depends on PhyloNetworks and its powerful network manipulation tools. Input species phylogenies can be read in extended Newick format, either in numbers of generations or in coalescent units. Simulated gene trees can be written in Newick format, and in a way that preserves information about their embedding within the species network. This embedding can be used for downstream purposes, such as to simulate species-specific processes like rate variation across species, or for other scenarios as illustrated in this note. This package should be useful for simulation studies and simulation-based inference methods. The software is available open source with documentation and a tutorial at https://github.com/cecileane/PhyloCoalSimulations.jl.

我们考虑了系统发育基因树沿着系统发育物种网络的进化,根据网络的多物种聚结过程,引入了一种新的具有基因流相关遗传的网络聚结模型。该模型概括了网络融合的两个传统版本:具有独立继承或共同继承。在每个网状结构中,给定基因座的多个谱系是从随机选择的亲本群体中遗传的,要么独立地跨谱系遗传,要么根据狄利克雷过程呈正相关。例如,这个过程可以解释基因座特定的遗传概率。我们在Julia软件包PhylCoalSimulations中实现了这些网络合并模型下的基因树模拟,该软件包依赖于PhylNetworks及其强大的网络操作工具。输入物种系统发育可以用扩展的Newick格式读取,可以用代数或联合单位读取。模拟基因树可以用Newick格式编写,并以一种保存有关其嵌入物种网络的信息的方式编写。这种嵌入可以用于下游目的,例如模拟物种特定的过程,如跨物种的速率变化,或用于本说明中所示的其他场景。该软件包应适用于模拟研究和基于模拟的推理方法。该软件提供了开源文档和教程,网址为https://github.com/cecileane/PhyloCoalSimulations.jl.
{"title":"PhyloCoalSimulations: A Simulator for Network Multispecies Coalescent Models, Including a New Extension for the Inheritance of Gene Flow.","authors":"John Fogg, Elizabeth S Allman, Cécile Ané","doi":"10.1093/sysbio/syad030","DOIUrl":"10.1093/sysbio/syad030","url":null,"abstract":"<p><p>We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance. At each reticulation, multiple lineages of a given locus are inherited from parental populations chosen at random, either independently across lineages or with positive correlation according to a Dirichlet process. This process may account for locus-specific probabilities of inheritance, for example. We implemented the simulation of gene trees under these network coalescent models in the Julia package PhyloCoalSimulations, which depends on PhyloNetworks and its powerful network manipulation tools. Input species phylogenies can be read in extended Newick format, either in numbers of generations or in coalescent units. Simulated gene trees can be written in Newick format, and in a way that preserves information about their embedding within the species network. This embedding can be used for downstream purposes, such as to simulate species-specific processes like rate variation across species, or for other scenarios as illustrated in this note. This package should be useful for simulation studies and simulation-based inference methods. The software is available open source with documentation and a tutorial at https://github.com/cecileane/PhyloCoalSimulations.jl.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9547786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The ClaDS rate-heterogeneous birth-death prior for full phylogenetic inference in BEAST2. 在BEAST2中,ClaDS率-异质出生-死亡优先于完整的系统发育推断。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad027
Joëlle Barido-Sottani, Hélène Morlon

Bayesian phylogenetic inference requires a tree prior, which models the underlying diversification process that gives rise to the phylogeny. Existing birth-death diversification models include a wide range of features, for instance, lineage-specific variations in speciation and extinction (SSE) rates. While across-lineage variation in SSE rates is widespread in empirical datasets, few heterogeneous rate models have been implemented as tree priors for Bayesian phylogenetic inference. As a consequence, rate heterogeneity is typically ignored when reconstructing phylogenies, and rate heterogeneity is usually investigated on fixed trees. In this paper, we present a new BEAST2 package implementing the cladogenetic diversification rate shift (ClaDS) model as a tree prior. ClaDS is a birth-death diversification model designed to capture small progressive variations in birth and death rates along a phylogeny. Unlike previous implementations of ClaDS, which were designed to be used with fixed, user-chosen phylogenies, our package is implemented in the BEAST2 framework and thus allows full phylogenetic inference, where the phylogeny and model parameters are co-estimated from a molecular alignment. Our package provides all necessary components of the inference, including a new tree object and operators to propose moves to the Monte-Carlo Markov chain. It also includes a graphical interface through BEAUti. We validate our implementation of the package by comparing the produced distributions to simulated data and show an empirical example of the full inference, using a dataset of cetaceans.

贝叶斯系统发育推断需要一个树先验,它对导致系统发育的潜在多样化过程进行建模。现有的出生-死亡多样化模型包括广泛的特征,例如物种形成和灭绝(SSE)率的谱系特异性变化。虽然SSE率的跨谱系变化在经验数据集中很普遍,但很少有异构率模型被实现为贝叶斯系统发育推断的树先验。因此,在重建系统发育时,通常会忽略速率异质性,而速率异质性通常在固定树上进行研究。在本文中,我们提出了一个新的BEAST2包,该包将分支成因多样化速率转移(ClaDS)模型作为树先验来实现。ClaDS是一个出生-死亡多样化模型,旨在捕捉出生率和死亡率在系统发育过程中的微小渐进变化。与之前设计用于固定的、用户选择的系统发育的ClaDS实现不同,我们的软件包是在BEAST2框架中实现的,因此可以进行完整的系统发育推断,其中系统发育和模型参数是从分子比对中共同估计的。我们的程序包提供了推理的所有必要组件,包括一个新的树对象和运算符,以建议移动到蒙特卡罗马尔可夫链。它还包括一个通过BEAUti的图形界面。我们通过将产生的分布与模拟数据进行比较来验证我们对该软件包的实现,并使用鲸目动物数据集展示了完整推断的经验示例。
{"title":"The ClaDS rate-heterogeneous birth-death prior for full phylogenetic inference in BEAST2.","authors":"Joëlle Barido-Sottani, Hélène Morlon","doi":"10.1093/sysbio/syad027","DOIUrl":"10.1093/sysbio/syad027","url":null,"abstract":"<p><p>Bayesian phylogenetic inference requires a tree prior, which models the underlying diversification process that gives rise to the phylogeny. Existing birth-death diversification models include a wide range of features, for instance, lineage-specific variations in speciation and extinction (SSE) rates. While across-lineage variation in SSE rates is widespread in empirical datasets, few heterogeneous rate models have been implemented as tree priors for Bayesian phylogenetic inference. As a consequence, rate heterogeneity is typically ignored when reconstructing phylogenies, and rate heterogeneity is usually investigated on fixed trees. In this paper, we present a new BEAST2 package implementing the cladogenetic diversification rate shift (ClaDS) model as a tree prior. ClaDS is a birth-death diversification model designed to capture small progressive variations in birth and death rates along a phylogeny. Unlike previous implementations of ClaDS, which were designed to be used with fixed, user-chosen phylogenies, our package is implemented in the BEAST2 framework and thus allows full phylogenetic inference, where the phylogeny and model parameters are co-estimated from a molecular alignment. Our package provides all necessary components of the inference, including a new tree object and operators to propose moves to the Monte-Carlo Markov chain. It also includes a graphical interface through BEAUti. We validate our implementation of the package by comparing the produced distributions to simulated data and show an empirical example of the full inference, using a dataset of cetaceans.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627560/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9438881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring Historical Introgression with Deep Learning. 用深度学习推断历史渗透。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad033
Yubo Zhang, Qingjie Zhu, Yi Shao, Yanchen Jiang, Yidan Ouyang, Li Zhang, Wei Zhang

Resolving phylogenetic relationships among taxa remains a challenge in the era of big data due to the presence of genetic admixture in a wide range of organisms. Rapidly developing sequencing technologies and statistical tests enable evolutionary relationships to be disentangled at a genome-wide level, yet many of these tests are computationally intensive and rely on phased genotypes, large sample sizes, restricted phylogenetic topologies, or hypothesis testing. To overcome these difficulties, we developed a deep learning-based approach, named ERICA, for inferring genome-wide evolutionary relationships and local introgressed regions from sequence data. ERICA accepts sequence alignments of both population genomic data and multiple genome assemblies, and efficiently identifies discordant genealogy patterns and exchanged regions across genomes when compared with other methods. We further tested ERICA using real population genomic data from Heliconius butterflies that have undergone adaptive radiation and frequent hybridization. Finally, we applied ERICA to characterize hybridization and introgression in wild and cultivated rice, revealing the important role of introgression in rice domestication and adaptation. Taken together, our findings demonstrate that ERICA provides an effective method for teasing apart evolutionary relationships using whole genome data, which can ultimately facilitate evolutionary studies on hybridization and introgression.

在大数据时代,由于各种生物体中存在遗传混合,解决分类群之间的系统发育关系仍然是一个挑战。快速发展的测序技术和统计测试使进化关系能够在全基因组水平上解开,但这些测试中的许多都是计算密集型的,依赖于分阶段的基因型、大样本量、受限的系统发育拓扑结构或假设测试。为了克服这些困难,我们开发了一种基于深度学习的方法,名为ERICA,用于从序列数据推断全基因组进化关系和局部渗入区域。ERICA接受群体基因组数据和多个基因组组装的序列比对,与其他方法相比,可以有效地识别基因组中不一致的谱系模式和交换区域。我们使用经过适应性辐射和频繁杂交的Heliconius蝴蝶的真实种群基因组数据进一步测试了ERICA。最后,我们应用ERICA对野生和栽培水稻的杂交和渐渗进行了表征,揭示了渐渗在水稻驯化和适应中的重要作用。总之,我们的发现表明,ERICA提供了一种使用全基因组数据来区分进化关系的有效方法,最终可以促进杂交和渗入的进化研究。
{"title":"Inferring Historical Introgression with Deep Learning.","authors":"Yubo Zhang, Qingjie Zhu, Yi Shao, Yanchen Jiang, Yidan Ouyang, Li Zhang, Wei Zhang","doi":"10.1093/sysbio/syad033","DOIUrl":"10.1093/sysbio/syad033","url":null,"abstract":"<p><p>Resolving phylogenetic relationships among taxa remains a challenge in the era of big data due to the presence of genetic admixture in a wide range of organisms. Rapidly developing sequencing technologies and statistical tests enable evolutionary relationships to be disentangled at a genome-wide level, yet many of these tests are computationally intensive and rely on phased genotypes, large sample sizes, restricted phylogenetic topologies, or hypothesis testing. To overcome these difficulties, we developed a deep learning-based approach, named ERICA, for inferring genome-wide evolutionary relationships and local introgressed regions from sequence data. ERICA accepts sequence alignments of both population genomic data and multiple genome assemblies, and efficiently identifies discordant genealogy patterns and exchanged regions across genomes when compared with other methods. We further tested ERICA using real population genomic data from Heliconius butterflies that have undergone adaptive radiation and frequent hybridization. Finally, we applied ERICA to characterize hybridization and introgression in wild and cultivated rice, revealing the important role of introgression in rice domestication and adaptation. Taken together, our findings demonstrate that ERICA provides an effective method for teasing apart evolutionary relationships using whole genome data, which can ultimately facilitate evolutionary studies on hybridization and introgression.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9606856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speciation in Coastal Basins Driven by Staggered Headwater Captures: Dispersal of a Species Complex, Leporinus bahiensis, as Revealed by Genome-wide SNP Data. 交错水源捕获驱动的沿海盆地物种形成:全基因组SNP数据揭示的一种物种复合物bahiensis的扩散
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad034
Jorge L Ramirez, Carolina B Machado, Paulo Roberto Antunes de Mello Affonso, Pedro M Galetti

Past sea level changes and geological instability along watershed boundaries have largely influenced fish distribution across coastal basins, either by dispersal via palaeodrainages now submerged or by headwater captures, respectively. Accordingly, the South American Atlantic coast encompasses several small and isolated drainages that share a similar species composition, representing a suitable model to infer historical processes. Leporinus bahiensis is a freshwater fish species widespread along adjacent coastal basins over narrow continental shelf with no evidence of palaeodrainage connections at low sea level periods. Therefore, this study aimed to reconstruct its evolutionary history to infer the role of headwater captures in the dispersal process. To accomplish this, we employed molecular-level phylogenetic and population structure analyses based on Sanger sequences (5 genes) and genome-wide SNP data. Phylogenetic trees based on Sanger data were inconclusive, but SNPs data did support the monophyletic status of L. bahiensis. Both COI and SNP data revealed structured populations according to each hydrographic basin. Species delimitation analyses revealed from 3 (COI) to 5 (multilocus approach) MOTUs, corresponding to the sampled basins. An intricate biogeographic scenario was inferred and supported by Approximate Bayesian Computation (ABC) analysis. Specifically, a staggered pattern was revealed and characterized by sequential headwater captures from basins adjacent to upland drainages into small coastal basins at different periods. These headwater captures resulted in dispersal throughout contiguous coastal basins, followed by deep genetic divergence among lineages. To decipher such recent divergences, as herein represented by L. bahiensis populations, we used genome-wide SNPs data. Indeed, the combined use of genome-wide SNPs data and ABC method allowed us to reconstruct the evolutionary history and speciation of L. bahiensis. This framework might be useful in disentangling the diversification process in other neotropical fishes subject to a reticulate geological history.

过去的海平面变化和分水岭边界沿线的地质不稳定在很大程度上影响了沿海盆地的鱼类分布,无论是通过现在被淹没的古排水沟扩散,还是通过源头捕获。因此,南美洲大西洋海岸包括几个小而孤立的流域,它们具有相似的物种组成,代表了推断历史过程的合适模型。巴氏鳞鱼是一种淡水鱼类,分布在狭窄的大陆架上的邻近沿海盆地,没有证据表明在低海平面时期存在古水系连接。因此,本研究旨在重建其进化史,以推断源头捕获在扩散过程中的作用。为了实现这一点,我们采用了基于Sanger序列(5个基因)和全基因组SNP数据的分子水平系统发育和群体结构分析。基于Sanger数据的系统发育树是不确定的,但SNPs数据确实支持L.bahiensis的单系状态。COI和SNP数据都揭示了每个水文流域的结构种群。物种划界分析揭示了3个(COI)到5个(多点方法)MOTU,对应于采样盆地。近似贝叶斯计算(ABC)分析推断并支持了一个复杂的生物地理学场景。具体而言,揭示了一种交错模式,其特征是在不同时期从高地流域附近的流域到小型沿海流域的连续源头捕获。这些源头捕获导致了整个毗连的沿海盆地的扩散,随后是谱系之间的深层遗传差异。为了破解这种最近的差异,正如本文中以巴伊恩氏乳杆菌种群为代表的那样,我们使用了全基因组SNPs数据。事实上,全基因组SNPs数据和ABC方法的结合使用使我们能够重建巴氏乳杆菌的进化史和物种形成。这一框架可能有助于解开其他受网状地质史影响的新热带鱼的多样化过程。
{"title":"Speciation in Coastal Basins Driven by Staggered Headwater Captures: Dispersal of a Species Complex, Leporinus bahiensis, as Revealed by Genome-wide SNP Data.","authors":"Jorge L Ramirez, Carolina B Machado, Paulo Roberto Antunes de Mello Affonso, Pedro M Galetti","doi":"10.1093/sysbio/syad034","DOIUrl":"10.1093/sysbio/syad034","url":null,"abstract":"<p><p>Past sea level changes and geological instability along watershed boundaries have largely influenced fish distribution across coastal basins, either by dispersal via palaeodrainages now submerged or by headwater captures, respectively. Accordingly, the South American Atlantic coast encompasses several small and isolated drainages that share a similar species composition, representing a suitable model to infer historical processes. Leporinus bahiensis is a freshwater fish species widespread along adjacent coastal basins over narrow continental shelf with no evidence of palaeodrainage connections at low sea level periods. Therefore, this study aimed to reconstruct its evolutionary history to infer the role of headwater captures in the dispersal process. To accomplish this, we employed molecular-level phylogenetic and population structure analyses based on Sanger sequences (5 genes) and genome-wide SNP data. Phylogenetic trees based on Sanger data were inconclusive, but SNPs data did support the monophyletic status of L. bahiensis. Both COI and SNP data revealed structured populations according to each hydrographic basin. Species delimitation analyses revealed from 3 (COI) to 5 (multilocus approach) MOTUs, corresponding to the sampled basins. An intricate biogeographic scenario was inferred and supported by Approximate Bayesian Computation (ABC) analysis. Specifically, a staggered pattern was revealed and characterized by sequential headwater captures from basins adjacent to upland drainages into small coastal basins at different periods. These headwater captures resulted in dispersal throughout contiguous coastal basins, followed by deep genetic divergence among lineages. To decipher such recent divergences, as herein represented by L. bahiensis populations, we used genome-wide SNPs data. Indeed, the combined use of genome-wide SNPs data and ABC method allowed us to reconstruct the evolutionary history and speciation of L. bahiensis. This framework might be useful in disentangling the diversification process in other neotropical fishes subject to a reticulate geological history.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627554/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9557736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations. 使用matOptimize的在线系统发育产生等效树,并且对于大型SARS-CoV-2系统发育比从头开始和最大似然实现更有效。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad031
Alexander M Kramer, Bryan Thornlow, Cheng Ye, Nicola De Maio, Jakob McBroome, Angie S Hinrichs, Robert Lanfear, Yatish Turakhia, Russell Corbett-Detig

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.

系统发育遗传学一直是严重急性呼吸系统综合征冠状病毒2型研究和公共卫生政策的基础,有助于基因组监测、接触者追踪以及评估新变种的出现和传播。然而,严重急性呼吸系统综合征冠状病毒2型的系统发育分析通常依赖于为从头系统发育推断设计的工具,在进行任何分析之前都会收集所有数据,并从头开始推断系统发育。严重急性呼吸系统综合征冠状病毒2型数据集不符合这种模式。目前,在线数据库中有超过1400万个已测序的严重急性呼吸系统综合征冠状病毒2型基因组,每天新增数万个基因组。持续的数据收集,再加上严重急性呼吸系统综合征冠状病毒2型的公共卫生相关性,邀请了一种系统发育学的“在线”方法,即每天将新样本添加到现有的系统发育树中。严重急性呼吸系统综合征冠状病毒2型基因组的极其密集的采样也促使人们对系统发育推断的可能性和简约性方法进行比较。当单个分支的单个位点发生多个变化时,最大似然(ML)和伪ML方法可能更准确,但这种准确性需要大量的计算成本,而严重急性呼吸系统综合征冠状病毒2型基因组的密集采样意味着这些情况将极为罕见,因为每个内部分支预计都极短。因此,基于最大简约(MP)的方法对于重建严重急性呼吸系统综合征冠状病毒2型的系统发育可能足够准确,其简单性意味着它们可以应用于更大的数据集。在这里,我们评估了从头和在线系统发育方法的性能,以及ML、伪ML和MP框架,用于推断大规模和密集的严重急性呼吸系统综合征冠状病毒2型系统发育。总体而言,我们发现在线系统发育学产生的系统发育树与严重急性呼吸系统综合征冠状病毒2型的从头分析相似,而UShER和matOptimize的MP优化产生的严重急性呼吸系综合征冠状病毒二型系统发育与一些最流行的ML和伪ML推理工具相当。使用UShER和matOptimize的MP优化比目前可用的ML实现快数千倍,在线系统发育学比从头推断快。因此,我们的研究结果表明,基于简约的方法,如UShER和matOptimize,是大型严重急性呼吸系统综合征冠状病毒2型系统发育已建立的ML实现的一种准确且更实用的替代方法,可以成功应用于其他采样特别密集、分支长度较短的类似数据集。
{"title":"Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations.","authors":"Alexander M Kramer, Bryan Thornlow, Cheng Ye, Nicola De Maio, Jakob McBroome, Angie S Hinrichs, Robert Lanfear, Yatish Turakhia, Russell Corbett-Detig","doi":"10.1093/sysbio/syad031","DOIUrl":"10.1093/sysbio/syad031","url":null,"abstract":"<p><p>Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an \"online\" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627557/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9614158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phylogenomic Analyses Reveal an Allopolyploid Origin of Core Didymocarpinae (Gesneriaceae) Followed by Rapid Radiation. 系统基因组学分析揭示了苦苣苔科核心植物Didymocarpinae的异源多倍体起源。
IF 6.5 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2023-11-01 DOI: 10.1093/sysbio/syad029
Lihua Yang, A J Harris, Fang Wen, Zheng Li, Chao Feng, Hanghui Kong, Ming Kang

Allopolyploid plants have long been regarded as possessing genetic advantages under certain circumstances due to the combined effects of their hybrid origins and duplicated genomes. However, the evolutionary consequences of allopolyploidy in lineage diversification remain to be fully understood. Here, we investigate the evolutionary consequences of allopolyploidy using 138 transcriptomic sequences of Gesneriaceae, including 124 newly sequenced, focusing particularly on the largest subtribe Didymocarpinae. We estimated the phylogeny of Gesneriaceae using concatenated and coalescent-based methods based on five different nuclear matrices and 27 plastid genes, focusing on relationships among major clades. To better understand the evolutionary affinities in this family, we applied a range of approaches to characterize the extent and cause of phylogenetic incongruence. We found that extensive conflicts between nuclear and chloroplast genomes and among nuclear genes were caused by both incomplete lineage sorting (ILS) and reticulation, and we found evidence of widespread ancient hybridization and introgression. Using the most highly supported phylogenomic framework, we revealed multiple bursts of gene duplication throughout the evolutionary history of Gesneriaceae. By incorporating molecular dating and analyses of diversification dynamics, our study shows that an ancient allopolyploidization event occurred around the Oligocene-Miocene boundary, which may have driven the rapid radiation of core Didymocarpinae.

长期以来,异多倍体植物一直被认为在某些情况下具有遗传优势,这是由于它们的杂交起源和重复基因组的共同作用。然而,异源多倍体在谱系多样化中的进化后果仍有待充分理解。在这里,我们使用苦苣苔科的138个转录组序列,包括124个新测序的序列,研究了异倍性的进化后果,特别关注最大的亚种Didymocarpinae。基于5个不同的核基质和27个质体基因,我们使用串联和聚结的方法估计了苦苣苔科的系统发育,重点关注了主要分支之间的关系。为了更好地了解这个家族的进化亲缘关系,我们应用了一系列方法来描述系统发育不一致的程度和原因。我们发现,细胞核和叶绿体基因组之间以及细胞核基因之间的广泛冲突是由不完全谱系分类(ILS)和网状结构引起的,我们发现了广泛的古代杂交和渗入的证据。利用最受支持的系统发育学框架,我们揭示了苦苣苔科进化史上的多次基因重复。通过结合分子定年和多样化动力学分析,我们的研究表明,在渐新世-中新世边界附近发生了一次古老的异倍体化事件,这可能驱动了Didymocarpinae核心的快速辐射。
{"title":"Phylogenomic Analyses Reveal an Allopolyploid Origin of Core Didymocarpinae (Gesneriaceae) Followed by Rapid Radiation.","authors":"Lihua Yang, A J Harris, Fang Wen, Zheng Li, Chao Feng, Hanghui Kong, Ming Kang","doi":"10.1093/sysbio/syad029","DOIUrl":"10.1093/sysbio/syad029","url":null,"abstract":"<p><p>Allopolyploid plants have long been regarded as possessing genetic advantages under certain circumstances due to the combined effects of their hybrid origins and duplicated genomes. However, the evolutionary consequences of allopolyploidy in lineage diversification remain to be fully understood. Here, we investigate the evolutionary consequences of allopolyploidy using 138 transcriptomic sequences of Gesneriaceae, including 124 newly sequenced, focusing particularly on the largest subtribe Didymocarpinae. We estimated the phylogeny of Gesneriaceae using concatenated and coalescent-based methods based on five different nuclear matrices and 27 plastid genes, focusing on relationships among major clades. To better understand the evolutionary affinities in this family, we applied a range of approaches to characterize the extent and cause of phylogenetic incongruence. We found that extensive conflicts between nuclear and chloroplast genomes and among nuclear genes were caused by both incomplete lineage sorting (ILS) and reticulation, and we found evidence of widespread ancient hybridization and introgression. Using the most highly supported phylogenomic framework, we revealed multiple bursts of gene duplication throughout the evolutionary history of Gesneriaceae. By incorporating molecular dating and analyses of diversification dynamics, our study shows that an ancient allopolyploidization event occurred around the Oligocene-Miocene boundary, which may have driven the rapid radiation of core Didymocarpinae.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627561/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9434513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Systematic Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1