首页 > 最新文献

Systematic Biology最新文献

英文 中文
Clockor2: Inferring Global and Local Strict Molecular Clocks Using Root-to-Tip Regression. Clockor2:利用根尖回归推断全局和局部严格分子钟。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-09-05 DOI: 10.1093/sysbio/syae003
Leo A Featherstone, Andrew Rambaut, Sebastian Duchene, Wytamma Wirth

Molecular sequence data from rapidly evolving organisms are often sampled at different points in time. Sampling times can then be used for molecular clock calibration. The root-to-tip (RTT) regression is an essential tool to assess the degree to which the data behave in a clock-like fashion. Here, we introduce Clockor2, a client-side web application for conducting RTT regression. Clockor2 allows users to quickly fit local and global molecular clocks, thus handling the increasing complexity of genomic datasets that sample beyond the assumption of homogeneous host populations. Clockor2 is efficient, handling trees of up to the order of 104 tips, with significant speed increases compared with other RTT regression applications. Although clockor2 is written as a web application, all data processing happens on the client-side, meaning that data never leave the user's computer. Clockor2 is freely available at https://clockor2.github.io/.

快速进化生物的分子序列数据通常在不同的时间点取样。取样时间可用于分子钟校准。根尖回归(RTT)是评估数据行为与时钟相似程度的重要工具。在此,我们介绍一款用于进行 RTT 回归的客户端网络应用程序 Clockor2。Clockor2 允许用户快速拟合局部和全局分子钟,从而处理日益复杂的基因组数据集,这些数据集的采样超出了同质宿主种群的假设。Clockor2 非常高效,能处理多达 104 个尖端的树,与其他 RTT 回归应用程序相比,速度有显著提高。虽然 Clockor2 是作为网络应用程序编写的,但所有数据处理都在客户端进行,这意味着数据永远不会离开用户的计算机。Clockor2 可在 https ://clockor2.github.io/.
{"title":"Clockor2: Inferring Global and Local Strict Molecular Clocks Using Root-to-Tip Regression.","authors":"Leo A Featherstone, Andrew Rambaut, Sebastian Duchene, Wytamma Wirth","doi":"10.1093/sysbio/syae003","DOIUrl":"10.1093/sysbio/syae003","url":null,"abstract":"<p><p>Molecular sequence data from rapidly evolving organisms are often sampled at different points in time. Sampling times can then be used for molecular clock calibration. The root-to-tip (RTT) regression is an essential tool to assess the degree to which the data behave in a clock-like fashion. Here, we introduce Clockor2, a client-side web application for conducting RTT regression. Clockor2 allows users to quickly fit local and global molecular clocks, thus handling the increasing complexity of genomic datasets that sample beyond the assumption of homogeneous host populations. Clockor2 is efficient, handling trees of up to the order of 104 tips, with significant speed increases compared with other RTT regression applications. Although clockor2 is written as a web application, all data processing happens on the client-side, meaning that data never leave the user's computer. Clockor2 is freely available at https://clockor2.github.io/.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"623-628"},"PeriodicalIF":6.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11377183/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139898272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distinguishing Cophylogenetic Signal from Phylogenetic Congruence Clarifies the Interplay Between Evolutionary History and Species Interactions. 将同源遗传信号与系统发育一致性区分开来,可以澄清进化历史与物种相互作用之间的相互作用。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-09-05 DOI: 10.1093/sysbio/syae013
Benoît Perez-Lamarque, Hélène Morlon

Interspecific interactions, including host-symbiont associations, can profoundly affect the evolution of the interacting species. Given the phylogenies of host and symbiont clades and knowledge of which host species interact with which symbiont, two questions are often asked: "Do closely related hosts interact with closely related symbionts?" and "Do host and symbiont phylogenies mirror one another?." These questions are intertwined and can even collapse under specific situations, such that they are often confused one with the other. However, in most situations, a positive answer to the first question, hereafter referred to as "cophylogenetic signal," does not imply a close match between the host and symbiont phylogenies. It suggests only that past evolutionary history has contributed to shaping present-day interactions, which can arise, for example, through present-day trait matching, or from a single ancient vicariance event that increases the probability that closely related species overlap geographically. A positive answer to the second, referred to as "phylogenetic congruence," is more restrictive as it suggests a close match between the two phylogenies, which may happen, for example, if symbiont diversification tracks host diversification or if the diversifications of the two clades were subject to the same succession of vicariance events. Here we apply a set of methods (ParaFit, PACo, and eMPRess), whose significance is often interpreted as evidence for phylogenetic congruence, to simulations under 3 biologically realistic scenarios of trait matching, a single ancient vicariance event, and phylogenetic tracking with frequent cospeciation events. The latter is the only scenario that generates phylogenetic congruence, whereas the first 2 generate a cophylogenetic signal in the absence of phylogenetic congruence. We find that tests of global-fit methods (ParaFit and PACo) are significant under the 3 scenarios, whereas tests of event-based methods (eMPRess) are only significant under the scenario of phylogenetic tracking. Therefore, significant results from global-fit methods should be interpreted in terms of cophylogenetic signal and not phylogenetic congruence; such significant results can arise under scenarios when hosts and symbionts had independent evolutionary histories. Conversely, significant results from event-based methods suggest a strong form of dependency between hosts and symbionts evolutionary histories. Clarifying the patterns detected by different cophylogenetic methods is key to understanding how interspecific interactions shape and are shaped by evolution.

种间相互作用,包括宿主与共生体的结合,会深刻影响相互作用物种的进化。鉴于宿主和共生体支系的系统发育以及对哪些宿主物种与哪些共生体相互作用的了解,人们经常会提出两个问题:"密切相关的宿主会与密切相关的共生体相互作用吗?"以及 "宿主和共生体的系统发育是否相互映照?这两个问题交织在一起,在特定情况下甚至会相互冲突,因此经常被混淆。不过,在大多数情况下,第一个问题的肯定答案(下文称为 "同源信号")并不意味着宿主和共生体的系统发育密切匹配。它只是表明,过去的进化史有助于形成当今的相互作用,例如,这种相互作用可能是通过当今的性状匹配产生的,也可能是通过单一的远古迁徙事件产生的,这种迁徙事件增加了密切相关物种在地理上重叠的可能性。对第二种情况的肯定回答被称为 "系统发育一致性",它的限制性更强,因为它表明两个系统发育之间密切匹配,例如,如果共生体的多样化与宿主的多样化一致,或者如果两个支系的多样化经历了相同的连续沧桑事件,就可能出现这种情况。在这里,我们将一组方法(ParaFit、PACo 和 eMPRess)应用于三种生物现实情况下的模拟,即性状匹配、单一的古代沧桑事件和具有频繁共生事件的系统发育追踪。后者是唯一产生系统发育一致性的情景,而前两种情景在没有系统发育一致性的情况下会产生同源信号。我们发现,全局拟合方法(ParaFit 和 PACo)的检验结果在三种情况下都是显著的,而基于事件的方法(eMPRess)的检验结果只有在系统发育跟踪的情况下才显著。因此,全局拟合方法的显著结果应从同源信号而非系统发育一致性的角度来解释;这种显著结果可能出现在宿主和共生体具有独立进化史的情况下。相反,基于事件的方法得出的重要结果表明,宿主和共生体的进化历史之间存在很强的依赖性。澄清不同同源遗传学方法检测到的模式是理解种间相互作用如何影响进化以及进化如何影响种间相互作用的关键。
{"title":"Distinguishing Cophylogenetic Signal from Phylogenetic Congruence Clarifies the Interplay Between Evolutionary History and Species Interactions.","authors":"Benoît Perez-Lamarque, Hélène Morlon","doi":"10.1093/sysbio/syae013","DOIUrl":"10.1093/sysbio/syae013","url":null,"abstract":"<p><p>Interspecific interactions, including host-symbiont associations, can profoundly affect the evolution of the interacting species. Given the phylogenies of host and symbiont clades and knowledge of which host species interact with which symbiont, two questions are often asked: \"Do closely related hosts interact with closely related symbionts?\" and \"Do host and symbiont phylogenies mirror one another?.\" These questions are intertwined and can even collapse under specific situations, such that they are often confused one with the other. However, in most situations, a positive answer to the first question, hereafter referred to as \"cophylogenetic signal,\" does not imply a close match between the host and symbiont phylogenies. It suggests only that past evolutionary history has contributed to shaping present-day interactions, which can arise, for example, through present-day trait matching, or from a single ancient vicariance event that increases the probability that closely related species overlap geographically. A positive answer to the second, referred to as \"phylogenetic congruence,\" is more restrictive as it suggests a close match between the two phylogenies, which may happen, for example, if symbiont diversification tracks host diversification or if the diversifications of the two clades were subject to the same succession of vicariance events. Here we apply a set of methods (ParaFit, PACo, and eMPRess), whose significance is often interpreted as evidence for phylogenetic congruence, to simulations under 3 biologically realistic scenarios of trait matching, a single ancient vicariance event, and phylogenetic tracking with frequent cospeciation events. The latter is the only scenario that generates phylogenetic congruence, whereas the first 2 generate a cophylogenetic signal in the absence of phylogenetic congruence. We find that tests of global-fit methods (ParaFit and PACo) are significant under the 3 scenarios, whereas tests of event-based methods (eMPRess) are only significant under the scenario of phylogenetic tracking. Therefore, significant results from global-fit methods should be interpreted in terms of cophylogenetic signal and not phylogenetic congruence; such significant results can arise under scenarios when hosts and symbionts had independent evolutionary histories. Conversely, significant results from event-based methods suggest a strong form of dependency between hosts and symbionts evolutionary histories. Clarifying the patterns detected by different cophylogenetic methods is key to understanding how interspecific interactions shape and are shaped by evolution.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"613-622"},"PeriodicalIF":6.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140111462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Heuristic Species Delimitation under the Multispecies Coalescent Model with Migration 多物种聚合模型下的分层启发式物种划分与迁移
IF 6.5 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-08-20 DOI: 10.1093/sysbio/syae050
Daniel Kornai, Xiyun Jiao, Jiayi Ji, Tomáš Flouri, Ziheng Yang
The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for one species versus MSC for two species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively heuristic criteria based on population parameters (such as popula- tion split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (𝑔𝑑𝑖) and implement them in a python pipeline called hhsd. We characterize the behavior of the 𝑔𝑑𝑖 under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as three empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to over-splitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.
多物种聚合(MSC)模型可容纳整个基因组的谱系波动,并为近缘物种基因组序列数据的比较分析提供了一个自然框架,以推断物种分化和基因流动的历史。给定一组种群,物种划界(和物种系统发育)的假设可以表述为 MSC 模型的实例(例如,一个物种的 MSC 与两个物种的 MSC),并使用贝叶斯模型选择法进行比较。这种方法已在 bpp 程序中实现,但发现容易造成过度分裂。另一种方法是根据基因组数据估算出的种群参数(如种群分裂时间、种群大小和迁移率),采用启发式标准来划分物种。在此,我们基于系谱学分歧指数(𝑔𝑑𝑖)开发了启发式物种划界的分层合并与拆分算法,并在名为 hhsd 的 python 管道中加以实现。我们描述了几种简单的基因流动情况下 𝑔𝑖𝑑的行为特征。我们将新方法应用于在距离隔离模型下模拟的数据集以及三个经验数据集。我们的测试表明,新方法产生了合理的结果,而且不容易出现过度分裂。我们讨论了在分层算法中容纳旁系物种的可能策略,以及基于启发式标准的物种划分所面临的挑战。
{"title":"Hierarchical Heuristic Species Delimitation under the Multispecies Coalescent Model with Migration","authors":"Daniel Kornai, Xiyun Jiao, Jiayi Ji, Tomáš Flouri, Ziheng Yang","doi":"10.1093/sysbio/syae050","DOIUrl":"https://doi.org/10.1093/sysbio/syae050","url":null,"abstract":"The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for one species versus MSC for two species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively heuristic criteria based on population parameters (such as popula- tion split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (𝑔𝑑𝑖) and implement them in a python pipeline called hhsd. We characterize the behavior of the 𝑔𝑑𝑖 under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as three empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to over-splitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"4 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142045640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAST: Phylogenetic Inference with Mixtures Across Sites and Trees. MAST:利用跨位点和树的混合物进行系统发育推断。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syae008
Thomas K F Wong, Caitlin Cherryh, Allen G Rodrigo, Matthew W Hahn, Bui Quang Minh, Robert Lanfear

Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.

在现代系统发生组学研究中,通常会用到成百上千个基因位点。树推断的串联方法假定整个数据集存在单一拓扑结构,但不同的基因位点可能由于不完全的品系分类、引种和/或水平基因转移而具有不同的进化历史;甚至单个基因位点也可能由于重组而不是树状的。为了克服这一缺陷,我们引入了一个多树混合模型的实现方法,我们称之为 MAST。该模型扩展了 Boussau 等人(2009 年)的先前实现,允许用户在单个比对中估算一组预先指定的分叉树中每棵树的权重。MAST 模型允许每棵树都有自己的权重、拓扑结构、分支长度、替代模型、核苷酸或氨基酸频率以及跨位点的速率异质性模型。我们在流行的系统发生学软件 IQ-TREE 的最大似然法框架内实现了 MAST 模型。模拟结果表明,我们可以在多种生物现实场景下准确地恢复真实的模型参数,包括给定树拓扑的分支长度和树权重。我们还证明,在多树模型下模拟数据时,我们可以使用标准的统计推断方法来拒绝单树模型(反之亦然)。我们将 MAST 模型应用于多个灵长类动物数据集,发现它可以恢复类人猿不完全的血统分类信号,以及由多个猕猴物种间的引种引起的次要树的不对称性。当应用于一个由四个钝齿类物种组成的数据集时,我们观察到,MAST给予基因树方法也支持的树以最高权重(即最大比例的位点)。这些结果表明,MAST 模型能够使用最大似然法分析连接比对,同时避免了假设只有一棵树所带来的一些偏差。我们将讨论未来如何扩展 MAST 模型。
{"title":"MAST: Phylogenetic Inference with Mixtures Across Sites and Trees.","authors":"Thomas K F Wong, Caitlin Cherryh, Allen G Rodrigo, Matthew W Hahn, Bui Quang Minh, Robert Lanfear","doi":"10.1093/sysbio/syae008","DOIUrl":"10.1093/sysbio/syae008","url":null,"abstract":"<p><p>Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"375-391"},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11282360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139991260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Effect of Copy Number Hemiplasy on Gene Family Evolution. 拷贝数半重复对基因家族进化的影响
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syae007
Qiuyi Li, Yao-Ban Chan, Nicolas Galtier, Celine Scornavacca

The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss, and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempts to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper, we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models-multilocus multispecies coalescent (MLMSC), which models CNH, and duplication, loss, and coalescence (DLCoal), which does not-approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.

基因家族的进化是复杂的,涉及基因水平的进化事件,如基因复制、水平基因转移和基因丢失(DTL),以及其他过程,如不完全世系分类(ILS)。因此,基因树和物种树之间往往存在拓扑差异。为了解释这些差异,人们最近建立了一些模型,其中最现实的模型试图同时考虑基因水平事件和 ILS。当把 ILS 和基因水平事件统一到一个模型中时,它们之间的相互作用会导致基因拷贝数的多态性,我们称之为拷贝数半同源(CNH)。在本文中,我们扩展了赖特-费舍过程,使其包括多个物种的复制和丢失,并证明这一过程的 CNH 概率可能很大。我们研究了两种统一模型--模拟 CNH 的 MLMSC(多焦点多物种凝聚)和不模拟 CNH 的 DLCoal(复制、丢失和凝聚)--在多大程度上近似了包含复制和丢失的 Wright-Fisher 过程。然后,我们通过比较 MLMSC 和 DLCoal,研究了 CNH 对基因家族演化的影响。在这两种模型下,我们生成的基因树具有可比性,但在各种汇总统计中显示出显著差异;最重要的是,CNH 大大减少了基因拷贝数。如果不考虑这一点,传统的重复率估算方法(通过计算基因拷贝数)就会变得不准确。模拟的基因树还被用于使用 ASTRAL 和 ASTRAL-Pro 方法进行物种树推断,结果表明,基于真实数据校准的无 CNH 感知模拟的准确性可能被高估了。
{"title":"The Effect of Copy Number Hemiplasy on Gene Family Evolution.","authors":"Qiuyi Li, Yao-Ban Chan, Nicolas Galtier, Celine Scornavacca","doi":"10.1093/sysbio/syae007","DOIUrl":"10.1093/sysbio/syae007","url":null,"abstract":"<p><p>The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss, and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempts to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper, we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models-multilocus multispecies coalescent (MLMSC), which models CNH, and duplication, loss, and coalescence (DLCoal), which does not-approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"355-374"},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139707930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Considering Decoupled Phenotypic Diversification Between Ontogenetic Phases in Macroevolution: An Example Using Triggerfishes (Balistidae). 考虑宏观进化中本体发育阶段之间的脱钩表型多样化:以触发鱼(Balistidae)为例。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syae014
Alex Dornburg, Katerina L Zapfe, Rachel Williams, Michael E Alfaro, Richard Morris, Haruka Adachi, Joseph Flores, Francesco Santini, Thomas J Near, Bruno Frédérich

Across the Tree of Life, most studies of phenotypic disparity and diversification have been restricted to adult organisms. However, many lineages have distinct ontogenetic phases that differ from their adult forms in morphology and ecology. Focusing disproportionately on the evolution of adult forms unnecessarily hinders our understanding of the pressures shaping evolution over time. Non-adult disparity patterns are particularly important to consider for coastal ray-finned fishes, which can have juvenile phases with distinct phenotypes. These juvenile forms are often associated with sheltered nursery environments, with phenotypic shifts between adults and juvenile stages that are readily apparent in locomotor morphology. Whether this ontogenetic variation in locomotor morphology reflects a decoupling of diversification dynamics between life stages remains unknown. Here we investigate the evolutionary dynamics of locomotor morphology between adult and juvenile triggerfishes. We integrate a time-calibrated phylogenetic framework with geometric morphometric approaches and measurement data of fin aspect ratio and incidence, and reveal a mismatch between morphospace occupancy, the evolution of morphological disparity, and the tempo of trait evolution between life stages. Collectively, our results illuminate how the heterogeneity of morpho-functional adaptations can decouple the mode and tempo of morphological diversification between ontogenetic stages.

在整个生命之树上,对表型差异和多样化的研究大多局限于成体生物。然而,许多生物系都有不同的发育阶段,在形态和生态学上都不同于成体。过分关注成体生物的进化,会不必要地阻碍我们对影响生物随时间进化的压力的理解。对于沿海鳐形鱼类来说,非成体差异模式尤其重要,因为这些鱼类的幼体阶段往往具有不同的表型。这些幼鱼形态通常与隐蔽的育苗环境有关,成鱼和幼鱼阶段的表型变化在运动形态上很容易看出来。运动形态的这种本体变异是否反映了生命阶段之间多样化动态的脱钩,目前仍不清楚。在这里,我们研究了鲀成鱼和幼鱼之间运动形态的进化动态。我们将时间校准的系统发生学框架与几何形态计量学方法以及鳍长宽比和入射率的测量数据相结合,揭示了形态空间占据、形态差异演化以及生命阶段间性状演化速度之间的不匹配。总之,我们的研究结果阐明了形态功能适应的异质性如何使本体发育阶段之间形态多样化的模式和速度脱钩。
{"title":"Considering Decoupled Phenotypic Diversification Between Ontogenetic Phases in Macroevolution: An Example Using Triggerfishes (Balistidae).","authors":"Alex Dornburg, Katerina L Zapfe, Rachel Williams, Michael E Alfaro, Richard Morris, Haruka Adachi, Joseph Flores, Francesco Santini, Thomas J Near, Bruno Frédérich","doi":"10.1093/sysbio/syae014","DOIUrl":"10.1093/sysbio/syae014","url":null,"abstract":"<p><p>Across the Tree of Life, most studies of phenotypic disparity and diversification have been restricted to adult organisms. However, many lineages have distinct ontogenetic phases that differ from their adult forms in morphology and ecology. Focusing disproportionately on the evolution of adult forms unnecessarily hinders our understanding of the pressures shaping evolution over time. Non-adult disparity patterns are particularly important to consider for coastal ray-finned fishes, which can have juvenile phases with distinct phenotypes. These juvenile forms are often associated with sheltered nursery environments, with phenotypic shifts between adults and juvenile stages that are readily apparent in locomotor morphology. Whether this ontogenetic variation in locomotor morphology reflects a decoupling of diversification dynamics between life stages remains unknown. Here we investigate the evolutionary dynamics of locomotor morphology between adult and juvenile triggerfishes. We integrate a time-calibrated phylogenetic framework with geometric morphometric approaches and measurement data of fin aspect ratio and incidence, and reveal a mismatch between morphospace occupancy, the evolution of morphological disparity, and the tempo of trait evolution between life stages. Collectively, our results illuminate how the heterogeneity of morpho-functional adaptations can decouple the mode and tempo of morphological diversification between ontogenetic stages.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"434-454"},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140137307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the gold standard in NCBI GenBank and related databases: DNA sequences from type specimens and type strains. 模式标本和类型菌株的DNA序列——如何增加它们的数量并改进它们在NCBI GenBank和相关数据库中的注释。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syad068
Susanne S Renner, Mark D Scherz, Conrad L Schoch, Marc Gottschling, Miguel Vences

Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 739 species of arthropods, 1542 vertebrates, and 125 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.

学名使人类和搜索引擎能够获取有关我们周围生物多样性的知识,与DNA序列相关的名称在搜索和匹配鉴定程序中发挥着越来越大的作用。在这里,我们分析了国家生物技术信息中心(NCBI)的用户和管理者是如何标记和管理来自命名型材料的序列的,从长远来看,这是提高dna鉴定质量的唯一途径。对于原核生物,NCBI工作人员已经整理了18281个类型菌株的基因组组合,提高了原核生物命名的质量。对于真菌来说,代表超过21000个物种的类型衍生序列现在对于真菌的命名和鉴定是必不可少的。然而,对于剩余的真核生物,可识别为类型衍生的序列数量很少,仅代表1,000种节肢动物,8,441种脊椎动物和430种胚胎植物。这类序列的生产和管理的增加将来自于(i)博物馆收藏的类型或拓扑标本的测序,(ii) 2023年3月国际核苷酸序列数据库协作规则的变化,需要更多的标本元数据,以及(iii)数据提交者为促进管理所做的努力,包括告知NCBI馆长标本的类型状态。我们说明了不同类型数据提交过程,并提供了来自一系列生物体的最佳实践示例。扩大DNA数据库中类型衍生序列的数量,特别是真核生物的类型衍生序列,对于捕获、记录和保护生物多样性至关重要。
{"title":"Improving the gold standard in NCBI GenBank and related databases: DNA sequences from type specimens and type strains.","authors":"Susanne S Renner, Mark D Scherz, Conrad L Schoch, Marc Gottschling, Miguel Vences","doi":"10.1093/sysbio/syad068","DOIUrl":"10.1093/sysbio/syad068","url":null,"abstract":"<p><p>Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 739 species of arthropods, 1542 vertebrates, and 125 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"486-494"},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92156794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Phylogenetic Analysis on Multi-Core Compute Architectures: Implementation and Evaluation of BEAGLE in RevBayes With MPI. 多核计算架构上的贝叶斯系统发育分析:使用 MPI 实现和评估 RevBayes 中的 BEAGLE。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syae005
Killian Smith, Daniel Ayres, René Neumaier, Gert Wörheide, Sebastian Höhna

Phylogenies are central to many research areas in biology and commonly estimated using likelihood-based methods. Unfortunately, any likelihood-based method, including Bayesian inference, can be restrictively slow for large datasets-with many taxa and/or many sites in the sequence alignment-or complex substitutions models. The primary limiting factor when using large datasets and/or complex models in probabilistic phylogenetic analyses is the likelihood calculation, which dominates the total computation time. To address this bottleneck, we incorporated the high-performance phylogenetic library BEAGLE into RevBayes, which enables multi-threading on multi-core CPUs and GPUs, as well as hardware specific vectorized instructions for faster likelihood calculations. Our new implementation of RevBayes+BEAGLE retains the flexibility and dynamic nature that users expect from vanilla RevBayes. In addition, we implemented native parallelization within RevBayes without an external library using the message passing interface (MPI); RevBayes+MPI. We evaluated our new implementation of RevBayes+BEAGLE using multi-threading on CPUs and 2 different powerful GPUs (NVidia Titan V and NVIDIA A100) against our native implementation of RevBayes+MPI. We found good improvements in speedup when multiple cores were used, with up to 20-fold speedup when using multiple CPU cores and over 90-fold speedup when using multiple GPU cores. The improvement depended on the data type used, DNA or amino acids, and the size of the alignment, but less on the size of the tree. We additionally investigated the cost of rescaling partial likelihoods to avoid numerical underflow and showed that unnecessarily frequent and inefficient rescaling can increase runtimes up to 4-fold. Finally, we presented and compared a new approach to store partial likelihoods on branches instead of nodes that can speed up computations up to 1.7 times but comes at twice the memory requirements.

系统发生是生物学许多研究领域的核心,通常使用基于似然法的方法进行估算。遗憾的是,任何基于似然法的方法,包括贝叶斯推断法,对于大型数据集--序列排列中有许多类群和/或许多位点--或复杂的替换模型来说,速度都会非常缓慢。在概率系统发育分析中使用大型数据集和/或复杂模型时,主要的限制因素是似然法计算,它在总计算时间中占主导地位。为了解决这个瓶颈问题,我们将高性能系统发育库 BEAGLE 纳入了 RevBayes,它可以在多核 CPU 和 GPU 上实现多线程,并提供硬件特定的矢量化指令,以加快似然计算速度。我们新的 RevBayes+BEAGLE 实现保留了用户期望从 vanilla RevBayes 中获得的灵活性和动态性。此外,我们还使用消息传递接口(MPI)在 RevBayes 中实现了本地并行化,而无需使用外部库;即 RevBayes+MPI。我们在 CPU 和两种不同的强大 GPU(NVidia Titan V 和 NVIDIA A100)上使用多线程对 RevBayes+BEAGLE 的新实现与 RevBayes+MPI 的本机实现进行了评估。我们发现,在使用多核的情况下,速度提高了很多,使用多 CPU 核时速度提高了 20 倍,使用多 GPU 核时速度提高了 90 多倍。速度的提高取决于所使用的数据类型(DNA 或氨基酸)和排列的大小,但与树的大小关系不大。此外,我们还研究了为避免数值下溢而重新调整部分似然的成本,结果表明,不必要的频繁、低效的重新调整会使运行时间增加多达 4 倍。最后,我们介绍并比较了一种将部分似然存储在分支而非节点上的新方法,这种方法可将计算速度提高 1.7 倍,但内存需求却是原来的两倍。
{"title":"Bayesian Phylogenetic Analysis on Multi-Core Compute Architectures: Implementation and Evaluation of BEAGLE in RevBayes With MPI.","authors":"Killian Smith, Daniel Ayres, René Neumaier, Gert Wörheide, Sebastian Höhna","doi":"10.1093/sysbio/syae005","DOIUrl":"10.1093/sysbio/syae005","url":null,"abstract":"<p><p>Phylogenies are central to many research areas in biology and commonly estimated using likelihood-based methods. Unfortunately, any likelihood-based method, including Bayesian inference, can be restrictively slow for large datasets-with many taxa and/or many sites in the sequence alignment-or complex substitutions models. The primary limiting factor when using large datasets and/or complex models in probabilistic phylogenetic analyses is the likelihood calculation, which dominates the total computation time. To address this bottleneck, we incorporated the high-performance phylogenetic library BEAGLE into RevBayes, which enables multi-threading on multi-core CPUs and GPUs, as well as hardware specific vectorized instructions for faster likelihood calculations. Our new implementation of RevBayes+BEAGLE retains the flexibility and dynamic nature that users expect from vanilla RevBayes. In addition, we implemented native parallelization within RevBayes without an external library using the message passing interface (MPI); RevBayes+MPI. We evaluated our new implementation of RevBayes+BEAGLE using multi-threading on CPUs and 2 different powerful GPUs (NVidia Titan V and NVIDIA A100) against our native implementation of RevBayes+MPI. We found good improvements in speedup when multiple cores were used, with up to 20-fold speedup when using multiple CPU cores and over 90-fold speedup when using multiple GPU cores. The improvement depended on the data type used, DNA or amino acids, and the size of the alignment, but less on the size of the tree. We additionally investigated the cost of rescaling partial likelihoods to avoid numerical underflow and showed that unnecessarily frequent and inefficient rescaling can increase runtimes up to 4-fold. Finally, we presented and compared a new approach to store partial likelihoods on branches instead of nodes that can speed up computations up to 1.7 times but comes at twice the memory requirements.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"455-469"},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alpine Extremophytes in Evolutionary Turmoil: Complex Diversification Patterns and Demographic Responses of a Halophilic Grass in a Central Asian Biodiversity Hotspot. 进化动荡中的高山极端植物:中亚生物多样性热点地区嗜卤禾本科植物的复杂多样性模式和人口响应。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syad073
Anna Wróbel, Ewelina Klichowska, Arkadiusz Nowak, Marcin Nobis

Diversification and demographic responses are key processes shaping species evolutionary history. Yet we still lack a full understanding of ecological mechanisms that shape genetic diversity at different spatial scales upon rapid environmental changes. In this study, we examined genetic differentiation in an extremophilic grass Puccinellia pamirica and factors affecting its population dynamics among the occupied hypersaline alpine wetlands on the arid Pamir Plateau in Central Asia. Using genomic data, we found evidence of fine-scale population structure and gene flow among the localities established across the high-elevation plateau as well as fingerprints of historical demographic expansion. We showed that an increase in the effective population size could coincide with the Last Glacial Period, which was followed by the species demographic decline during the Holocene. Geographic distance plays a vital role in shaping the spatial genetic structure of P. pamirica alongside with isolation-by-environment and habitat fragmentation. Our results highlight a complex history of divergence and gene flow in this species-poor alpine region during the Late Quaternary. We demonstrate that regional climate specificity and a shortage of nonclimate data largely impede predictions of future range changes of the alpine extremophile using ecological niche modeling. This study emphasizes the importance of fine-scale environmental heterogeneity for population dynamics and species distribution shifts.

多样性和人口反应是影响物种进化史的关键过程。然而,我们对环境快速变化时在不同空间尺度上形成遗传多样性的生态机制仍缺乏全面了解。在这项研究中,我们考察了中亚干旱的帕米尔高原上一种嗜极端水草 Puccinellia pamirica 的遗传分化以及影响其种群动态的因素。利用基因组数据,我们发现了在高海拔高原各地建立的精细种群结构和基因流动的证据,以及历史上人口扩张的痕迹。我们的研究表明,有效种群数量的增加可能与末次冰川期相吻合,而在全新世期间,种群数量随之减少。除了环境隔离和栖息地破碎化之外,地理距离对 P. pamirica 的空间遗传结构的形成也起着至关重要的作用。我们的研究结果突显了第四纪晚期这一物种贫乏的高山地区复杂的分化和基因流动历史。我们的研究结果表明,地区气候的特殊性和非气候数据的缺乏在很大程度上阻碍了利用生态位建模预测这种高山极端物种未来分布范围的变化。这项研究强调了细尺度环境异质性对种群动态和物种分布变化的重要性。
{"title":"Alpine Extremophytes in Evolutionary Turmoil: Complex Diversification Patterns and Demographic Responses of a Halophilic Grass in a Central Asian Biodiversity Hotspot.","authors":"Anna Wróbel, Ewelina Klichowska, Arkadiusz Nowak, Marcin Nobis","doi":"10.1093/sysbio/syad073","DOIUrl":"10.1093/sysbio/syad073","url":null,"abstract":"<p><p>Diversification and demographic responses are key processes shaping species evolutionary history. Yet we still lack a full understanding of ecological mechanisms that shape genetic diversity at different spatial scales upon rapid environmental changes. In this study, we examined genetic differentiation in an extremophilic grass Puccinellia pamirica and factors affecting its population dynamics among the occupied hypersaline alpine wetlands on the arid Pamir Plateau in Central Asia. Using genomic data, we found evidence of fine-scale population structure and gene flow among the localities established across the high-elevation plateau as well as fingerprints of historical demographic expansion. We showed that an increase in the effective population size could coincide with the Last Glacial Period, which was followed by the species demographic decline during the Holocene. Geographic distance plays a vital role in shaping the spatial genetic structure of P. pamirica alongside with isolation-by-environment and habitat fragmentation. Our results highlight a complex history of divergence and gene flow in this species-poor alpine region during the Late Quaternary. We demonstrate that regional climate specificity and a shortage of nonclimate data largely impede predictions of future range changes of the alpine extremophile using ecological niche modeling. This study emphasizes the importance of fine-scale environmental heterogeneity for population dynamics and species distribution shifts.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"263-278"},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11282368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139032576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artifactual Orthologs and the Need for Diligent Data Exploration in Complex Phylogenomic Datasets: A Museomic Case Study from the Andean Flora. 在复杂的系统发生组数据集中伪造直系同源物和勤奋数据探索的必要性:来自安第斯植物区系的博物学案例研究。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syad076
Laura A Frost, Ana M Bedoya, Laura P Lagomarsino

The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap in our understanding of the World's richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including the presence of data artifacts. Via by-eye observation of gene trees, detailed examination of warnings from recently improved assembly pipelines, and gene tree filtering, we identified that artifactual orthologs (i.e., the presence of only one copy of a multicopy gene due to differential assembly) were an important source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These artifactual orthologs may be common in plant phylogenomic datasets, where multiple instances of genome duplication are common. After accounting for artifactual orthologs as source of gene tree error, we identified a significant, but nonspecific signal of introgression using Patterson's D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into 9 well-supported subclades whose evolution has been shaped by multiple evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, which are heightened in Andean radiations, and show the impact of filtering data processing artifacts and standard filtering approaches on phylogenetic inference.

南美洲西部的安第斯山脉是全球重要的生物多样性热点地区,但该地区植物支系的系统发生却很少。我们首次提出了以安第斯山脉为中心的云林辐射植物--Freziera(五枫香科)的系统发生,填补了我们对世界上最丰富植物区系了解的一个重要空白。我们的数据集是通过rid-enriched target sequence capture of Angiosperms获得的。这些数据几乎全部来自标本馆标本。我们在 Freziera 中发现了高度的系统发生复杂性,包括数据伪造的存在。通过亲眼观察基因树、详细检查最近改进的组装管道发出的警告以及基因树过滤,我们发现伪造的直系同源物(即由于差异组装导致多拷贝基因只有一个拷贝)是基因树异质性的一个重要来源,对系统发生推断和支持有负面影响。在植物系统发生组数据集中,这些人为的直向同源物可能很常见,因为在植物系统发生组数据集中,多个基因组重复的情况很普遍。在考虑了作为基因树误差来源的伪造直系同源物之后,我们利用 Patterson's D 和 f4 统计发现了一个显著但非特异性的引种信号。尽管系统发生组十分复杂,但我们仍能将 Freziera 分解为 9 个支持度较高的亚支系,其进化受多种进化过程的影响,包括不完全的世系分类、历史基因流和基因复制。我们的研究结果凸显了植物系统发生组学的复杂性,而安第斯地区的辐射则使这种复杂性更加突出,同时也显示了过滤数据处理人工痕迹和标准过滤方法对系统发生推断的影响。
{"title":"Artifactual Orthologs and the Need for Diligent Data Exploration in Complex Phylogenomic Datasets: A Museomic Case Study from the Andean Flora.","authors":"Laura A Frost, Ana M Bedoya, Laura P Lagomarsino","doi":"10.1093/sysbio/syad076","DOIUrl":"10.1093/sysbio/syad076","url":null,"abstract":"<p><p>The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap in our understanding of the World's richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including the presence of data artifacts. Via by-eye observation of gene trees, detailed examination of warnings from recently improved assembly pipelines, and gene tree filtering, we identified that artifactual orthologs (i.e., the presence of only one copy of a multicopy gene due to differential assembly) were an important source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These artifactual orthologs may be common in plant phylogenomic datasets, where multiple instances of genome duplication are common. After accounting for artifactual orthologs as source of gene tree error, we identified a significant, but nonspecific signal of introgression using Patterson's D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into 9 well-supported subclades whose evolution has been shaped by multiple evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, which are heightened in Andean radiations, and show the impact of filtering data processing artifacts and standard filtering approaches on phylogenetic inference.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"308-322"},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139088586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Systematic Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1