首页 > 最新文献

Systematic Biology最新文献

英文 中文
Hierarchical Heuristic Species Delimitation under the Multispecies Coalescent Model with Migration 多物种聚合模型下的分层启发式物种划分与迁移
IF 6.5 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-08-20 DOI: 10.1093/sysbio/syae050
Daniel Kornai, Xiyun Jiao, Jiayi Ji, Tomáš Flouri, Ziheng Yang
The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for one species versus MSC for two species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively heuristic criteria based on population parameters (such as popula- tion split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (𝑔𝑑𝑖) and implement them in a python pipeline called hhsd. We characterize the behavior of the 𝑔𝑑𝑖 under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as three empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to over-splitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.
多物种聚合(MSC)模型可容纳整个基因组的谱系波动,并为近缘物种基因组序列数据的比较分析提供了一个自然框架,以推断物种分化和基因流动的历史。给定一组种群,物种划界(和物种系统发育)的假设可以表述为 MSC 模型的实例(例如,一个物种的 MSC 与两个物种的 MSC),并使用贝叶斯模型选择法进行比较。这种方法已在 bpp 程序中实现,但发现容易造成过度分裂。另一种方法是根据基因组数据估算出的种群参数(如种群分裂时间、种群大小和迁移率),采用启发式标准来划分物种。在此,我们基于系谱学分歧指数(𝑔𝑑𝑖)开发了启发式物种划界的分层合并与拆分算法,并在名为 hhsd 的 python 管道中加以实现。我们描述了几种简单的基因流动情况下 𝑔𝑖𝑑的行为特征。我们将新方法应用于在距离隔离模型下模拟的数据集以及三个经验数据集。我们的测试表明,新方法产生了合理的结果,而且不容易出现过度分裂。我们讨论了在分层算法中容纳旁系物种的可能策略,以及基于启发式标准的物种划分所面临的挑战。
{"title":"Hierarchical Heuristic Species Delimitation under the Multispecies Coalescent Model with Migration","authors":"Daniel Kornai, Xiyun Jiao, Jiayi Ji, Tomáš Flouri, Ziheng Yang","doi":"10.1093/sysbio/syae050","DOIUrl":"https://doi.org/10.1093/sysbio/syae050","url":null,"abstract":"The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for one species versus MSC for two species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively heuristic criteria based on population parameters (such as popula- tion split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (𝑔𝑑𝑖) and implement them in a python pipeline called hhsd. We characterize the behavior of the 𝑔𝑑𝑖 under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as three empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to over-splitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142045640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phylogenomics and pervasive genome-wide phylogenetic discordance among fin whales (Balaenoptera physalus). 长须鲸(Balaenoptera physalus)的系统发生组学和普遍的全基因组系统发生不一致。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-08-19 DOI: 10.1093/sysbio/syae049
Fabricio Furni, Eduardo R Secchi, Camilla Speller, Daniel DenDanto, Christian Ramp, Finn Larsen, Sally Mizroch, Jooke Robbins, Richard Sears, Jorge Urbán R, Martine Bérubé, Per J Palsbøll

Phylogenomics has the power to uncover complex phylogenetic scenarios across the genome. In most cases, no single topology is reflected across the entire genome as the phylogenetic signal differs among genomic regions due to processes, such as introgression and incomplete lineage sorting. Baleen whales are among the largest vertebrates on Earth with a high dispersal potential in a relatively unrestricted habitat, the oceans. The fin whale (Balaenoptera physalus) is one of the most enigmatic baleen whale species, currently divided into four subspecies. It has been a matter of debate whether phylogeographic patterns explain taxonomic variation in fin whales. Here we present a chromosome-level whole genome analysis of the phylogenetic relationships among fin whales from multiple ocean basins. First, we estimated concatenated and consensus phylogenies for both the mitochondrial and nuclear genomes. The consensus phylogenies based upon the autosomal genome uncovered monophyletic clades associated with each ocean basin, aligning with the current understanding of subspecies division. Nevertheless, discordances were detected in the phylogenies based on the Y chromosome, mitochondrial genome, autosomal genome and X chromosome. Furthermore, we detected signs of introgression and pervasive phylogenetic discordance across the autosomal genome. This complex phylogenetic scenario could be explained by a puzzle of introgressive events, not yet documented in fin whales. Similarly, incomplete lineage sorting and low phylogenetic signal could lead to such phylogenetic discordances. Our study reinforces the pitfalls of relying on concatenated or single locus phylogenies to determine taxonomic relationships below the species level by illustrating the underlying nuances which some phylogenetic approaches may fail to capture. We emphasize the significance of accurate taxonomic delineation in fin whales by exploring crucial information revealed through genome-wide assessments.

系统发生组学能够揭示整个基因组的复杂系统发生情况。在大多数情况下,没有一种拓扑结构能反映整个基因组的情况,因为基因组区域之间的系统发育信号因引入和不完全世系分类等过程而有所不同。须鲸是地球上最大的脊椎动物之一,在相对不受限制的栖息地--海洋中具有很高的扩散潜力。长须鲸(Balaenoptera physalus)是最神秘的须鲸物种之一,目前分为四个亚种。系统地理学模式是否能解释长须鲸的分类变异一直是一个争论不休的问题。在此,我们对来自多个大洋盆地的长须鲸之间的系统发育关系进行了染色体水平的全基因组分析。首先,我们估算了线粒体和核基因组的连接系统进化和共识系统进化。基于常染色体基因组的共识系统发生发现了与各大洋盆地相关的单系支系,这与目前对亚种划分的理解一致。然而,在基于 Y 染色体、线粒体基因组、常染色体基因组和 X 染色体的系统发生中发现了不一致。此外,我们还在常染色体基因组中发现了引入的迹象和普遍的系统发育不协调。这种复杂的系统发育情况可以用长须鲸中还没有记录到的内传事件来解释。同样,不完全的世系分类和较低的系统发育信号也可能导致这种系统发育不一致。我们的研究通过说明一些系统发育方法可能无法捕捉到的潜在细微差别,强化了依靠连接或单位点系统发育来确定物种水平以下分类学关系的缺陷。我们通过探讨全基因组评估所揭示的关键信息,强调了准确划分长须鲸分类的重要性。
{"title":"Phylogenomics and pervasive genome-wide phylogenetic discordance among fin whales (Balaenoptera physalus).","authors":"Fabricio Furni, Eduardo R Secchi, Camilla Speller, Daniel DenDanto, Christian Ramp, Finn Larsen, Sally Mizroch, Jooke Robbins, Richard Sears, Jorge Urbán R, Martine Bérubé, Per J Palsbøll","doi":"10.1093/sysbio/syae049","DOIUrl":"https://doi.org/10.1093/sysbio/syae049","url":null,"abstract":"<p><p>Phylogenomics has the power to uncover complex phylogenetic scenarios across the genome. In most cases, no single topology is reflected across the entire genome as the phylogenetic signal differs among genomic regions due to processes, such as introgression and incomplete lineage sorting. Baleen whales are among the largest vertebrates on Earth with a high dispersal potential in a relatively unrestricted habitat, the oceans. The fin whale (Balaenoptera physalus) is one of the most enigmatic baleen whale species, currently divided into four subspecies. It has been a matter of debate whether phylogeographic patterns explain taxonomic variation in fin whales. Here we present a chromosome-level whole genome analysis of the phylogenetic relationships among fin whales from multiple ocean basins. First, we estimated concatenated and consensus phylogenies for both the mitochondrial and nuclear genomes. The consensus phylogenies based upon the autosomal genome uncovered monophyletic clades associated with each ocean basin, aligning with the current understanding of subspecies division. Nevertheless, discordances were detected in the phylogenies based on the Y chromosome, mitochondrial genome, autosomal genome and X chromosome. Furthermore, we detected signs of introgression and pervasive phylogenetic discordance across the autosomal genome. This complex phylogenetic scenario could be explained by a puzzle of introgressive events, not yet documented in fin whales. Similarly, incomplete lineage sorting and low phylogenetic signal could lead to such phylogenetic discordances. Our study reinforces the pitfalls of relying on concatenated or single locus phylogenies to determine taxonomic relationships below the species level by illustrating the underlying nuances which some phylogenetic approaches may fail to capture. We emphasize the significance of accurate taxonomic delineation in fin whales by exploring crucial information revealed through genome-wide assessments.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142000709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Persistent Gene Flow Suggests an Absence of Reproductive Isolation in an African Antelope Speciation Model. 基因持续流动表明非洲羚羊物种模式中不存在生殖隔离
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-08-14 DOI: 10.1093/sysbio/syae037
Xi Wang, Casper-Emil Tingskov Pedersen, Georgios Athanasiadis, Genís Garcia-Erill, Kristian Hanghøj, Laura D Bertola, Malthe Sebro Rasmussen, Mikkel Schubert, Xiaodong Liu, Zilong Li, Long Lin, Renzo F Balboa, Emil Jørsboe, Casia Nursyifa, Shanlin Liu, Vincent Muwanika, Charles Masembe, Lei Chen, Wen Wang, Ida Moltke, Hans R Siegismund, Anders Albrechtsen, Rasmus Heller

African antelope diversity is a globally unique vestige of a much richer world-wide Pleistocene megafauna. Despite this, the evolutionary processes leading to the prolific radiation of African antelopes are not well understood. Here, we sequenced 145 whole genomes from both subspecies of the waterbuck (Kobus ellipsiprymnus), an African antelope believed to be in the process of speciation. We investigated genetic structure and population divergence and found evidence of a mid-Pleistocene separation on either side of the eastern Great Rift Valley, consistent with vicariance caused by a rain shadow along the so-called 'Kingdon's Line'. However, we also found pervasive evidence of both recent and widespread historical gene flow across the Rift Valley barrier. By inferring the genome-wide landscape of variation among subspecies, we found 14 genomic regions of elevated differentiation, including a locus that may be related to each subspecies' distinctive coat pigmentation pattern. We investigated these regions as candidate speciation islands. However, we observed no significant reduction in gene flow in these regions, nor any indications of selection against hybrids. Altogether, these results suggest a pattern whereby climatically driven vicariance is the most important process driving the African antelope radiation, and suggest that reproductive isolation may not set in until very late in the divergence process. This has a significant impact on taxonomic inference, as many taxa will be in a gray area of ambiguous systematic status, possibly explaining why it has been hard to achieve consensus regarding the species status of many African antelopes. Our analyses demonstrate how population genetics based on low-depth whole genome sequencing can provide new insights that can help resolve how far lineages have gone along the path to speciation.

非洲羚羊的多样性是全球独一无二的遗存,它是世界上更丰富的更新世巨型动物的遗存。尽管如此,人们对导致非洲羚羊大量辐射的进化过程仍不甚了解。在这里,我们对水鹿(Kobus ellipsiprymnus)两个亚种的 145 个全基因组进行了测序。我们对遗传结构和种群分化进行了研究,发现了早在始新世中期大裂谷东部两侧出现分离的证据,这与所谓的 "金顿线 "沿线雨影造成的沧桑变化是一致的。不过,我们也发现了近期和历史上跨越大裂谷屏障广泛基因流动的普遍证据。通过推断亚种间的全基因组变异景观,我们发现了 14 个基因组分化加剧的区域,其中包括一个可能与每个亚种独特的皮毛色素模式有关的位点。我们将这些区域作为候选物种岛进行了研究。然而,我们在这些区域没有观察到基因流的明显减少,也没有观察到针对杂交的选择迹象。总之,这些结果表明,气候驱动的沧海桑田是推动非洲羚羊辐射的最重要过程,并表明生殖隔离可能要到分化过程的后期才会出现。这对分类推断有重大影响,因为许多类群将处于系统地位不明确的灰色区域,这可能解释了为什么许多非洲羚羊的物种地位难以达成共识。我们的分析表明,基于低深度全基因组测序的种群遗传学可以提供新的见解,帮助确定各系在物种形成的道路上已经走了多远。
{"title":"Persistent Gene Flow Suggests an Absence of Reproductive Isolation in an African Antelope Speciation Model.","authors":"Xi Wang, Casper-Emil Tingskov Pedersen, Georgios Athanasiadis, Genís Garcia-Erill, Kristian Hanghøj, Laura D Bertola, Malthe Sebro Rasmussen, Mikkel Schubert, Xiaodong Liu, Zilong Li, Long Lin, Renzo F Balboa, Emil Jørsboe, Casia Nursyifa, Shanlin Liu, Vincent Muwanika, Charles Masembe, Lei Chen, Wen Wang, Ida Moltke, Hans R Siegismund, Anders Albrechtsen, Rasmus Heller","doi":"10.1093/sysbio/syae037","DOIUrl":"https://doi.org/10.1093/sysbio/syae037","url":null,"abstract":"<p><p>African antelope diversity is a globally unique vestige of a much richer world-wide Pleistocene megafauna. Despite this, the evolutionary processes leading to the prolific radiation of African antelopes are not well understood. Here, we sequenced 145 whole genomes from both subspecies of the waterbuck (Kobus ellipsiprymnus), an African antelope believed to be in the process of speciation. We investigated genetic structure and population divergence and found evidence of a mid-Pleistocene separation on either side of the eastern Great Rift Valley, consistent with vicariance caused by a rain shadow along the so-called 'Kingdon's Line'. However, we also found pervasive evidence of both recent and widespread historical gene flow across the Rift Valley barrier. By inferring the genome-wide landscape of variation among subspecies, we found 14 genomic regions of elevated differentiation, including a locus that may be related to each subspecies' distinctive coat pigmentation pattern. We investigated these regions as candidate speciation islands. However, we observed no significant reduction in gene flow in these regions, nor any indications of selection against hybrids. Altogether, these results suggest a pattern whereby climatically driven vicariance is the most important process driving the African antelope radiation, and suggest that reproductive isolation may not set in until very late in the divergence process. This has a significant impact on taxonomic inference, as many taxa will be in a gray area of ambiguous systematic status, possibly explaining why it has been hard to achieve consensus regarding the species status of many African antelopes. Our analyses demonstrate how population genetics based on low-depth whole genome sequencing can provide new insights that can help resolve how far lineages have gone along the path to speciation.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141976713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PhyloJunction: a computational framework for simulating, developing, and teaching evolutionary models. PhyloJunction:模拟、开发和教授进化模型的计算框架。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-08-08 DOI: 10.1093/sysbio/syae048
F Abio K Mendes, Michael J Landis

We introduce PhyloJunction, a computational framework designed to facilitate the prototyping, test- ing, and characterization of evolutionary models. PhyloJunction is distributed as an open-source Python library that can be used to implement a variety of models, thanks to its flexible graphical modeling architecture and dedicated model specification language. Model design and use are exposed to users via command-line and graphical interfaces, which integrate the steps of simulating, summarizing, and visualizing data. This paper describes the features of PhyloJunction - which include, but are not limited to, a general implementation of a popular family of phylogenetic diversification models - and, moving forward, how it may be expanded to not only include new models, but to also become a platform for conducting and teaching statistical learning.

我们介绍的 PhyloJunction 是一个计算框架,旨在促进进化模型的原型设计、测试和表征。PhyloJunction 以开源 Python 库的形式发布,其灵活的图形建模架构和专用的模型规范语言可用于实现各种模型。用户可通过命令行和图形界面进行模型设计和使用,这些界面集成了模拟、汇总和可视化数据等步骤。本文介绍了 PhyloJunction 的特点(包括但不限于流行的系统发育多样化模型系列的一般实现),以及如何将其扩展到不仅包括新模型,而且成为进行统计学习和教学的平台。
{"title":"PhyloJunction: a computational framework for simulating, developing, and teaching evolutionary models.","authors":"F Abio K Mendes, Michael J Landis","doi":"10.1093/sysbio/syae048","DOIUrl":"10.1093/sysbio/syae048","url":null,"abstract":"<p><p>We introduce PhyloJunction, a computational framework designed to facilitate the prototyping, test- ing, and characterization of evolutionary models. PhyloJunction is distributed as an open-source Python library that can be used to implement a variety of models, thanks to its flexible graphical modeling architecture and dedicated model specification language. Model design and use are exposed to users via command-line and graphical interfaces, which integrate the steps of simulating, summarizing, and visualizing data. This paper describes the features of PhyloJunction - which include, but are not limited to, a general implementation of a popular family of phylogenetic diversification models - and, moving forward, how it may be expanded to not only include new models, but to also become a platform for conducting and teaching statistical learning.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141902966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive radiation without independent stages of trait evolution in a group of Caribbean anoles. 加勒比鼹鼠群中没有独立性状进化阶段的适应性辐射。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-08-02 DOI: 10.1093/sysbio/syae041
Brooke Bodensteiner, Edward D Burress, Martha M Muñoz

Adaptive radiation involves diversification along multiple trait axes, producing phenotypically diverse, species-rich lineages. Theory generally predicts that multi-trait evolution occurs via a 'stages' model, with some traits saturating early in a lineage's history, and others diversifying later. Despite its multidimensional nature, however, we know surprisingly little about how different suites of traits evolve during adaptive radiation. Here, we investigated the rate, pattern, and timing of morphological and physiological evolution in the anole lizard adaptive radiation from the Caribbean island of Hispaniola. Rates and patterns of morphological and physiological diversity are largely unaligned, corresponding to independent selective pressures associated with structural and thermal niches. Cold tolerance evolution reflects parapatric divergence across elevation, rather than niche partitioning within communities. Heat tolerance evolution and the preferred temperature evolve more slowly than cold tolerance, reflecting behavioral buffering, particularly in edge-habitat species (a pattern associated with the Bogert effect). In contrast to the nearby island of Puerto Rico, closely related anoles on Hispaniola do not sympatrically partition thermal niche space. Instead, allopatric and parapatric separation across biogeographic and environmental boundaries serves to keep morphologically similar close relatives apart. The phenotypic diversity of this island's adaptive radiation accumulated largely as a by-product of time, with surprisingly few exceptional pulses of trait evolution. A better understanding of the processes that guide multidimensional trait evolution (and nuance therein) will prove key in determining whether the stages model should be considered a common theme of adaptive radiation.

适应性辐射涉及沿多个性状轴的多样化,从而产生表型多样、物种丰富的品系。理论通常预测,多性状进化是通过一个 "阶段 "模型发生的,一些性状在一个品系历史的早期达到饱和,而另一些则在后期多样化。尽管多性状进化具有多维性,但我们对适应性辐射过程中不同性状组合如何进化却知之甚少。在这里,我们研究了来自加勒比海伊斯帕尼奥拉岛的无尾蜥适应性辐射中形态和生理进化的速度、模式和时间。形态和生理多样性的速率和模式在很大程度上是不一致的,这与结构和热环境相关的独立选择压力相对应。耐寒性的进化反映了跨海拔的同域分化,而不是群落内部的生态位划分。耐热性和喜好温度的进化比耐寒性慢,这反映了行为缓冲,特别是在边缘栖息地物种中(与博格特效应相关的模式)。与附近的波多黎各岛不同,伊斯帕尼奥拉岛上亲缘关系密切的鼹鼠并不以同域方式划分热生态位空间。相反,跨越生物地理学和环境边界的同域和旁域分离使形态上相似的近缘种保持分离。该岛适应性辐射的表型多样性在很大程度上是随着时间的推移而积累起来的,其性状进化的特殊脉冲少得令人吃惊。更好地了解引导多维性状进化的过程(以及其中的细微差别)将被证明是确定阶段模型是否应被视为适应性辐射共同主题的关键。
{"title":"Adaptive radiation without independent stages of trait evolution in a group of Caribbean anoles.","authors":"Brooke Bodensteiner, Edward D Burress, Martha M Muñoz","doi":"10.1093/sysbio/syae041","DOIUrl":"https://doi.org/10.1093/sysbio/syae041","url":null,"abstract":"<p><p>Adaptive radiation involves diversification along multiple trait axes, producing phenotypically diverse, species-rich lineages. Theory generally predicts that multi-trait evolution occurs via a 'stages' model, with some traits saturating early in a lineage's history, and others diversifying later. Despite its multidimensional nature, however, we know surprisingly little about how different suites of traits evolve during adaptive radiation. Here, we investigated the rate, pattern, and timing of morphological and physiological evolution in the anole lizard adaptive radiation from the Caribbean island of Hispaniola. Rates and patterns of morphological and physiological diversity are largely unaligned, corresponding to independent selective pressures associated with structural and thermal niches. Cold tolerance evolution reflects parapatric divergence across elevation, rather than niche partitioning within communities. Heat tolerance evolution and the preferred temperature evolve more slowly than cold tolerance, reflecting behavioral buffering, particularly in edge-habitat species (a pattern associated with the Bogert effect). In contrast to the nearby island of Puerto Rico, closely related anoles on Hispaniola do not sympatrically partition thermal niche space. Instead, allopatric and parapatric separation across biogeographic and environmental boundaries serves to keep morphologically similar close relatives apart. The phenotypic diversity of this island's adaptive radiation accumulated largely as a by-product of time, with surprisingly few exceptional pulses of trait evolution. A better understanding of the processes that guide multidimensional trait evolution (and nuance therein) will prove key in determining whether the stages model should be considered a common theme of adaptive radiation.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141879499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Inference Under the Multispecies Coalescent with Ancient DNA Sequences. 古 DNA 序列多物种聚合下的贝叶斯推断。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-30 DOI: 10.1093/sysbio/syae047
Anna A Nagel, Tomáš Flouri, Ziheng Yang, Bruce Rannala

Ancient DNA (aDNA) is increasingly being used to investigate questions such as the phylogenetic relationships and divergence times of extant and extinct species. If aDNA samples are sufficiently old, expected branch lengths (in units of nucleotide substitutions) are reduced relative to contemporary samples. This can be accounted for by incorporating sample ages into phylogenetic analyses. Existing methods that use tip (sample) dates infer gene trees rather than species trees, which can lead to incorrect or biased inferences of the species tree. Methods using a multispecies coalescent (MSC) model overcome these issues. We developed an MSC model with tip dates and implemented it in the program bpp. The method performed well for a range of biologically realistic scenarios, estimating calibrated divergence times and mutation rates precisely. Simulations suggest that estimation precision can be best improved by prioritizing sampling of many loci and more ancient samples. Incorrectly treating ancient samples as contemporary in analyzing simulated data, mimicking a common practice of empirical analyses, led to large systematic biases in model parameters, including divergence times. Two genomic datasets of mammoths and elephants were analyzed, demonstrating the method's empirical utility.

古DNA(aDNA)越来越多地被用于研究现存和灭绝物种的系统发育关系和分化时间等问题。如果 aDNA 样本的年代足够久远,其预期分支长度(以核苷酸替换为单位)相对于当代样本会有所缩短。这可以通过将样本年龄纳入系统发生学分析来解释。现有方法使用尖端(样本)日期推断基因树而不是物种树,这可能导致物种树推断不正确或有偏差。使用多物种凝聚(MSC)模型的方法可以克服这些问题。我们开发了一个带有尖端日期的 MSC 模型,并在 bpp 程序中实现了该模型。该方法在一系列生物现实场景中表现良好,精确估计了校准的分化时间和突变率。模拟结果表明,提高估计精度的最佳方法是优先对许多位点和更古老的样本进行采样。在分析模拟数据时,模仿经验分析的常见做法,不正确地将远古样本视为当代样本,会导致包括分化时间在内的模型参数出现较大的系统性偏差。分析了猛犸象和大象的两个基因组数据集,证明了该方法的经验实用性。
{"title":"Bayesian Inference Under the Multispecies Coalescent with Ancient DNA Sequences.","authors":"Anna A Nagel, Tomáš Flouri, Ziheng Yang, Bruce Rannala","doi":"10.1093/sysbio/syae047","DOIUrl":"https://doi.org/10.1093/sysbio/syae047","url":null,"abstract":"<p><p>Ancient DNA (aDNA) is increasingly being used to investigate questions such as the phylogenetic relationships and divergence times of extant and extinct species. If aDNA samples are sufficiently old, expected branch lengths (in units of nucleotide substitutions) are reduced relative to contemporary samples. This can be accounted for by incorporating sample ages into phylogenetic analyses. Existing methods that use tip (sample) dates infer gene trees rather than species trees, which can lead to incorrect or biased inferences of the species tree. Methods using a multispecies coalescent (MSC) model overcome these issues. We developed an MSC model with tip dates and implemented it in the program bpp. The method performed well for a range of biologically realistic scenarios, estimating calibrated divergence times and mutation rates precisely. Simulations suggest that estimation precision can be best improved by prioritizing sampling of many loci and more ancient samples. Incorrectly treating ancient samples as contemporary in analyzing simulated data, mimicking a common practice of empirical analyses, led to large systematic biases in model parameters, including divergence times. Two genomic datasets of mammoths and elephants were analyzed, demonstrating the method's empirical utility.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141793571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Effect of Copy Number Hemiplasy on Gene Family Evolution. 拷贝数半重复对基因家族进化的影响
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syae007
Qiuyi Li, Yao-Ban Chan, Nicolas Galtier, Celine Scornavacca

The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss, and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempts to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper, we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models-multilocus multispecies coalescent (MLMSC), which models CNH, and duplication, loss, and coalescence (DLCoal), which does not-approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.

基因家族的进化是复杂的,涉及基因水平的进化事件,如基因复制、水平基因转移和基因丢失(DTL),以及其他过程,如不完全世系分类(ILS)。因此,基因树和物种树之间往往存在拓扑差异。为了解释这些差异,人们最近建立了一些模型,其中最现实的模型试图同时考虑基因水平事件和 ILS。当把 ILS 和基因水平事件统一到一个模型中时,它们之间的相互作用会导致基因拷贝数的多态性,我们称之为拷贝数半同源(CNH)。在本文中,我们扩展了赖特-费舍过程,使其包括多个物种的复制和丢失,并证明这一过程的 CNH 概率可能很大。我们研究了两种统一模型--模拟 CNH 的 MLMSC(多焦点多物种凝聚)和不模拟 CNH 的 DLCoal(复制、丢失和凝聚)--在多大程度上近似了包含复制和丢失的 Wright-Fisher 过程。然后,我们通过比较 MLMSC 和 DLCoal,研究了 CNH 对基因家族演化的影响。在这两种模型下,我们生成的基因树具有可比性,但在各种汇总统计中显示出显著差异;最重要的是,CNH 大大减少了基因拷贝数。如果不考虑这一点,传统的重复率估算方法(通过计算基因拷贝数)就会变得不准确。模拟的基因树还被用于使用 ASTRAL 和 ASTRAL-Pro 方法进行物种树推断,结果表明,基于真实数据校准的无 CNH 感知模拟的准确性可能被高估了。
{"title":"The Effect of Copy Number Hemiplasy on Gene Family Evolution.","authors":"Qiuyi Li, Yao-Ban Chan, Nicolas Galtier, Celine Scornavacca","doi":"10.1093/sysbio/syae007","DOIUrl":"10.1093/sysbio/syae007","url":null,"abstract":"<p><p>The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss, and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempts to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper, we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models-multilocus multispecies coalescent (MLMSC), which models CNH, and duplication, loss, and coalescence (DLCoal), which does not-approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139707930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAST: Phylogenetic Inference with Mixtures Across Sites and Trees. MAST:利用跨位点和树的混合物进行系统发育推断。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syae008
Thomas K F Wong, Caitlin Cherryh, Allen G Rodrigo, Matthew W Hahn, Bui Quang Minh, Robert Lanfear

Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.

在现代系统发生组学研究中,通常会用到成百上千个基因位点。树推断的串联方法假定整个数据集存在单一拓扑结构,但不同的基因位点可能由于不完全的品系分类、引种和/或水平基因转移而具有不同的进化历史;甚至单个基因位点也可能由于重组而不是树状的。为了克服这一缺陷,我们引入了一个多树混合模型的实现方法,我们称之为 MAST。该模型扩展了 Boussau 等人(2009 年)的先前实现,允许用户在单个比对中估算一组预先指定的分叉树中每棵树的权重。MAST 模型允许每棵树都有自己的权重、拓扑结构、分支长度、替代模型、核苷酸或氨基酸频率以及跨位点的速率异质性模型。我们在流行的系统发生学软件 IQ-TREE 的最大似然法框架内实现了 MAST 模型。模拟结果表明,我们可以在多种生物现实场景下准确地恢复真实的模型参数,包括给定树拓扑的分支长度和树权重。我们还证明,在多树模型下模拟数据时,我们可以使用标准的统计推断方法来拒绝单树模型(反之亦然)。我们将 MAST 模型应用于多个灵长类动物数据集,发现它可以恢复类人猿不完全的血统分类信号,以及由多个猕猴物种间的引种引起的次要树的不对称性。当应用于一个由四个钝齿类物种组成的数据集时,我们观察到,MAST给予基因树方法也支持的树以最高权重(即最大比例的位点)。这些结果表明,MAST 模型能够使用最大似然法分析连接比对,同时避免了假设只有一棵树所带来的一些偏差。我们将讨论未来如何扩展 MAST 模型。
{"title":"MAST: Phylogenetic Inference with Mixtures Across Sites and Trees.","authors":"Thomas K F Wong, Caitlin Cherryh, Allen G Rodrigo, Matthew W Hahn, Bui Quang Minh, Robert Lanfear","doi":"10.1093/sysbio/syae008","DOIUrl":"10.1093/sysbio/syae008","url":null,"abstract":"<p><p>Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11282360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139991260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Considering Decoupled Phenotypic Diversification Between Ontogenetic Phases in Macroevolution: An Example Using Triggerfishes (Balistidae). 考虑宏观进化中本体发育阶段之间的脱钩表型多样化:以触发鱼(Balistidae)为例。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syae014
Alex Dornburg, Katerina L Zapfe, Rachel Williams, Michael E Alfaro, Richard Morris, Haruka Adachi, Joseph Flores, Francesco Santini, Thomas J Near, Bruno Frédérich

Across the Tree of Life, most studies of phenotypic disparity and diversification have been restricted to adult organisms. However, many lineages have distinct ontogenetic phases that differ from their adult forms in morphology and ecology. Focusing disproportionately on the evolution of adult forms unnecessarily hinders our understanding of the pressures shaping evolution over time. Non-adult disparity patterns are particularly important to consider for coastal ray-finned fishes, which can have juvenile phases with distinct phenotypes. These juvenile forms are often associated with sheltered nursery environments, with phenotypic shifts between adults and juvenile stages that are readily apparent in locomotor morphology. Whether this ontogenetic variation in locomotor morphology reflects a decoupling of diversification dynamics between life stages remains unknown. Here we investigate the evolutionary dynamics of locomotor morphology between adult and juvenile triggerfishes. We integrate a time-calibrated phylogenetic framework with geometric morphometric approaches and measurement data of fin aspect ratio and incidence, and reveal a mismatch between morphospace occupancy, the evolution of morphological disparity, and the tempo of trait evolution between life stages. Collectively, our results illuminate how the heterogeneity of morpho-functional adaptations can decouple the mode and tempo of morphological diversification between ontogenetic stages.

在整个生命之树上,对表型差异和多样化的研究大多局限于成体生物。然而,许多生物系都有不同的发育阶段,在形态和生态学上都不同于成体。过分关注成体生物的进化,会不必要地阻碍我们对影响生物随时间进化的压力的理解。对于沿海鳐形鱼类来说,非成体差异模式尤其重要,因为这些鱼类的幼体阶段往往具有不同的表型。这些幼鱼形态通常与隐蔽的育苗环境有关,成鱼和幼鱼阶段的表型变化在运动形态上很容易看出来。运动形态的这种本体变异是否反映了生命阶段之间多样化动态的脱钩,目前仍不清楚。在这里,我们研究了鲀成鱼和幼鱼之间运动形态的进化动态。我们将时间校准的系统发生学框架与几何形态计量学方法以及鳍长宽比和入射率的测量数据相结合,揭示了形态空间占据、形态差异演化以及生命阶段间性状演化速度之间的不匹配。总之,我们的研究结果阐明了形态功能适应的异质性如何使本体发育阶段之间形态多样化的模式和速度脱钩。
{"title":"Considering Decoupled Phenotypic Diversification Between Ontogenetic Phases in Macroevolution: An Example Using Triggerfishes (Balistidae).","authors":"Alex Dornburg, Katerina L Zapfe, Rachel Williams, Michael E Alfaro, Richard Morris, Haruka Adachi, Joseph Flores, Francesco Santini, Thomas J Near, Bruno Frédérich","doi":"10.1093/sysbio/syae014","DOIUrl":"10.1093/sysbio/syae014","url":null,"abstract":"<p><p>Across the Tree of Life, most studies of phenotypic disparity and diversification have been restricted to adult organisms. However, many lineages have distinct ontogenetic phases that differ from their adult forms in morphology and ecology. Focusing disproportionately on the evolution of adult forms unnecessarily hinders our understanding of the pressures shaping evolution over time. Non-adult disparity patterns are particularly important to consider for coastal ray-finned fishes, which can have juvenile phases with distinct phenotypes. These juvenile forms are often associated with sheltered nursery environments, with phenotypic shifts between adults and juvenile stages that are readily apparent in locomotor morphology. Whether this ontogenetic variation in locomotor morphology reflects a decoupling of diversification dynamics between life stages remains unknown. Here we investigate the evolutionary dynamics of locomotor morphology between adult and juvenile triggerfishes. We integrate a time-calibrated phylogenetic framework with geometric morphometric approaches and measurement data of fin aspect ratio and incidence, and reveal a mismatch between morphospace occupancy, the evolution of morphological disparity, and the tempo of trait evolution between life stages. Collectively, our results illuminate how the heterogeneity of morpho-functional adaptations can decouple the mode and tempo of morphological diversification between ontogenetic stages.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140137307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the gold standard in NCBI GenBank and related databases: DNA sequences from type specimens and type strains. 模式标本和类型菌株的DNA序列——如何增加它们的数量并改进它们在NCBI GenBank和相关数据库中的注释。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-07-27 DOI: 10.1093/sysbio/syad068
Susanne S Renner, Mark D Scherz, Conrad L Schoch, Marc Gottschling, Miguel Vences

Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 739 species of arthropods, 1542 vertebrates, and 125 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.

学名使人类和搜索引擎能够获取有关我们周围生物多样性的知识,与DNA序列相关的名称在搜索和匹配鉴定程序中发挥着越来越大的作用。在这里,我们分析了国家生物技术信息中心(NCBI)的用户和管理者是如何标记和管理来自命名型材料的序列的,从长远来看,这是提高dna鉴定质量的唯一途径。对于原核生物,NCBI工作人员已经整理了18281个类型菌株的基因组组合,提高了原核生物命名的质量。对于真菌来说,代表超过21000个物种的类型衍生序列现在对于真菌的命名和鉴定是必不可少的。然而,对于剩余的真核生物,可识别为类型衍生的序列数量很少,仅代表1,000种节肢动物,8,441种脊椎动物和430种胚胎植物。这类序列的生产和管理的增加将来自于(i)博物馆收藏的类型或拓扑标本的测序,(ii) 2023年3月国际核苷酸序列数据库协作规则的变化,需要更多的标本元数据,以及(iii)数据提交者为促进管理所做的努力,包括告知NCBI馆长标本的类型状态。我们说明了不同类型数据提交过程,并提供了来自一系列生物体的最佳实践示例。扩大DNA数据库中类型衍生序列的数量,特别是真核生物的类型衍生序列,对于捕获、记录和保护生物多样性至关重要。
{"title":"Improving the gold standard in NCBI GenBank and related databases: DNA sequences from type specimens and type strains.","authors":"Susanne S Renner, Mark D Scherz, Conrad L Schoch, Marc Gottschling, Miguel Vences","doi":"10.1093/sysbio/syad068","DOIUrl":"10.1093/sysbio/syad068","url":null,"abstract":"<p><p>Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 739 species of arthropods, 1542 vertebrates, and 125 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92156794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Systematic Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1