Systematic Biology最新文献_第6页

Introgressed variants obscure phylogenetic relationships but are not subject to positive selection in Australasian long-tailed parrots. 在澳大利亚长尾鹦鹉中，渐渗变异模糊了系统发育关系，但不受正选择的影响。

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-27 DOI: 10.1093/sysbio/syaf066

Brian Tilston Smith,Agusto Luzuriaga-Neira,David Alvarez-Ponce,Kaiya L Provost,Gregory Thom,Leo Joseph

Gene flow often obscures phylogenetic relationships but the evolutionary significance of introgressed variants is unclear. Here we examine the Australasian long-tailed parrots (Psittaculinae: Polytelini) in which an unexpected sister relationship between Polytelis alexandrae and the genus Aprosmictus, and not the other Polytelis species, has been observed. We tested whether this relationship was due to ancient introgression in whole genomes. We found that the majority of gene trees had Ap. erythropterus and P. alexandrae as sister taxa, whereas network analysis indicated monophyly of Polytelis, 48% of gene trees being in phylogenetic conflict due to introgression from Ap. erythropterus into P. alexandrae. Further analyses confidently confirmed that 4-8% of the genome of P. alexandrae was introgressed from Ap. erythropterus with signals of gene flow occurring throughout the genome. These findings indicate that topologies with P. alexandrae and the genus Ap. erythropterus as sister taxa were biased by gene flow and affirm that Polytelis is monophyletic. Next, we assessed the evolutionary outcomes for introgressed variants and found that, among introgressed protein-coding genes, only two (0.8%) were under positive selection, in comparison to 99 (1.7%) of non-introgressed genes. Our results indicate that, despite the ubiquity of detectable introgression in phylogenies, many genetic variants flowing between species may play a minor role in molecular adaptations.

基因流动往往模糊了系统发育关系，但渐渗变异的进化意义尚不清楚。在这里，我们研究了澳大利亚长尾鹦鹉（长尾鹦鹉科：长尾鹦鹉），其中发现了一种意想不到的姐妹关系，即Polytelis alexandrae和approsmictus属，而不是其他长尾鹦鹉物种。我们测试了这种关系是否是由于整个基因组的古代基因渗入。结果表明，大部分基因树的姊妹类群为红翼藓和亚历山大山藓，而网络分析结果显示，红翼藓和亚历山大山藓是单系的，48%的基因树由于从红翼藓向亚历山大山藓的渗透而发生系统发育冲突。进一步的分析证实了4-8%的alexandrae基因组是由app . erythropterus渗入的，基因流动的信号发生在整个基因组中。这些研究结果表明，以alexandrae和Ap. erythropterus为姐妹类群的拓扑结构受到基因流的影响，证实了Polytelis是单系的。接下来，我们评估了渐渗变异的进化结果，发现在渐渗的蛋白质编码基因中，只有两个（0.8%）处于正选择状态，而非渐渗的基因有99个（1.7%）处于正选择状态。我们的研究结果表明，尽管在系统发育中普遍存在可检测到的渗入，但物种之间流动的许多遗传变异可能在分子适应中起着次要作用。

{"title":"Introgressed variants obscure phylogenetic relationships but are not subject to positive selection in Australasian long-tailed parrots.","authors":"Brian Tilston Smith,Agusto Luzuriaga-Neira,David Alvarez-Ponce,Kaiya L Provost,Gregory Thom,Leo Joseph","doi":"10.1093/sysbio/syaf066","DOIUrl":"https://doi.org/10.1093/sysbio/syaf066","url":null,"abstract":"Gene flow often obscures phylogenetic relationships but the evolutionary significance of introgressed variants is unclear. Here we examine the Australasian long-tailed parrots (Psittaculinae: Polytelini) in which an unexpected sister relationship between Polytelis alexandrae and the genus Aprosmictus, and not the other Polytelis species, has been observed. We tested whether this relationship was due to ancient introgression in whole genomes. We found that the majority of gene trees had Ap. erythropterus and P. alexandrae as sister taxa, whereas network analysis indicated monophyly of Polytelis, 48% of gene trees being in phylogenetic conflict due to introgression from Ap. erythropterus into P. alexandrae. Further analyses confidently confirmed that 4-8% of the genome of P. alexandrae was introgressed from Ap. erythropterus with signals of gene flow occurring throughout the genome. These findings indicate that topologies with P. alexandrae and the genus Ap. erythropterus as sister taxa were biased by gene flow and affirm that Polytelis is monophyletic. Next, we assessed the evolutionary outcomes for introgressed variants and found that, among introgressed protein-coding genes, only two (0.8%) were under positive selection, in comparison to 99 (1.7%) of non-introgressed genes. Our results indicate that, despite the ubiquity of detectable introgression in phylogenies, many genetic variants flowing between species may play a minor role in molecular adaptations.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"102 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145153407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating Genomics, Collections, and Community Science to Delimit Species Clarifies the Taxonomy of a Variable Monitor Lizard (Varanus tristis). 整合基因组学，收集和社区科学来划定物种澄清可变巨蜥（Varanus tristis）的分类。

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-25 DOI: 10.1093/sysbio/syaf064

Carlos J Pavón-Vázquez,Alison J Fitch,Paul Doughty,Stephen C Donnellan,J Scott Keogh

The accurate characterization of species diversity is a vital prerequisite for ecological and evolutionary research, as well as conservation. Thus, it is necessary to generate robust hypotheses of species limits based on the inference of evolutionary processes. Integrative species delimitation, the inference of species limits based on multiple sources of evidence, can provide unique insight into species diversity and the processes behind it. Here, we show how community observations can be integrated with standard molecular and phenotypic datasets under an integrative framework to identify the processes generating genetic and phenotypic variation. We implement this approach in Varanus tristis, a widespread and variable complex of Australian monitor lizards. Using genomic, phenotypic (linear and geometric morphometrics, coloration), spatial, and environmental data, we show that disparity in this complex is inconsistent with intraspecific variation and instead suggests that speciation has occurred. Based on our results, we provide an updated taxonomy for this complex and identify the processes that may have been responsible for the geographic sorting of variation. Our workflow provides a guideline for the integrative analysis of several types of data to identify the occurrence and causes of speciation. Furthermore, our study highlights the benefits and caveats associated with community science and machine learning-two tools used here-in taxonomic research.

物种多样性的准确表征是生态和进化研究以及保护的重要前提。因此，有必要根据进化过程的推断产生强有力的物种极限假设。综合物种划界是基于多种证据来源对物种界限的推断，可以为物种多样性及其背后的过程提供独特的见解。在这里，我们展示了如何将社区观察与标准分子和表型数据集整合在一个综合框架下，以确定产生遗传和表型变异的过程。我们在Varanus tristis中实施了这种方法，Varanus tristis是一种分布广泛且多变的澳大利亚巨蜥复合体。利用基因组学、表型学（线性和几何形态计量学、颜色）、空间和环境数据，我们发现这种复合物的差异与种内变异不一致，而是表明物种形成已经发生。根据我们的研究结果，我们为这个复杂的物种提供了一个更新的分类，并确定了可能对变异的地理分类负责的过程。我们的工作流程为综合分析几种类型的数据提供了指导方针，以确定物种形成的发生和原因。此外，我们的研究强调了社区科学和机器学习（这里使用的两种工具）在分类学研究中的好处和警告。

{"title":"Integrating Genomics, Collections, and Community Science to Delimit Species Clarifies the Taxonomy of a Variable Monitor Lizard (Varanus tristis).","authors":"Carlos J Pavón-Vázquez,Alison J Fitch,Paul Doughty,Stephen C Donnellan,J Scott Keogh","doi":"10.1093/sysbio/syaf064","DOIUrl":"https://doi.org/10.1093/sysbio/syaf064","url":null,"abstract":"The accurate characterization of species diversity is a vital prerequisite for ecological and evolutionary research, as well as conservation. Thus, it is necessary to generate robust hypotheses of species limits based on the inference of evolutionary processes. Integrative species delimitation, the inference of species limits based on multiple sources of evidence, can provide unique insight into species diversity and the processes behind it. Here, we show how community observations can be integrated with standard molecular and phenotypic datasets under an integrative framework to identify the processes generating genetic and phenotypic variation. We implement this approach in Varanus tristis, a widespread and variable complex of Australian monitor lizards. Using genomic, phenotypic (linear and geometric morphometrics, coloration), spatial, and environmental data, we show that disparity in this complex is inconsistent with intraspecific variation and instead suggests that speciation has occurred. Based on our results, we provide an updated taxonomy for this complex and identify the processes that may have been responsible for the geographic sorting of variation. Our workflow provides a guideline for the integrative analysis of several types of data to identify the occurrence and causes of speciation. Furthermore, our study highlights the benefits and caveats associated with community science and machine learning-two tools used here-in taxonomic research.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"37 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybridization and Polyploidy shaped the Evolutionary History of a Complex of Cryptic Species in European Woodrushes (Luzula sect. Luzula). 杂交和多倍体形成了欧洲木桐属（Luzula sect. Luzula）隐种复合体的进化史。

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-25 DOI: 10.1093/sysbio/syaf065

Valentin Heimer,Pau Carnicero,Carolina Carrizo García,Andreas Hilpold,Jasna Dolenc Koce,J Luis Leal,Mingai Li,Claudio Varotto,Peter Schönswetter,Božo Frajman

Polyploidization has played a central role in the evolutionary history of most plant lineages, yet it poses significant challenges for phylogenetic inference, particularly in allopolyploid complexes with reticulate species relationships. Luzula sect. Luzula (Juncaceae) is a taxonomically intricate group characterized by widespread polyploidy, agmatoploidy, and high morphological uniformity. Focusing on the Eastern Alps, a key center of its diversity, we collected 1,002 samples of nine species and applied an integrative framework combining ddRADseq, plastid sequencing, relative genome size estimation, and chromosome counting to disentangle its evolutionary history. We extended previously inferred phylogenetic relationships and assessed gene flow among diploids, establishing a baseline for investigating the origin of polyploids. By analyzing patterns of genotype frequencies and genetic affinities to diploids, we inferred the most likely parental species of polyploids and identified key hybridization events shaping the current taxonomic and karyotypic diversity within this group. Our results reveal weak genetic differentiation among some diploid lineages, likely reflecting gene flow and incomplete lineage sorting. We propose a common allopolyploid origin of two tetraploids, which subsequently gave rise to a third tetraploid and a hexaploid species through interploidy hybridization. Although the parental species of some polyploids remain obscure, our genomic data highlight polyploidy and hybridization as major drivers of speciation in this poorly understood lineage. This study underscores the value of integrative approaches in resolving reticulate plant phylogenies and advances our understanding of polyploid speciation.

多倍体在大多数植物谱系的进化史中发挥着核心作用，但它对系统发育推理提出了重大挑战，特别是在具有网状物种关系的异源多倍体复合体中。灯花属（灯花科）是一个分类复杂的类群，其特征是广泛的多倍体、双倍体和高度的形态均匀性。本研究以青藏高原东部阿尔卑斯地区为研究对象，采集了9个物种的1002份样本，采用ddRADseq、质体测序、相对基因组大小估算和染色体计数等综合分析框架，对其进化历史进行了梳理。我们扩展了先前推断的系统发育关系，并评估了二倍体之间的基因流动，为研究多倍体的起源建立了基线。通过分析基因型频率和与二倍体的遗传亲和性模式，我们推断出多倍体最可能的亲本物种，并确定了形成该群体当前分类和核型多样性的关键杂交事件。我们的研究结果显示，一些二倍体谱系之间的遗传分化较弱，可能反映了基因流动和不完整的谱系分类。我们提出了两个四倍体的共同异源多倍体起源，随后通过倍间杂交产生第三个四倍体和一个六倍体物种。虽然一些多倍体的亲本物种仍然不清楚，但我们的基因组数据强调多倍体和杂交是这个知之甚少的谱系中物种形成的主要驱动因素。这项研究强调了综合方法在解决网状植物系统发育中的价值，并促进了我们对多倍体物种形成的理解。

{"title":"Hybridization and Polyploidy shaped the Evolutionary History of a Complex of Cryptic Species in European Woodrushes (Luzula sect. Luzula).","authors":"Valentin Heimer,Pau Carnicero,Carolina Carrizo García,Andreas Hilpold,Jasna Dolenc Koce,J Luis Leal,Mingai Li,Claudio Varotto,Peter Schönswetter,Božo Frajman","doi":"10.1093/sysbio/syaf065","DOIUrl":"https://doi.org/10.1093/sysbio/syaf065","url":null,"abstract":"Polyploidization has played a central role in the evolutionary history of most plant lineages, yet it poses significant challenges for phylogenetic inference, particularly in allopolyploid complexes with reticulate species relationships. Luzula sect. Luzula (Juncaceae) is a taxonomically intricate group characterized by widespread polyploidy, agmatoploidy, and high morphological uniformity. Focusing on the Eastern Alps, a key center of its diversity, we collected 1,002 samples of nine species and applied an integrative framework combining ddRADseq, plastid sequencing, relative genome size estimation, and chromosome counting to disentangle its evolutionary history. We extended previously inferred phylogenetic relationships and assessed gene flow among diploids, establishing a baseline for investigating the origin of polyploids. By analyzing patterns of genotype frequencies and genetic affinities to diploids, we inferred the most likely parental species of polyploids and identified key hybridization events shaping the current taxonomic and karyotypic diversity within this group. Our results reveal weak genetic differentiation among some diploid lineages, likely reflecting gene flow and incomplete lineage sorting. We propose a common allopolyploid origin of two tetraploids, which subsequently gave rise to a third tetraploid and a hexaploid species through interploidy hybridization. Although the parental species of some polyploids remain obscure, our genomic data highlight polyploidy and hybridization as major drivers of speciation in this poorly understood lineage. This study underscores the value of integrative approaches in resolving reticulate plant phylogenies and advances our understanding of polyploid speciation.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"85 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Practical Guide and Review of Fossil Tip-Dating in Phylogenetics. 系统发育中化石尖端测年的实用指南与综述。

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-24 DOI: 10.1093/sysbio/syaf050

Nicola S Heckeberg,Alessio Capobianco,Basanta Khakurel,Gustavo Darlim,Sebastian Höhna

Phylogenetic tip-dating has been and still is revolutionizing evolutionary biology in several ways. Fossil tip-dating, where fossils are placed into a phylogeny as tips based on morphological and/or molecular character information, provides a more principled approach to infer time-calibrated phylogenies compared with node-dating. Additionally, phylogenetic trees with fossils as tips become more and more important to elucidate evolutionary processes in macroevolutionary studies, e.g., deciphering diversification patterns and directional phenotypic evolution. Fossil tip-dating is slowly gathering popularity in empirical applications and has progressed substantially since its first demonstration in 2011, with respect to improved statistical models, software and datasets. Nevertheless, executing a phylogenetic fossil tip-dating analysis is complicated and comes with many challenges. Here, we provide an extensive review and overview of methods and models for phylogenetic tip-dating analyses with fossils. We focus both on data collection and preparation as well as on modeling choices. We start with a survey of all published phylogenetic tip-dating studies to date, showing common data and modeling choices as well as trends towards new approaches. Then, we walk readers through sections of molecular evolution, morphological evolution (both for discrete and continuous data), and lineage evolution (the fossilized-birth-death process). In each section, we describe the data and standard models with their underlying assumptions, and provide an outlook and practical recommendations.

系统发育尖端测年已经并仍在以多种方式革新进化生物学。化石尖端定年法是将化石作为基于形态和/或分子特征信息的尖端放入系统发生中，与节点定年法相比，它提供了一种更有原则的方法来推断时间校准的系统发生。此外，在宏观进化研究中，以化石为尖端的系统发育树在解释进化过程中变得越来越重要，例如破译多样化模式和定向表型进化。化石尖端测年在实证应用中逐渐普及，自2011年首次展示以来，在改进的统计模型、软件和数据集方面取得了实质性进展。然而，执行系统发育化石尖端年代分析是复杂的，并且面临许多挑战。在这里，我们提供了广泛的回顾和概述的方法和模型的系统发育尖端年代分析与化石。我们既关注数据收集和准备，也关注建模选择。我们首先对迄今为止发表的所有系统发育尖端测年研究进行了调查，展示了常见的数据和建模选择以及新方法的趋势。然后，我们带领读者浏览分子进化，形态进化（离散和连续数据）和谱系进化（化石出生-死亡过程）的章节。在每个部分中，我们将描述数据和标准模型及其基本假设，并提供展望和实用建议。

{"title":"Practical Guide and Review of Fossil Tip-Dating in Phylogenetics.","authors":"Nicola S Heckeberg,Alessio Capobianco,Basanta Khakurel,Gustavo Darlim,Sebastian Höhna","doi":"10.1093/sysbio/syaf050","DOIUrl":"https://doi.org/10.1093/sysbio/syaf050","url":null,"abstract":"Phylogenetic tip-dating has been and still is revolutionizing evolutionary biology in several ways. Fossil tip-dating, where fossils are placed into a phylogeny as tips based on morphological and/or molecular character information, provides a more principled approach to infer time-calibrated phylogenies compared with node-dating. Additionally, phylogenetic trees with fossils as tips become more and more important to elucidate evolutionary processes in macroevolutionary studies, e.g., deciphering diversification patterns and directional phenotypic evolution. Fossil tip-dating is slowly gathering popularity in empirical applications and has progressed substantially since its first demonstration in 2011, with respect to improved statistical models, software and datasets. Nevertheless, executing a phylogenetic fossil tip-dating analysis is complicated and comes with many challenges. Here, we provide an extensive review and overview of methods and models for phylogenetic tip-dating analyses with fossils. We focus both on data collection and preparation as well as on modeling choices. We start with a survey of all published phylogenetic tip-dating studies to date, showing common data and modeling choices as well as trends towards new approaches. Then, we walk readers through sections of molecular evolution, morphological evolution (both for discrete and continuous data), and lineage evolution (the fossilized-birth-death process). In each section, we describe the data and standard models with their underlying assumptions, and provide an outlook and practical recommendations.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"40 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145127098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mechanisms of Community Assembly through the lens of Phylogenetic Diversity: a Critical Reappraisal 系统发育多样性视角下的群落聚集机制：一个重要的再评价

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-19 DOI: 10.1093/sysbio/syaf062

Thibault Kasprzyk, Gilles Dauby, Alain Vanderpoorten, Olivier J Hardy

Darwin was one of the first to hypothesize a connection between niche differentiation and competition and species relatedness, offering an appealing framework to disentangle community assembly processes based on phylogenetic diversity patterns. Community assembly is, however, the result of several processes including potentially confounding factors associated with dispersal limitations and spatial effects, casting doubt about the application of phylogenetic diversity metrics to infer community assembly processes. We implemented a spatially-explicit model involving limited dispersal, drift, trait-based selection and competition to simulate community composition under competing assembly processes in a landscape with contrasted habitat connectivity. The phylogenetic structure of communities globally varied depending on assembly processes and the combination thereof, validating the assumption, made by a large number of studies but seldom tested in a spatially-explicit context, that different assembly processes indeed lead to significantly different patterns of community phylogenetic structure. All the investigated alpha metrics exhibited a poor ability to detect overdispersion under stabilizing processes, and some even unduly recovered a signal of clustering. Some of the most widely used metrics, such as UniFrac, carry a redundant signal with non-phylogenetic metrics, and hence, poorly capture the phylogenetic signal in the data. We identified three metrics, namely Bst or Pst for abundance data and PIst for occurrence data, which best retrieved the correct signal of phylogenetic structure under different assembly processes. Spatial effects may blur the phylogenetic structure of communities and decrease our ability to infer underlying processes. However, meaningful results may be obtained when the appropriate comparisons are made. In particular, phylogenetic clustering under equalizing processes must be tested on inter-habitat comparisons because it is the differential filtering of species between habitats that reveals the impact of equalizing processes. Our simulations further suggest that a significant phylogenetic structure of communities can be retrieved even in species-poor communities, except when the communities being compared are dominated by a single, most abundant species. We therefore conclude with best practices to adequately infer assembly processes with useful phylogenetic diversity metrics.

达尔文是第一个假设生态位分化与竞争和物种亲缘关系之间存在联系的人之一，他提供了一个有吸引力的框架来解开基于系统发育多样性模式的群落组装过程。然而，群落聚集是几个过程的结果，包括与扩散限制和空间效应相关的潜在混淆因素，这使人们对系统发育多样性指标在推断群落聚集过程中的应用产生了怀疑。我们实施了一个空间显式模型，包括有限的扩散、漂移、基于性状的选择和竞争，以模拟具有对比栖息地连通性的景观中竞争组装过程下的群落组成。全球群落的系统发育结构因装配过程及其组合而异，这证实了大量研究提出的假设，即不同的装配过程确实导致了显著不同的群落系统发育结构模式，但很少在空间明确的背景下进行测试。所有研究的alpha指标在稳定过程中都表现出较差的检测过分散的能力，有些甚至不适当地恢复聚类信号。一些最广泛使用的指标，如UniFrac，带有非系统发育指标的冗余信号，因此很难捕获数据中的系统发育信号。我们确定了三个指标，即丰度数据的Bst或Pst和发生率数据的ist，它们最能准确地检索不同装配过程下的系统发育结构信号。空间效应可能模糊群落的系统发育结构，降低我们推断潜在过程的能力。然而，当进行适当的比较时，可能会得到有意义的结果。特别是，平衡过程下的系统发育聚类必须在生境间比较中进行测试，因为它是生境间物种的差异过滤，揭示了平衡过程的影响。我们的模拟进一步表明，即使在物种贫乏的群落中，也可以检索到群落的重要系统发育结构，除非被比较的群落由单一的、最丰富的物种主导。因此，我们总结了最佳实践，以充分推断装配过程与有用的系统发育多样性指标。

{"title":"Mechanisms of Community Assembly through the lens of Phylogenetic Diversity: a Critical Reappraisal","authors":"Thibault Kasprzyk, Gilles Dauby, Alain Vanderpoorten, Olivier J Hardy","doi":"10.1093/sysbio/syaf062","DOIUrl":"https://doi.org/10.1093/sysbio/syaf062","url":null,"abstract":"Darwin was one of the first to hypothesize a connection between niche differentiation and competition and species relatedness, offering an appealing framework to disentangle community assembly processes based on phylogenetic diversity patterns. Community assembly is, however, the result of several processes including potentially confounding factors associated with dispersal limitations and spatial effects, casting doubt about the application of phylogenetic diversity metrics to infer community assembly processes. We implemented a spatially-explicit model involving limited dispersal, drift, trait-based selection and competition to simulate community composition under competing assembly processes in a landscape with contrasted habitat connectivity. The phylogenetic structure of communities globally varied depending on assembly processes and the combination thereof, validating the assumption, made by a large number of studies but seldom tested in a spatially-explicit context, that different assembly processes indeed lead to significantly different patterns of community phylogenetic structure. All the investigated alpha metrics exhibited a poor ability to detect overdispersion under stabilizing processes, and some even unduly recovered a signal of clustering. Some of the most widely used metrics, such as UniFrac, carry a redundant signal with non-phylogenetic metrics, and hence, poorly capture the phylogenetic signal in the data. We identified three metrics, namely Bst or Pst for abundance data and PIst for occurrence data, which best retrieved the correct signal of phylogenetic structure under different assembly processes. Spatial effects may blur the phylogenetic structure of communities and decrease our ability to infer underlying processes. However, meaningful results may be obtained when the appropriate comparisons are made. In particular, phylogenetic clustering under equalizing processes must be tested on inter-habitat comparisons because it is the differential filtering of species between habitats that reveals the impact of equalizing processes. Our simulations further suggest that a significant phylogenetic structure of communities can be retrieved even in species-poor communities, except when the communities being compared are dominated by a single, most abundant species. We therefore conclude with best practices to adequately infer assembly processes with useful phylogenetic diversity metrics.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"54 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating Ancestral States of Complex Characters: a Case Study on the Evolution of Feathers. 复杂性状祖先状态的估计：以羽毛进化为例。

IF 5.7 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-13 DOI: 10.1093/sysbio/syaf063

Pierre Cockx, Michael J Benton, Joseph N Keating

Feathers are a key novelty underpinning the evolutionary success of birds, yet the origin of feathers remains poorly understood. Debates about feather evolution hinge upon whether filamentous integument has evolved once or multiple times independently on the lineage leading to modern birds. These contradictory results stem from methodological differences in statistical ancestral state estimates. Here we conduct a comprehensive comparison of ancestral state estimation methodologies applied to stem-group birds, testing the role of outgroup inclusion, tree time scaling method, model choice and character coding strategy. Models are compared based on their Akaike Information Criteria (AIC), mutual information, as well as the uncertainty of marginal ancestral state estimates. Our results demonstrate that ancestral state estimates of stem-bird integument are strongly influenced by tree time scaling method, outgroup selection and model choice, while character coding strategy seems to have less effect on the ancestral estimates produced. We identify the best fitting and most generalizable models using AIC scores and leave-one-out cross-validation (LOOCV) respectively. Our analyses broadly support the independent origin of filamentous integument in dinosaurs and pterosaurs and support a younger evolutionary origin of feathers than has been suggested previously. In terms of model selection, we observe little correlation between AIC/AICc and LOOCV error, suggesting that, for our dataset, model fit does not reliably predict generalizability. However, both approaches favor models that infer a similar pattern of feather evolution. More globally, our study highlights that special care must be taken in selecting the outgroup, tree and model when conducting ASE analyses.

羽毛是支撑鸟类进化成功的关键新事物，但羽毛的起源仍然知之甚少。关于羽毛进化的争论取决于丝状被皮是在导致现代鸟类的谱系中独立进化了一次还是多次。这些相互矛盾的结果源于统计祖先状态估计的方法差异。本文对用于干群鸟类的祖先状态估计方法进行了全面比较，测试了外群包含、树时间尺度方法、模型选择和字符编码策略的作用。基于赤池信息准则（Akaike Information Criteria， AIC）、互信息以及边际祖先状态估计的不确定性对模型进行了比较。我们的研究结果表明，茎鸟被毛的祖先状态估计受树时间尺度法、外群选择和模型选择的强烈影响，而字符编码策略对祖先状态估计的影响较小。我们分别使用AIC分数和留一交叉验证（LOOCV）来确定最佳拟合和最可推广的模型。我们的分析广泛地支持了恐龙和翼龙的丝状被膜的独立起源，并支持了羽毛的进化起源比之前提出的更年轻。在模型选择方面，我们观察到AIC/AICc与LOOCV误差之间的相关性很小，这表明对于我们的数据集，模型拟合不能可靠地预测泛化性。然而，这两种方法都倾向于推断羽毛进化模式相似的模型。更广泛地说，我们的研究强调，在进行ASE分析时，必须特别注意选择外群、树和模型。

{"title":"Estimating Ancestral States of Complex Characters: a Case Study on the Evolution of Feathers.","authors":"Pierre Cockx, Michael J Benton, Joseph N Keating","doi":"10.1093/sysbio/syaf063","DOIUrl":"https://doi.org/10.1093/sysbio/syaf063","url":null,"abstract":"Feathers are a key novelty underpinning the evolutionary success of birds, yet the origin of feathers remains poorly understood. Debates about feather evolution hinge upon whether filamentous integument has evolved once or multiple times independently on the lineage leading to modern birds. These contradictory results stem from methodological differences in statistical ancestral state estimates. Here we conduct a comprehensive comparison of ancestral state estimation methodologies applied to stem-group birds, testing the role of outgroup inclusion, tree time scaling method, model choice and character coding strategy. Models are compared based on their Akaike Information Criteria (AIC), mutual information, as well as the uncertainty of marginal ancestral state estimates. Our results demonstrate that ancestral state estimates of stem-bird integument are strongly influenced by tree time scaling method, outgroup selection and model choice, while character coding strategy seems to have less effect on the ancestral estimates produced. We identify the best fitting and most generalizable models using AIC scores and leave-one-out cross-validation (LOOCV) respectively. Our analyses broadly support the independent origin of filamentous integument in dinosaurs and pterosaurs and support a younger evolutionary origin of feathers than has been suggested previously. In terms of model selection, we observe little correlation between AIC/AICc and LOOCV error, suggesting that, for our dataset, model fit does not reliably predict generalizability. However, both approaches favor models that infer a similar pattern of feather evolution. More globally, our study highlights that special care must be taken in selecting the outgroup, tree and model when conducting ASE analyses.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating waiting distances between genealogy changes under a Multi-Species Extension of the Sequentially Markov Coalescent. 序贯马尔可夫聚结的多种扩展下谱系变化等待距离的估计。

IF 5.7 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-09 DOI: 10.1093/sysbio/syaf059

Patrick F McKenzie, Deren A R Eaton

Genomes are composed of a mosaic of segments inherited from different ancestors, each separated by past recombination events. Consequently, genealogical relationships among multiple genomes vary spatially across different genomic regions. Genealogical variation among unlinked (uncorrelated) genomic regions is well described for either a single population (coalescent) or multiple structured populations (multispecies coalescent). However, the expected similarity among genealogies at linked regions of a genome is less well characterized. Recently, an analytical solution was derived for the distribution of the waiting distance for a change in the genealogical tree spatially across a genome for a single population with constant effective population size. Here we describe a generalization of this result, in terms of the distribution of waiting distances between changes in genealogical trees and topologies for multiple structured populations with branch-specific effective population sizes (i.e., under the multispecies coalescent). We implemented our model in the Python package ipcoal and validated its accuracy against stochastic coalescent simulations. Using a novel likelihood framework we show that tree and topology-change waiting distances in an ARG can be used to fit species tree model parameters, demonstrating an application of our model for developing new methods for phylogenetic inference. The Multi-Species Sequentially Markov Coalescent (MS-SMC) model presented here represents a major advance for linking local ancestry inference to hierarchical demographic models.

基因组是由从不同祖先遗传下来的片段拼接而成的，每个片段都被过去的重组事件分开。因此，多个基因组之间的谱系关系在不同的基因组区域存在空间差异。在单个种群（聚结）或多个结构种群（多物种聚结）中，非连锁（不相关）基因组区域之间的家谱变异都得到了很好的描述。然而，预期的相似性在系谱之间的联系区域的基因组是不太好表征。最近，对一个有效种群规模恒定的单一种群，导出了谱系树变化等待距离在基因组上的空间分布的解析解。本文从具有分支特异性有效种群大小的多结构种群（即多物种聚合）的谱系树和拓扑变化之间的等待距离分布的角度对这一结果进行了推广。我们在Python包ipcoal中实现了我们的模型，并在随机聚结模拟中验证了它的准确性。利用新的似然框架，我们证明了ARG中的树和拓扑变化等待距离可用于拟合物种树模型参数，证明了我们的模型在开发系统发育推断新方法方面的应用。本文提出的多物种序列马尔可夫聚结（MS-SMC）模型代表了将本地祖先推断与分层人口统计模型联系起来的重大进展。

{"title":"Estimating waiting distances between genealogy changes under a Multi-Species Extension of the Sequentially Markov Coalescent.","authors":"Patrick F McKenzie, Deren A R Eaton","doi":"10.1093/sysbio/syaf059","DOIUrl":"https://doi.org/10.1093/sysbio/syaf059","url":null,"abstract":"Genomes are composed of a mosaic of segments inherited from different ancestors, each separated by past recombination events. Consequently, genealogical relationships among multiple genomes vary spatially across different genomic regions. Genealogical variation among unlinked (uncorrelated) genomic regions is well described for either a single population (coalescent) or multiple structured populations (multispecies coalescent). However, the expected similarity among genealogies at linked regions of a genome is less well characterized. Recently, an analytical solution was derived for the distribution of the waiting distance for a change in the genealogical tree spatially across a genome for a single population with constant effective population size. Here we describe a generalization of this result, in terms of the distribution of waiting distances between changes in genealogical trees and topologies for multiple structured populations with branch-specific effective population sizes (i.e., under the multispecies coalescent). We implemented our model in the Python package ipcoal and validated its accuracy against stochastic coalescent simulations. Using a novel likelihood framework we show that tree and topology-change waiting distances in an ARG can be used to fit species tree model parameters, demonstrating an application of our model for developing new methods for phylogenetic inference. The Multi-Species Sequentially Markov Coalescent (MS-SMC) model presented here represents a major advance for linking local ancestry inference to hierarchical demographic models.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145024236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The comparative analysis of lineage-pair traits. 系对性状的比较分析。

IF 5.7 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-05 DOI: 10.1093/sysbio/syaf061

Sean A S Anderson, Sachin Kaushik, Daniel R Matute

For many questions in ecology and evolution, the most relevant data to consider are attributes of lineage pairs. Comparative tests for causal relationships among traits like 'diet niche overlap', 'divergence time', and 'strength of reproductive isolation (RI)' - measured for pairwise combinations of related species or populations - have led to several groundbreaking insights, but the correct statistical approach for these analyses has never been clear. Lineage-pair traits are non-independent, but unlike the expected covariance among species' traits, which is captured by a phylogenetic covariance matrix arising from a given model, the expected covariance among lineage-pair traits has not been explicitly formulated. Analyses of pairwise-defined data have thus employed untested workarounds for non-independence rather than direct models of lineage-pair covariance, with consequences that are unexplored. Here, we consider how evolutionary relatedness among taxa translates into non-independence among taxonomic pairs. We develop models by which phylogenetic signal in an underlying character generates covariance among pairs in a lineage-pair trait. We incorporate the resulting lineage-pair covariance matrices into modified versions of phylogenetic generalized least squares and a new phylogenetic beta regression for bounded response variables. Both outperform previous approaches in simulation tests. We find that a common heuristic method, node averaging, imparts a greater cost to model performance than does the non-independence it was designed to correct. We re-analyze two empirical datasets to find dramatic improvements in model fit and, in the case of avian hybridization data, an even stronger relationship between pair age and RI than is revealed from uncorrected analysis. We finally present a new tool, the R package phylopairs, that allows empiricists to test relationships among pairwise-defined variables in a way that is statistically robust and more straightforward to implement.

对于生态学和进化中的许多问题，最需要考虑的相关数据是谱系对的属性。对“饮食生态位重叠”、“分化时间”和“生殖隔离强度（RI）”等性状之间因果关系的比较测试——对相关物种或种群的成对组合进行测量——已经产生了一些开创性的见解，但这些分析的正确统计方法从未明确过。谱系对性状是非独立的，但与物种性状之间的预期协方差不同，物种性状之间的预期协方差是由给定模型产生的系统发育协方差矩阵捕获的，谱系对性状之间的预期协方差尚未明确表示。因此，对双定义数据的分析采用了未经检验的非独立性的变通方法，而不是直接的谱系对协方差模型，其后果尚未探索。在这里，我们考虑类群之间的进化亲缘关系如何转化为分类对之间的非独立性。我们开发的模型，其中系统发育信号在一个潜在的性状产生协方差对谱系对性状之间。我们将所得到的谱系对协方差矩阵纳入改良版的系统发育广义最小二乘和针对有界响应变量的新的系统发育β回归。在模拟测试中，这两种方法都优于以前的方法。我们发现，一种常见的启发式方法，即节点平均，对模型性能的影响比它旨在纠正的非独立性更大。我们重新分析了两个经验数据集，发现模型拟合的显著改善，并且在鸟类杂交数据的情况下，配对年龄和RI之间的关系比未经校正的分析显示的更强。我们最后提出了一个新工具，R包系统对，它允许经验主义者以一种统计上稳健且更直接实现的方式测试成对定义变量之间的关系。

{"title":"The comparative analysis of lineage-pair traits.","authors":"Sean A S Anderson, Sachin Kaushik, Daniel R Matute","doi":"10.1093/sysbio/syaf061","DOIUrl":"10.1093/sysbio/syaf061","url":null,"abstract":"For many questions in ecology and evolution, the most relevant data to consider are attributes of lineage pairs. Comparative tests for causal relationships among traits like 'diet niche overlap', 'divergence time', and 'strength of reproductive isolation (RI)' - measured for pairwise combinations of related species or populations - have led to several groundbreaking insights, but the correct statistical approach for these analyses has never been clear. Lineage-pair traits are non-independent, but unlike the expected covariance among species' traits, which is captured by a phylogenetic covariance matrix arising from a given model, the expected covariance among lineage-pair traits has not been explicitly formulated. Analyses of pairwise-defined data have thus employed untested workarounds for non-independence rather than direct models of lineage-pair covariance, with consequences that are unexplored. Here, we consider how evolutionary relatedness among taxa translates into non-independence among taxonomic pairs. We develop models by which phylogenetic signal in an underlying character generates covariance among pairs in a lineage-pair trait. We incorporate the resulting lineage-pair covariance matrices into modified versions of phylogenetic generalized least squares and a new phylogenetic beta regression for bounded response variables. Both outperform previous approaches in simulation tests. We find that a common heuristic method, node averaging, imparts a greater cost to model performance than does the non-independence it was designed to correct. We re-analyze two empirical datasets to find dramatic improvements in model fit and, in the case of avian hybridization data, an even stronger relationship between pair age and RI than is revealed from uncorrected analysis. We finally present a new tool, the R package phylopairs, that allows empiricists to test relationships among pairwise-defined variables in a way that is statistically robust and more straightforward to implement.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145001406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parameter Estimation from Phylogenetic Trees Using Neural Networks and Ensemble Learning. 基于神经网络和集成学习的系统发育树参数估计。

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-03 DOI: 10.1093/sysbio/syaf060

Tianjian Qin,Koen J van Benthem,Luis Valente,Rampal S Etienne

Species diversification is characterized by speciation and extinction, the rates of which can, under some assumptions, be estimated from time-calibrated phylogenies. However, maximum likelihood estimation methods (MLE) for inferring rates are limited to simpler models and can show bias, particularly in small phylogenies. Likelihood-free methods to estimate parameters of diversification models using deep learning have started to emerge, but how robust neural network methods are at handling the intricate nature of phylogenetic data remains an open question. Here we present a new ensemble neural network approach to estimate diversification parameters from phylogenetic trees that leverages different classes of neural networks (dense neural network, graph neural network, and long short-term memory recurrent network) and simultaneously learns from graph representations of phylogenies, their branching times and their summary statistics. Our best-performing ensemble neural network (which adjusts the graph neural network result using a recurrent neural network) delivers estimates faster than MLE and shows less sensitivity to tree size for constant-rate and diversity-dependent speciation scenarios. It performs well compared to an existing convolutional network approach. However, like MLE, our approach still fails to recover parameters precisely under a protracted birth-death process. Our analysis suggests that the primary limitation to accurate parameter estimation is the amount of information contained within a phylogeny, as indicated by its size and the strength of effects shaping it. In cases where MLE is unavailable, our neural network method provides a promising alternative for estimating phylogenetic tree parameters. If detectable phylogenetic signals are present, our approach delivers results that are comparable to MLE but without inherent biases.

物种多样化的特征是物种形成和灭绝，在某些假设下，物种形成和灭绝的速率可以根据时间校准的系统发生来估计。然而，用于推断速率的最大似然估计方法（MLE）仅限于更简单的模型，并且可能存在偏差，特别是在小型系统发育中。使用深度学习来估计多样化模型参数的无似然方法已经开始出现，但是神经网络方法在处理系统发育数据的复杂性方面有多强大仍然是一个悬而未决的问题。在这里，我们提出了一种新的集成神经网络方法来估计系统发生树的多样化参数，该方法利用不同类别的神经网络（密集神经网络、图神经网络和长短期记忆循环网络），同时从系统发生的图表示、分支时间和汇总统计中学习。我们表现最好的集成神经网络（使用循环神经网络调整图神经网络结果）比MLE提供更快的估计，并且在恒定速率和多样性依赖的物种形成场景中对树大小的敏感性较低。与现有的卷积网络方法相比，它表现良好。然而，与MLE一样，我们的方法仍然无法在长期的出生-死亡过程中精确地恢复参数。我们的分析表明，准确参数估计的主要限制是系统发育中包含的信息量，如其大小和形成它的效应的强度所表明的那样。在MLE不可用的情况下，我们的神经网络方法为估计系统发育树参数提供了一个有希望的替代方法。如果存在可检测的系统发育信号，我们的方法提供的结果与MLE相当，但没有固有的偏差。

{"title":"Parameter Estimation from Phylogenetic Trees Using Neural Networks and Ensemble Learning.","authors":"Tianjian Qin,Koen J van Benthem,Luis Valente,Rampal S Etienne","doi":"10.1093/sysbio/syaf060","DOIUrl":"https://doi.org/10.1093/sysbio/syaf060","url":null,"abstract":"Species diversification is characterized by speciation and extinction, the rates of which can, under some assumptions, be estimated from time-calibrated phylogenies. However, maximum likelihood estimation methods (MLE) for inferring rates are limited to simpler models and can show bias, particularly in small phylogenies. Likelihood-free methods to estimate parameters of diversification models using deep learning have started to emerge, but how robust neural network methods are at handling the intricate nature of phylogenetic data remains an open question. Here we present a new ensemble neural network approach to estimate diversification parameters from phylogenetic trees that leverages different classes of neural networks (dense neural network, graph neural network, and long short-term memory recurrent network) and simultaneously learns from graph representations of phylogenies, their branching times and their summary statistics. Our best-performing ensemble neural network (which adjusts the graph neural network result using a recurrent neural network) delivers estimates faster than MLE and shows less sensitivity to tree size for constant-rate and diversity-dependent speciation scenarios. It performs well compared to an existing convolutional network approach. However, like MLE, our approach still fails to recover parameters precisely under a protracted birth-death process. Our analysis suggests that the primary limitation to accurate parameter estimation is the amount of information contained within a phylogeny, as indicated by its size and the strength of effects shaping it. In cases where MLE is unavailable, our neural network method provides a promising alternative for estimating phylogenetic tree parameters. If detectable phylogenetic signals are present, our approach delivers results that are comparable to MLE but without inherent biases.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"13 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144960285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parameter Estimation from Phylogenetic Trees Using Neural Networks and Ensemble Learning. 基于神经网络和集成学习的系统发育树参数估计。

IF 5.7 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-09-03 DOI: 10.1093/sysbio/syaf060

Tianjian Qin, Koen J van Benthem, Luis Valente, Rampal S Etienne

Species diversification is characterized by speciation and extinction, the rates of which can, under some assumptions, be estimated from time-calibrated phylogenies. However, maximum likelihood estimation methods (MLE) for inferring rates are limited to simpler models and can show bias, particularly in small phylogenies. Likelihood-free methods to estimate parameters of diversification models using deep learning have started to emerge, but how robust neural network methods are at handling the intricate nature of phylogenetic data remains an open question. Here we present a new ensemble neural network approach to estimate diversification parameters from phylogenetic trees that leverages different classes of neural networks (dense neural network, graph neural network, and long short-term memory recurrent network) and simultaneously learns from graph representations of phylogenies, their branching times and their summary statistics. Our best-performing ensemble neural network (which adjusts the graph neural network result using a recurrent neural network) delivers estimates faster than MLE and shows less sensitivity to tree size for constant-rate and diversity-dependent speciation scenarios. It performs well compared to an existing convolutional network approach. However, like MLE, our approach still fails to recover parameters precisely under a protracted birth-death process. Our analysis suggests that the primary limitation to accurate parameter estimation is the amount of information contained within a phylogeny, as indicated by its size and the strength of effects shaping it. In cases where MLE is unavailable, our neural network method provides a promising alternative for estimating phylogenetic tree parameters. If detectable phylogenetic signals are present, our approach delivers results that are comparable to MLE but without inherent biases.

物种多样化的特征是物种形成和灭绝，在某些假设下，物种形成和灭绝的速率可以根据时间校准的系统发生来估计。然而，用于推断速率的最大似然估计方法（MLE）仅限于更简单的模型，并且可能存在偏差，特别是在小型系统发育中。使用深度学习来估计多样化模型参数的无似然方法已经开始出现，但是神经网络方法在处理系统发育数据的复杂性方面有多强大仍然是一个悬而未决的问题。在这里，我们提出了一种新的集成神经网络方法来估计系统发生树的多样化参数，该方法利用不同类别的神经网络（密集神经网络、图神经网络和长短期记忆循环网络），同时从系统发生的图表示、分支时间和汇总统计中学习。我们表现最好的集成神经网络（使用循环神经网络调整图神经网络结果）比MLE提供更快的估计，并且在恒定速率和多样性依赖的物种形成场景中对树大小的敏感性较低。与现有的卷积网络方法相比，它表现良好。然而，与MLE一样，我们的方法仍然无法在长期的出生-死亡过程中精确地恢复参数。我们的分析表明，准确参数估计的主要限制是系统发育中包含的信息量，如其大小和形成它的效应的强度所表明的那样。在MLE不可用的情况下，我们的神经网络方法为估计系统发育树参数提供了一个有希望的替代方法。如果存在可检测的系统发育信号，我们的方法提供的结果与MLE相当，但没有固有的偏差。

{"title":"Parameter Estimation from Phylogenetic Trees Using Neural Networks and Ensemble Learning.","authors":"Tianjian Qin, Koen J van Benthem, Luis Valente, Rampal S Etienne","doi":"10.1093/sysbio/syaf060","DOIUrl":"10.1093/sysbio/syaf060","url":null,"abstract":"Species diversification is characterized by speciation and extinction, the rates of which can, under some assumptions, be estimated from time-calibrated phylogenies. However, maximum likelihood estimation methods (MLE) for inferring rates are limited to simpler models and can show bias, particularly in small phylogenies. Likelihood-free methods to estimate parameters of diversification models using deep learning have started to emerge, but how robust neural network methods are at handling the intricate nature of phylogenetic data remains an open question. Here we present a new ensemble neural network approach to estimate diversification parameters from phylogenetic trees that leverages different classes of neural networks (dense neural network, graph neural network, and long short-term memory recurrent network) and simultaneously learns from graph representations of phylogenies, their branching times and their summary statistics. Our best-performing ensemble neural network (which adjusts the graph neural network result using a recurrent neural network) delivers estimates faster than MLE and shows less sensitivity to tree size for constant-rate and diversity-dependent speciation scenarios. It performs well compared to an existing convolutional network approach. However, like MLE, our approach still fails to recover parameters precisely under a protracted birth-death process. Our analysis suggests that the primary limitation to accurate parameter estimation is the amount of information contained within a phylogeny, as indicated by its size and the strength of effects shaping it. In cases where MLE is unavailable, our neural network method provides a promising alternative for estimating phylogenetic tree parameters. If detectable phylogenetic signals are present, our approach delivers results that are comparable to MLE but without inherent biases.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144970014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0