Systematic Biology最新文献_第7页

New perspectives in phylogenetic support assessment: using the new Relative Contradiction Index to investigate the phylogenetic controversies in Crocodylia 系统发育支持评价的新视角：用新的相对矛盾指数研究鳄鱼的系统发育争议

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-08-27 DOI: 10.1093/sysbio/syaf058

Paul Aubier, Valentin Rineau, Jorge Cubo, Stéphane Jouve

Numerous tools have been developed since the advent of phylogenetic methods to assess tree robustness. Identifying the degree of contradiction in a phylogenetic matrix, as well as the specific contribution of each taxon and character, is essential for estimating its reliability. In parsimony-based phylogenetic inferences, classically used by paleontologists, a phylogeny results from the interaction of all the characters used in the analysis. Consequently, the support initially provided by the characters in the matrix may differ from that after after optimization in the final tree, severing the link between the phylogenetic content of the matrix and that of the final tree. Thus, all methods aimed at measuring support only do so indirectly and the impact of individual characters or taxa can only be assessed after the analysis. Three-taxon analysis (3ta) is a phylogenetic method that can circumvent these issues by precisely measuring the support of targeted characters and/or taxa directly from the phylogenetic matrix. In 3ta, characters are coded as trees and decomposed into three-taxon statements (3ts). The analysis searches for the largest set of non-contradicting 3ts to compute the optimal phylogeny. Because the analysis is a compatibility procedure, not an optimization procedure, character supports on the tree are independent from one another. This enables direct assessment of support from the matrix, providing meaningful insights into the topology of the optimal trees. Moreover, the decomposition of characters into 3ts allows for precise quantification of the impact of the characters/taxa in the results. In this study, focusing on Crocodylia (a subject of ongoing debate over recent decades), we use 3ta to measure the support of specific characters and/or taxa in the recently published matrix of Rio and Mannion (2021). This conflict revolves around two competing hypotheses – Longirostres and Brevirostres – supporting a different placement of the Gavialoidea clade. We also introduce here the Relative Contradiction Index (RCI) to evaluate node support, a metric that reflects the degree of contradiction in a matrix between competing cladistic hypotheses, ranging from 0.5 (maximum contradiction) to 1 (no contradiction). We show that although the Longirostres hypothesis is the best-supported, it is strongly challenged by the Brevirostres hypothesis (RCI = 0.62). Furthermore, we find that Tomistominae provides 61% of the supporting evidence for the Longirostres hypothesis, such that, when removed, the matrix supports the Brevirostres hypothesis. Individual tomistomines’ contributions vary only from 2% to 7% of the total support to the Longirostres hypothesis. Finally, we show that characters correlated to longirostry only provide a fraction (22%) of the total support to the Longirostres hypothesis. Thus, our method can quantify the impact of specific characters or taxa on a phylogenetic result. This should prove very useful to phylogeneticists, especi

自从系统发育方法出现以来，已经开发了许多工具来评估树的稳健性。识别系统发育矩阵中的矛盾程度，以及每个分类群和特征的具体贡献，对于估计其可靠性至关重要。在以简约为基础的系统发育推断中，系统发育是由分析中使用的所有特征的相互作用产生的。因此，矩阵中字符最初提供的支持可能与最终树优化后的支持不同，从而切断了矩阵的系统发育内容与最终树的系统发育内容之间的联系。因此，所有旨在测量支持度的方法都是间接的，个体性状或分类群的影响只能在分析后才能评估。3 -taxon analysis （3ta）是一种系统发育方法，可以通过直接从系统发育矩阵中精确测量目标性状和/或分类群的支持度来避免这些问题。在3ta中，字符被编码为树，并分解为三个分类单元语句（3ts）。该分析寻找最大的不矛盾的3ts集来计算最优的系统发育。因为分析是一个兼容性过程，而不是一个优化过程，所以树上的字符支持是相互独立的。这使得可以直接评估矩阵的支持度，从而对最优树的拓扑结构提供有意义的见解。此外，将字符分解为3ts可以精确量化结果中字符/分类群的影响。在这项研究中，我们将重点放在鳄鱼（近几十年来一直存在争议的主题）上，在最近发表的里约热内卢和Mannion（2021）矩阵中，我们使用3ta来衡量特定特征和/或分类群的支持度。这场冲突围绕着两个相互竞争的假说——长形和短形假说——支持长形总分支的不同位置。我们还引入了相对矛盾指数（RCI）来评估节点支持度，这是一个反映竞争分支假设之间矩阵矛盾程度的度量，范围从0.5（最大矛盾）到1（无矛盾）。我们发现，虽然长压力假说得到了最好的支持，但短压力假说对它提出了强烈的挑战（RCI = 0.62）。此外，我们发现，Tomistominae提供了61%的支持证据的长压力假说，这样，当删除，矩阵支持短压力假说。对于Longirostres假说，个体的贡献仅占总支持量的2%到7%。最后，我们表明，与Longirostres相关的字符只提供了一小部分（22%）的支持Longirostres假说。因此，我们的方法可以量化特定性状或分类群对系统发育结果的影响。这将证明对系统发育学家非常有用，特别是在处理不完整的材料，如化石时。

{"title":"New perspectives in phylogenetic support assessment: using the new Relative Contradiction Index to investigate the phylogenetic controversies in Crocodylia","authors":"Paul Aubier, Valentin Rineau, Jorge Cubo, Stéphane Jouve","doi":"10.1093/sysbio/syaf058","DOIUrl":"https://doi.org/10.1093/sysbio/syaf058","url":null,"abstract":"Numerous tools have been developed since the advent of phylogenetic methods to assess tree robustness. Identifying the degree of contradiction in a phylogenetic matrix, as well as the specific contribution of each taxon and character, is essential for estimating its reliability. In parsimony-based phylogenetic inferences, classically used by paleontologists, a phylogeny results from the interaction of all the characters used in the analysis. Consequently, the support initially provided by the characters in the matrix may differ from that after after optimization in the final tree, severing the link between the phylogenetic content of the matrix and that of the final tree. Thus, all methods aimed at measuring support only do so indirectly and the impact of individual characters or taxa can only be assessed after the analysis. Three-taxon analysis (3ta) is a phylogenetic method that can circumvent these issues by precisely measuring the support of targeted characters and/or taxa directly from the phylogenetic matrix. In 3ta, characters are coded as trees and decomposed into three-taxon statements (3ts). The analysis searches for the largest set of non-contradicting 3ts to compute the optimal phylogeny. Because the analysis is a compatibility procedure, not an optimization procedure, character supports on the tree are independent from one another. This enables direct assessment of support from the matrix, providing meaningful insights into the topology of the optimal trees. Moreover, the decomposition of characters into 3ts allows for precise quantification of the impact of the characters/taxa in the results. In this study, focusing on Crocodylia (a subject of ongoing debate over recent decades), we use 3ta to measure the support of specific characters and/or taxa in the recently published matrix of Rio and Mannion (2021). This conflict revolves around two competing hypotheses – Longirostres and Brevirostres – supporting a different placement of the Gavialoidea clade. We also introduce here the Relative Contradiction Index (RCI) to evaluate node support, a metric that reflects the degree of contradiction in a matrix between competing cladistic hypotheses, ranging from 0.5 (maximum contradiction) to 1 (no contradiction). We show that although the Longirostres hypothesis is the best-supported, it is strongly challenged by the Brevirostres hypothesis (RCI = 0.62). Furthermore, we find that Tomistominae provides 61% of the supporting evidence for the Longirostres hypothesis, such that, when removed, the matrix supports the Brevirostres hypothesis. Individual tomistomines’ contributions vary only from 2% to 7% of the total support to the Longirostres hypothesis. Finally, we show that characters correlated to longirostry only provide a fraction (22%) of the total support to the Longirostres hypothesis. Thus, our method can quantify the impact of specific characters or taxa on a phylogenetic result. This should prove very useful to phylogeneticists, especi","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"3 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144906112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Phylogenetic Resolution and Conflict in the Species-Rich Flowering Plant Family Leguminosae. 物种丰富的开花植物豆科的系统发育解决与冲突。

IF 5.7 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-08-19 DOI: 10.1093/sysbio/syaf057

Rong Zhang, Gregory W Stull, Jian-Jun Jin, Yin-Huan Wang, Ying Guo, Zhi-Yun Yang, Hong-Tao Li, Kai-Lun An, Joseph L M Charboneau, Ryan A Folk, Domingos Cardoso, Luciano P de Queiroz, Anne Bruneau, Pamela S Soltis, Douglas E Soltis, Stephen A Smith, De-Zhu Li, Ting-Shuang Yi

The Tree of Life is central to evolutionary biology, yet resolving deep, recalcitrant phylogenetic relationships remains challenging due to complex processes such as incomplete lineage sorting (ILS), hybridization, and polyploidization. Although previous phylogenetic studies have advanced our understanding of Leguminosae (Fabaceae), a species-rich and ecologically diverse family, many deep relationships at the tribal and higher levels remain unresolved. Incorporating newly generated genome skimming data for 231 species with previously issued plastid genomic, mitochondrial genomic and transcriptomic data, we reconstructed a phylogeny of the family using whole plastomes, 39 mitochondrial genes, and 1559 low-copy nuclear genes, achieving dense taxonomic sampling across almost all recognized tribes and major unplaced lineages. Our results supported the monophyly of the six subfamilies and 49 recognized tribes, identified ten clades worthy of recognition as new tribes in subfamily Papilionoideae, and clarified many contentious relationships. However, nuclear-nuclear and cytonuclear conflicts persist at multiple nodes among trees inferred from different datasets and analytical methods. We proposed the most probable resolution for 22 contentious nodes by applying nuclear gene-tree quartet analysis with corroboration from support of nuclear Maximum Likelihood (ML) and ASTRAL trees. Our results indicate ILS significantly contributes to observed phylogenetic conflicts, while gene flow represents an additional and previously underappreciated factor that mainly contributes to cytonuclear conflicts, particularly along the branches of the Angylocalyceae + Dipterygeae + Amburaneae (ADA) clade and Wisterieae. These processes likely underlie recalcitrant phylogenetic relationships, such as those within the 50-kb inversion clade of Papilionoideae. Our study uses multiple data partitions and analytical methods to resolve contentious phylogenetic relationships in Leguminosae, resulting in a robust phylogenomic framework to guide further investigations in this economically important and exceptionally diverse family.

生命之树是进化生物学的核心，但由于复杂的过程，如不完全谱系分类（ILS）、杂交和多倍体化，解决深层的、顽固的系统发育关系仍然具有挑战性。虽然以前的系统发育研究提高了我们对豆科（豆科）这一物种丰富、生态多样的科的认识，但在部落和更高层次上的许多深层关系仍未得到解决。结合新生成的231个物种的基因组略读数据和先前发布的质体基因组、线粒体基因组和转录组数据，我们利用整个质体、39个线粒体基因和1559个低拷贝核基因重建了该家族的系统发育，在几乎所有已知的部落和主要未被发现的谱系中实现了密集的分类抽样。本研究结果支持了6个亚科和49个已确认的部落的单一性，确定了10个值得确认为新部落的分支，并澄清了许多有争议的关系。然而，从不同的数据集和分析方法推断出的树中，核冲突和细胞核冲突在多个节点上持续存在。我们通过核基因树四重奏分析提出了22个争议节点的最可能解决方案，并得到核最大似然树（ML）和ASTRAL树的支持。我们的研究结果表明，ILS显著有助于观察到的系统发育冲突，而基因流动是一个额外的、以前未被重视的因素，主要有助于细胞核冲突，特别是沿着Angylocalyceae + Dipterygeae + Amburaneae （ADA）分支和Wisterieae的分支。这些过程可能是顽固的系统发育关系的基础，例如在凤蝶科的50 kb倒置分支中。我们的研究使用多个数据分区和分析方法来解决豆科植物中有争议的系统发育关系，从而形成一个强大的系统基因组框架，以指导对这个经济上重要且异常多样化的家庭的进一步调查。

{"title":"Phylogenetic Resolution and Conflict in the Species-Rich Flowering Plant Family Leguminosae.","authors":"Rong Zhang, Gregory W Stull, Jian-Jun Jin, Yin-Huan Wang, Ying Guo, Zhi-Yun Yang, Hong-Tao Li, Kai-Lun An, Joseph L M Charboneau, Ryan A Folk, Domingos Cardoso, Luciano P de Queiroz, Anne Bruneau, Pamela S Soltis, Douglas E Soltis, Stephen A Smith, De-Zhu Li, Ting-Shuang Yi","doi":"10.1093/sysbio/syaf057","DOIUrl":"10.1093/sysbio/syaf057","url":null,"abstract":"The Tree of Life is central to evolutionary biology, yet resolving deep, recalcitrant phylogenetic relationships remains challenging due to complex processes such as incomplete lineage sorting (ILS), hybridization, and polyploidization. Although previous phylogenetic studies have advanced our understanding of Leguminosae (Fabaceae), a species-rich and ecologically diverse family, many deep relationships at the tribal and higher levels remain unresolved. Incorporating newly generated genome skimming data for 231 species with previously issued plastid genomic, mitochondrial genomic and transcriptomic data, we reconstructed a phylogeny of the family using whole plastomes, 39 mitochondrial genes, and 1559 low-copy nuclear genes, achieving dense taxonomic sampling across almost all recognized tribes and major unplaced lineages. Our results supported the monophyly of the six subfamilies and 49 recognized tribes, identified ten clades worthy of recognition as new tribes in subfamily Papilionoideae, and clarified many contentious relationships. However, nuclear-nuclear and cytonuclear conflicts persist at multiple nodes among trees inferred from different datasets and analytical methods. We proposed the most probable resolution for 22 contentious nodes by applying nuclear gene-tree quartet analysis with corroboration from support of nuclear Maximum Likelihood (ML) and ASTRAL trees. Our results indicate ILS significantly contributes to observed phylogenetic conflicts, while gene flow represents an additional and previously underappreciated factor that mainly contributes to cytonuclear conflicts, particularly along the branches of the Angylocalyceae + Dipterygeae + Amburaneae (ADA) clade and Wisterieae. These processes likely underlie recalcitrant phylogenetic relationships, such as those within the 50-kb inversion clade of Papilionoideae. Our study uses multiple data partitions and analytical methods to resolve contentious phylogenetic relationships in Leguminosae, resulting in a robust phylogenomic framework to guide further investigations in this economically important and exceptionally diverse family.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144875351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Social environment and the evolution of delayed reproduction in birds. 社会环境与鸟类延迟生殖的进化。

IF 5.7 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-08-12 DOI: 10.1093/sysbio/syaf056

Liam U Taylor, Josef C Uyeda, Richard O Prum

One puzzling feature of avian life histories is that individuals in many different lineages delay reproduction for several years after they finish growing. Intraspecific field studies suggest that various complex social environments-such as cooperative breeding groups, nesting colonies, and display leks-result in delayed reproduction because they require forms of sociosexual development that extend beyond physical maturation. Here, we formally propose this hypothesis and use a full suite of phylogenetic comparative methods to test it, analyzing the evolution of age at first reproduction (AFR) in females and males across 963 species of birds. Phylogenetic regressions support increased AFR in colonial females and males, cooperatively breeding males, and lekking males. Continuous Ornstein-Uhlenbeck models support distinct evolutionary regimes with increased AFR for all of cooperative, colonial, and lekking lineages. Discrete hidden state Markov models suggest a net increase in delayed reproduction for social lineages, even when accounting for hidden state heterogeneity and the potential reverse influence of AFR on sociality. Our results support the hypothesis that the evolution of sociality reshapes the dynamics of life history evolution in birds. Comparative analyses of even the most broadly generalizable characters, such as AFR, must reckon with unique, heterogeneous, historical events in the evolution of individual lineages.

鸟类生命史的一个令人困惑的特征是，许多不同谱系的个体在发育完成后会推迟数年进行繁殖。种内实地研究表明，各种复杂的社会环境——如合作繁殖群体、筑巢群体和展示泄漏——导致繁殖延迟，因为它们需要超越身体成熟的社会性发展形式。在这里，我们正式提出了这一假设，并使用一整套系统发育比较方法来验证它，分析了963种鸟类雌性和雄性的首次繁殖年龄（AFR）进化。系统发育回归支持种群雌性和雄性、合作繁殖的雄性和雄性的AFR增加。连续的Ornstein-Uhlenbeck模型支持不同的进化机制，所有的合作、殖民和泄漏谱系都增加了AFR。离散隐态马尔可夫模型表明，即使考虑到隐态异质性和AFR对社会性的潜在反向影响，社会谱系的延迟生殖也会净增加。我们的研究结果支持了社会性进化重塑鸟类生活史进化动态的假设。即使是对最具普遍性的性状（如AFR）进行比较分析，也必须考虑到个体谱系进化过程中独特的、异质的历史事件。

{"title":"Social environment and the evolution of delayed reproduction in birds.","authors":"Liam U Taylor, Josef C Uyeda, Richard O Prum","doi":"10.1093/sysbio/syaf056","DOIUrl":"10.1093/sysbio/syaf056","url":null,"abstract":"One puzzling feature of avian life histories is that individuals in many different lineages delay reproduction for several years after they finish growing. Intraspecific field studies suggest that various complex social environments-such as cooperative breeding groups, nesting colonies, and display leks-result in delayed reproduction because they require forms of sociosexual development that extend beyond physical maturation. Here, we formally propose this hypothesis and use a full suite of phylogenetic comparative methods to test it, analyzing the evolution of age at first reproduction (AFR) in females and males across 963 species of birds. Phylogenetic regressions support increased AFR in colonial females and males, cooperatively breeding males, and lekking males. Continuous Ornstein-Uhlenbeck models support distinct evolutionary regimes with increased AFR for all of cooperative, colonial, and lekking lineages. Discrete hidden state Markov models suggest a net increase in delayed reproduction for social lineages, even when accounting for hidden state heterogeneity and the potential reverse influence of AFR on sociality. Our results support the hypothesis that the evolution of sociality reshapes the dynamics of life history evolution in birds. Comparative analyses of even the most broadly generalizable characters, such as AFR, must reckon with unique, heterogeneous, historical events in the evolution of individual lineages.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144837778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ConvexML: Fast and accurate branch length estimation under irreversible mutation models, illustrated through applications to CRISPR/Cas9-based lineage tracing ConvexML：在不可逆突变模型下快速准确的分支长度估计，通过应用于基于CRISPR/ cas9的谱系追踪来说明

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-08-12 DOI: 10.1093/sysbio/syaf054

Sebastian Prillo, Akshay Ravoor, Nir Yosef, Yun S Song

Branch length estimation is a fundamental problem in Statistical Phylogenetics and a core component of tree reconstruction algorithms. Traditionally, general time-reversible mutation models are employed, and many software tools exist for this scenario. With the advent of CRISPR/Cas9 lineage tracing technologies, there has been significant interest in the study of branch length estimation under irreversible mutation models. Under the CRISPR/Cas9 mutation model, irreversible mutations – in the form of DNA insertions or deletions – are accrued during the experiment, which are then read out at the single-cell level to reconstruct the cell lineage tree. However, most of the analyses of CRISPR/Cas9 lineage tracing data have so far been limited to the reconstruction of single-cell tree topologies, which depict lineage relationships between cells, but not the amount of time that has passed between ancestral cell states and the present. Time-resolved trees, known as chronograms, would allow one to study the evolutionary dynamics of cell populations at an unprecedented level of resolution. Indeed, time-resolved trees would reveal the timing of events on the tree, the relative fitness of subclones, and the dynamics underlying phenotypic changes in the cell population – among other important applications. In this work, we introduce the first scalable and accurate method to refine any given single-cell tree topology into a single-cell chronogram by estimating its branch lengths. To do this, we perform regularized maximum likelihood estimation under a general irreversible mutation model, paired with a conservative version of maximum parsimony that reconstructs only the ancestral states that we are confident about. To deal with the particularities of CRISPR/Cas9 lineage tracing data – such as double-resection events which affect runs of consecutive sites – we avoid making our model more complex and instead opt for using a simple but effective data encoding scheme. Similarly, we avoid explicitly modeling the missing data mechanisms – such as heritable missing data – by instead assuming that they are missing completely at random. We stabilize estimates in low-information regimes by using a simple penalized version of maximum likelihood estimation (MLE) using a minimum branch length constraint and pseudocounts. All this leads to a convex MLE problem that can be readily solved in seconds with off-the-shelf convex optimization solvers. We benchmark our method using both simulations and real lineage tracing data, and show that it performs well on several tasks, matching or outperforming competing methods such as TiDeTree and LAML in terms of accuracy, while being 10 ∼ 100 × faster. Notably, our statistical model is simpler and more general, as we do not explicitly model the intricacies of CRISPR/Cas9 lineage tracing data. In this sense, our contribution is twofold: (1) a fast and robust method for branch length estimation under a general irreversible mutation model,

分支长度估计是统计系统发育学中的一个基本问题，也是树重建算法的核心组成部分。传统上，一般采用时间可逆的突变模型，并且存在许多用于此场景的软件工具。随着CRISPR/Cas9谱系追踪技术的出现，人们对不可逆突变模型下分支长度估计的研究产生了浓厚的兴趣。在CRISPR/Cas9突变模型下，不可逆转的突变——以DNA插入或缺失的形式——在实验过程中积累，然后在单细胞水平上读出这些突变，以重建细胞谱系树。然而，迄今为止，对CRISPR/Cas9谱系追踪数据的大多数分析都局限于单细胞树拓扑结构的重建，这些拓扑结构描述了细胞之间的谱系关系，而不是祖先细胞状态与当前状态之间经过的时间。时间分辨树，也就是时间表，将使人们能够以前所未有的分辨率研究细胞群体的进化动态。事实上，时间分辨树将揭示树中事件的时间，亚克隆的相对适应性，以及细胞群体中表型变化的动态-以及其他重要应用。在这项工作中，我们引入了第一个可扩展和精确的方法，通过估计其分支长度将任何给定的单细胞树拓扑细化为单细胞时序图。为此，我们在一般不可逆突变模型下执行正则化最大似然估计，并与仅重建我们确信的祖先状态的最大简约性的保守版本配对。为了处理CRISPR/Cas9谱系追踪数据的特殊性-例如影响连续位点运行的双切除事件-我们避免使我们的模型更复杂，而是选择使用简单但有效的数据编码方案。同样，我们避免显式地对缺失的数据机制（例如可继承的缺失数据）建模，而是假设它们完全是随机丢失的。我们通过使用最小分支长度约束和伪计数的最大似然估计（MLE）的简单惩罚版本来稳定低信息状态下的估计。所有这些都导致了一个凸MLE问题，这个问题可以用现成的凸优化求解器在几秒钟内轻松解决。我们使用模拟和真实谱系追踪数据对我们的方法进行了基准测试，并表明它在几个任务上表现良好，在准确性方面匹配或优于TiDeTree和LAML等竞争方法，同时速度快10 ~ 100倍。值得注意的是，我们的统计模型更简单，更通用，因为我们没有明确地模拟CRISPR/Cas9谱系追踪数据的复杂性。从这个意义上说，我们的贡献是双重的：(1)在一般不可逆突变模型下快速和鲁棒的分支长度估计方法，以及(2)特定于CRISPR/ cas9谱系追踪数据的数据编码方案，使其适用于一般模型。我们的分支长度估计方法，我们称之为“ConvexML”，应该广泛适用于任何具有不可逆突变（理想情况下，具有高多样性）和几乎可以忽略的缺失数据机制的进化模型。‘ ConvexML ’可以通过ConvexML开源Python包获得。

{"title":"ConvexML: Fast and accurate branch length estimation under irreversible mutation models, illustrated through applications to CRISPR/Cas9-based lineage tracing","authors":"Sebastian Prillo, Akshay Ravoor, Nir Yosef, Yun S Song","doi":"10.1093/sysbio/syaf054","DOIUrl":"https://doi.org/10.1093/sysbio/syaf054","url":null,"abstract":"Branch length estimation is a fundamental problem in Statistical Phylogenetics and a core component of tree reconstruction algorithms. Traditionally, general time-reversible mutation models are employed, and many software tools exist for this scenario. With the advent of CRISPR/Cas9 lineage tracing technologies, there has been significant interest in the study of branch length estimation under irreversible mutation models. Under the CRISPR/Cas9 mutation model, irreversible mutations – in the form of DNA insertions or deletions – are accrued during the experiment, which are then read out at the single-cell level to reconstruct the cell lineage tree. However, most of the analyses of CRISPR/Cas9 lineage tracing data have so far been limited to the reconstruction of single-cell tree topologies, which depict lineage relationships between cells, but not the amount of time that has passed between ancestral cell states and the present. Time-resolved trees, known as chronograms, would allow one to study the evolutionary dynamics of cell populations at an unprecedented level of resolution. Indeed, time-resolved trees would reveal the timing of events on the tree, the relative fitness of subclones, and the dynamics underlying phenotypic changes in the cell population – among other important applications. In this work, we introduce the first scalable and accurate method to refine any given single-cell tree topology into a single-cell chronogram by estimating its branch lengths. To do this, we perform regularized maximum likelihood estimation under a general irreversible mutation model, paired with a conservative version of maximum parsimony that reconstructs only the ancestral states that we are confident about. To deal with the particularities of CRISPR/Cas9 lineage tracing data – such as double-resection events which affect runs of consecutive sites – we avoid making our model more complex and instead opt for using a simple but effective data encoding scheme. Similarly, we avoid explicitly modeling the missing data mechanisms – such as heritable missing data – by instead assuming that they are missing completely at random. We stabilize estimates in low-information regimes by using a simple penalized version of maximum likelihood estimation (MLE) using a minimum branch length constraint and pseudocounts. All this leads to a convex MLE problem that can be readily solved in seconds with off-the-shelf convex optimization solvers. We benchmark our method using both simulations and real lineage tracing data, and show that it performs well on several tasks, matching or outperforming competing methods such as TiDeTree and LAML in terms of accuracy, while being 10 ∼ 100 × faster. Notably, our statistical model is simpler and more general, as we do not explicitly model the intricacies of CRISPR/Cas9 lineage tracing data. In this sense, our contribution is twofold: (1) a fast and robust method for branch length estimation under a general irreversible mutation model, ","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"1 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144825256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Selecting a Window Size for Phylogenomic Analyses of Whole Genome Alignments using AIC 利用AIC选择全基因组比对系统发育分析的窗口大小

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-08-12 DOI: 10.1093/sysbio/syaf053

Jeremias Ivan, Paul Frandsen, Robert Lanfear

Gene tree discordance along a set of aligned genomes presents a challenge for phylogenomic methods to identify the non-recombining regions and reconstruct the phylogenetic tree for each region. To address this problem, many studies used the non-overlapping window approach, often with an arbitrary selection of fixed window sizes that potentially include intra-window recombination events. In this study, we propose an information theoretic approach to select a window size that best reflects the underlying histories of the alignment. First, we simulated chromosome alignments that reflected the key characteristics of an empirical dataset and found that the AIC is a good predictor of window size accuracy in correctly recovering the tree topologies of the alignment. To address the issue of missing data in empirical datasets, we designed a stepwise non-overlapping window approach that compares the AIC of two window sizes at a time, retaining only genomic regions that can be analysed using both window sizes. We then applied this method to the genomes of Heliconius butterflies and great apes. We found that the best window sizes for the butterflies’ chromosomes ranged from <125bp to 250bp, which are much shorter than those used in a previous study even though this difference in window size did not significantly change the most common topologies across the genome. On the other hand, the best window sizes for great apes’ chromosomes ranged from 500bp to 1kb with the proportion of the major topology (grouping human and chimpanzee) falling between 60% and 87%, consistent with previous findings. Additionally, we observed a notable impact of gene tree estimation error and concatenation when using small and large windows, respectively. For instance, the proportion of the major topology for great apes was 50% when using 250bp windows, but reached almost 100% for 64kb windows. In conclusion, our study highlights the challenges associated with selecting a fixed window size in non-overlapping window analyses and proposes the AIC as a less arbitrary way to select the optimal window size when running non-overlapping method on whole genome alignments.

基因树的不一致性对系统基因组学方法识别非重组区域和重建每个区域的系统发育树提出了挑战。为了解决这个问题，许多研究使用了非重叠窗口方法，通常是任意选择固定窗口大小，可能包括窗口内重组事件。在这项研究中，我们提出了一种信息理论方法来选择最能反映对齐潜在历史的窗口大小。首先，我们模拟了反映经验数据集关键特征的染色体比对，并发现AIC在正确恢复染色体比对的树拓扑结构方面是一个很好的窗口大小精度预测器。为了解决经验数据集中缺失数据的问题，我们设计了一种逐步非重叠窗口方法，该方法一次比较两个窗口大小的AIC，只保留可以使用两个窗口大小进行分析的基因组区域。然后，我们将这种方法应用于蝴蝶和类人猿的基因组。我们发现蝴蝶染色体的最佳窗口大小在125bp到250bp之间，这比之前研究中使用的要短得多，尽管这种窗口大小的差异并没有显著改变基因组中最常见的拓扑结构。另一方面，类人猿染色体的最佳窗口大小在500bp到1kb之间，主要拓扑结构（人类和黑猩猩分组）的比例在60%到87%之间，与先前的发现一致。此外，我们观察到分别使用小窗口和大窗口时基因树估计误差和连接的显着影响。例如，当使用250bp的窗口时，类人猿的主要拓扑比例为50%，但对于64kb的窗口，这一比例几乎达到100%。总之，我们的研究强调了在非重叠窗口分析中选择固定窗口大小的挑战，并提出AIC是在全基因组比对中运行非重叠方法时选择最佳窗口大小的一种不那么任意的方法。

{"title":"Selecting a Window Size for Phylogenomic Analyses of Whole Genome Alignments using AIC","authors":"Jeremias Ivan, Paul Frandsen, Robert Lanfear","doi":"10.1093/sysbio/syaf053","DOIUrl":"https://doi.org/10.1093/sysbio/syaf053","url":null,"abstract":"Gene tree discordance along a set of aligned genomes presents a challenge for phylogenomic methods to identify the non-recombining regions and reconstruct the phylogenetic tree for each region. To address this problem, many studies used the non-overlapping window approach, often with an arbitrary selection of fixed window sizes that potentially include intra-window recombination events. In this study, we propose an information theoretic approach to select a window size that best reflects the underlying histories of the alignment. First, we simulated chromosome alignments that reflected the key characteristics of an empirical dataset and found that the AIC is a good predictor of window size accuracy in correctly recovering the tree topologies of the alignment. To address the issue of missing data in empirical datasets, we designed a stepwise non-overlapping window approach that compares the AIC of two window sizes at a time, retaining only genomic regions that can be analysed using both window sizes. We then applied this method to the genomes of Heliconius butterflies and great apes. We found that the best window sizes for the butterflies’ chromosomes ranged from &lt;125bp to 250bp, which are much shorter than those used in a previous study even though this difference in window size did not significantly change the most common topologies across the genome. On the other hand, the best window sizes for great apes’ chromosomes ranged from 500bp to 1kb with the proportion of the major topology (grouping human and chimpanzee) falling between 60% and 87%, consistent with previous findings. Additionally, we observed a notable impact of gene tree estimation error and concatenation when using small and large windows, respectively. For instance, the proportion of the major topology for great apes was 50% when using 250bp windows, but reached almost 100% for 64kb windows. In conclusion, our study highlights the challenges associated with selecting a fixed window size in non-overlapping window analyses and proposes the AIC as a less arbitrary way to select the optimal window size when running non-overlapping method on whole genome alignments.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"69 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144825108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

But the Clock, Tick-Tock: An Empirical Case Study Highlights the Preeminence of Relaxed Clock Models in Total-Evidence Dating 但是，时钟滴答作响：一个实证案例研究强调了放松时钟模型在全证据年代测定中的卓越地位

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-08-08 DOI: 10.1093/sysbio/syaf055

Nicolás Mongiardino Koch, Jeffrey R Thompson, Rich Mooi, Greg W Rouse

Phylogenetic clock models translate inferred amounts of evolutionary change (calculated from either genotypes or phenotypes) into estimates of elapsed time, providing a mechanism for time scaling phylogenetic trees. Relaxed-clock models, which accommodate variation in evolutionary rates across branches, are one of the main components of Bayesian dating, yet their consequences for total-evidence phylogenetics have not been thoroughly explored. Here, we combine morphological, molecular (both transcriptomic and Sanger-sequenced), and stratigraphic datasets for all major lineages of echinoids (sea urchins, heart urchins, sand dollars). We then perform total-evidence dated inference under the fossilized birth-death prior, varying two analytical conditions: the choice between autocorrelated and uncorrelated relaxed clocks, which enforce (or not) evolutionary rate inheritance; and the ability to recover fossil terminals as direct ancestors. Our results highlight a previously unnoticed interaction between tree and clock models, with analyses implementing an autocorrelated clock failing to recover any direct ancestors. Nonetheless, even under conditions conducive to the placement of fossil terminals as ancestors, we find this type of relationship to be accommodated without any impact on either topology or node ages. On the other hand, tree topology, fossil placement, divergence times, and downstream macroevolutionary inferences (e.g., ancestral state reconstructions) were all strongly affected by the type of relaxed clock implemented. In regions of the tree where molecular rate variation is pervasive and morphological signal relatively uninformative, fossil tips seem to play little to no role in informing divergence times, and instead passively move in and out of clades depending on the ages imposed upon surrounding nodes by molecular data. Our results highlight the extent to which the phylogenetic and macroevolutionary conclusions of total-evidence dated analyses are contingent on the choice of relaxed-clock model, highlighting the need for either careful methodological validation or a thorough assessment of sensitivity. Our efforts continue to illuminate the echinoid tree of life, supporting the erection of the order Apatopygoida to include three living species last sharing a common ancestor with other extant lineages around the time of the Jurassic-Cretaceous boundary. Furthermore, they also illustrate how the phylogenetic placement of extinct clades hinges upon the modelling of molecular data, evidencing the extent to which the fossil record remains subservient to phylogenomics.

系统发育时钟模型将推断的进化变化量（从基因型或表型计算）转化为经过时间的估计，提供了一种时间尺度系统发育树的机制。松弛时钟模型是贝叶斯测年法的主要组成部分之一，它能适应不同分支间进化速率的变化，但其对全证据系统发育的影响尚未得到彻底探索。在这里，我们结合形态学、分子（转录组学和桑格测序）和地层学数据集，研究了所有主要的棘皮类动物谱系（海胆、心海胆、沙美元）。然后，我们在化石出生-死亡先验下进行了全证据日期推断，改变了两个分析条件：自相关和不相关放松时钟之间的选择，这强制（或不强制）进化速率遗传；以及恢复化石终端作为直系祖先的能力。我们的结果突出了以前未被注意到的树和时钟模型之间的相互作用，实现自相关时钟的分析无法恢复任何直接祖先。然而，即使在有利于化石终端作为祖先放置的条件下，我们发现这种类型的关系可以被容纳，而不会对拓扑结构或节点年龄产生任何影响。另一方面，树木拓扑结构、化石位置、分化时间和下游宏观进化推断（例如，祖先状态重建）都受到所实现的放松时钟类型的强烈影响。在分子速率变化普遍存在且形态信号相对缺乏信息的区域，化石尖端似乎在告知分化时间方面几乎没有作用，而是被动地根据分子数据施加给周围节点的年龄在进化枝上进进出出。我们的研究结果强调了总证据日期分析的系统发育和宏观进化结论在多大程度上取决于松弛时钟模型的选择，强调了需要仔细的方法验证或彻底的敏感性评估。我们的工作将继续阐明棘刺类动物的生命之树，支持Apatopygoida目的建立，包括三个现存物种，最后与其他现存的谱系在侏罗纪-白垩纪边界时期共享一个共同的祖先。此外，它们还说明了灭绝枝的系统发育位置如何依赖于分子数据的建模，证明了化石记录在多大程度上服从于系统基因组学。

{"title":"But the Clock, Tick-Tock: An Empirical Case Study Highlights the Preeminence of Relaxed Clock Models in Total-Evidence Dating","authors":"Nicolás Mongiardino Koch, Jeffrey R Thompson, Rich Mooi, Greg W Rouse","doi":"10.1093/sysbio/syaf055","DOIUrl":"https://doi.org/10.1093/sysbio/syaf055","url":null,"abstract":"Phylogenetic clock models translate inferred amounts of evolutionary change (calculated from either genotypes or phenotypes) into estimates of elapsed time, providing a mechanism for time scaling phylogenetic trees. Relaxed-clock models, which accommodate variation in evolutionary rates across branches, are one of the main components of Bayesian dating, yet their consequences for total-evidence phylogenetics have not been thoroughly explored. Here, we combine morphological, molecular (both transcriptomic and Sanger-sequenced), and stratigraphic datasets for all major lineages of echinoids (sea urchins, heart urchins, sand dollars). We then perform total-evidence dated inference under the fossilized birth-death prior, varying two analytical conditions: the choice between autocorrelated and uncorrelated relaxed clocks, which enforce (or not) evolutionary rate inheritance; and the ability to recover fossil terminals as direct ancestors. Our results highlight a previously unnoticed interaction between tree and clock models, with analyses implementing an autocorrelated clock failing to recover any direct ancestors. Nonetheless, even under conditions conducive to the placement of fossil terminals as ancestors, we find this type of relationship to be accommodated without any impact on either topology or node ages. On the other hand, tree topology, fossil placement, divergence times, and downstream macroevolutionary inferences (e.g., ancestral state reconstructions) were all strongly affected by the type of relaxed clock implemented. In regions of the tree where molecular rate variation is pervasive and morphological signal relatively uninformative, fossil tips seem to play little to no role in informing divergence times, and instead passively move in and out of clades depending on the ages imposed upon surrounding nodes by molecular data. Our results highlight the extent to which the phylogenetic and macroevolutionary conclusions of total-evidence dated analyses are contingent on the choice of relaxed-clock model, highlighting the need for either careful methodological validation or a thorough assessment of sensitivity. Our efforts continue to illuminate the echinoid tree of life, supporting the erection of the order Apatopygoida to include three living species last sharing a common ancestor with other extant lineages around the time of the Jurassic-Cretaceous boundary. Furthermore, they also illustrate how the phylogenetic placement of extinct clades hinges upon the modelling of molecular data, evidencing the extent to which the fossil record remains subservient to phylogenomics.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"12 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144825112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Species Diversification in the Sky Islands of Southwestern China Revealed by Genomic, Introgression and Demographic Analyses of Asian Shrew Moles. 亚洲鼩鼱基因组、基因渗入和人口统计学分析揭示中国西南天空群岛物种多样性。

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-07-31 DOI: 10.1093/sysbio/syaf052

Yi-Xian Li,Zhong-Zheng Chen,Quan Li,Tao Zhang,Feng Cheng,Wen-Yu Song,Xue-You Li,Shui-Wang He,Hong-Jiao Wang,Kenneth Otieno Onditi,Xue-Long Jiang

The Mountains of Southwest China, a global biodiversity hotspot, have a unique "sky island" landscape with high diversity of both ancient and recent-formed species. While their distribution patterns offer significant insights into diversification processes, the complex geological and climatic history, combined with dynamic histories of gene flow in endemic taxa, make unravelling this history challenging. This study focuses on Asian shrew moles (genus Uropsilus), an ancient group endemic to this region with an unresolved taxonomic system. By combining phylogenomic, introgression and demographic history analyses, we investigated the historical patterns of species diversification in this genus. We detected phylogenetic discordances among rapidly diverged lineages, driven by incomplete lineage sorting, both recent and ancient gene flow, and ghost introgression. The gene flow patterns revealed strong genetic isolation in the Hengduan Mountains region, contrasted by more extensive dispersal or connectivity in areas to its east, while suggesting potential ring-like diversification around the Sichuan Basin. Demographic history indicated that rapidly diverged lineages south of the Yangtze River exhibited significantly different responses to climatic fluctuations compared to other lineages, with the East Asian monsoon likely driving their radiative differentiation and dispersal. Our study demonstrates the impacts of mountain uplift, climatic changes, and the connectivity of sky island refugia in shaping the diverse patterns of species differentiation and their distribution. [phylogenomics; introgression; Asian shrew moles; demographic history].

中国西南山区是全球生物多样性热点地区，拥有独特的“天空岛”景观，古代和现代物种多样性都很高。虽然它们的分布模式为多样化过程提供了重要的见解，但复杂的地质和气候历史，加上地方性分类群中基因流动的动态历史，使得解开这一历史具有挑战性。本研究的重点是亚洲鼩鼱鼹鼠（Uropsilus属），这是该地区特有的一个古老类群，分类系统尚未确定。通过系统基因组学、基因渗入和人口统计学分析，研究了该属植物物种多样化的历史模式。我们发现在快速分化的谱系中存在系统发育不一致，这是由不完整的谱系分类、现代和古代基因流动以及幽灵渗入所驱动的。基因流动模式显示横断山脉地区具有较强的遗传隔离性，而横断山脉以东地区则具有较广泛的分散或连通性，表明四川盆地周围存在潜在的环状多样化。人口统计历史表明，长江以南迅速分化的谱系对气候波动的响应明显不同于其他谱系，东亚季风可能推动了它们的辐射分化和扩散。研究结果表明，高山隆升、气候变化和天岛避难所的连通性对物种分化和分布格局的影响。[phylogenomics;渐渗现象;亚洲鼩鼱；人口历史)。

{"title":"Species Diversification in the Sky Islands of Southwestern China Revealed by Genomic, Introgression and Demographic Analyses of Asian Shrew Moles.","authors":"Yi-Xian Li,Zhong-Zheng Chen,Quan Li,Tao Zhang,Feng Cheng,Wen-Yu Song,Xue-You Li,Shui-Wang He,Hong-Jiao Wang,Kenneth Otieno Onditi,Xue-Long Jiang","doi":"10.1093/sysbio/syaf052","DOIUrl":"https://doi.org/10.1093/sysbio/syaf052","url":null,"abstract":"The Mountains of Southwest China, a global biodiversity hotspot, have a unique \"sky island\" landscape with high diversity of both ancient and recent-formed species. While their distribution patterns offer significant insights into diversification processes, the complex geological and climatic history, combined with dynamic histories of gene flow in endemic taxa, make unravelling this history challenging. This study focuses on Asian shrew moles (genus Uropsilus), an ancient group endemic to this region with an unresolved taxonomic system. By combining phylogenomic, introgression and demographic history analyses, we investigated the historical patterns of species diversification in this genus. We detected phylogenetic discordances among rapidly diverged lineages, driven by incomplete lineage sorting, both recent and ancient gene flow, and ghost introgression. The gene flow patterns revealed strong genetic isolation in the Hengduan Mountains region, contrasted by more extensive dispersal or connectivity in areas to its east, while suggesting potential ring-like diversification around the Sichuan Basin. Demographic history indicated that rapidly diverged lineages south of the Yangtze River exhibited significantly different responses to climatic fluctuations compared to other lineages, with the East Asian monsoon likely driving their radiative differentiation and dispersal. Our study demonstrates the impacts of mountain uplift, climatic changes, and the connectivity of sky island refugia in shaping the diverse patterns of species differentiation and their distribution. [phylogenomics; introgression; Asian shrew moles; demographic history].","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"96 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144748115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Global climate cooling spurred skipper butterfly diversification 全球气候变冷促使跳蝶多样化

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-07-28 DOI: 10.1093/sysbio/syaf029

Emmanuel F A Toussaint, Fabien L Condamine, Ana Paula dos Santos De Carvalho, David M Plotkin, Emily A Ellis, Kelly M Dexter, Chandra Earl, Kwaku Aduse-Poku, Michael F Braby, Hideyuki Chiba, Riley J Gott, Kiyoshi Maruyama, Ana BB Morais, Chris J Müller, Djunijanti Peggie, Szabolcs Sáfián, Roger Vila, Andrew D Warren, Masaya Yago, Jesse W Breinholt, Marianne Espeland, Naomi E Pierce, David J Lohman, Akito Y Kawahara

Characterizing drivers governing the diversification of species-rich lineages is challenging. Although butterflies are one of the most well-studied groups of insects, there are few comprehensive studies investigating their diversification dynamics. Here, we reconstruct a phylogenomic tree for ca. 1,500 species in the family Hesperiidae, the skippers, to test whether historical global climate change, geographical range evolution, and host-plant association are drivers of diversification. Our findings suggest skippers originated in Laurasia before the Cretaceous-Paleogene mass extinction, in a northern region centered on Beringia before colonizing southern regions coinciding with global climate cooling. Climate cooling also fostered the diversification of skippers throughout the Cenozoic possibly by fueling biome transitions from closed to open ecosystems such as grasslands. An early shift from dicot-feeding to monocot-feeding reduced extinction rates and increased speciation rates, explaining the large diversity of grass-feeding adapted skippers. A dynamic geographic range evolution and host-plant shifts linked with long-term climate change explain skipper butterfly diversification.

描述控制物种丰富谱系多样化的驱动因素具有挑战性。虽然蝴蝶是被研究得最充分的昆虫群体之一，但很少有全面的研究调查它们的多样化动态。在这里，我们重建了大约1500个跳蛛科物种的系统基因组树，以测试历史全球气候变化、地理范围进化和寄主-植物关联是否是多样性的驱动因素。我们的研究结果表明，在白垩纪-古近纪大灭绝之前，在以白令陆桥为中心的北部地区，跳船起源于劳亚，然后在全球气候变冷的同时向南部地区殖民。在整个新生代，气候变冷也促进了跳船的多样化，可能是通过推动生物群落从封闭的生态系统向开放的生态系统（如草原）转变。早期从双食到单食的转变降低了灭绝率，增加了物种形成率，解释了食草适应跳船的巨大多样性。与长期气候变化相关的动态地理范围演变和寄主植物转移解释了跳蝶的多样化。

{"title":"Global climate cooling spurred skipper butterfly diversification","authors":"Emmanuel F A Toussaint, Fabien L Condamine, Ana Paula dos Santos De Carvalho, David M Plotkin, Emily A Ellis, Kelly M Dexter, Chandra Earl, Kwaku Aduse-Poku, Michael F Braby, Hideyuki Chiba, Riley J Gott, Kiyoshi Maruyama, Ana BB Morais, Chris J Müller, Djunijanti Peggie, Szabolcs Sáfián, Roger Vila, Andrew D Warren, Masaya Yago, Jesse W Breinholt, Marianne Espeland, Naomi E Pierce, David J Lohman, Akito Y Kawahara","doi":"10.1093/sysbio/syaf029","DOIUrl":"https://doi.org/10.1093/sysbio/syaf029","url":null,"abstract":"Characterizing drivers governing the diversification of species-rich lineages is challenging. Although butterflies are one of the most well-studied groups of insects, there are few comprehensive studies investigating their diversification dynamics. Here, we reconstruct a phylogenomic tree for ca. 1,500 species in the family Hesperiidae, the skippers, to test whether historical global climate change, geographical range evolution, and host-plant association are drivers of diversification. Our findings suggest skippers originated in Laurasia before the Cretaceous-Paleogene mass extinction, in a northern region centered on Beringia before colonizing southern regions coinciding with global climate cooling. Climate cooling also fostered the diversification of skippers throughout the Cenozoic possibly by fueling biome transitions from closed to open ecosystems such as grasslands. An early shift from dicot-feeding to monocot-feeding reduced extinction rates and increased speciation rates, explaining the large diversity of grass-feeding adapted skippers. A dynamic geographic range evolution and host-plant shifts linked with long-term climate change explain skipper butterfly diversification.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"86 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144715356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Phylogenetic Analysis of Characters with Dependencies under Maximum Likelihood 极大似然下具有依赖性性状的系统发育分析

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-07-26 DOI: 10.1093/sysbio/syaf051

Pablo A Goloboff

The dependencies between characters used in phylogenetic analysis (e.g., inapplicabilities, functional dependencies) can be taken into account by using combinations of character states as possible ancestral morphotypes, and using appropriate rates of transformation between such morphotypes. As every morphotype represents a permissible combination of the original character states, this allows easily ruling out specific combinations of character states, and taking into account changes that are either less or more likely to co-occur, or to occur in certain contexts. For inapplicable characters, Goloboff et al. (2021) used morphotypes but proposed obtaining transition probabilities between morphotypes from products of transition probabilities of the original characters and factors to incorporate dependencies. The product of transition probabilities is shown here to be flawed (failing the time-continuity requirement of phylogenetic Markov models, essential for statistical consistency under the model). Tarasov (2023) used the same delimitation of morphotypes but proposed obtaining transition probabilities from rate matrices, synthesized in a stepwise fashion from the hierarchy of dependencies. This paper shows that the rate matrices can easily be created, instead of with a stepwise synthesis, from direct comparisons between legitimate morphotypes (as done by Goloboff and De Laet 2023 for parsimony). Based on a few simple rules, the resulting rate matrices are (for inapplicable characters) identical to those obtained by Tarasov (2023). Additionally, in the computer program TNT, biological dependencies beyond mere inapplicability can be specified by the user with a simple syntax for (combinations of) states in “parent” characters restricting the states that “child” characters can take, using AND and OR conjunctions for elaborate interactions. These researcher-defined rules are used to internally convert the original characters into morphotypes, discarding morphotypes made impossible by the rules. In the case of biological dependencies (where, depending on the parent characters, there can be restrictions in the states that dependent characters can take, instead of the character being inapplicable), the rates of transition between morphotypes cannot be calculated solely from comparisons of states differing in both morphotypes –consideration of the conditions of dependency is needed as well.

系统发育分析中使用的性状之间的依赖性（例如，不适用性，功能依赖性）可以通过使用性状状态组合作为可能的祖先形态，并在这些形态之间使用适当的转换速率来考虑。由于每种形态都代表了原始角色状态的一种可允许的组合，这就可以很容易地排除角色状态的特定组合，并考虑到更少或更有可能同时发生的变化，或者在特定环境中发生的变化。对于不适用的字符，Goloboff等人（2021）使用形态型，但提出从原始字符的转移概率和因素的乘积中获得形态型之间的转移概率，以纳入依赖关系。转移概率的乘积在这里是有缺陷的（不符合系统发育马尔可夫模型的时间连续性要求，这对模型下的统计一致性至关重要）。Tarasov（2023）使用了相同的形态划分，但提出了从速率矩阵中获得转移概率的建议，并从依赖关系的层次结构中逐步合成。本文表明，速率矩阵可以很容易地创建，而不是通过逐步合成，从合法形态之间的直接比较（如Goloboff和De Laet 2023所做的那样）。基于一些简单的规则，得到的速率矩阵（对于不适用的字符）与Tarasov（2023）得到的相同。此外，在计算机程序TNT中，用户可以使用“父”字符状态的简单语法（组合）来指定生物依赖性，限制“子”字符可以采取的状态，使用AND和OR连词进行复杂的交互。这些研究人员定义的规则用于在内部将原始字符转换为形态，丢弃因规则而无法实现的形态。在生物依赖的情况下（根据亲本性状，依赖性状可以采取的状态可能有限制，而不是性状不适用），形态之间的转换速率不能仅仅通过比较两种形态不同的状态来计算——也需要考虑依赖条件。

{"title":"Phylogenetic Analysis of Characters with Dependencies under Maximum Likelihood","authors":"Pablo A Goloboff","doi":"10.1093/sysbio/syaf051","DOIUrl":"https://doi.org/10.1093/sysbio/syaf051","url":null,"abstract":"The dependencies between characters used in phylogenetic analysis (e.g., inapplicabilities, functional dependencies) can be taken into account by using combinations of character states as possible ancestral morphotypes, and using appropriate rates of transformation between such morphotypes. As every morphotype represents a permissible combination of the original character states, this allows easily ruling out specific combinations of character states, and taking into account changes that are either less or more likely to co-occur, or to occur in certain contexts. For inapplicable characters, Goloboff et al. (2021) used morphotypes but proposed obtaining transition probabilities between morphotypes from products of transition probabilities of the original characters and factors to incorporate dependencies. The product of transition probabilities is shown here to be flawed (failing the time-continuity requirement of phylogenetic Markov models, essential for statistical consistency under the model). Tarasov (2023) used the same delimitation of morphotypes but proposed obtaining transition probabilities from rate matrices, synthesized in a stepwise fashion from the hierarchy of dependencies. This paper shows that the rate matrices can easily be created, instead of with a stepwise synthesis, from direct comparisons between legitimate morphotypes (as done by Goloboff and De Laet 2023 for parsimony). Based on a few simple rules, the resulting rate matrices are (for inapplicable characters) identical to those obtained by Tarasov (2023). Additionally, in the computer program TNT, biological dependencies beyond mere inapplicability can be specified by the user with a simple syntax for (combinations of) states in “parent” characters restricting the states that “child” characters can take, using AND and OR conjunctions for elaborate interactions. These researcher-defined rules are used to internally convert the original characters into morphotypes, discarding morphotypes made impossible by the rules. In the case of biological dependencies (where, depending on the parent characters, there can be restrictions in the states that dependent characters can take, instead of the character being inapplicable), the rates of transition between morphotypes cannot be calculated solely from comparisons of states differing in both morphotypes –consideration of the conditions of dependency is needed as well.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"118 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144710791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction to: How Important Is Budding Speciation for Comparative Studies? 修正：萌芽物种形成对比较研究有多重要？

IF 6.5 1区生物学 Q1 EVOLUTIONARY BIOLOGY

Systematic Biology

Pub Date : 2025-07-23 DOI: 10.1093/sysbio/syaf042

引用次数: 0