首页 > 最新文献

Systematic Biology最新文献

英文 中文
Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae) 尽管存在基因树估计误差,但分化时间估计的稳健性:萤火虫(鞘翅目:灯蛾科)案例研究
IF 6.5 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-13 DOI: 10.1093/sysbio/syae065
Sebastian Höhna, Sarah E Lower, Pablo Duchen, Ana Catalán
Genomic data has become ubiquitous in phylogenomic studies, including divergence time estimation, but provide new challenges. These challenges include, amongst others, biological gene tree discordance, methodological gene tree estimation error, and computational limitations on performing full Bayesian inference under complex models. In this study, we use a recently published firefly (Coleoptera: Lampyridae) anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) as a case study to explore gene tree estimation error and the robustness of divergence time estimation. First, we explored the amount of model violation using posterior predictive simulations because model violations are likely to bias phylogenetic inferences and produce gene tree estimation error. We specifically focused on missing data (either uniformly distributed or systematically) and the distribution of highly variable and conserved sites (either uniformly distributed or clustered). Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci. We tested if the model violations and alignment errors resulted indeed in gene tree estimation error by comparing the observed gene tree discordance to simulated gene tree discordance under the multispecies coalescent model. Thus, we show that the inferred gene tree discordance is not only due to biological mechanism but primarily due to inference errors. Lastly, we explored if divergence time estimation is robust despite the observed gene tree estimation error. We selected four subsets of the full AHE dataset, concatenated each subset and performed a Bayesian relaxed clock divergence estimation in RevBayes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust using any well selected data subset as long as the topology inference is robust.
基因组数据在系统发生学研究(包括分化时间估计)中已变得无处不在,但也带来了新的挑战。这些挑战包括生物基因树不一致、方法学基因树估计误差以及在复杂模型下进行完全贝叶斯推断的计算限制等。在本研究中,我们以最近发表的萤火虫(鞘翅目:灯蛾科)锚定杂交富集数据集(AHE;88个灯蛾科物种和10个外群物种的436个位点)为案例,探讨了基因树估计误差和分歧时间估计的稳健性。首先,我们利用后验预测模拟探索了模型违反的程度,因为模型违反很可能会使系统发育推断产生偏差并产生基因树估计误差。我们特别关注了缺失数据(均匀分布或系统分布)以及高变异和保守位点的分布(均匀分布或聚类分布)。我们对模型适当性的评估表明,标准的系统发生替换模型对 436 个 AHE 位点中的任何一个都不适当。我们通过比较观察到的基因树不一致性和多物种聚合模型下模拟的基因树不一致性,检验了违反模型和比对错误是否确实导致了基因树估计错误。因此,我们表明推断出的基因树不一致不仅是生物机制造成的,而且主要是推断错误造成的。最后,我们探讨了尽管观察到了基因树估计误差,但分歧时间估计是否稳健。我们从完整的 AHE 数据集中选择了四个子集,将每个子集连接起来,并在 RevBayes 中进行了贝叶斯松弛时钟发散估计。对于拓扑之间共享的所有节点,估计的发散时间都是重叠的。因此,只要拓扑推断是稳健的,那么使用任何精心挑选的数据子集进行发散时间估计都是稳健的。
{"title":"Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae)","authors":"Sebastian Höhna, Sarah E Lower, Pablo Duchen, Ana Catalán","doi":"10.1093/sysbio/syae065","DOIUrl":"https://doi.org/10.1093/sysbio/syae065","url":null,"abstract":"Genomic data has become ubiquitous in phylogenomic studies, including divergence time estimation, but provide new challenges. These challenges include, amongst others, biological gene tree discordance, methodological gene tree estimation error, and computational limitations on performing full Bayesian inference under complex models. In this study, we use a recently published firefly (Coleoptera: Lampyridae) anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) as a case study to explore gene tree estimation error and the robustness of divergence time estimation. First, we explored the amount of model violation using posterior predictive simulations because model violations are likely to bias phylogenetic inferences and produce gene tree estimation error. We specifically focused on missing data (either uniformly distributed or systematically) and the distribution of highly variable and conserved sites (either uniformly distributed or clustered). Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci. We tested if the model violations and alignment errors resulted indeed in gene tree estimation error by comparing the observed gene tree discordance to simulated gene tree discordance under the multispecies coalescent model. Thus, we show that the inferred gene tree discordance is not only due to biological mechanism but primarily due to inference errors. Lastly, we explored if divergence time estimation is robust despite the observed gene tree estimation error. We selected four subsets of the full AHE dataset, concatenated each subset and performed a Bayesian relaxed clock divergence estimation in RevBayes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust using any well selected data subset as long as the topology inference is robust.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"20 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to validate a Bayesian evolutionary model. 如何验证贝叶斯进化模型。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-07 DOI: 10.1093/sysbio/syae064
Fábio K Mendes, Remco Bouckaert, Luiz M Carvalho, Alexei J Drummond

Biology has become a highly mathematical discipline in which probabilistic models play a central role. As a result, research in the biological sciences is now dependent on computational tools capable of carrying out complex analyses. These tools must be validated before they can be used, but what is understood as validation varies widely among methodological contributions. This may be a consequence of the still embryonic stage of the literature on statistical software validation for computational biology. Our manuscript aims to advance this literature. Here, we describe, illustrate and introduce new good practices for assessing the correctness of a model implementation, with an emphasis on Bayesian methods. We also introduce a suite of functionalities for automating validation protocols. It is our hope that the guidelines presented here help sharpen the focus of discussions on (as well as elevate) expected standards of statistical software for biology.

生物学已成为一门高度数学化的学科,其中概率模型发挥着核心作用。因此,生物科学研究现在依赖于能够进行复杂分析的计算工具。这些工具在使用之前必须经过验证,但对验证的理解却因方法论的不同而大相径庭。这可能是计算生物学统计软件验证文献仍处于萌芽阶段的结果。我们的手稿旨在推动这一文献的发展。在这里,我们描述、说明并介绍了评估模型实现正确性的新的良好实践,重点是贝叶斯方法。我们还介绍了一套用于自动验证协议的功能。我们希望这里介绍的指导原则有助于使生物学统计软件预期标准的讨论重点更加突出(以及提高)。
{"title":"How to validate a Bayesian evolutionary model.","authors":"Fábio K Mendes, Remco Bouckaert, Luiz M Carvalho, Alexei J Drummond","doi":"10.1093/sysbio/syae064","DOIUrl":"10.1093/sysbio/syae064","url":null,"abstract":"<p><p>Biology has become a highly mathematical discipline in which probabilistic models play a central role. As a result, research in the biological sciences is now dependent on computational tools capable of carrying out complex analyses. These tools must be validated before they can be used, but what is understood as validation varies widely among methodological contributions. This may be a consequence of the still embryonic stage of the literature on statistical software validation for computational biology. Our manuscript aims to advance this literature. Here, we describe, illustrate and introduce new good practices for assessing the correctness of a model implementation, with an emphasis on Bayesian methods. We also introduce a suite of functionalities for automating validation protocols. It is our hope that the guidelines presented here help sharpen the focus of discussions on (as well as elevate) expected standards of statistical software for biology.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolution of Large Eyes in Stromboidea (Gastropoda): Impact of Photic Environment and Life History Traits. 石龙子目(腹足纲)大眼的进化:光环境和生活史特征的影响。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-05 DOI: 10.1093/sysbio/syae063
Alison R Irwin, Nicholas W Roberts, Ellen E Strong, Yasunori Kano, Daniel I Speiser, Elizabeth M Harper, Suzanne T Williams

Eyes within the marine gastropod superfamily Stromboidea range widely in size, from 0.2 to 2.3 mm - the largest eyes known in any gastropod. Despite this interesting variation, the underlying evolutionary pressures remain unknown. Here, we use the wealth of material available in museum collections to explore the evolution of stromboid eye size and structure. Our results suggest that depth is a key light-limiting factor in stromboid eye evolution; here, increasing water depth is correlated with increasing aperture width relative to lens diameter, and therefore an increasing investment in sensitivity in dim light environments. In the major clade containing all large-eyed stromboid families, species observed active during the day and the night had wider eye apertures relative to lens sizes than species observed active during the day only, thereby prioritising sensitivity over resolution. Species with no consistent diel activity pattern also had smaller body sizes than exclusively day-active species, which may suggest that smaller animals are more vulnerable to shell-crushing predators, and avoid the higher predation pressure experienced by animals active during the day. Within the same major clade, ancestral state reconstruction suggests that absolute eye size increased above 1 mm twice. The unresolved position of Varicospira, however, weakens this hypothesis and further work with additional markers is needed to confirm this result.

海洋腹足纲虾形目超科的眼睛大小不一,从 0.2 毫米到 2.3 毫米不等,这是已知腹足纲动物中最大的眼睛。尽管存在这种有趣的差异,但其背后的进化压力仍然未知。在这里,我们利用博物馆收藏的大量材料来探索石龙子眼大小和结构的进化。我们的研究结果表明,深度是节肢动物眼睛进化过程中一个关键的光限制因素;在这里,水深的增加与相对于晶状体直径的孔径宽度的增加相关,因此在暗光环境中的灵敏度也在增加。在包含所有大眼石龙子科的主要支系中,昼夜都有活动的物种的眼孔相对于晶状体的大小要比只在白天活动的物种更宽,因此灵敏度要优先于分辨率。与只在白天活动的物种相比,没有一致的昼夜活动模式的物种的体型也较小,这可能表明较小的动物更容易受到碎壳捕食者的攻击,从而避免了在白天活动的动物所经历的较高的捕食压力。在同一主要支系中,祖先状态重建表明眼睛的绝对大小曾两次超过 1 毫米。然而,水蛭的位置尚未确定,这削弱了这一假设,因此需要使用更多的标记物来进一步证实这一结果。
{"title":"Evolution of Large Eyes in Stromboidea (Gastropoda): Impact of Photic Environment and Life History Traits.","authors":"Alison R Irwin, Nicholas W Roberts, Ellen E Strong, Yasunori Kano, Daniel I Speiser, Elizabeth M Harper, Suzanne T Williams","doi":"10.1093/sysbio/syae063","DOIUrl":"https://doi.org/10.1093/sysbio/syae063","url":null,"abstract":"<p><p>Eyes within the marine gastropod superfamily Stromboidea range widely in size, from 0.2 to 2.3 mm - the largest eyes known in any gastropod. Despite this interesting variation, the underlying evolutionary pressures remain unknown. Here, we use the wealth of material available in museum collections to explore the evolution of stromboid eye size and structure. Our results suggest that depth is a key light-limiting factor in stromboid eye evolution; here, increasing water depth is correlated with increasing aperture width relative to lens diameter, and therefore an increasing investment in sensitivity in dim light environments. In the major clade containing all large-eyed stromboid families, species observed active during the day and the night had wider eye apertures relative to lens sizes than species observed active during the day only, thereby prioritising sensitivity over resolution. Species with no consistent diel activity pattern also had smaller body sizes than exclusively day-active species, which may suggest that smaller animals are more vulnerable to shell-crushing predators, and avoid the higher predation pressure experienced by animals active during the day. Within the same major clade, ancestral state reconstruction suggests that absolute eye size increased above 1 mm twice. The unresolved position of Varicospira, however, weakens this hypothesis and further work with additional markers is needed to confirm this result.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142584383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rapid Evolution of Host Repertoire and Geographic Range in a Young and Diverse Genus of Montane Butterflies. 一个年轻而多样化的山地蝴蝶属的寄主种类和地理分布的快速进化。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-11-01 DOI: 10.1093/sysbio/syae061
Shifang Mo, Yaowei Zhu, Mariana P Braga, David J Lohman, Sören Nylin, Ashraf Moumou, Christopher W Wheat, Niklas Wahlberg, Min Wang, Fangzhou Ma, Peng Zhang, Houshuai Wang

Evolutionary changes in geographic distribution and larval host plants may promote the rapid diversification of montane insects, but this scenario has been rarely investigated. We studied rapid radiation of the butterfly genus Colias, which has diversified in mountain ecosystems in Eurasia, Africa, and the Americas. Based on a dataset of 150 nuclear protein-coding genetic loci and mitochondrial genomes, we constructed a time-calibrated phylogenetic tree of Colias species with broad taxon sampling. We then inferred their ancestral geographic ranges, historical diversification rates, and the evolution of host use. We found that the most recent common ancestor of Colias was likely geographically widespread and originated ~3.5 Ma. The group subsequently diversified in different regions across the world, often in tandem with geographic expansion events. No aspect of elevation was found to have a direct effect on diversification. The genus underwent a burst of diversification soon after the divergence of the Neotropical lineage, followed by an exponential decline in diversification rate toward the present. The ancestral host repertoire included the legume genera Astragalus and Trifolium but later expanded to include a wide range of Fabaceae genera and plants in more distantly related families, punctuated with periods of host range expansion and contraction. We suggest that the widespread distribution of the ancestor of all extant Colias lineages set the stage for diversification by isolation of populations that locally adapted to the various different environments they encountered, including different host plants. In this scenario, elevation is not the main driver but might have accelerated diversification by isolating populations.

地理分布和幼虫寄主植物的进化变化可能会促进山地昆虫的快速多样化,但这种情况很少被研究。我们研究了在欧亚大陆、非洲和美洲山区生态系统中实现多样化的蝶属 Colias 的快速辐射。基于 150 个核蛋白编码基因位点和线粒体基因组的数据集,我们构建了一个具有广泛类群取样的时间校准的 Colias 物种系统发生树。然后,我们推断了它们的祖先地理分布、历史分化率和宿主利用的演化。我们发现,Colias最近的共同祖先可能地理分布广泛,起源于约 3.5 Ma。该类群随后在全球不同地区进行了分化,通常与地理扩张事件同步进行。没有发现海拔高度对其多样化有直接影响。该属在新热带系分化后不久经历了一次多样化爆发,随后多样化率呈指数下降,直到现在。其祖先的寄主范围包括豆科的黄芪属和三叶草属,但后来扩展到包括广泛的豆科属和关系较远的科的植物,并伴随着寄主范围的扩张和收缩期。我们认为,所有现存科利亚斯(Colias)种系的祖先的广泛分布为种群的多样化创造了条件,这些种群通过隔离来适应它们遇到的各种不同环境,包括不同的寄主植物。在这种情况下,海拔高度并不是主要的驱动因素,但可能会通过隔离种群而加速多样化。
{"title":"Rapid Evolution of Host Repertoire and Geographic Range in a Young and Diverse Genus of Montane Butterflies.","authors":"Shifang Mo, Yaowei Zhu, Mariana P Braga, David J Lohman, Sören Nylin, Ashraf Moumou, Christopher W Wheat, Niklas Wahlberg, Min Wang, Fangzhou Ma, Peng Zhang, Houshuai Wang","doi":"10.1093/sysbio/syae061","DOIUrl":"https://doi.org/10.1093/sysbio/syae061","url":null,"abstract":"<p><p>Evolutionary changes in geographic distribution and larval host plants may promote the rapid diversification of montane insects, but this scenario has been rarely investigated. We studied rapid radiation of the butterfly genus Colias, which has diversified in mountain ecosystems in Eurasia, Africa, and the Americas. Based on a dataset of 150 nuclear protein-coding genetic loci and mitochondrial genomes, we constructed a time-calibrated phylogenetic tree of Colias species with broad taxon sampling. We then inferred their ancestral geographic ranges, historical diversification rates, and the evolution of host use. We found that the most recent common ancestor of Colias was likely geographically widespread and originated ~3.5 Ma. The group subsequently diversified in different regions across the world, often in tandem with geographic expansion events. No aspect of elevation was found to have a direct effect on diversification. The genus underwent a burst of diversification soon after the divergence of the Neotropical lineage, followed by an exponential decline in diversification rate toward the present. The ancestral host repertoire included the legume genera Astragalus and Trifolium but later expanded to include a wide range of Fabaceae genera and plants in more distantly related families, punctuated with periods of host range expansion and contraction. We suggest that the widespread distribution of the ancestor of all extant Colias lineages set the stage for diversification by isolation of populations that locally adapted to the various different environments they encountered, including different host plants. In this scenario, elevation is not the main driver but might have accelerated diversification by isolating populations.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142558837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topology Testing and Demographic Modeling Illuminate a Novel Speciation Pathway in the Greater Caribbean Sea Following the Formation of the Isthmus of Panama. 拓扑测试和人口模型揭示了巴拿马地峡形成后大加勒比海的新物种演化途径。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-10-30 DOI: 10.1093/sysbio/syae045
Benjamin M Titus, H Lisle Gibbs, Nuno Simões, Marymegan Daly

Recent genomic analyses have highlighted the prevalence of speciation with gene flow in many taxa and have underscored the importance of accounting for these reticulate evolutionary processes when constructing species trees and generating parameter estimates. This is especially important for deepening our understanding of speciation in the sea where fast-moving ocean currents, expanses of deep water, and periodic episodes of sea level rise and fall act as soft and temporary allopatric barriers that facilitate both divergence and secondary contact. Under these conditions, gene flow is not expected to cease completely while contemporary distributions are expected to differ from historical ones. Here, we conduct range-wide sampling for Pederson's cleaner shrimp (Ancylomenes pedersoni), a species complex from the Greater Caribbean that contains three clearly delimited mitochondrial lineages with both allopatric and sympatric distributions. Using mtDNA barcodes and a genomic ddRADseq approach, we combine classic phylogenetic analyses with extensive topology testing and demographic modeling (10 site frequency replicates × 45 evolutionary models × 50 model simulations/replicate = 22,500 simulations) to test species boundaries and reconstruct the evolutionary history of what was expected to be a simple case study. Instead, our results indicate a history of allopatric divergence, secondary contact, introgression, and endemic hybrid speciation that we hypothesize was driven by the final closure of the Isthmus of Panama and the strengthening of the Gulf Stream Current ~3.5 Ma. The history of this species complex recovered by model-based methods that allow reticulation differs from that recovered by standard phylogenetic analyses and is unexpected given contemporary distributions. The geologically and biologically meaningful insights gained by our model selection analyses illuminate what is likely a novel pathway of species formation not previously documented that resulted from one of the most biogeographically significant events in Earth's history.

最近的基因组分析突显了许多类群中基因流动的物种演化现象,并强调了在构建物种树和生成参数估计时考虑这些网状演化过程的重要性。在海洋中,快速移动的洋流、广阔的深水区以及周期性的海平面上升和下降成为软性和暂时性的同域屏障,促进了物种的分化和二次接触,这对于加深我们对海洋中物种分化的理解尤为重要。在这些条件下,基因流动预计不会完全停止,而当代分布预计会与历史分布有所不同。在这里,我们对佩德森对虾(Ancylomenes pedersoni)进行了全域采样,这是大加勒比海的一个物种群,包含三个界限清晰的线粒体系,既有同域分布,也有异域分布。利用 mtDNA 条形码和基因组 ddRADseq 方法,我们将经典的系统发育分析与广泛的拓扑测试和人口统计建模(10 个位点频率重复 x 45 个进化模型 x 50 个模型模拟/重复 = 22,500 次模拟)相结合,检验了物种边界,并重建了这一预期为简单案例研究的进化历史。相反,我们的研究结果表明,在距今约 350 万年前,巴拿马地峡的最终关闭和湾流的加强推动了异地分化、次生接触、引种和地方性杂交物种的形成。通过基于模型的方法(允许网状结构)复原的这一物种复合体的历史与标准系统发育分析复原的历史不同,而且从当代分布来看也出乎意料。我们的模型选择分析所获得的具有地质学和生物学意义的见解,阐明了地球历史上最重要的生物地理事件之一所导致的物种形成的新途径,这可能是以前没有记载的。
{"title":"Topology Testing and Demographic Modeling Illuminate a Novel Speciation Pathway in the Greater Caribbean Sea Following the Formation of the Isthmus of Panama.","authors":"Benjamin M Titus, H Lisle Gibbs, Nuno Simões, Marymegan Daly","doi":"10.1093/sysbio/syae045","DOIUrl":"10.1093/sysbio/syae045","url":null,"abstract":"<p><p>Recent genomic analyses have highlighted the prevalence of speciation with gene flow in many taxa and have underscored the importance of accounting for these reticulate evolutionary processes when constructing species trees and generating parameter estimates. This is especially important for deepening our understanding of speciation in the sea where fast-moving ocean currents, expanses of deep water, and periodic episodes of sea level rise and fall act as soft and temporary allopatric barriers that facilitate both divergence and secondary contact. Under these conditions, gene flow is not expected to cease completely while contemporary distributions are expected to differ from historical ones. Here, we conduct range-wide sampling for Pederson's cleaner shrimp (Ancylomenes pedersoni), a species complex from the Greater Caribbean that contains three clearly delimited mitochondrial lineages with both allopatric and sympatric distributions. Using mtDNA barcodes and a genomic ddRADseq approach, we combine classic phylogenetic analyses with extensive topology testing and demographic modeling (10 site frequency replicates × 45 evolutionary models × 50 model simulations/replicate = 22,500 simulations) to test species boundaries and reconstruct the evolutionary history of what was expected to be a simple case study. Instead, our results indicate a history of allopatric divergence, secondary contact, introgression, and endemic hybrid speciation that we hypothesize was driven by the final closure of the Isthmus of Panama and the strengthening of the Gulf Stream Current ~3.5 Ma. The history of this species complex recovered by model-based methods that allow reticulation differs from that recovered by standard phylogenetic analyses and is unexpected given contemporary distributions. The geologically and biologically meaningful insights gained by our model selection analyses illuminate what is likely a novel pathway of species formation not previously documented that resulted from one of the most biogeographically significant events in Earth's history.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"758-768"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141749074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Fundamental Role of Character Coding in Bayesian Morphological Phylogenetics. 贝叶斯形态系统学中特征编码的基本作用。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-10-30 DOI: 10.1093/sysbio/syae033
Basanta Khakurel, Courtney Grigsby, Tyler D Tran, Juned Zariwala, Sebastian Höhna, April M Wright

Phylogenetic trees establish a historical context for the study of organismal form and function. Most phylogenetic trees are estimated using a model of evolution. For molecular data, modeling evolution is often based on biochemical observations about changes between character states. For example, there are 4 nucleotides, and we can make assumptions about the probability of transitions between them. By contrast, for morphological characters, we may not know a priori how many characters states there are per character, as both extant sampling and the fossil record may be highly incomplete, which leads to an observer bias. For a given character, the state space may be larger than what has been observed in the sample of taxa collected by the researcher. In this case, how many evolutionary rates are needed to even describe transitions between morphological character states may not be clear, potentially leading to model misspecification. To explore the impact of this model misspecification, we simulated character data with varying numbers of character states per character. We then used the data to estimate phylogenetic trees using models of evolution with the correct number of character states and an incorrect number of character states. The results of this study indicate that this observer bias may lead to phylogenetic error, particularly in the branch lengths of trees. If the state space is wrongly assumed to be too large, then we underestimate the branch lengths, and the opposite occurs when the state space is wrongly assumed to be too small.

系统发生树为研究生物体的形态和功能提供了历史背景。大多数系统发生树都是通过进化模型来估算的。对于分子数据,进化模型通常基于对特征状态之间变化的生化观察。例如,有四种核苷酸,我们可以对它们之间的转换概率做出假设。相比之下,对于形态特征而言,我们可能无法先验地知道每个特征有多少种特征状态,因为现存取样和化石记录都可能非常不完整,这就导致了观察者偏差。对于一个给定的特征,其状态空间可能比研究者收集的类群样本中观察到的更大。在这种情况下,需要多少进化率才能描述形态特征状态之间的转变可能并不清楚,从而可能导致模型的错误规范。为了探究这种模型不规范的影响,我们模拟了每个特征具有不同数量特征状态的特征数据。然后,我们利用这些数据,使用具有正确特征状态数和不正确特征状态数的进化模型来估计系统发生树。研究结果表明,这种观察者偏差可能会导致系统发育错误,尤其是在树的分支长度方面。如果错误地假定状态空间过大,那么我们就会低估分支长度,而如果错误地假定状态空间过小,则会出现相反的情况。
{"title":"The Fundamental Role of Character Coding in Bayesian Morphological Phylogenetics.","authors":"Basanta Khakurel, Courtney Grigsby, Tyler D Tran, Juned Zariwala, Sebastian Höhna, April M Wright","doi":"10.1093/sysbio/syae033","DOIUrl":"10.1093/sysbio/syae033","url":null,"abstract":"<p><p>Phylogenetic trees establish a historical context for the study of organismal form and function. Most phylogenetic trees are estimated using a model of evolution. For molecular data, modeling evolution is often based on biochemical observations about changes between character states. For example, there are 4 nucleotides, and we can make assumptions about the probability of transitions between them. By contrast, for morphological characters, we may not know a priori how many characters states there are per character, as both extant sampling and the fossil record may be highly incomplete, which leads to an observer bias. For a given character, the state space may be larger than what has been observed in the sample of taxa collected by the researcher. In this case, how many evolutionary rates are needed to even describe transitions between morphological character states may not be clear, potentially leading to model misspecification. To explore the impact of this model misspecification, we simulated character data with varying numbers of character states per character. We then used the data to estimate phylogenetic trees using models of evolution with the correct number of character states and an incorrect number of character states. The results of this study indicate that this observer bias may lead to phylogenetic error, particularly in the branch lengths of trees. If the state space is wrongly assumed to be too large, then we underestimate the branch lengths, and the opposite occurs when the state space is wrongly assumed to be too small.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"861-871"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141535331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whole Genomes Reveal Evolutionary Relationships and Mechanisms Underlying Gene-Tree Discordance in Neodiprion Sawflies. 全基因组揭示 Neodiprion 锯蝇基因树不一致的进化关系和机制
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-10-30 DOI: 10.1093/sysbio/syae036
Danielle K Herrig, Ryan D Ridenbaugh, Kim L Vertacnik, Kathryn M Everson, Sheina B Sim, Scott M Geib, David W Weisrock, Catherine R Linnen

Rapidly evolving taxa are excellent models for understanding the mechanisms that give rise to biodiversity. However, developing an accurate historical framework for comparative analysis of such lineages remains a challenge due to ubiquitous incomplete lineage sorting (ILS) and introgression. Here, we use a whole-genome alignment, multiple locus-sampling strategies, and summary-tree and single nucleotide polymorphism-based species-tree methods to infer a species tree for eastern North American Neodiprion species, a clade of pine-feeding sawflies (Order: Hymenopteran; Family: Diprionidae). We recovered a well-supported species tree that-except for three uncertain relationships-was robust to different strategies for analyzing whole-genome data. Nevertheless, underlying gene-tree discordance was high. To understand this genealogical variation, we used multiple linear regression to model site concordance factors estimated in 50-kb windows as a function of several genomic predictor variables. We found that site concordance factors tended to be higher in regions of the genome with more parsimony-informative sites, fewer singletons, less missing data, lower GC content, more genes, lower recombination rates, and lower D-statistics (less introgression). Together, these results suggest that ILS, introgression, and genotyping error all shape the genomic landscape of gene-tree discordance in Neodiprion. More generally, our findings demonstrate how combining phylogenomic analysis with knowledge of local genomic features can reveal mechanisms that produce topological heterogeneity across genomes.

快速进化的类群是了解生物多样性产生机制的绝佳模型。然而,由于无处不在的不完全世系分类和引入,为这类世系的比较分析建立一个准确的历史框架仍然是一个挑战。在本文中,我们使用全基因组比对、多位点取样策略以及基于总结树和 SNP 的物种树方法来推断北美东部 Neodiprion 物种的物种树,这是一个食松锯蝇支系(目:膜翅目;科:双翅目)。我们恢复了一个支持良好的物种树,除了三个不确定的关系外,该物种树对不同的全基因组数据分析策略都很稳健。然而,潜在基因树的不一致性很高。为了了解这种谱系变异,我们使用多元线性回归方法,将 50-kb 窗口中估计的位点一致性因子作为几个基因组预测变量的函数来建模。我们发现,在基因组中具有更多解析信息的位点、更少的单子、更少的缺失数据、更低的 GC 含量、更多的基因、更低的重组率和更低的 D 统计量(更少的引入)的区域,位点一致性系数往往更高。这些结果表明,不完全的世系分选、引入和基因分型错误都会造成新地鸟基因组中基因树不一致的情况。更广泛地说,我们的研究结果证明了如何将系统发生组分析与对局部基因组特征的了解相结合,从而揭示产生跨基因组拓扑异质性的机制。
{"title":"Whole Genomes Reveal Evolutionary Relationships and Mechanisms Underlying Gene-Tree Discordance in Neodiprion Sawflies.","authors":"Danielle K Herrig, Ryan D Ridenbaugh, Kim L Vertacnik, Kathryn M Everson, Sheina B Sim, Scott M Geib, David W Weisrock, Catherine R Linnen","doi":"10.1093/sysbio/syae036","DOIUrl":"10.1093/sysbio/syae036","url":null,"abstract":"<p><p>Rapidly evolving taxa are excellent models for understanding the mechanisms that give rise to biodiversity. However, developing an accurate historical framework for comparative analysis of such lineages remains a challenge due to ubiquitous incomplete lineage sorting (ILS) and introgression. Here, we use a whole-genome alignment, multiple locus-sampling strategies, and summary-tree and single nucleotide polymorphism-based species-tree methods to infer a species tree for eastern North American Neodiprion species, a clade of pine-feeding sawflies (Order: Hymenopteran; Family: Diprionidae). We recovered a well-supported species tree that-except for three uncertain relationships-was robust to different strategies for analyzing whole-genome data. Nevertheless, underlying gene-tree discordance was high. To understand this genealogical variation, we used multiple linear regression to model site concordance factors estimated in 50-kb windows as a function of several genomic predictor variables. We found that site concordance factors tended to be higher in regions of the genome with more parsimony-informative sites, fewer singletons, less missing data, lower GC content, more genes, lower recombination rates, and lower D-statistics (less introgression). Together, these results suggest that ILS, introgression, and genotyping error all shape the genomic landscape of gene-tree discordance in Neodiprion. More generally, our findings demonstrate how combining phylogenomic analysis with knowledge of local genomic features can reveal mechanisms that produce topological heterogeneity across genomes.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"839-860"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141545293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward a Semi-Supervised Learning Approach to Phylogenetic Estimation. 系统发育估计的半监督学习方法。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-10-30 DOI: 10.1093/sysbio/syae029
Daniele Silvestro, Thibault Latrille, Nicolas Salamin

Models have always been central to inferring molecular evolution and to reconstructing phylogenetic trees. Their use typically involves the development of a mechanistic framework reflecting our understanding of the underlying biological processes, such as nucleotide substitutions, and the estimation of model parameters by maximum likelihood or Bayesian inference. However, deriving and optimizing the likelihood of the data is not always possible under complex evolutionary scenarios or even tractable for large datasets, often leading to unrealistic simplifying assumptions in the fitted models. To overcome this issue, we coupled stochastic simulations of genome evolution with a new supervised deep-learning model to infer key parameters of molecular evolution. Our model is designed to directly analyze multiple sequence alignments and estimate per-site evolutionary rates and divergence without requiring a known phylogenetic tree. The accuracy of our predictions matched that of likelihood-based phylogenetic inference when rate heterogeneity followed a simple gamma distribution, but it strongly exceeded it under more complex patterns of rate variation, such as codon models. Our approach is highly scalable and can be efficiently applied to genomic data, as we showed on a dataset of 26 million nucleotides from the clownfish clade. Our simulations also showed that the integration of per-site rates obtained by deep learning within a Bayesian framework led to significantly more accurate phylogenetic inference, particularly with respect to the estimated branch lengths. We thus propose that future advancements in phylogenetic analysis will benefit from a semi-supervised learning approach that combines deep-learning estimation of substitution rates, which allows for more flexible models of rate variation, and probabilistic inference of the phylogenetic tree, which guarantees interpretability and a rigorous assessment of statistical support.

模型一直是推断分子进化和重建系统发生树的核心。使用模型通常需要建立一个机理框架,反映我们对核苷酸取代等基本生物过程的理解,并通过最大似然法或贝叶斯推断法估计模型参数。然而,在复杂的进化情况下,推导和优化数据的似然性并不总是可能的,甚至对于大型数据集来说也不是一件容易的事,这往往会导致拟合模型中出现不切实际的简化假设。为了克服这个问题,我们将基因组进化的随机模拟与新的监督深度学习模型相结合,以推断分子进化的关键参数。我们的模型旨在直接分析多序列比对,并估算每个位点的进化速率和分歧,而无需已知的系统发生树。当速率异质性遵循简单的伽马分布时,我们预测的准确性与基于似然法的系统发育推断相匹配,但在更复杂的速率变异模式(如密码子模型)下,我们预测的准确性大大超过了似然法。我们的方法具有很强的可扩展性,可以高效地应用于基因组数据,正如我们在小丑鱼支系的 2600 万核苷酸数据集上所展示的那样。我们的模拟还表明,在贝叶斯框架内整合通过深度学习获得的每个位点率,可以大大提高系统发育推断的准确率,尤其是在估计分支长度方面。因此,我们建议,未来系统发生分析的进步将受益于半监督学习方法,这种方法结合了深度学习对替代率的估计和系统发生树的概率推断,前者允许更灵活的替代率变化模型,后者保证了可解释性和对统计支持的严格评估。
{"title":"Toward a Semi-Supervised Learning Approach to Phylogenetic Estimation.","authors":"Daniele Silvestro, Thibault Latrille, Nicolas Salamin","doi":"10.1093/sysbio/syae029","DOIUrl":"10.1093/sysbio/syae029","url":null,"abstract":"<p><p>Models have always been central to inferring molecular evolution and to reconstructing phylogenetic trees. Their use typically involves the development of a mechanistic framework reflecting our understanding of the underlying biological processes, such as nucleotide substitutions, and the estimation of model parameters by maximum likelihood or Bayesian inference. However, deriving and optimizing the likelihood of the data is not always possible under complex evolutionary scenarios or even tractable for large datasets, often leading to unrealistic simplifying assumptions in the fitted models. To overcome this issue, we coupled stochastic simulations of genome evolution with a new supervised deep-learning model to infer key parameters of molecular evolution. Our model is designed to directly analyze multiple sequence alignments and estimate per-site evolutionary rates and divergence without requiring a known phylogenetic tree. The accuracy of our predictions matched that of likelihood-based phylogenetic inference when rate heterogeneity followed a simple gamma distribution, but it strongly exceeded it under more complex patterns of rate variation, such as codon models. Our approach is highly scalable and can be efficiently applied to genomic data, as we showed on a dataset of 26 million nucleotides from the clownfish clade. Our simulations also showed that the integration of per-site rates obtained by deep learning within a Bayesian framework led to significantly more accurate phylogenetic inference, particularly with respect to the estimated branch lengths. We thus propose that future advancements in phylogenetic analysis will benefit from a semi-supervised learning approach that combines deep-learning estimation of substitution rates, which allows for more flexible models of rate variation, and probabilistic inference of the phylogenetic tree, which guarantees interpretability and a rigorous assessment of statistical support.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"789-806"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model. 期望最大化使分类率模型下的系统发育约会成为可能
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-10-30 DOI: 10.1093/sysbio/syae034
Uyen Mai, Eduardo Charvel, Siavash Mirarab

Dating phylogenetic trees to obtain branch lengths in time units is essential for many downstream applications but has remained challenging. Dating requires inferring substitution rates that can change across the tree. While we can assume to have information about a small subset of nodes from the fossil record or sampling times (for fast-evolving organisms), inferring the ages of the other nodes essentially requires extrapolation and interpolation. Assuming a distribution of branch rates, we can formulate dating as a constrained maximum likelihood (ML) estimation problem. While ML dating methods exist, their accuracy degrades in the face of model misspecification, where the assumed parametric statistical distribution of branch rates vastly differs from the true distribution. Notably, most existing methods assume rigid, often unimodal, branch rate distributions. A second challenge is that the likelihood function involves an integral over the continuous domain of the rates, often leading to difficult non-convex optimization problems. To tackle both challenges, we propose a new method called Molecular Dating using Categorical-models (MD-Cat). MD-Cat uses a categorical model of rates inspired by non-parametric statistics and can approximate a large family of models by discretizing the rate distribution into k categories. Under this model, we can use the Expectation-Maximization algorithm to co-estimate rate categories and branch lengths in time units. Our model has fewer assumptions about the true distribution of branch rates than parametric models such as Gamma or LogNormal distribution. Our results on two simulated and real datasets of Angiosperms and HIV and a wide selection of rate distributions show that MD-Cat is often more accurate than the alternatives, especially on datasets with exponential or multimodal rate distributions.

对系统发生树进行定年以获得时间单位的分支长度对许多下游应用都是至关重要的,但仍然具有挑战性。确定系统发生树的年代需要推断整个系统发生树中可能发生变化的替代率。虽然我们可以假设从化石记录或取样时间(对于快速进化的生物)中获得了一小部分节点的信息,但推断其他节点的年龄基本上需要外推法和内插法。假设分支率的分布情况,我们可以将年代测定表述为一个受约束的最大似然(ML)估计问题。虽然存在最大似然法测年方法,但其准确性会因模型失当而降低,因为在模型失当的情况下,假定的分支率参数统计分布与真实分布相差甚远。值得注意的是,大多数现有方法都假设了僵化的、通常是单模态的分支率分布。第二个挑战是,似然函数涉及对比率连续域的积分,通常会导致困难的非凸优化问题。为了解决这两个难题,我们提出了一种名为 "使用分类模型的分子约会"(MD-Cat)的新方法。MD-Cat 采用了一种受非参数统计启发的速率分类模型,通过将速率分布离散为 k 个类别,可以近似大量的模型族。在此模型下,我们可以使用期望最大化(EM)算法来共同估算速率类别和以时间为单位的分支长度。与伽马分布或对数正态分布等参数模型相比,我们的模型对分支率真实分布的假设更少。我们在 Angiosperms 和 HIV 两个模拟和真实数据集以及多种速率分布选择上的结果表明,MD-Cat 通常比其他方法更准确,尤其是在指数或多模态速率分布的数据集上。
{"title":"Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model.","authors":"Uyen Mai, Eduardo Charvel, Siavash Mirarab","doi":"10.1093/sysbio/syae034","DOIUrl":"10.1093/sysbio/syae034","url":null,"abstract":"<p><p>Dating phylogenetic trees to obtain branch lengths in time units is essential for many downstream applications but has remained challenging. Dating requires inferring substitution rates that can change across the tree. While we can assume to have information about a small subset of nodes from the fossil record or sampling times (for fast-evolving organisms), inferring the ages of the other nodes essentially requires extrapolation and interpolation. Assuming a distribution of branch rates, we can formulate dating as a constrained maximum likelihood (ML) estimation problem. While ML dating methods exist, their accuracy degrades in the face of model misspecification, where the assumed parametric statistical distribution of branch rates vastly differs from the true distribution. Notably, most existing methods assume rigid, often unimodal, branch rate distributions. A second challenge is that the likelihood function involves an integral over the continuous domain of the rates, often leading to difficult non-convex optimization problems. To tackle both challenges, we propose a new method called Molecular Dating using Categorical-models (MD-Cat). MD-Cat uses a categorical model of rates inspired by non-parametric statistics and can approximate a large family of models by discretizing the rate distribution into k categories. Under this model, we can use the Expectation-Maximization algorithm to co-estimate rate categories and branch lengths in time units. Our model has fewer assumptions about the true distribution of branch rates than parametric models such as Gamma or LogNormal distribution. Our results on two simulated and real datasets of Angiosperms and HIV and a wide selection of rate distributions show that MD-Cat is often more accurate than the alternatives, especially on datasets with exponential or multimodal rate distributions.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"823-838"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141545291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Reliable Detection of Introgression in the Presence of Among-Species Rate Variation. 在存在物种间速率变异的情况下,实现可靠的引种检测。
IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Pub Date : 2024-10-30 DOI: 10.1093/sysbio/syae028
Thore Koppetsch, Milan Malinsky, Michael Matschiner

The role of interspecific hybridization has recently seen increasing attention, especially in the context of diversification dynamics. Genomic research has now made it abundantly clear that both hybridization and introgression-the exchange of genetic material through hybridization and backcrossing-are far more common than previously thought. Besides cases of ongoing or recent genetic exchange between taxa, an increasing number of studies report "ancient introgression"- referring to results of hybridization that took place in the distant past. However, it is not clear whether commonly used methods for the detection of introgression are applicable to such old systems, given that most of these methods were originally developed for analyses at the level of populations and recently diverged species, affected by recent or ongoing genetic exchange. In particular, the assumption of constant evolutionary rates, which is implicit in many commonly used approaches, is more likely to be violated as evolutionary divergence increases. To test the limitations of introgression detection methods when being applied to old systems, we simulated thousands of genomic datasets under a wide range of settings, with varying degrees of among-species rate variation and introgression. Using these simulated datasets, we showed that some commonly applied statistical methods, including the D-statistic and certain tests based on sets of local phylogenetic trees, can produce false-positive signals of introgression between divergent taxa that have different rates of evolution. These misleading signals are caused by the presence of homoplasies occurring at different rates in different lineages. To distinguish between the patterns caused by rate variation and genuine introgression, we developed a new test that is based on the expected clustering of introgressed sites along the genome and implemented this test in the program Dsuite.

近来,种间杂交的作用越来越受到关注,尤其是在物种多样化动态的背景下。基因组研究现在已经非常清楚地表明,杂交和引种--通过杂交和回交进行遗传物质交换--比以前想象的要普遍得多。除了类群之间正在进行的或最近发生的基因交流,越来越多的研究报告了 "古老的引入"--指的是远古时代发生的杂交结果。然而,目前还不清楚常用的检测引入的方法是否适用于这种古老的系统,因为这些方法最初大多是为分析受近期或正在进行的遗传交流影响的种群和新近分化的物种而开发的。特别是,许多常用方法中隐含的恒定进化速率假设,随着进化差异的增加更有可能被违反。为了检验引入检测方法在应用于旧系统时的局限性,我们模拟了数千个基因组数据集,这些数据集在各种设置下,物种间的进化率变化和引入程度各不相同。通过使用这些模拟数据集,我们发现一些常用的统计方法,包括 D 统计量和某些基于局部系统发生树的测试,会在具有不同进化速率的不同类群之间产生引入的假阳性信号。这些误导性信号是由于同源现象在不同品系中以不同的速度出现而造成的。为了区分由进化速率变化引起的模式和真正的引入,我们开发了一种新的检验方法,它基于引入位点沿基因组的预期聚类,并在 Dsuite 程序中实现了这一检验方法。
{"title":"Towards Reliable Detection of Introgression in the Presence of Among-Species Rate Variation.","authors":"Thore Koppetsch, Milan Malinsky, Michael Matschiner","doi":"10.1093/sysbio/syae028","DOIUrl":"10.1093/sysbio/syae028","url":null,"abstract":"<p><p>The role of interspecific hybridization has recently seen increasing attention, especially in the context of diversification dynamics. Genomic research has now made it abundantly clear that both hybridization and introgression-the exchange of genetic material through hybridization and backcrossing-are far more common than previously thought. Besides cases of ongoing or recent genetic exchange between taxa, an increasing number of studies report \"ancient introgression\"- referring to results of hybridization that took place in the distant past. However, it is not clear whether commonly used methods for the detection of introgression are applicable to such old systems, given that most of these methods were originally developed for analyses at the level of populations and recently diverged species, affected by recent or ongoing genetic exchange. In particular, the assumption of constant evolutionary rates, which is implicit in many commonly used approaches, is more likely to be violated as evolutionary divergence increases. To test the limitations of introgression detection methods when being applied to old systems, we simulated thousands of genomic datasets under a wide range of settings, with varying degrees of among-species rate variation and introgression. Using these simulated datasets, we showed that some commonly applied statistical methods, including the D-statistic and certain tests based on sets of local phylogenetic trees, can produce false-positive signals of introgression between divergent taxa that have different rates of evolution. These misleading signals are caused by the presence of homoplasies occurring at different rates in different lineages. To distinguish between the patterns caused by rate variation and genuine introgression, we developed a new test that is based on the expected clustering of introgressed sites along the genome and implemented this test in the program Dsuite.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"769-788"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141443357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Systematic Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1