{"title":"在复杂的系统发生组数据集中伪造直系同源物和勤奋数据探索的必要性:来自安第斯植物区系的博物学案例研究。","authors":"Laura A Frost, Ana M Bedoya, Laura P Lagomarsino","doi":"10.1093/sysbio/syad076","DOIUrl":null,"url":null,"abstract":"<p><p>The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap in our understanding of the World's richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including the presence of data artifacts. Via by-eye observation of gene trees, detailed examination of warnings from recently improved assembly pipelines, and gene tree filtering, we identified that artifactual orthologs (i.e., the presence of only one copy of a multicopy gene due to differential assembly) were an important source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These artifactual orthologs may be common in plant phylogenomic datasets, where multiple instances of genome duplication are common. After accounting for artifactual orthologs as source of gene tree error, we identified a significant, but nonspecific signal of introgression using Patterson's D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into 9 well-supported subclades whose evolution has been shaped by multiple evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, which are heightened in Andean radiations, and show the impact of filtering data processing artifacts and standard filtering approaches on phylogenetic inference.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"308-322"},"PeriodicalIF":6.1000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artifactual Orthologs and the Need for Diligent Data Exploration in Complex Phylogenomic Datasets: A Museomic Case Study from the Andean Flora.\",\"authors\":\"Laura A Frost, Ana M Bedoya, Laura P Lagomarsino\",\"doi\":\"10.1093/sysbio/syad076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap in our understanding of the World's richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including the presence of data artifacts. Via by-eye observation of gene trees, detailed examination of warnings from recently improved assembly pipelines, and gene tree filtering, we identified that artifactual orthologs (i.e., the presence of only one copy of a multicopy gene due to differential assembly) were an important source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These artifactual orthologs may be common in plant phylogenomic datasets, where multiple instances of genome duplication are common. After accounting for artifactual orthologs as source of gene tree error, we identified a significant, but nonspecific signal of introgression using Patterson's D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into 9 well-supported subclades whose evolution has been shaped by multiple evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, which are heightened in Andean radiations, and show the impact of filtering data processing artifacts and standard filtering approaches on phylogenetic inference.</p>\",\"PeriodicalId\":22120,\"journal\":{\"name\":\"Systematic Biology\",\"volume\":\" \",\"pages\":\"308-322\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2024-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systematic Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/sysbio/syad076\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syad076","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
南美洲西部的安第斯山脉是全球重要的生物多样性热点地区,但该地区植物支系的系统发生却很少。我们首次提出了以安第斯山脉为中心的云林辐射植物--Freziera(五枫香科)的系统发生,填补了我们对世界上最丰富植物区系了解的一个重要空白。我们的数据集是通过rid-enriched target sequence capture of Angiosperms获得的。这些数据几乎全部来自标本馆标本。我们在 Freziera 中发现了高度的系统发生复杂性,包括数据伪造的存在。通过亲眼观察基因树、详细检查最近改进的组装管道发出的警告以及基因树过滤,我们发现伪造的直系同源物(即由于差异组装导致多拷贝基因只有一个拷贝)是基因树异质性的一个重要来源,对系统发生推断和支持有负面影响。在植物系统发生组数据集中,这些人为的直向同源物可能很常见,因为在植物系统发生组数据集中,多个基因组重复的情况很普遍。在考虑了作为基因树误差来源的伪造直系同源物之后,我们利用 Patterson's D 和 f4 统计发现了一个显著但非特异性的引种信号。尽管系统发生组十分复杂,但我们仍能将 Freziera 分解为 9 个支持度较高的亚支系,其进化受多种进化过程的影响,包括不完全的世系分类、历史基因流和基因复制。我们的研究结果凸显了植物系统发生组学的复杂性,而安第斯地区的辐射则使这种复杂性更加突出,同时也显示了过滤数据处理人工痕迹和标准过滤方法对系统发生推断的影响。
Artifactual Orthologs and the Need for Diligent Data Exploration in Complex Phylogenomic Datasets: A Museomic Case Study from the Andean Flora.
The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap in our understanding of the World's richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including the presence of data artifacts. Via by-eye observation of gene trees, detailed examination of warnings from recently improved assembly pipelines, and gene tree filtering, we identified that artifactual orthologs (i.e., the presence of only one copy of a multicopy gene due to differential assembly) were an important source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These artifactual orthologs may be common in plant phylogenomic datasets, where multiple instances of genome duplication are common. After accounting for artifactual orthologs as source of gene tree error, we identified a significant, but nonspecific signal of introgression using Patterson's D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into 9 well-supported subclades whose evolution has been shaped by multiple evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, which are heightened in Andean radiations, and show the impact of filtering data processing artifacts and standard filtering approaches on phylogenetic inference.
期刊介绍:
Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.