Limbs are a defining characteristic of tetrapods, yet numerous taxa, primarily among amphibians and reptiles, have independently lost limbs as an adaptation to new ecological niches. To elucidate the genetic factors contributing to this convergent limb loss, we present a 12 Gb chromosome-level assembly of the Banna caecilian (Ichthyophis bannanicus), a limbless amphibian. Our comparative analysis, which includes the reconstruction of amphibian karyotype evolution, reveals constrained gene length evolution in a subset of developmental genes across 3 large genomes. Investigation of limb development genes uncovered the loss of Grem1 in caecilians and Tulp3 in snakes. Interestingly, caecilians and snakes share a significantly larger number of convergent degenerated conserved noncoding elements than limbless lizards, which have a shorter evolutionary history of limb loss. These convergent degenerated conserved noncoding elements overlap significantly with active genomic regions during mouse limb development and are conserved in limbed species, suggesting their essential role in limb patterning in the tetrapod common ancestor. While most convergent degenerated conserved noncoding elements emerged in the jawed vertebrate ancestor, coinciding with the origin of paired appendage, more recent degenerated conserved noncoding elements also contribute to limb development, as demonstrated through functional experiments. Our study provides novel insights into the regulatory elements associated with limb development and loss, offering an evolutionary perspective on the genetic basis of morphological specialization.
{"title":"Convergent Degenerated Regulatory Elements Associated with Limb Loss in Limbless Amphibians and Reptiles.","authors":"Chenglong Zhu, Shengyou Li, Daizhen Zhang, Jinjin Zhang, Gang Wang, Botong Zhou, Jiangmin Zheng, Wenjie Xu, Zhengfei Wang, Xueli Gao, Qiuning Liu, Tingfeng Xue, Huabin Zhang, Chunhui Li, Baoming Ge, Yuxuan Liu, Qiang Qiu, Huixian Zhang, Jinghui Huang, Boping Tang, Kun Wang","doi":"10.1093/molbev/msae239","DOIUrl":"10.1093/molbev/msae239","url":null,"abstract":"<p><p>Limbs are a defining characteristic of tetrapods, yet numerous taxa, primarily among amphibians and reptiles, have independently lost limbs as an adaptation to new ecological niches. To elucidate the genetic factors contributing to this convergent limb loss, we present a 12 Gb chromosome-level assembly of the Banna caecilian (Ichthyophis bannanicus), a limbless amphibian. Our comparative analysis, which includes the reconstruction of amphibian karyotype evolution, reveals constrained gene length evolution in a subset of developmental genes across 3 large genomes. Investigation of limb development genes uncovered the loss of Grem1 in caecilians and Tulp3 in snakes. Interestingly, caecilians and snakes share a significantly larger number of convergent degenerated conserved noncoding elements than limbless lizards, which have a shorter evolutionary history of limb loss. These convergent degenerated conserved noncoding elements overlap significantly with active genomic regions during mouse limb development and are conserved in limbed species, suggesting their essential role in limb patterning in the tetrapod common ancestor. While most convergent degenerated conserved noncoding elements emerged in the jawed vertebrate ancestor, coinciding with the origin of paired appendage, more recent degenerated conserved noncoding elements also contribute to limb development, as demonstrated through functional experiments. Our study provides novel insights into the regulatory elements associated with limb development and loss, offering an evolutionary perspective on the genetic basis of morphological specialization.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lisa Y Mesrop, Geetanjali Minsky, Michael S Drummond, Jessica A Goodheart, Stephen R Proulx, Todd H Oakley
Evolutionary innovations in chemical secretion-such as the production of secondary metabolites, pheromones, and toxins-profoundly impact ecological interactions across a broad diversity of life. These secretory innovations may involve a "legacy-plus-innovation" mode of evolution, whereby new biochemical pathways are integrated with conserved secretory processes to create novel products. Among secretory innovations, bioluminescence is important because it evolved convergently many times to influence predator-prey interactions, while often producing courtship signals linked to increased rates of speciation. However, whether or not deeply conserved secretory genes are used in secretory bioluminescence remains unexplored. Here, we show that in the ostracod Vargula tsujii, the evolutionary novel c-luciferase gene is co-expressed with many conserved genes, including those related to toxin production and high-output protein secretion. Our results demonstrate that the legacy-plus-innovation mode of secretory evolution, previously applied to sensory modalities of olfaction, gustation, and nociception, also encompasses light-producing signals generated by bioluminescent secretions. This extension broadens the paradigm of secretory diversification to include not only chemical signals but also bioluminescent light as an important medium of ecological interaction and evolutionary innovation.
{"title":"Ancient Secretory Pathways Contributed to the Evolutionary Origin of an Ecologically Impactful Bioluminescence System.","authors":"Lisa Y Mesrop, Geetanjali Minsky, Michael S Drummond, Jessica A Goodheart, Stephen R Proulx, Todd H Oakley","doi":"10.1093/molbev/msae216","DOIUrl":"10.1093/molbev/msae216","url":null,"abstract":"<p><p>Evolutionary innovations in chemical secretion-such as the production of secondary metabolites, pheromones, and toxins-profoundly impact ecological interactions across a broad diversity of life. These secretory innovations may involve a \"legacy-plus-innovation\" mode of evolution, whereby new biochemical pathways are integrated with conserved secretory processes to create novel products. Among secretory innovations, bioluminescence is important because it evolved convergently many times to influence predator-prey interactions, while often producing courtship signals linked to increased rates of speciation. However, whether or not deeply conserved secretory genes are used in secretory bioluminescence remains unexplored. Here, we show that in the ostracod Vargula tsujii, the evolutionary novel c-luciferase gene is co-expressed with many conserved genes, including those related to toxin production and high-output protein secretion. Our results demonstrate that the legacy-plus-innovation mode of secretory evolution, previously applied to sensory modalities of olfaction, gustation, and nociception, also encompasses light-producing signals generated by bioluminescent secretions. This extension broadens the paradigm of secretory diversification to include not only chemical signals but also bioluminescent light as an important medium of ecological interaction and evolutionary innovation.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539039/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aleksandra Marconi, Grégoire Vernaz, Achira Karunaratna, Maxon J Ngochera, Richard Durbin, M Emília Santos
Neural crest (NC) is a vertebrate-specific embryonic progenitor cell population at the basis of important vertebrate features such as the craniofacial skeleton and pigmentation patterns. Despite the wide-ranging variation of NC-derived traits across vertebrates, the contribution of NC to species diversification remains underexplored. Here, leveraging the adaptive diversity of African Great Lakes' cichlid species, we combined comparative transcriptomics and population genomics to investigate the evolution of the NC genetic program in the context of their morphological divergence. Our analysis revealed substantial differences in transcriptional landscapes across somitogenesis, an embryonic period coinciding with NC development and migration. This included dozens of genes with described functions in the vertebrate NC gene regulatory network, several of which showed signatures of positive selection. Among candidates showing between-species expression divergence, we focused on teleost-specific paralogs of the NC-specifier sox10 (sox10a and sox10b) as prime candidates to influence NC development. These genes, expressed in NC cells, displayed remarkable spatio-temporal variation in cichlids, suggesting their contribution to interspecific morphological differences, such as craniofacial structures and pigmentation. Finally, through CRISPR/Cas9 mutagenesis, we demonstrated the functional divergence between cichlid sox10 paralogs, with the acquisition of a novel skeletogenic function by sox10a. When compared with teleost models zebrafish and medaka, our findings reveal that sox10 duplication, although retained in most teleost lineages, had variable functional fates across their phylogeny. Altogether, our study suggests that NC-related processes-particularly those controlled by sox10s-are involved in generating morphological diversification between species and lays the groundwork for further investigations into the mechanisms underpinning vertebrate NC diversification.
{"title":"Genetic and Developmental Divergence in the Neural Crest Program between Cichlid Fish Species.","authors":"Aleksandra Marconi, Grégoire Vernaz, Achira Karunaratna, Maxon J Ngochera, Richard Durbin, M Emília Santos","doi":"10.1093/molbev/msae217","DOIUrl":"10.1093/molbev/msae217","url":null,"abstract":"<p><p>Neural crest (NC) is a vertebrate-specific embryonic progenitor cell population at the basis of important vertebrate features such as the craniofacial skeleton and pigmentation patterns. Despite the wide-ranging variation of NC-derived traits across vertebrates, the contribution of NC to species diversification remains underexplored. Here, leveraging the adaptive diversity of African Great Lakes' cichlid species, we combined comparative transcriptomics and population genomics to investigate the evolution of the NC genetic program in the context of their morphological divergence. Our analysis revealed substantial differences in transcriptional landscapes across somitogenesis, an embryonic period coinciding with NC development and migration. This included dozens of genes with described functions in the vertebrate NC gene regulatory network, several of which showed signatures of positive selection. Among candidates showing between-species expression divergence, we focused on teleost-specific paralogs of the NC-specifier sox10 (sox10a and sox10b) as prime candidates to influence NC development. These genes, expressed in NC cells, displayed remarkable spatio-temporal variation in cichlids, suggesting their contribution to interspecific morphological differences, such as craniofacial structures and pigmentation. Finally, through CRISPR/Cas9 mutagenesis, we demonstrated the functional divergence between cichlid sox10 paralogs, with the acquisition of a novel skeletogenic function by sox10a. When compared with teleost models zebrafish and medaka, our findings reveal that sox10 duplication, although retained in most teleost lineages, had variable functional fates across their phylogeny. Altogether, our study suggests that NC-related processes-particularly those controlled by sox10s-are involved in generating morphological diversification between species and lays the groundwork for further investigations into the mechanisms underpinning vertebrate NC diversification.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558072/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As phylogenomic datasets have grown in size, researchers have developed new ways to measure biological variation and to assess statistical support for specific branches. Larger datasets have more sites and loci and therefore less sampling variance. While we can more accurately measure the mean signal in these datasets, lower sampling variance is often reflected in uniformly high measures of branch support-such as the bootstrap and posterior probability-limiting their utility. Larger datasets have also revealed substantial biological variation in the topologies found across individual loci, such that the single species tree inferred by most phylogenetic methods represents a limited summary of the data for many purposes. In contrast to measures of statistical support, the degree of underlying topological variation among loci should be approximately constant regardless of the size of the dataset. "Concordance factors" (CFs) and similar statistics have therefore become increasingly important tools in phylogenetics. In this review, we explain why CFs should be thought of as descriptors of topological variation rather than as measures of statistical support, and argue that they provide important information about the predictive power of the species tree not contained in measures of support. We review a growing suite of statistics for measuring concordance, compare them in a common framework that reveals their interrelationships, and demonstrate how to calculate them using an example from birds. We also discuss how measures of topological variation might change in the future as we move beyond estimating a single "tree of life" toward estimating the myriad evolutionary histories underlying genomic variation.
{"title":"The Meaning and Measure of Concordance Factors in Phylogenomics.","authors":"Robert Lanfear, Matthew W Hahn","doi":"10.1093/molbev/msae214","DOIUrl":"10.1093/molbev/msae214","url":null,"abstract":"<p><p>As phylogenomic datasets have grown in size, researchers have developed new ways to measure biological variation and to assess statistical support for specific branches. Larger datasets have more sites and loci and therefore less sampling variance. While we can more accurately measure the mean signal in these datasets, lower sampling variance is often reflected in uniformly high measures of branch support-such as the bootstrap and posterior probability-limiting their utility. Larger datasets have also revealed substantial biological variation in the topologies found across individual loci, such that the single species tree inferred by most phylogenetic methods represents a limited summary of the data for many purposes. In contrast to measures of statistical support, the degree of underlying topological variation among loci should be approximately constant regardless of the size of the dataset. \"Concordance factors\" (CFs) and similar statistics have therefore become increasingly important tools in phylogenetics. In this review, we explain why CFs should be thought of as descriptors of topological variation rather than as measures of statistical support, and argue that they provide important information about the predictive power of the species tree not contained in measures of support. We review a growing suite of statistics for measuring concordance, compare them in a common framework that reveals their interrelationships, and demonstrate how to calculate them using an example from birds. We also discuss how measures of topological variation might change in the future as we move beyond estimating a single \"tree of life\" toward estimating the myriad evolutionary histories underlying genomic variation.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11532913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications.","authors":"","doi":"10.1093/molbev/msae230","DOIUrl":"10.1093/molbev/msae230","url":null,"abstract":"","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":"41 11","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felsenstein's bootstrap is the most commonly used method to measure branch support in phylogenetics. Current sequencing technologies can result in massive sampling of taxa (e.g. SARS-CoV-2). In this case, the sequences are very similar, the trees are short, and the branches correspond to a small number of mutations (possibly 0). Nevertheless, these trees contain a strong signal, with unresolved parts but a low rate of false branches. With such data, Felsenstein's bootstrap is not satisfactory. Due to the frequentist nature of bootstrap sampling, the expected support of a branch corresponding to a single mutation is ∼63%, even though it is highly likely to be correct. Here, we propose a Bayesian version of the phylogenetic bootstrap in which sites are assigned uninformative prior probabilities. The branch support can then be interpreted as a posterior probability. We do not view the alignment as a small subsample of a large sample of sites, but rather as containing all available information (e.g. as with complete viral genomes, which are becoming routine). We give formulas for expected supports under the assumption of perfect phylogeny, in both the frequentist and Bayesian frameworks, where a branch corresponding to a single mutation now has an expected support of ∼90%. Simulations show that these theoretical results are robust to realistic data. Analyses on low-homoplasy viral and nonviral datasets show that Bayesian bootstrap support is easier to interpret, with high supports for branches very likely to be correct. As homoplasy increases, the two supports become closer and strongly correlated.
{"title":"The Bayesian Phylogenetic Bootstrap and its Application to Short Trees and Branches.","authors":"Frédéric Lemoine, Olivier Gascuel","doi":"10.1093/molbev/msae238","DOIUrl":"10.1093/molbev/msae238","url":null,"abstract":"<p><p>Felsenstein's bootstrap is the most commonly used method to measure branch support in phylogenetics. Current sequencing technologies can result in massive sampling of taxa (e.g. SARS-CoV-2). In this case, the sequences are very similar, the trees are short, and the branches correspond to a small number of mutations (possibly 0). Nevertheless, these trees contain a strong signal, with unresolved parts but a low rate of false branches. With such data, Felsenstein's bootstrap is not satisfactory. Due to the frequentist nature of bootstrap sampling, the expected support of a branch corresponding to a single mutation is ∼63%, even though it is highly likely to be correct. Here, we propose a Bayesian version of the phylogenetic bootstrap in which sites are assigned uninformative prior probabilities. The branch support can then be interpreted as a posterior probability. We do not view the alignment as a small subsample of a large sample of sites, but rather as containing all available information (e.g. as with complete viral genomes, which are becoming routine). We give formulas for expected supports under the assumption of perfect phylogeny, in both the frequentist and Bayesian frameworks, where a branch corresponding to a single mutation now has an expected support of ∼90%. Simulations show that these theoretical results are robust to realistic data. Analyses on low-homoplasy viral and nonviral datasets show that Bayesian bootstrap support is easier to interpret, with high supports for branches very likely to be correct. As homoplasy increases, the two supports become closer and strongly correlated.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142605471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kalle Leppälä, Flavio Augusto da Silva Coelho, Michaela Richter, Victor A Albert, Charlotte Lindqvist
Over the past 15 years, the D-statistic, a four-taxon test for organismal admixture (hybridization, or introgression) which incorporates single nucleotide polymorphism data with allelic patterns ABBA and BABA, has seen considerable use. This statistic seeks to discern significant deviation from either a given species tree assumption, or from the balanced incomplete lineage sorting that could otherwise defy this species tree. However, while the D-statistic can successfully discriminate admixture from incomplete lineage sorting, it is not a simple matter to determine the directionality of admixture using only four-leaf tree models. As such, methods have been developed that use five leaves to evaluate admixture. Among these, the DFOIL method ("FOIL", a mnemonic for "First-Outer-Inner-Last"), which tests allelic patterns on the "symmetric" tree S=(((1,2),(3,4)),5), succeeds in finding admixture direction for many five-taxon examples. However, DFOIL does not make full use of all symmetry, nor can DFOIL function properly when ancient samples are included because of the reliance on singleton patterns (such as BAAAA and ABAAA). Here, we take inspiration from DFOIL to develop a new and completely general family of five-leaf admixture tests, dubbed Δ-statistics, that can either incorporate or exclude the singleton allelic patterns depending on individual taxon and age sampling choices. We describe two new shapes that are also fully testable, namely the "asymmetric" tree A=((((1,2),3),4),5) and the "quasisymmetric" tree Q=(((1,2),3),(4,5)), which can considerably supplement the "symmetric" S=(((1,2),(3,4)),5) model used by DFOIL. We demonstrate the consistency of Δ-statistics under various simulated scenarios, and provide empirical examples using data from black, brown and polar bears, the latter also including two ancient polar bear samples from previous studies. Recently, DFOIL and one of these ancient samples was used to argue for a dominant polar bear → brown bear introgression direction. However, we find, using both this ancient polar bear and our own, that by far the strongest signal using both DFOIL and Δ-statistics on tree S is actually bidirectional gene flow of indistinguishable direction. Further experiments on trees A and Q instead highlight what were likely two phases of admixture: one with stronger brown bear → polar bear introgression in ancient times, and a more recent phase with predominant polar bear → brown bear directionality.
在过去的 15 年中,D 统计量(D-statistic)得到了广泛应用,它是一种生物混杂(杂交或引入)的四种群检验方法,结合了等位基因模式 ABBA 和 BABA 的单核苷酸多态性数据。该统计量旨在发现明显偏离特定物种树假设的情况,或偏离平衡的不完整世系分类的情况,否则可能会违背该物种树。然而,虽然 D 统计量可以成功地从不连贯世系排序中分辨出掺杂,但仅用四叶树模型来确定掺杂的方向性并不简单。因此,人们开发了使用五叶树来评估掺杂的方法。其中,在 "对称 "树 S =(((1,2),(3,4)),5)上测试等位基因模式的 DFOIL 方法成功地找到了许多 5 个物种实例的混杂方向。然而,DFOIL 并没有充分利用所有的对称性,而且由于依赖单子模式(如 BAAAA 和 ABAAA),当包含古代样本时,DFOIL 也无法正常工作。在此,我们从 DFOIL 中汲取灵感,开发了一个全新的、完全通用的五叶混杂检验系列,称为 Δ-统计量,它可以根据单个分类群和年龄取样的选择,纳入或排除单子等位基因模式。我们描述了两种也可完全检验的新形状,即 "非对称 "树 A = ((((1,2),3),4),5)和 "准对称 "树 Q = (((1,2),3),(4,5)) ,它们可以大大补充 DFOIL 使用的 "对称 "树 S = (((1,2),(3,4)),5) 模型。我们利用黑熊、棕熊和北极熊的数据证明了Δ统计量在各种模拟情况下的一致性,并提供了经验实例,后者还包括先前研究中的两个古老北极熊样本。最近,DFOIL 和其中一个古老样本被用来论证北极熊→棕熊的主导性引种方向。然而,我们利用这只远古北极熊和我们自己的北极熊样本发现,到目前为止,在树 S 上使用 DFOIL 和 Δ 统计的最强信号实际上是无法区分方向的双向基因流动。在树 A 和树 Q 上的进一步实验反而凸显了可能存在的两个混杂阶段:一个是远古时期较强的棕熊→北极熊引入,另一个是较近的北极熊→棕熊方向性占主导地位的阶段。代码和文档见 https://github.com/KalleLeppala/Delta-statistics。
{"title":"Five-leaf Generalizations of the D-statistic Reveal the Directionality of Admixture.","authors":"Kalle Leppälä, Flavio Augusto da Silva Coelho, Michaela Richter, Victor A Albert, Charlotte Lindqvist","doi":"10.1093/molbev/msae198","DOIUrl":"10.1093/molbev/msae198","url":null,"abstract":"<p><p>Over the past 15 years, the D-statistic, a four-taxon test for organismal admixture (hybridization, or introgression) which incorporates single nucleotide polymorphism data with allelic patterns ABBA and BABA, has seen considerable use. This statistic seeks to discern significant deviation from either a given species tree assumption, or from the balanced incomplete lineage sorting that could otherwise defy this species tree. However, while the D-statistic can successfully discriminate admixture from incomplete lineage sorting, it is not a simple matter to determine the directionality of admixture using only four-leaf tree models. As such, methods have been developed that use five leaves to evaluate admixture. Among these, the DFOIL method (\"FOIL\", a mnemonic for \"First-Outer-Inner-Last\"), which tests allelic patterns on the \"symmetric\" tree S=(((1,2),(3,4)),5), succeeds in finding admixture direction for many five-taxon examples. However, DFOIL does not make full use of all symmetry, nor can DFOIL function properly when ancient samples are included because of the reliance on singleton patterns (such as BAAAA and ABAAA). Here, we take inspiration from DFOIL to develop a new and completely general family of five-leaf admixture tests, dubbed Δ-statistics, that can either incorporate or exclude the singleton allelic patterns depending on individual taxon and age sampling choices. We describe two new shapes that are also fully testable, namely the \"asymmetric\" tree A=((((1,2),3),4),5) and the \"quasisymmetric\" tree Q=(((1,2),3),(4,5)), which can considerably supplement the \"symmetric\" S=(((1,2),(3,4)),5) model used by DFOIL. We demonstrate the consistency of Δ-statistics under various simulated scenarios, and provide empirical examples using data from black, brown and polar bears, the latter also including two ancient polar bear samples from previous studies. Recently, DFOIL and one of these ancient samples was used to argue for a dominant polar bear → brown bear introgression direction. However, we find, using both this ancient polar bear and our own, that by far the strongest signal using both DFOIL and Δ-statistics on tree S is actually bidirectional gene flow of indistinguishable direction. Further experiments on trees A and Q instead highlight what were likely two phases of admixture: one with stronger brown bear → polar bear introgression in ancient times, and a more recent phase with predominant polar bear → brown bear directionality.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142291462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinxin Li, Min Wang, Ming Zou, Xiaotong Guan, Shaohua Xu, Weitao Chen, Chongnv Wang, Yiyu Chen, Shunping He, Baocheng Guo
Whole-genome duplication (WGD), or polyploidization, is a major contributor to biodiversity. However, the establishment and survival of WGDs are often considered to be stochastic, since elucidating the processes of WGD establishment remains challenging. In the current study, we explored the processes leading to polyploidy establishment in snow carp (Cyprinidae: Schizothoracinae), a predominant component of the ichthyofauna of the Tibetan Plateau and its surrounding areas. Using large-scale genomic data from isoform sequencing, we analyzed ohnolog genealogies and divergence in hundreds to thousands of gene families across major snow carp lineages. Our findings demonstrated that independent autopolyploidization subsequent to speciation was prevalent, while autopolyploidization followed by speciation also occurred in the diversification of snow carp. This was further supported by matrilineal divergence and drainage evolution evidence. Contrary to the long-standing hypothesis that ancient polyploidization preceded the diversification of snow carp, we determined that polyploidy in extant snow carp was established by recurrent autopolyploidization events during the Pleistocene. These findings indicate that the diversification of extant snow carp resembles a coordinated duet: first, the uplift of the Tibetan Plateau orchestrated the biogeography and diversification of their diploid progenitors; then, the extensive Pliocene-Pleistocene climate changes acted as relay runners, further fueling diversification through recurrent autopolyploidization. Overall, this study not only reveals a hitherto unrecognized recent WGD lineage in vertebrates but also advances current understanding of WGD processes, emphasizing that WGD establishment is a nonstochastic event, emerging from numerous adaptations to environmental challenges and recurring throughout evolutionary history rather than merely in plants.
{"title":"Recent and Recurrent Autopolyploidization Fueled Diversification of Snow Carp on the Tibetan Plateau.","authors":"Xinxin Li, Min Wang, Ming Zou, Xiaotong Guan, Shaohua Xu, Weitao Chen, Chongnv Wang, Yiyu Chen, Shunping He, Baocheng Guo","doi":"10.1093/molbev/msae221","DOIUrl":"10.1093/molbev/msae221","url":null,"abstract":"<p><p>Whole-genome duplication (WGD), or polyploidization, is a major contributor to biodiversity. However, the establishment and survival of WGDs are often considered to be stochastic, since elucidating the processes of WGD establishment remains challenging. In the current study, we explored the processes leading to polyploidy establishment in snow carp (Cyprinidae: Schizothoracinae), a predominant component of the ichthyofauna of the Tibetan Plateau and its surrounding areas. Using large-scale genomic data from isoform sequencing, we analyzed ohnolog genealogies and divergence in hundreds to thousands of gene families across major snow carp lineages. Our findings demonstrated that independent autopolyploidization subsequent to speciation was prevalent, while autopolyploidization followed by speciation also occurred in the diversification of snow carp. This was further supported by matrilineal divergence and drainage evolution evidence. Contrary to the long-standing hypothesis that ancient polyploidization preceded the diversification of snow carp, we determined that polyploidy in extant snow carp was established by recurrent autopolyploidization events during the Pleistocene. These findings indicate that the diversification of extant snow carp resembles a coordinated duet: first, the uplift of the Tibetan Plateau orchestrated the biogeography and diversification of their diploid progenitors; then, the extensive Pliocene-Pleistocene climate changes acted as relay runners, further fueling diversification through recurrent autopolyploidization. Overall, this study not only reveals a hitherto unrecognized recent WGD lineage in vertebrates but also advances current understanding of WGD processes, emphasizing that WGD establishment is a nonstochastic event, emerging from numerous adaptations to environmental challenges and recurring throughout evolutionary history rather than merely in plants.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542630/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142504319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Logan S Whitehouse, Dylan D Ray, Daniel R Schrider
As population genetic data increase in size, new methods have been developed to store genetic information in efficient ways, such as tree sequences. These data structures are computationally and storage efficient but are not interchangeable with existing data structures used for many population genetic inference methodologies such as the use of convolutional neural networks applied to population genetic alignments. To better utilize these new data structures, we propose and implement a graph convolutional network to directly learn from tree sequence topology and node data, allowing for the use of neural network applications without an intermediate step of converting tree sequences to population genetic alignment format. We then compare our approach to standard convolutional neural network approaches on a set of previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression detection, and demographic model parameter inference. We show that tree sequences can be directly learned from using a graph convolutional network approach and can be used to perform well on these common population genetic inference tasks with accuracies roughly matching or even exceeding that of a convolutional neural network-based method. As tree sequences become more widely used in population genetic research, we foresee developments and optimizations of this work to provide a foundation for population genetic inference moving forward.
{"title":"Tree Sequences as a General-Purpose Tool for Population Genetic Inference.","authors":"Logan S Whitehouse, Dylan D Ray, Daniel R Schrider","doi":"10.1093/molbev/msae223","DOIUrl":"10.1093/molbev/msae223","url":null,"abstract":"<p><p>As population genetic data increase in size, new methods have been developed to store genetic information in efficient ways, such as tree sequences. These data structures are computationally and storage efficient but are not interchangeable with existing data structures used for many population genetic inference methodologies such as the use of convolutional neural networks applied to population genetic alignments. To better utilize these new data structures, we propose and implement a graph convolutional network to directly learn from tree sequence topology and node data, allowing for the use of neural network applications without an intermediate step of converting tree sequences to population genetic alignment format. We then compare our approach to standard convolutional neural network approaches on a set of previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression detection, and demographic model parameter inference. We show that tree sequences can be directly learned from using a graph convolutional network approach and can be used to perform well on these common population genetic inference tasks with accuracies roughly matching or even exceeding that of a convolutional neural network-based method. As tree sequences become more widely used in population genetic research, we foresee developments and optimizations of this work to provide a foundation for population genetic inference moving forward.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142504320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}