Hend Abu-Elmakarem, Oscar A MacLean, Frank Venter, Lindsey J Plenderleith, Richard L Culleton, Beatrice H Hahn, Paul M Sharp
Genes encoded within organelle genomes often evolve at rates different from those in the nuclear genome. Here, we analyzed the relative rates of nucleotide substitution in the mitochondrial, apicoplast and nuclear genomes in four different lineages of Plasmodium species (malaria parasites) infecting mammals. The rates of substitution in the three genomes exhibit substantial variation among lineages, with the relative rates of nuclear and mitochondrial DNA being particularly divergent between the Laverania (including Plasmodium falciparum) and Vivax lineages (including Plasmodium vivax). Consideration of synonymous and nonsynonymous substitution rates suggests that their variation is largely due to changes in mutation rates, with constraints on amino acid replacements remaining more similar among lineages. Mitochondrial DNA mutation rate variations among lineages may reflect differences in the long-term average lengths of the sexual and asexual stages of the life cycle. These rate variations have far-reaching implications for the use of molecular clocks to date Plasmodium evolution.
细胞器基因组中编码的基因通常以不同于核基因组的速度进化。在这里,我们分析了感染哺乳动物的疟原虫(疟疾寄生虫)四个不同品系的线粒体、细胞质和核基因组中核苷酸的相对替换率。三个基因组的替换率在不同品系之间存在很大差异,其中核DNA和线粒体DNA的相对替换率在Laverania品系(包括恶性疟原虫)和Vivax品系(包括间日疟原虫)之间的差异尤为明显。对同义和非同义替换率的考虑表明,它们的变化主要是由于突变率的变化造成的,而氨基酸替换的限制因素在不同品系之间仍然较为相似。不同种系之间线粒体 DNA 变异率的变化可能反映了生命周期有性阶段和无性阶段长期平均长度的差异。这些变异对利用分子钟来确定疟原虫进化的时间具有深远的影响。
{"title":"Remarkable evolutionary rate variations among lineages and among genome compartments in malaria parasites of mammals.","authors":"Hend Abu-Elmakarem, Oscar A MacLean, Frank Venter, Lindsey J Plenderleith, Richard L Culleton, Beatrice H Hahn, Paul M Sharp","doi":"10.1093/molbev/msae243","DOIUrl":"https://doi.org/10.1093/molbev/msae243","url":null,"abstract":"<p><p>Genes encoded within organelle genomes often evolve at rates different from those in the nuclear genome. Here, we analyzed the relative rates of nucleotide substitution in the mitochondrial, apicoplast and nuclear genomes in four different lineages of Plasmodium species (malaria parasites) infecting mammals. The rates of substitution in the three genomes exhibit substantial variation among lineages, with the relative rates of nuclear and mitochondrial DNA being particularly divergent between the Laverania (including Plasmodium falciparum) and Vivax lineages (including Plasmodium vivax). Consideration of synonymous and nonsynonymous substitution rates suggests that their variation is largely due to changes in mutation rates, with constraints on amino acid replacements remaining more similar among lineages. Mitochondrial DNA mutation rate variations among lineages may reflect differences in the long-term average lengths of the sexual and asexual stages of the life cycle. These rate variations have far-reaching implications for the use of molecular clocks to date Plasmodium evolution.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142687711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, advances in image processing and machine learning have fueled a paradigm shift in detecting genomic regions under natural selection. Early machine learning techniques employed population-genetic summary statistics as features, which focus on specific genomic patterns expected by adaptive and neutral processes. Though such engineered features are important when training data is limited, the ease at which simulated data can now be generated has led to the recent development of approaches that take in image representations of haplotype alignments and automatically extract important features using convolutional neural networks. Digital image processing methods termed α-molecules are a class of techniques for multi-scale representation of objects that can extract a diverse set of features from images. One such α-molecule method, termed wavelet decomposition, lends greater control over high-frequency components of images. Another α-molecule method, termed curvelet decomposition, is an extension of the wavelet concept that considers events occurring along curves within images. We show that application of these α-molecule techniques to extract features from image representations of haplotype alignments yield high true positive rate and accuracy to detect hard and soft selective sweep signatures from genomic data with both linear and nonlinear machine learning classifiers. Moreover, we find that such models are easy to visualize and interpret, with performance rivaling those of contemporary deep learning approaches for detecting sweeps.
{"title":"Digital image processing to detect adaptive evolution.","authors":"Md Ruhul Amin, Mahmudul Hasan, Michael DeGiorgio","doi":"10.1093/molbev/msae242","DOIUrl":"https://doi.org/10.1093/molbev/msae242","url":null,"abstract":"<p><p>In recent years, advances in image processing and machine learning have fueled a paradigm shift in detecting genomic regions under natural selection. Early machine learning techniques employed population-genetic summary statistics as features, which focus on specific genomic patterns expected by adaptive and neutral processes. Though such engineered features are important when training data is limited, the ease at which simulated data can now be generated has led to the recent development of approaches that take in image representations of haplotype alignments and automatically extract important features using convolutional neural networks. Digital image processing methods termed α-molecules are a class of techniques for multi-scale representation of objects that can extract a diverse set of features from images. One such α-molecule method, termed wavelet decomposition, lends greater control over high-frequency components of images. Another α-molecule method, termed curvelet decomposition, is an extension of the wavelet concept that considers events occurring along curves within images. We show that application of these α-molecule techniques to extract features from image representations of haplotype alignments yield high true positive rate and accuracy to detect hard and soft selective sweep signatures from genomic data with both linear and nonlinear machine learning classifiers. Moreover, we find that such models are easy to visualize and interpret, with performance rivaling those of contemporary deep learning approaches for detecting sweeps.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142681669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multiple rounds of whole-genome duplication (WGD) followed by diploidization have occurred throughout the evolutionary history of angiosperms. Much work has been done to model the genomic consequences and evolutionary significance of WGD. While researchers have historically modeled polyploids as either allopolyploids or autopolyploids, the variety of natural polyploids span a continuum of differentiation across multiple parameters, such as the extent of polysomic vs. disomic inheritance, and the degree of genetic differentiation between the ancestral lineages. Here we present a forward-time polyploid genome evolution simulator called SpecKS. SpecKS models polyploid speciation as originating from a 2D continuum, whose dimensions account for both the level of genetic differentiation between the ancestral parental genomes, as well the time lag between ancestral speciation and their subsequent reunion in the derived polyploid. Using extensive simulations, we demonstrate that changes in initial conditions along either dimension of the 2D continuum deterministically affect the shape of the Ks histogram. Our findings indicate that the error in the common method of estimating WGD time from the Ks histogram peak scales with the degree of allopolyploidy, and we present an alternative, accurate estimation method that is independent of the degree of allopolyploidy. Lastly, we use SpecKS to derive tests that infer both the lag time between parental divergence and WGD time, and the diversity of the ancestral species, from an input Ks histogram. We apply the latter test to transcriptomic data from over 200 species across the plant kingdom, the results of which are concordant with the prevailing theory that the majority of angiosperm lineages are derived from diverse parental genomes and may be of allopolyploid origin.
{"title":"Accurate Inference of the Polyploid Continuum using Forward-time Simulations.","authors":"Tamsen Dunn, Arun Sethuraman","doi":"10.1093/molbev/msae241","DOIUrl":"https://doi.org/10.1093/molbev/msae241","url":null,"abstract":"<p><p>Multiple rounds of whole-genome duplication (WGD) followed by diploidization have occurred throughout the evolutionary history of angiosperms. Much work has been done to model the genomic consequences and evolutionary significance of WGD. While researchers have historically modeled polyploids as either allopolyploids or autopolyploids, the variety of natural polyploids span a continuum of differentiation across multiple parameters, such as the extent of polysomic vs. disomic inheritance, and the degree of genetic differentiation between the ancestral lineages. Here we present a forward-time polyploid genome evolution simulator called SpecKS. SpecKS models polyploid speciation as originating from a 2D continuum, whose dimensions account for both the level of genetic differentiation between the ancestral parental genomes, as well the time lag between ancestral speciation and their subsequent reunion in the derived polyploid. Using extensive simulations, we demonstrate that changes in initial conditions along either dimension of the 2D continuum deterministically affect the shape of the Ks histogram. Our findings indicate that the error in the common method of estimating WGD time from the Ks histogram peak scales with the degree of allopolyploidy, and we present an alternative, accurate estimation method that is independent of the degree of allopolyploidy. Lastly, we use SpecKS to derive tests that infer both the lag time between parental divergence and WGD time, and the diversity of the ancestral species, from an input Ks histogram. We apply the latter test to transcriptomic data from over 200 species across the plant kingdom, the results of which are concordant with the prevailing theory that the majority of angiosperm lineages are derived from diverse parental genomes and may be of allopolyploid origin.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142644353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bats possess a range of distinctive characteristics, including flight, echolocation, impressive longevity, and the ability to harbor various zoonotic pathogens. Additionally, they account for the second-highest species diversity among mammalian orders, yet their phylogenetic relationships and demographic history remain underexplored. Here, we generated de novo assembled genomes for 17 bat species and two of their mammalian relatives (the Amur hedgehog and Chinese mole shrew), with 12 genomes reaching chromosome-level assembly. Comparative genomics and ChIP-seq assays identified newly gained genomic regions in bats potentially linked to the regulation of gene activity and expression. Notably, some antiviral infection related gene under positive selection exhibited the activity of suppressing cancer, evidencing the linkage between virus tolerance and cancer resistance in bats. By integrating published bat genome assemblies, phylogenetic reconstruction established the proximity of noctilionoid bats to vesper bats. Interestingly, we found two distinct patterns of ancient population dynamics in bats and population changes since the last-glacial maximum do not reflect species phylogenetic relationships. These findings enriched our understanding of adaptive mechanisms and demographic history of bats.
{"title":"Comparative genomics provides insights into adaptive evolution and demographics of bats.","authors":"Gaoming Liu, Qi Pan, Pingfen Zhu, Xinyu Guo, Zhan Zhang, Zihao Li, Yaolei Zhang, Xiaoxiao Zhang, Jiahao Wang, Weiqiang Liu, Chunyan Hu, Yang Yu, Xiao Wang, Weixiao Chen, Meng Li, Wenhua Yu, Xin Liu, Inge Seim, Guangyi Fan, Xuming Zhou","doi":"10.1093/molbev/msae208","DOIUrl":"https://doi.org/10.1093/molbev/msae208","url":null,"abstract":"<p><p>Bats possess a range of distinctive characteristics, including flight, echolocation, impressive longevity, and the ability to harbor various zoonotic pathogens. Additionally, they account for the second-highest species diversity among mammalian orders, yet their phylogenetic relationships and demographic history remain underexplored. Here, we generated de novo assembled genomes for 17 bat species and two of their mammalian relatives (the Amur hedgehog and Chinese mole shrew), with 12 genomes reaching chromosome-level assembly. Comparative genomics and ChIP-seq assays identified newly gained genomic regions in bats potentially linked to the regulation of gene activity and expression. Notably, some antiviral infection related gene under positive selection exhibited the activity of suppressing cancer, evidencing the linkage between virus tolerance and cancer resistance in bats. By integrating published bat genome assemblies, phylogenetic reconstruction established the proximity of noctilionoid bats to vesper bats. Interestingly, we found two distinct patterns of ancient population dynamics in bats and population changes since the last-glacial maximum do not reflect species phylogenetic relationships. These findings enriched our understanding of adaptive mechanisms and demographic history of bats.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Dalldorf, Ying Hefner, Richard Szubin, Josefin Johnsen, Elsayed Mohamed, Gaoyuan Li, Jayanth Krishnan, Adam M Feist, Bernhard O Palsson, Daniel C Zielinski
The Transcriptional Regulatory Network (TRN) in bacteria is thought to rapidly evolve in response to selection pressures, modulating transcription factor (TF) activities and interactions. In order to probe the limits and mechanisms surrounding the short-term adaptability of the TRN, we generated, evolved, and characterized knockout (KO) strains in E. coli for 11 regulators selected based on measured growth impact on glucose minimal media. All but one knockout strain (Δlrp) were able to recover growth and did so requiring few convergent mutations. We found that the TF knockout adaptations could be divided into four categories: 1) Strains (ΔargR, ΔbasR, Δlon, ΔzntR, Δzur) that recovered growth without any regulator-specific adaptations, likely due to minimal activity of the regulator on the growth condition, 2) Strains (ΔcytR, ΔmlrA, ΔybaO) that recovered growth without TF-specific mutations but with differential expression of regulators with overlapping regulons to the KO'ed TF, 3) Strains (Δcrp, Δfur) that recovered growth using convergent mutations within their regulatory networks, including regulated promoters and connected regulators, and 4) Strains (Δlrp) that were unable to fully recover growth, seemingly due to the broad connectivity of the TF within the TRN. Analyzing growth capabilities in evolved and unevolved strains indicated that growth adaptation can restore fitness to diverse substrates often despite a lack of TF-specific mutations. This work reveals the breadth of TRN adaptive mechanisms and suggests these mechanisms can be anticipated based on the network and functional context of the perturbed TFs.
{"title":"Diversity of transcriptional regulatory adaptation in E. coli.","authors":"Christopher Dalldorf, Ying Hefner, Richard Szubin, Josefin Johnsen, Elsayed Mohamed, Gaoyuan Li, Jayanth Krishnan, Adam M Feist, Bernhard O Palsson, Daniel C Zielinski","doi":"10.1093/molbev/msae240","DOIUrl":"https://doi.org/10.1093/molbev/msae240","url":null,"abstract":"<p><p>The Transcriptional Regulatory Network (TRN) in bacteria is thought to rapidly evolve in response to selection pressures, modulating transcription factor (TF) activities and interactions. In order to probe the limits and mechanisms surrounding the short-term adaptability of the TRN, we generated, evolved, and characterized knockout (KO) strains in E. coli for 11 regulators selected based on measured growth impact on glucose minimal media. All but one knockout strain (Δlrp) were able to recover growth and did so requiring few convergent mutations. We found that the TF knockout adaptations could be divided into four categories: 1) Strains (ΔargR, ΔbasR, Δlon, ΔzntR, Δzur) that recovered growth without any regulator-specific adaptations, likely due to minimal activity of the regulator on the growth condition, 2) Strains (ΔcytR, ΔmlrA, ΔybaO) that recovered growth without TF-specific mutations but with differential expression of regulators with overlapping regulons to the KO'ed TF, 3) Strains (Δcrp, Δfur) that recovered growth using convergent mutations within their regulatory networks, including regulated promoters and connected regulators, and 4) Strains (Δlrp) that were unable to fully recover growth, seemingly due to the broad connectivity of the TF within the TRN. Analyzing growth capabilities in evolved and unevolved strains indicated that growth adaptation can restore fitness to diverse substrates often despite a lack of TF-specific mutations. This work reveals the breadth of TRN adaptive mechanisms and suggests these mechanisms can be anticipated based on the network and functional context of the perturbed TFs.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142624031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Limbs are a defining characteristic of tetrapods, yet numerous taxa, primarily among amphibians and reptiles, have independently lost limbs as an adaptation to new ecological niches. To elucidate the genetic factors contributing to this convergent limb loss, we present a 12 Gb chromosome-level assembly of the Banna caecilian (Ichthyophis bannanicus), a limbless amphibian. Our comparative analysis, which includes the reconstruction of amphibian karyotype evolution, reveals constrained gene length evolution in a subset of developmental genes across three large genomes. Investigation of limb development genes uncovered the loss of Grem1 in caecilians and Tulp3 in snakes. Interestingly, caecilians and snakes share a significantly larger number of convergent degenerated conserved non-coding elements (dCNEs) than limbless lizards, which have a shorter evolutionary history of limb loss. These convergent dCNEs overlap significantly with active genomic regions during mouse limb development and are conserved in limbed species, suggesting their essential role in limb patterning in the tetrapod common ancestor. While most convergent dCNEs emerged in the jawed vertebrate ancestor, coinciding with the origin of paired appendage, more recent dCNEs also contribute to limb development, as demonstrated through functional experiments. Our study provides novel insights into the regulatory elements associated with limb development and loss, offering an evolutionary perspective on the genetic basis of morphological specialization.
{"title":"Convergent degenerated regulatory elements associated with limb loss in limbless amphibians and reptiles.","authors":"Chenglong Zhu, Shengyou Li, Daizhen Zhang, Jinjin Zhang, Gang Wang, Botong Zhou, Jiangmin Zheng, Wenjie Xu, Zhengfei Wang, Xueli Gao, Qiuning Liu, Tingfeng Xue, Huabin Zhang, Chunhui Li, Baoming Ge, Yuxuan Liu, Qiang Qiu, Huixian Zhang, Jinghui Huang, Boping Tang, Kun Wang","doi":"10.1093/molbev/msae239","DOIUrl":"https://doi.org/10.1093/molbev/msae239","url":null,"abstract":"<p><p>Limbs are a defining characteristic of tetrapods, yet numerous taxa, primarily among amphibians and reptiles, have independently lost limbs as an adaptation to new ecological niches. To elucidate the genetic factors contributing to this convergent limb loss, we present a 12 Gb chromosome-level assembly of the Banna caecilian (Ichthyophis bannanicus), a limbless amphibian. Our comparative analysis, which includes the reconstruction of amphibian karyotype evolution, reveals constrained gene length evolution in a subset of developmental genes across three large genomes. Investigation of limb development genes uncovered the loss of Grem1 in caecilians and Tulp3 in snakes. Interestingly, caecilians and snakes share a significantly larger number of convergent degenerated conserved non-coding elements (dCNEs) than limbless lizards, which have a shorter evolutionary history of limb loss. These convergent dCNEs overlap significantly with active genomic regions during mouse limb development and are conserved in limbed species, suggesting their essential role in limb patterning in the tetrapod common ancestor. While most convergent dCNEs emerged in the jawed vertebrate ancestor, coinciding with the origin of paired appendage, more recent dCNEs also contribute to limb development, as demonstrated through functional experiments. Our study provides novel insights into the regulatory elements associated with limb development and loss, offering an evolutionary perspective on the genetic basis of morphological specialization.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felsenstein's bootstrap is the most commonly used method to measure branch support in phylogenetics. Current sequencing technologies can result in massive sampling of taxa (e.g. SARS-CoV-2). In this case, the sequences are very similar, the trees are short, and the branches correspond to a small number of mutations (possibly 0). Nevertheless, these trees contain a strong signal, with unresolved parts but a low rate of false branches. With such data, Felsenstein's bootstrap is not satisfactory. Due to the frequentist nature of bootstrap sampling, the expected support of a branch corresponding to a single mutation is ∼63%, even though it is highly likely to be correct. Here we propose a Bayesian version of the phylogenetic bootstrap in which sites are assigned uninformative prior probabilities. The branch support can then be interpreted as a posterior probability. We do not view the alignment as a small subsample of a large sample of sites, but rather as containing all available information (e.g., as with complete viral genomes, which are becoming routine). We give formulas for expected supports under the assumption of perfect phylogeny, in both the frequentist and Bayesian frameworks, where a branch corresponding to a single mutation now has an expected support of ∼90%. Simulations show that these theoretical results are robust to realistic data. Analyses on low-homoplasy viral and non-viral datasets show that Bayesian bootstrap support is easier to interpret, with high supports for branches very likely to be correct. As homoplasy increases, the two supports become closer and strongly correlated.
{"title":"The Bayesian Phylogenetic Bootstrap, Application to Short Trees and Branches.","authors":"Frédéric Lemoine, Olivier Gascuel","doi":"10.1093/molbev/msae238","DOIUrl":"https://doi.org/10.1093/molbev/msae238","url":null,"abstract":"<p><p>Felsenstein's bootstrap is the most commonly used method to measure branch support in phylogenetics. Current sequencing technologies can result in massive sampling of taxa (e.g. SARS-CoV-2). In this case, the sequences are very similar, the trees are short, and the branches correspond to a small number of mutations (possibly 0). Nevertheless, these trees contain a strong signal, with unresolved parts but a low rate of false branches. With such data, Felsenstein's bootstrap is not satisfactory. Due to the frequentist nature of bootstrap sampling, the expected support of a branch corresponding to a single mutation is ∼63%, even though it is highly likely to be correct. Here we propose a Bayesian version of the phylogenetic bootstrap in which sites are assigned uninformative prior probabilities. The branch support can then be interpreted as a posterior probability. We do not view the alignment as a small subsample of a large sample of sites, but rather as containing all available information (e.g., as with complete viral genomes, which are becoming routine). We give formulas for expected supports under the assumption of perfect phylogeny, in both the frequentist and Bayesian frameworks, where a branch corresponding to a single mutation now has an expected support of ∼90%. Simulations show that these theoretical results are robust to realistic data. Analyses on low-homoplasy viral and non-viral datasets show that Bayesian bootstrap support is easier to interpret, with high supports for branches very likely to be correct. As homoplasy increases, the two supports become closer and strongly correlated.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142605471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanxing Xu, Jiaojiao Liu, Xiaoxi Zhang, Jia Wen, Qidi Feng, Yang Gao, Yuwen Pan, Yan Lu, Asifullah Khan, Shuhua Xu
While whole-genome sequencing has been applied extensively to investigate the genetic diversity of global populations, ethnic minority groups in Pakistan are generally underrepresented. In particular, little is known about the genetic origin and highland adaptation of the Pamirian Wakhi people. According to Chinese historical records, the geographical location and language usage of Wakhi may be closely related to Xinjiang Tajiks (XJT). In this study, based on high-coverage (∼30×) whole-genome sequencing of eight Wakhi and 25 XJT individuals, we performed data analyses together with worldwide populations to gain insights into their genetic composition, demography, and adaptive evolution to the highland environment. The Wakhi derived more than 85% of their ancestry from West Eurasian populations (European ∼44.5%, South Asian ∼42.2%) and 10% from East Eurasian populations (Siberian ∼6.0%, East Asian ∼4.3%). Modeling the admixture history of the Wakhi indicated that the early West-East admixture occurred approximately 3,875-2,250 years ago and that the recent admixture occurred 750-375 years ago. We identified selection signatures across EGLN3, in particular, a distinctive evolutionary signature was observed, and a certain underlying selected haplotype showed higher frequency (87.5%) in the Wakhi than in nearby XJT and other highlanders. Interestingly, we found high-frequency archaic sequences in the Wakhi genome, which overlapped with several genes related to cellular signaling transduction, including MAGI2, previously associated with high-altitude adaptation. Our analysis indicates that the Wakhi are distinct from the XJTs and Tajikistan Tajiks, and shed light on the Wakhi's ancestral origin and genetic basis of high-altitude adaptation.
{"title":"Multiple-wave admixture and adaptive evolution of the Pamirian Wakhi people.","authors":"Wanxing Xu, Jiaojiao Liu, Xiaoxi Zhang, Jia Wen, Qidi Feng, Yang Gao, Yuwen Pan, Yan Lu, Asifullah Khan, Shuhua Xu","doi":"10.1093/molbev/msae237","DOIUrl":"10.1093/molbev/msae237","url":null,"abstract":"<p><p>While whole-genome sequencing has been applied extensively to investigate the genetic diversity of global populations, ethnic minority groups in Pakistan are generally underrepresented. In particular, little is known about the genetic origin and highland adaptation of the Pamirian Wakhi people. According to Chinese historical records, the geographical location and language usage of Wakhi may be closely related to Xinjiang Tajiks (XJT). In this study, based on high-coverage (∼30×) whole-genome sequencing of eight Wakhi and 25 XJT individuals, we performed data analyses together with worldwide populations to gain insights into their genetic composition, demography, and adaptive evolution to the highland environment. The Wakhi derived more than 85% of their ancestry from West Eurasian populations (European ∼44.5%, South Asian ∼42.2%) and 10% from East Eurasian populations (Siberian ∼6.0%, East Asian ∼4.3%). Modeling the admixture history of the Wakhi indicated that the early West-East admixture occurred approximately 3,875-2,250 years ago and that the recent admixture occurred 750-375 years ago. We identified selection signatures across EGLN3, in particular, a distinctive evolutionary signature was observed, and a certain underlying selected haplotype showed higher frequency (87.5%) in the Wakhi than in nearby XJT and other highlanders. Interestingly, we found high-frequency archaic sequences in the Wakhi genome, which overlapped with several genes related to cellular signaling transduction, including MAGI2, previously associated with high-altitude adaptation. Our analysis indicates that the Wakhi are distinct from the XJTs and Tajikistan Tajiks, and shed light on the Wakhi's ancestral origin and genetic basis of high-altitude adaptation.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruopeng Xie, Dillon C Adam, Shu Hu, Benjamin J Cowling, Olivier Gascuel, Anna Zhukova, Vijaykrishna Dhanasekaran
Phylodynamics is central to understanding infectious disease dynamics through the integration of genomic and epidemiological data. Despite advancements, including the application of deep learning to overcome computational limitations, significant challenges persist due to data inadequacies and statistical unidentifiability of key parameters. These issues are particularly pronounced in poorly resolved phylogenies, commonly observed in outbreaks such as SARS-CoV-2. In this study, we conducted a thorough evaluation of PhyloDeep, a deep learning inference tool for phylodynamics, assessing its performance on poorly resolved phylogenies. Our findings reveal the limited predictive accuracy of PhyloDeep (and other state-of-the-art approaches) in these scenarios. However, models trained on poorly resolved, realistically simulated trees demonstrate improved predictive power, despite not being infallible, especially in scenarios with superspreading dynamics, whose parameters are challenging to capture accurately. Notably, we observe markedly improved performance through the integration of minimal contact tracing data, which refines poorly resolved trees. Applying this approach to a sample of SARS-CoV-2 sequences partially matched to contact tracing from Hong Kong yields informative estimates of superspreading potential, extending beyond the scope of contact tracing data alone. Our findings demonstrate the potential for enhancing phylodynamic analysis through complementary data integration, ultimately increasing the precision of epidemiological predictions crucial for public health decision making and outbreak control.
{"title":"Integrating contact tracing data to enhance outbreak phylodynamic inference: a deep learning approach.","authors":"Ruopeng Xie, Dillon C Adam, Shu Hu, Benjamin J Cowling, Olivier Gascuel, Anna Zhukova, Vijaykrishna Dhanasekaran","doi":"10.1093/molbev/msae232","DOIUrl":"https://doi.org/10.1093/molbev/msae232","url":null,"abstract":"<p><p>Phylodynamics is central to understanding infectious disease dynamics through the integration of genomic and epidemiological data. Despite advancements, including the application of deep learning to overcome computational limitations, significant challenges persist due to data inadequacies and statistical unidentifiability of key parameters. These issues are particularly pronounced in poorly resolved phylogenies, commonly observed in outbreaks such as SARS-CoV-2. In this study, we conducted a thorough evaluation of PhyloDeep, a deep learning inference tool for phylodynamics, assessing its performance on poorly resolved phylogenies. Our findings reveal the limited predictive accuracy of PhyloDeep (and other state-of-the-art approaches) in these scenarios. However, models trained on poorly resolved, realistically simulated trees demonstrate improved predictive power, despite not being infallible, especially in scenarios with superspreading dynamics, whose parameters are challenging to capture accurately. Notably, we observe markedly improved performance through the integration of minimal contact tracing data, which refines poorly resolved trees. Applying this approach to a sample of SARS-CoV-2 sequences partially matched to contact tracing from Hong Kong yields informative estimates of superspreading potential, extending beyond the scope of contact tracing data alone. Our findings demonstrate the potential for enhancing phylodynamic analysis through complementary data integration, ultimately increasing the precision of epidemiological predictions crucial for public health decision making and outbreak control.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142575151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D Madern, F Halgand, C Houée-Levin, A-B Dufour, S Coquille, S Ansanay-Alex, S Sacquin-Mora, C Brochier-Armanet
Malate dehydrogenases (MalDH) (EC.1.1.1.37), which are involved in the conversion of oxaloacetate to pyruvate in the tricarboxylic acid cycle, are a relevant model for the study of enzyme evolution and adaptation. Likewise, a recent study showed that Methanococcales, a major lineage of Archaea, is a good model to study the molecular processes of proteome thermoadaptation in prokaryotes. Here, we use ancestral sequence reconstruction and paleoenzymology to characterize both ancient and extant MalDHs. We observe a good correlation between inferred optimal growth temperatures (OGTs) and experimental optimal temperatures for activity (A-Topt). In particular, we show that the MalDH present in the ancestor of Methanococcales was hyperthermostable and had an A-Topt of 80°C, consistent with a hyperthermophilic lifestyle. This ancestor gave rise to two lineages with different thermal constraints, one remaining hyperthermophilic while the other underwent several independent adaptations to colder environments. Surprisingly, the enzymes of the first lineage have retained a thermoresistant behavior (i.e., strong thermostability and high A-Topt), whereas the ancestor of the second lineage shows a strong thermostability, but a reduced A-Topt. Using mutants, we mimic the adaptation trajectory towards mesophily and show that it is possible to significantly reduce the A-Topt without altering the thermostability of the enzyme by introducing a few mutations. Finally, we reveal an unexpected link between thermostability and the ability to resist γ-irradiation-induced unfolding.
{"title":"The characterization of ancient Methanococcales malate dehydrogenases reveals that strong thermal stability prevents unfolding under intense γ-irradiation.","authors":"D Madern, F Halgand, C Houée-Levin, A-B Dufour, S Coquille, S Ansanay-Alex, S Sacquin-Mora, C Brochier-Armanet","doi":"10.1093/molbev/msae231","DOIUrl":"https://doi.org/10.1093/molbev/msae231","url":null,"abstract":"<p><p>Malate dehydrogenases (MalDH) (EC.1.1.1.37), which are involved in the conversion of oxaloacetate to pyruvate in the tricarboxylic acid cycle, are a relevant model for the study of enzyme evolution and adaptation. Likewise, a recent study showed that Methanococcales, a major lineage of Archaea, is a good model to study the molecular processes of proteome thermoadaptation in prokaryotes. Here, we use ancestral sequence reconstruction and paleoenzymology to characterize both ancient and extant MalDHs. We observe a good correlation between inferred optimal growth temperatures (OGTs) and experimental optimal temperatures for activity (A-Topt). In particular, we show that the MalDH present in the ancestor of Methanococcales was hyperthermostable and had an A-Topt of 80°C, consistent with a hyperthermophilic lifestyle. This ancestor gave rise to two lineages with different thermal constraints, one remaining hyperthermophilic while the other underwent several independent adaptations to colder environments. Surprisingly, the enzymes of the first lineage have retained a thermoresistant behavior (i.e., strong thermostability and high A-Topt), whereas the ancestor of the second lineage shows a strong thermostability, but a reduced A-Topt. Using mutants, we mimic the adaptation trajectory towards mesophily and show that it is possible to significantly reduce the A-Topt without altering the thermostability of the enzyme by introducing a few mutations. Finally, we reveal an unexpected link between thermostability and the ability to resist γ-irradiation-induced unfolding.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142568125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}