Pub Date : 2024-04-18DOI: 10.1007/s00239-024-10172-1
Leo Douglas Creasey, Eran Tauber
Hypothesizing that CpG codon dyads, formed by consecutive codons containing a cytosine-guanine pair (NNC-GNN), may play a crucial role in gene function, we conducted an extensive analysis to investigate their distribution and conservation within mammalian genes. Our findings reveal that genes characterized by a high density of CpG codon dyads are notably associated with homeobox domains and RNA polymerase II transcription factors. Conversely, genes exhibiting low CpG codon dyad density have links to DNA damage repair and mitosis. Importantly, our study identifies a re-markable increase in expressed genes that harbor CpG during embryonic development, suggesting their potential involvement in gene regulation at these developmental stages. These results under-score the functional significance of CpG codon dyads in DNA methylation and gene expression, fur-ther demonstrating the coevolution of consecutive codons and their contribution to codon usage bias.
我们推测,由含有一对胞嘧啶-鸟嘌呤(NNC-GNN)的连续密码子形成的 CpG 密码子二联体可能在基因功能中起着至关重要的作用,因此我们进行了广泛的分析,研究它们在哺乳动物基因中的分布和保存情况。我们的研究结果表明,CpG密码子二元对密度高的基因主要与同源染色体结构域和RNA聚合酶II转录因子有关。相反,CpG密码子二联体密度低的基因则与DNA损伤修复和有丝分裂有关。重要的是,我们的研究发现,在胚胎发育过程中,含有 CpG 的表达基因明显增加,这表明它们可能参与了这些发育阶段的基因调控。这些结果证明了 CpG 密码子二元对在 DNA 甲基化和基因表达中的功能意义,进一步证明了连续密码子的共同进化及其对密码子使用偏倚的贡献。
{"title":"Interconnected Codons: Unravelling the Epigenetic Significance of Flanking Sequences in CpG Dyads","authors":"Leo Douglas Creasey, Eran Tauber","doi":"10.1007/s00239-024-10172-1","DOIUrl":"https://doi.org/10.1007/s00239-024-10172-1","url":null,"abstract":"<p>Hypothesizing that CpG codon dyads, formed by consecutive codons containing a cytosine-guanine pair (NNC-GNN), may play a crucial role in gene function, we conducted an extensive analysis to investigate their distribution and conservation within mammalian genes. Our findings reveal that genes characterized by a high density of CpG codon dyads are notably associated with homeobox domains and RNA polymerase II transcription factors. Conversely, genes exhibiting low CpG codon dyad density have links to DNA damage repair and mitosis. Importantly, our study identifies a re-markable increase in expressed genes that harbor CpG during embryonic development, suggesting their potential involvement in gene regulation at these developmental stages. These results under-score the functional significance of CpG codon dyads in DNA methylation and gene expression, fur-ther demonstrating the coevolution of consecutive codons and their contribution to codon usage bias.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"39 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140610066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1007/s00239-024-10166-z
Zhenxin Fan, Rusong Zhang, Anbo Zhou, Jody Hey, Yang Song, Naoki Osada, Yuzuru Hamada, Bisong Yue, Jinchuan Xing, Jing Li
The genus Macaca is widely distributed, occupies a variety of habitats, shows diverse phenotypic characteristics, and is one of the best-studied genera of nonhuman primates. Here, we reported five re-sequencing Macaca genomes, including one M. cyclopis, one M. fuscata, one M. thibetana, one M. silenus, and one M. sylvanus. Together with published genomes of other macaque species, we combined 20 genome sequences of 10 macaque species to investigate the gene introgression and genetic differences among the species. The network analysis of the SNV-fragment trees indicates a reticular phylogeny of macaque species. Combining the results from various analytical methods, we identified extensive ancient introgression events among macaque species. The multiple introgression signals between different species groups were also observed, such as between fascicularis group species and silenus group species. However, gene flow signals between fascicularis and sinica group were not as strong as those between fascicularis group and silenus group. On the other hand, the unidirect gene flow in M. arctoides probably occurred between the progenitor of M. arctoides and the common ancestor of fascicularis group. Our study also shows that the genetic backgrounds and genetic diversity of different macaques vary dramatically among species, even among populations of the same species. In conclusion, using whole genome sequences and multiple methods, we have studied the evolutionary history of the genus Macaca and provided evidence for extensive introgression among the species.
{"title":"Genomic Evidence for the Complex Evolutionary History of Macaques (Genus Macaca)","authors":"Zhenxin Fan, Rusong Zhang, Anbo Zhou, Jody Hey, Yang Song, Naoki Osada, Yuzuru Hamada, Bisong Yue, Jinchuan Xing, Jing Li","doi":"10.1007/s00239-024-10166-z","DOIUrl":"https://doi.org/10.1007/s00239-024-10166-z","url":null,"abstract":"<p>The genus <i>Macaca</i> is widely distributed, occupies a variety of habitats, shows diverse phenotypic characteristics, and is one of the best-studied genera of nonhuman primates. Here, we reported five re-sequencing <i>Macaca</i> genomes, including one <i>M. cyclopis</i>, one <i>M. fuscata</i>, one <i>M. thibetana</i>, one <i>M. silenus</i>, and one <i>M. sylvanus</i>. Together with published genomes of other macaque species, we combined 20 genome sequences of 10 macaque species to investigate the gene introgression and genetic differences among the species. The network analysis of the SNV-fragment trees indicates a reticular phylogeny of macaque species. Combining the results from various analytical methods, we identified extensive ancient introgression events among macaque species. The multiple introgression signals between different species groups were also observed, such as between fascicularis group species and silenus group species. However, gene flow signals between <i>fascicularis</i> and <i>sinica</i> group were not as strong as those between <i>fascicularis</i> group and <i>silenus</i> group. On the other hand, the unidirect gene flow in <i>M. arctoides</i> probably occurred between the progenitor of <i>M. arctoides</i> and the common ancestor of <i>fascicularis</i> group. Our study also shows that the genetic backgrounds and genetic diversity of different macaques vary dramatically among species, even among populations of the same species. In conclusion, using whole genome sequences and multiple methods, we have studied the evolutionary history of the genus <i>Macaca</i> and provided evidence for extensive introgression among the species.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"170 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140610683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2024-02-28DOI: 10.1007/s00239-024-10156-1
Katherine D Chau, Frances E Hauser, Alexander Van Nynatten, Jacob M Daane, Matthew P Harris, Belinda S W Chang, Nathan R Lovejoy
Ecological and evolutionary transitions offer an excellent opportunity to examine the molecular basis of adaptation. Fishes of the order Beloniformes include needlefishes, flyingfishes, halfbeaks, and allies, and comprise over 200 species occupying a wide array of habitats-from the marine epipelagic zone to tropical rainforest rivers. These fishes also exhibit a diversity of diets, including piscivory, herbivory, and zooplanktivory. We investigated how diet and habitat affected the molecular evolution of cone opsins, which play a key role in bright light and colour vision and are tightly linked to ecology and life history. We analyzed a targeted-capture dataset to reconstruct the evolutionary history of beloniforms and assemble cone opsin sequences. We implemented codon-based clade models of evolution to examine how molecular evolution was affected by habitat and diet. We found high levels of positive selection in medium- and long-wavelength beloniform opsins, with piscivores showing increased positive selection in medium-wavelength opsins and zooplanktivores showing increased positive selection in long-wavelength opsins. In contrast, short-wavelength opsins showed purifying selection. While marine/freshwater habitat transitions have an effect on opsin molecular evolution, we found that diet plays a more important role. Our study suggests that evolutionary transitions along ecological axes produce complex adaptive interactions that affect patterns of selection on genes that underlie vision.
{"title":"Multiple Ecological Axes Drive Molecular Evolution of Cone Opsins in Beloniform Fishes.","authors":"Katherine D Chau, Frances E Hauser, Alexander Van Nynatten, Jacob M Daane, Matthew P Harris, Belinda S W Chang, Nathan R Lovejoy","doi":"10.1007/s00239-024-10156-1","DOIUrl":"10.1007/s00239-024-10156-1","url":null,"abstract":"<p><p>Ecological and evolutionary transitions offer an excellent opportunity to examine the molecular basis of adaptation. Fishes of the order Beloniformes include needlefishes, flyingfishes, halfbeaks, and allies, and comprise over 200 species occupying a wide array of habitats-from the marine epipelagic zone to tropical rainforest rivers. These fishes also exhibit a diversity of diets, including piscivory, herbivory, and zooplanktivory. We investigated how diet and habitat affected the molecular evolution of cone opsins, which play a key role in bright light and colour vision and are tightly linked to ecology and life history. We analyzed a targeted-capture dataset to reconstruct the evolutionary history of beloniforms and assemble cone opsin sequences. We implemented codon-based clade models of evolution to examine how molecular evolution was affected by habitat and diet. We found high levels of positive selection in medium- and long-wavelength beloniform opsins, with piscivores showing increased positive selection in medium-wavelength opsins and zooplanktivores showing increased positive selection in long-wavelength opsins. In contrast, short-wavelength opsins showed purifying selection. While marine/freshwater habitat transitions have an effect on opsin molecular evolution, we found that diet plays a more important role. Our study suggests that evolutionary transitions along ecological axes produce complex adaptive interactions that affect patterns of selection on genes that underlie vision.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"93-103"},"PeriodicalIF":3.9,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139983154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2024-03-12DOI: 10.1007/s00239-024-10161-4
Michael Schmutzer, Pouria Dasmeh, Andreas Wagner
Virtually all enzymes catalyse more than one reaction, a phenomenon known as enzyme promiscuity. It is unclear whether promiscuous enzymes are more often generalists that catalyse multiple reactions at similar rates or specialists that catalyse one reaction much more efficiently than other reactions. In addition, the factors that shape whether an enzyme evolves to be a generalist or a specialist are poorly understood. To address these questions, we follow a three-pronged approach. First, we examine the distribution of promiscuity in empirical enzymes reported in the BRENDA database. We find that the promiscuity distribution of empirical enzymes is bimodal. In other words, a large fraction of promiscuous enzymes are either generalists or specialists, with few intermediates. Second, we demonstrate that enzyme biophysics is not sufficient to explain this bimodal distribution. Third, we devise a constraint-based model of promiscuous enzymes undergoing duplication and facing selection pressures favouring subfunctionalization. The model posits the existence of constraints between the catalytic efficiencies of an enzyme for different reactions and is inspired by empirical case studies. The promiscuity distribution predicted by our constraint-based model is consistent with the empirical bimodal distribution. Our results suggest that subfunctionalization is possible and beneficial only in certain enzymes. Furthermore, the model predicts that conflicting constraints and selection pressures can cause promiscuous enzymes to enter a 'frustrated' state, in which competing interactions limit the specialisation of enzymes. We find that frustration can be both a driver and an inhibitor of enzyme evolution by duplication and subfunctionalization. In addition, our model predicts that frustration becomes more likely as enzymes catalyse more reactions, implying that natural selection may prefer catalytically simple enzymes. In sum, our results suggest that frustration may play an important role in enzyme evolution.
{"title":"Frustration can Limit the Adaptation of Promiscuous Enzymes Through Gene Duplication and Specialisation.","authors":"Michael Schmutzer, Pouria Dasmeh, Andreas Wagner","doi":"10.1007/s00239-024-10161-4","DOIUrl":"10.1007/s00239-024-10161-4","url":null,"abstract":"<p><p>Virtually all enzymes catalyse more than one reaction, a phenomenon known as enzyme promiscuity. It is unclear whether promiscuous enzymes are more often generalists that catalyse multiple reactions at similar rates or specialists that catalyse one reaction much more efficiently than other reactions. In addition, the factors that shape whether an enzyme evolves to be a generalist or a specialist are poorly understood. To address these questions, we follow a three-pronged approach. First, we examine the distribution of promiscuity in empirical enzymes reported in the BRENDA database. We find that the promiscuity distribution of empirical enzymes is bimodal. In other words, a large fraction of promiscuous enzymes are either generalists or specialists, with few intermediates. Second, we demonstrate that enzyme biophysics is not sufficient to explain this bimodal distribution. Third, we devise a constraint-based model of promiscuous enzymes undergoing duplication and facing selection pressures favouring subfunctionalization. The model posits the existence of constraints between the catalytic efficiencies of an enzyme for different reactions and is inspired by empirical case studies. The promiscuity distribution predicted by our constraint-based model is consistent with the empirical bimodal distribution. Our results suggest that subfunctionalization is possible and beneficial only in certain enzymes. Furthermore, the model predicts that conflicting constraints and selection pressures can cause promiscuous enzymes to enter a 'frustrated' state, in which competing interactions limit the specialisation of enzymes. We find that frustration can be both a driver and an inhibitor of enzyme evolution by duplication and subfunctionalization. In addition, our model predicts that frustration becomes more likely as enzymes catalyse more reactions, implying that natural selection may prefer catalytically simple enzymes. In sum, our results suggest that frustration may play an important role in enzyme evolution.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"104-120"},"PeriodicalIF":3.9,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10978624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140110469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyanobacteria are recognised for their pivotal roles in aquatic ecosystems, serving as primary producers and major agents in diazotrophic processes. Currently, the primary focus of cyanobacterial research lies in gaining a more detailed understanding of these well-established ecosystem functions. However, their involvement and impact on other crucial biogeochemical cycles remain understudied. This knowledge gap is partially attributed to the challenges associated with culturing cyanobacteria in controlled laboratory conditions and the limited understanding of their specific growth requirements. This can be circumvented partially by the culture-independent methods which can shed light on the genomic potential of cyanobacterial species and answer more profound questions about the evolution of other key biogeochemical functions. In this study, we assembled 83 cyanobacterial genomes from metagenomic data generated from environmental DNA extracted from a brackish water lagoon (Chilika Lake, India). We taxonomically classified these metagenome-assembled genomes (MAGs) and found that about 92.77% of them are novel genomes at the species level. We then annotated these cyanobacterial MAGs for all the encoded functions using KEGG Orthology. Interestingly, we found two previously unreported functions in Cyanobacteria, namely, DNRA (Dissimilatory Nitrate Reduction to Ammonium) and DMSP (Dimethylsulfoniopropionate) synthesis in multiple MAGs using nirBD and dsyB genes as markers. We validated their presence in several publicly available cyanobacterial isolate genomes. Further, we identified incongruities between the evolutionary patterns of species and the marker genes and elucidated the underlying reasons for these discrepancies. This study expands our overall comprehension of the contribution of cyanobacteria to the biogeochemical cycling in coastal brackish ecosystems.
{"title":"Cyanobacterial Genomes from a Brackish Coastal Lagoon Reveal Potential for Novel Biogeochemical Functions and Their Evolution.","authors":"Manisha Ray, Shivakumara Manu, Gurdeep Rastogi, Govindhaswamy Umapathy","doi":"10.1007/s00239-024-10159-y","DOIUrl":"10.1007/s00239-024-10159-y","url":null,"abstract":"<p><p>Cyanobacteria are recognised for their pivotal roles in aquatic ecosystems, serving as primary producers and major agents in diazotrophic processes. Currently, the primary focus of cyanobacterial research lies in gaining a more detailed understanding of these well-established ecosystem functions. However, their involvement and impact on other crucial biogeochemical cycles remain understudied. This knowledge gap is partially attributed to the challenges associated with culturing cyanobacteria in controlled laboratory conditions and the limited understanding of their specific growth requirements. This can be circumvented partially by the culture-independent methods which can shed light on the genomic potential of cyanobacterial species and answer more profound questions about the evolution of other key biogeochemical functions. In this study, we assembled 83 cyanobacterial genomes from metagenomic data generated from environmental DNA extracted from a brackish water lagoon (Chilika Lake, India). We taxonomically classified these metagenome-assembled genomes (MAGs) and found that about 92.77% of them are novel genomes at the species level. We then annotated these cyanobacterial MAGs for all the encoded functions using KEGG Orthology. Interestingly, we found two previously unreported functions in Cyanobacteria, namely, DNRA (Dissimilatory Nitrate Reduction to Ammonium) and DMSP (Dimethylsulfoniopropionate) synthesis in multiple MAGs using nirBD and dsyB genes as markers. We validated their presence in several publicly available cyanobacterial isolate genomes. Further, we identified incongruities between the evolutionary patterns of species and the marker genes and elucidated the underlying reasons for these discrepancies. This study expands our overall comprehension of the contribution of cyanobacteria to the biogeochemical cycling in coastal brackish ecosystems.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"121-137"},"PeriodicalIF":3.9,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140136881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2024-03-15DOI: 10.1007/s00239-024-10160-5
Riccardo G Kyriacou, Peter O Mulhair, Peter W H Holland
The proportions of A:T and G:C nucleotide pairs are often unequal and can vary greatly between animal species and along chromosomes. The causes and consequences of this variation are incompletely understood. The recent release of high-quality genome sequences from the Darwin Tree of Life and other large-scale genome projects provides an opportunity for GC heterogeneity to be compared across a large number of insect species. Here we analyse GC content along chromosomes, and within protein-coding genes and codons, of 150 insect species from four holometabolous orders: Coleoptera, Diptera, Hymenoptera, and Lepidoptera. We find that protein-coding sequences have higher GC content than the genome average, and that Lepidoptera generally have higher GC content than the other three insect orders examined. GC content is higher in small chromosomes in most Lepidoptera species, but this pattern is less consistent in other orders. GC content also increases towards subtelomeric regions within protein-coding genes in Diptera, Coleoptera and Lepidoptera. Two species of Diptera, Bombylius major and B. discolor, have very atypical genomes with ubiquitous increase in AT content, especially at third codon positions. Despite dramatic AT-biased codon usage, we find no evidence that this has driven divergent protein evolution. We argue that the GC landscape of Lepidoptera, Diptera and Coleoptera genomes is influenced by GC-biased gene conversion, strongest in Lepidoptera, with some outlier taxa affected drastically by counteracting processes.
{"title":"GC Content Across Insect Genomes: Phylogenetic Patterns, Causes and Consequences.","authors":"Riccardo G Kyriacou, Peter O Mulhair, Peter W H Holland","doi":"10.1007/s00239-024-10160-5","DOIUrl":"10.1007/s00239-024-10160-5","url":null,"abstract":"<p><p>The proportions of A:T and G:C nucleotide pairs are often unequal and can vary greatly between animal species and along chromosomes. The causes and consequences of this variation are incompletely understood. The recent release of high-quality genome sequences from the Darwin Tree of Life and other large-scale genome projects provides an opportunity for GC heterogeneity to be compared across a large number of insect species. Here we analyse GC content along chromosomes, and within protein-coding genes and codons, of 150 insect species from four holometabolous orders: Coleoptera, Diptera, Hymenoptera, and Lepidoptera. We find that protein-coding sequences have higher GC content than the genome average, and that Lepidoptera generally have higher GC content than the other three insect orders examined. GC content is higher in small chromosomes in most Lepidoptera species, but this pattern is less consistent in other orders. GC content also increases towards subtelomeric regions within protein-coding genes in Diptera, Coleoptera and Lepidoptera. Two species of Diptera, Bombylius major and B. discolor, have very atypical genomes with ubiquitous increase in AT content, especially at third codon positions. Despite dramatic AT-biased codon usage, we find no evidence that this has driven divergent protein evolution. We argue that the GC landscape of Lepidoptera, Diptera and Coleoptera genomes is influenced by GC-biased gene conversion, strongest in Lepidoptera, with some outlier taxa affected drastically by counteracting processes.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"138-152"},"PeriodicalIF":2.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10978632/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140140331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2024-03-14DOI: 10.1007/s00239-024-10158-z
Zachery W Dickson, G Brian Golding
Protein Protein low complexity regions (LCRs) are compositionally biased amino acid sequences, many of which have significant evolutionary impacts on the proteins which contain them. They are mutationally unstable experiencing higher rates of indels and substitutions than higher complexity regions. LCRs also impact the expression of their proteins, likely through multiple effects along the path from gene transcription, through translation, and eventual protein degradation. It has been observed that proteins which contain LCRs are associated with elevated transcript abundance (TAb), despite having lower protein abundance. We have gathered and integrated human data to investigate the co-evolution of TAb and LCRs through ancestral reconstructions and model inference using an approximate Bayesian calculation based method. We observe that on short evolutionary timescales TAb evolution is significantly impacted by changes in LCR length, with insertions driving TAb down. But in contrast, the observed data is best explained by indel rates in LCRs which are unaffected by shifts in TAb. Our work demonstrates a coupling between LCR and TAb evolution, and the utility of incorporating multiple responses into evolutionary analyses.
{"title":"Evolution of Transcript Abundance is Influenced by Indels in Protein Low Complexity Regions.","authors":"Zachery W Dickson, G Brian Golding","doi":"10.1007/s00239-024-10158-z","DOIUrl":"10.1007/s00239-024-10158-z","url":null,"abstract":"<p><p>Protein Protein low complexity regions (LCRs) are compositionally biased amino acid sequences, many of which have significant evolutionary impacts on the proteins which contain them. They are mutationally unstable experiencing higher rates of indels and substitutions than higher complexity regions. LCRs also impact the expression of their proteins, likely through multiple effects along the path from gene transcription, through translation, and eventual protein degradation. It has been observed that proteins which contain LCRs are associated with elevated transcript abundance (TAb), despite having lower protein abundance. We have gathered and integrated human data to investigate the co-evolution of TAb and LCRs through ancestral reconstructions and model inference using an approximate Bayesian calculation based method. We observe that on short evolutionary timescales TAb evolution is significantly impacted by changes in LCR length, with insertions driving TAb down. But in contrast, the observed data is best explained by indel rates in LCRs which are unaffected by shifts in TAb. Our work demonstrates a coupling between LCR and TAb evolution, and the utility of incorporating multiple responses into evolutionary analyses.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"153-168"},"PeriodicalIF":3.9,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140131756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2024-03-07DOI: 10.1007/s00239-024-10163-2
Johannes Jaeger
A recent publication in Nature has generated much heated discussion about evolution, its tendency towards increasing diversity and complexity, and its potential status above and beyond the known laws of fundamental physics. The argument at the heart of this controversy concerns assembly theory, a method to detect and quantify the influence of higher-level emergent causal constraints in computational worlds made of basic objects and their combinations. In this short essay, I briefly review the theory, its basic principles and potential applications. I then go on to critically examine its authors' assertions, concluding that assembly theory has merit but is not nearly as novel or revolutionary as claimed. It certainly does not provide any new explanation of biological evolution or natural selection, or a new grounding of biology in physics. In this regard, the presentation of the paper is starkly distorted by hype, which may explain some of the outrage it created.
{"title":"Assembly Theory: What It Does and What It Does Not Do.","authors":"Johannes Jaeger","doi":"10.1007/s00239-024-10163-2","DOIUrl":"10.1007/s00239-024-10163-2","url":null,"abstract":"<p><p>A recent publication in Nature has generated much heated discussion about evolution, its tendency towards increasing diversity and complexity, and its potential status above and beyond the known laws of fundamental physics. The argument at the heart of this controversy concerns assembly theory, a method to detect and quantify the influence of higher-level emergent causal constraints in computational worlds made of basic objects and their combinations. In this short essay, I briefly review the theory, its basic principles and potential applications. I then go on to critically examine its authors' assertions, concluding that assembly theory has merit but is not nearly as novel or revolutionary as claimed. It certainly does not provide any new explanation of biological evolution or natural selection, or a new grounding of biology in physics. In this regard, the presentation of the paper is starkly distorted by hype, which may explain some of the outrage it created.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"87-92"},"PeriodicalIF":2.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10978598/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140059643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-19DOI: 10.1007/s00239-024-10162-3
Michael A. Sennett, Douglas L. Theobald
Ancestral sequence reconstruction (ASR) is a phylogenetic method widely used to analyze the properties of ancient biomolecules and to elucidate mechanisms of molecular evolution. Despite its increasingly widespread application, the accuracy of ASR is currently unknown, as it is generally impossible to compare resurrected proteins to the true ancestors. Which evolutionary models are best for ASR? How accurate are the resulting inferences? Here we answer these questions using a cross-validation method to reconstruct each extant sequence in an alignment with ASR methodology, a method we term “extant sequence reconstruction” (ESR). We thus can evaluate the accuracy of ASR methodology by comparing ESR reconstructions to the corresponding known true sequences. We find that a common measure of the quality of a reconstructed sequence, the average probability, is indeed a good estimate of the fraction of correct amino acids when the evolutionary model is accurate or overparameterized. However, the average probability is a poor measure for comparing reconstructions from different models, because, surprisingly, a more accurate phylogenetic model often results in reconstructions with lower probability. While better (more predictive) models may produce reconstructions with lower sequence identity to the true sequences, better models nevertheless produce reconstructions that are more biophysically similar to true ancestors. In addition, we find that a large fraction of sequences sampled from the reconstruction distribution may have fewer errors than the single most probable (SMP) sequence reconstruction, despite the fact that the SMP has the lowest expected error of all possible sequences. Our results emphasize the importance of model selection for ASR and the usefulness of sampling sequence reconstructions for analyzing ancestral protein properties. ESR is a powerful method for validating the evolutionary models used for ASR and can be applied in practice to any phylogenetic analysis of real biological sequences. Most significantly, ESR uses ASR methodology to provide a general method by which the biophysical properties of resurrected proteins can be compared to the properties of the true protein.
祖先序列重建(ASR)是一种系统发生学方法,广泛用于分析古代生物大分子的特性和阐明分子进化机制。尽管ASR的应用越来越广泛,但其准确性目前尚不清楚,因为通常无法将复活的蛋白质与真正的祖先进行比较。哪种进化模型最适合 ASR?由此得出的推论准确度如何?在这里,我们用交叉验证的方法来回答这些问题,用 ASR 方法重建排列中的每个现存序列,我们称这种方法为 "现存序列重建"(ESR)。因此,我们可以通过比较 ESR 重建与相应的已知真实序列来评估 ASR 方法的准确性。我们发现,在进化模型准确或过度参数化的情况下,衡量重建序列质量的常用指标--平均概率,确实是对正确氨基酸比例的良好估计。然而,在比较不同模型的重建结果时,平均概率并不是一个好的衡量标准,因为令人惊讶的是,更准确的系统进化模型往往会导致重建结果的概率更低。虽然更好(更具预测性)的模型可能会产生与真实序列具有较低序列同一性的重建结果,但更好的模型所产生的重建结果在生物物理上与真实祖先更为相似。此外,我们还发现,从重建分布中采样的大部分序列可能比单一最可能(SMP)序列重建的误差更小,尽管事实上 SMP 在所有可能序列中具有最低的预期误差。我们的研究结果强调了模型选择对 ASR 的重要性,以及取样序列重建对分析祖先蛋白质特性的有用性。ESR 是验证 ASR 所用进化模型的有力方法,可实际应用于任何真实生物序列的系统发育分析。最重要的是,ESR 利用 ASR 方法提供了一种通用方法,可将复活蛋白质的生物物理特性与真实蛋白质的特性进行比较。
{"title":"Extant Sequence Reconstruction: The Accuracy of Ancestral Sequence Reconstructions Evaluated by Extant Sequence Cross-Validation","authors":"Michael A. Sennett, Douglas L. Theobald","doi":"10.1007/s00239-024-10162-3","DOIUrl":"https://doi.org/10.1007/s00239-024-10162-3","url":null,"abstract":"<p>Ancestral sequence reconstruction (ASR) is a phylogenetic method widely used to analyze the properties of ancient biomolecules and to elucidate mechanisms of molecular evolution. Despite its increasingly widespread application, the accuracy of ASR is currently unknown, as it is generally impossible to compare resurrected proteins to the true ancestors. Which evolutionary models are best for ASR? How accurate are the resulting inferences? Here we answer these questions using a cross-validation method to reconstruct each extant sequence in an alignment with ASR methodology, a method we term “extant sequence reconstruction” (ESR). We thus can evaluate the accuracy of ASR methodology by comparing ESR reconstructions to the corresponding known true sequences. We find that a common measure of the quality of a reconstructed sequence, the average probability, is indeed a good estimate of the fraction of correct amino acids when the evolutionary model is accurate or overparameterized. However, the average probability is a poor measure for comparing reconstructions from different models, because, surprisingly, a more accurate phylogenetic model often results in reconstructions with lower probability. While better (more predictive) models may produce reconstructions with lower sequence identity to the true sequences, better models nevertheless produce reconstructions that are more biophysically similar to true ancestors. In addition, we find that a large fraction of sequences sampled from the reconstruction distribution may have fewer errors than the single most probable (SMP) sequence reconstruction, despite the fact that the SMP has the lowest expected error of all possible sequences. Our results emphasize the importance of model selection for ASR and the usefulness of sampling sequence reconstructions for analyzing ancestral protein properties. ESR is a powerful method for validating the evolutionary models used for ASR and can be applied in practice to any phylogenetic analysis of real biological sequences. Most significantly, ESR uses ASR methodology to provide a general method by which the biophysical properties of resurrected proteins can be compared to the properties of the true protein.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"23 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140165297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-19DOI: 10.1007/s00239-024-10164-1
Abstract
The bacterial strain SECRCQ15T was isolated from seeds of Chenopodium quinoa in Spain. Phylogenetic, chemotaxonomic, and phenotypic analyses, as well as genome similarity indices, support the classification of the strain into a novel species of the genus Ferdinandcohnia, for which we propose the name Ferdinandcohnia quinoae sp. nov. To dig deep into the speciation features of the strain SECRCQ15T, we performed a comparative genomic analysis of the genome of this strain and those of the type strains of species from the genus Ferdinandcohnia. We found several genes related with plant growth-promoting mechanisms within the SECRCQ15T genome. We also found that singletons of F. quinoae SECRCQ15T are mainly related to the use of carbohydrates, which is a common trait of plant-associated bacteria. To further reveal speciation events in this strain, we revealed genes undergoing diversifying selection (e.g., genes encoding ribosomal proteins) and functions likely lost due to pseudogenization. Also, we found that this novel species contains 138 plant-associated gene-cluster functions that are unique within the genus Ferdinandcohnia. These features may explain both the ecological and taxonomical differentiation of this new taxon.
{"title":"Speciation Features of Ferdinandcohnia quinoae sp. nov to Adapt to the Plant Host","authors":"","doi":"10.1007/s00239-024-10164-1","DOIUrl":"https://doi.org/10.1007/s00239-024-10164-1","url":null,"abstract":"<h3>Abstract</h3> <p>The bacterial strain SECRCQ15<sup>T</sup> was isolated from seeds of <em>Chenopodium quinoa</em> in Spain. Phylogenetic, chemotaxonomic, and phenotypic analyses, as well as genome similarity indices, support the classification of the strain into a novel species of the genus <em>Ferdinandcohnia,</em> for which we propose the name <em>Ferdinandcohnia quinoae</em> sp. nov. To dig deep into the speciation features of the strain SECRCQ15<sup>T</sup>, we performed a comparative genomic analysis of the genome of this strain and those of the type strains of species from the genus <em>Ferdinandcohnia</em>. We found several genes related with plant growth-promoting mechanisms within the SECRCQ15<sup>T</sup> genome. We also found that singletons of <em>F. quinoae</em> SECRCQ15<sup>T</sup> are mainly related to the use of carbohydrates, which is a common trait of plant-associated bacteria. To further reveal speciation events in this strain, we revealed genes undergoing diversifying selection (e.g., genes encoding ribosomal proteins) and functions likely lost due to pseudogenization. Also, we found that this novel species contains 138 plant-associated gene-cluster functions that are unique within the genus <em>Ferdinandcohnia</em>. These features may explain both the ecological and taxonomical differentiation of this new taxon.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"40 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140165339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}