Pub Date : 2024-09-19DOI: 10.1007/s00239-024-10207-7
Rodrigo Jácome
Many polymerases and other proteins are endowed with a catalytic domain belonging to the nucleotidyltransferase fold, which has also been deemed the non-canonical palm domain, in which three conserved acidic residues coordinate two divalent metal ions. Tertiary structure-based evolutionary analyses provide valuable information when the phylogenetic signal contained in the primary structure is blurry or has been lost, as is the case with these proteins. Pairwise structural comparisons of proteins with a nucleotidyltransferase fold were performed in the PDBefold web server: the RMSD, the number of superimposed residues, and the Qscore were obtained. The structural alignment score (RMSD × 100/number of superimposed residues) and the 1-Qscore were calculated, and distance matrices were constructed, from which a dendogram and a phylogenetic network were drawn for each score. The dendograms and the phylogenetic networks display well-defined clades, reflecting high levels of structural conservation within each clade, not mirrored by primary sequence. The conserved structural core between all these proteins consists of the catalytic nucleotidyltransferase fold, which is surrounded by different functional domains. Hence, many of the clades include proteins that bind different substrates or partake in non-related functions. Enzymes endowed with a nucleotidyltransferase fold are present in all domains of life, and participate in essential cellular and viral functions, which suggests that this domain is very ancient. Despite the loss of evolutionary traces in their primary structure, tertiary structure-based analyses allow us to delve into the evolution and functional diversification of the NT fold.
{"title":"Structural and Evolutionary Analysis of Proteins Endowed with a Nucleotidyltransferase, or Non-canonical Palm, Catalytic Domain","authors":"Rodrigo Jácome","doi":"10.1007/s00239-024-10207-7","DOIUrl":"https://doi.org/10.1007/s00239-024-10207-7","url":null,"abstract":"<p>Many polymerases and other proteins are endowed with a catalytic domain belonging to the nucleotidyltransferase fold, which has also been deemed the non-canonical palm domain, in which three conserved acidic residues coordinate two divalent metal ions. Tertiary structure-based evolutionary analyses provide valuable information when the phylogenetic signal contained in the primary structure is blurry or has been lost, as is the case with these proteins. Pairwise structural comparisons of proteins with a nucleotidyltransferase fold were performed in the PDBefold web server: the RMSD, the number of superimposed residues, and the Qscore were obtained. The structural alignment score (RMSD × 100/number of superimposed residues) and the 1-Qscore were calculated, and distance matrices were constructed, from which a dendogram and a phylogenetic network were drawn for each score. The dendograms and the phylogenetic networks display well-defined clades, reflecting high levels of structural conservation within each clade, not mirrored by primary sequence. The conserved structural core between all these proteins consists of the catalytic nucleotidyltransferase fold, which is surrounded by different functional domains. Hence, many of the clades include proteins that bind different substrates or partake in non-related functions. Enzymes endowed with a nucleotidyltransferase fold are present in all domains of life, and participate in essential cellular and viral functions, which suggests that this domain is very ancient. Despite the loss of evolutionary traces in their primary structure, tertiary structure-based analyses allow us to delve into the evolution and functional diversification of the NT fold.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"32 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s00239-024-10200-0
Guillermina Hill-Terán, Julieta Petrich, Maria Lorena Falcone Ferreyra, Manuel J. Aybar, Gabriela Coux
Treacher Collins syndrome (TCS) is a genetic disorder affecting facial development, primarily caused by mutations in the TCOF1 gene. TCOF1, along with NOLC1, play important roles in ribosomal RNA transcription and processing. Previously, a zebrafish model of TCS successfully recapitulated the main characteristics of the syndrome by knocking down the expression of a gene on chromosome 13 (coding for Uniprot ID B8JIY2), which was identified as the TCOF1 orthologue. However, database updates renamed this gene as nolc1 and the zebrafish database (ZFIN) identified a different gene on chromosome 14 as the TCOF1 orthologue (coding for Uniprot ID E7F9D9). NOLC1 and TCOF1 are large proteins with unstructured regions and repetitive sequences that complicate alignments and comparisons. Also, the additional whole genome duplication of teleosts sets further difficulty. In this study, we present evidence that endorses that NOLC1 and TCOF1 are paralogs, and that the zebrafish gene on chromosome 14 is a low-complexity LisH domain-containing factor that displays homology to NOLC1 but lacks essential sequence features to accomplish TCOF1 nucleolar functions. Our analysis also supports the idea that zebrafish, as has been suggested for other non-tetrapod vertebrates, lack the TCOF1 gene that is associated with tripartite nucleolus. Using BLAST searches in a group of teleost genomes, we identified fish-specific sequences similar to E7F9D9 zebrafish protein. We propose naming them “LisH-containing Low Complexity Proteins” (LLCP). Interestingly, the gene on chromosome 13 (nolc1) displays the sequence features, developmental expression patterns, and phenotypic impact of depletion that are characteristic of TCOF1 functions. These findings suggest that in teleost fish, the nucleolar functions described for both NOLC1 and TCOF1 mediated by their repeated motifs, are carried out by a single gene, nolc1. Our study, which is mainly based on computational tools available as free web-based algorithms, could help to solve similar conflicts regarding gene orthology in zebrafish.
{"title":"Untangling Zebrafish Genetic Annotation: Addressing Complexities and Nomenclature Issues in Orthologous Evaluation of TCOF1 and NOLC1","authors":"Guillermina Hill-Terán, Julieta Petrich, Maria Lorena Falcone Ferreyra, Manuel J. Aybar, Gabriela Coux","doi":"10.1007/s00239-024-10200-0","DOIUrl":"https://doi.org/10.1007/s00239-024-10200-0","url":null,"abstract":"<p>Treacher Collins syndrome (TCS) is a genetic disorder affecting facial development, primarily caused by mutations in the <i>TCOF1</i> gene. TCOF1, along with NOLC1, play important roles in ribosomal RNA transcription and processing. Previously, a zebrafish model of TCS successfully recapitulated the main characteristics of the syndrome by knocking down the expression of a gene on chromosome 13 (coding for Uniprot ID B8JIY2), which was identified as the <i>TCOF1</i> orthologue. However, database updates renamed this gene as <i>nolc1</i> and the zebrafish database (ZFIN) identified a different gene on chromosome 14 as the <i>TCOF1</i> orthologue (coding for Uniprot ID E7F9D9). NOLC1 and TCOF1 are large proteins with unstructured regions and repetitive sequences that complicate alignments and comparisons. Also, the additional whole genome duplication of teleosts sets further difficulty. In this study, we present evidence that endorses that <i>NOLC1</i> and <i>TCOF1</i> are paralogs, and that the zebrafish gene on chromosome 14 is a low-complexity LisH domain-containing factor that displays homology to NOLC1 but lacks essential sequence features to accomplish TCOF1 nucleolar functions. Our analysis also supports the idea that zebrafish, as has been suggested for other non-tetrapod vertebrates, lack the <i>TCOF1</i> gene that is associated with tripartite nucleolus. Using BLAST searches in a group of teleost genomes, we identified fish-specific sequences similar to E7F9D9 zebrafish protein. We propose naming them “LisH-containing Low Complexity Proteins” (LLCP). Interestingly, the gene on chromosome 13 <i>(nolc1</i>) displays the sequence features, developmental expression patterns, and phenotypic impact of depletion that are characteristic of <i>TCOF1</i> functions. These findings suggest that in teleost fish, the nucleolar functions described for both <i>NOLC1</i> and <i>TCOF1</i> mediated by their repeated motifs, are carried out by a single gene, <i>nolc1</i>. Our study, which is mainly based on computational tools available as free web-based algorithms, could help to solve similar conflicts regarding gene orthology in zebrafish.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"5 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1007/s00239-024-10192-x
J. A. Carlisle, D. H. Gurbuz, W. J. Swanson
Many reproductive proteins show signatures of rapid evolution through sequence divergence and duplication. These features of reproductive genes may complicate the detection of orthologs across taxa, making it difficult to connect studies in model systems to human biology. In mice, ZP3r/sp56 is a binding partner to the egg coat protein ZP3 and may mediate induction of the acrosome reaction, a crucial step in fertilization. In rodents, ZP3r, as a member of the Regulators of Complement Activation cluster, is surrounded by paralogs, some of which have been shown to be evolving under positive selection. Although primate egg coats also contain ZP3, sequence divergence paired with paralogous relationships with neighboring genes has complicated the accurate identification of the human ZP3r ortholog. Here, we phylogenetically and syntenically resolve that the human ortholog of ZP3r is the pseudogene C4BPAP1. We investigate the evolution of this gene within primates. We observe independent pseudogenization events of ZP3r in all Apes with the exception of Orangutans, and independent pseudogenization events in many monkey species. ZP3r in both primates that retain ZP3r and in rodents contains positively selected sites. We hypothesize that redundant mechanisms mediate ZP3 recognition in mammals and ZP3r’s relative importance to ZP recognition varies across species.
{"title":"Recurrent Independent Pseudogenization Events of the Sperm Fertilization Gene ZP3r in Apes and Monkeys","authors":"J. A. Carlisle, D. H. Gurbuz, W. J. Swanson","doi":"10.1007/s00239-024-10192-x","DOIUrl":"https://doi.org/10.1007/s00239-024-10192-x","url":null,"abstract":"<p>Many reproductive proteins show signatures of rapid evolution through sequence divergence and duplication. These features of reproductive genes may complicate the detection of orthologs across taxa, making it difficult to connect studies in model systems to human biology. In mice, ZP3r/sp56 is a binding partner to the egg coat protein ZP3 and may mediate induction of the acrosome reaction, a crucial step in fertilization. In rodents, ZP3r, as a member of the Regulators of Complement Activation cluster, is surrounded by paralogs, some of which have been shown to be evolving under positive selection. Although primate egg coats also contain ZP3, sequence divergence paired with paralogous relationships with neighboring genes has complicated the accurate identification of the human ZP3r ortholog. Here, we phylogenetically and syntenically resolve that the human ortholog of ZP3r is the pseudogene <i>C4BPAP1</i>. We investigate the evolution of this gene within primates. We observe independent pseudogenization events of ZP3r in all Apes with the exception of Orangutans, and independent pseudogenization events in many monkey species. ZP3r in both primates that retain ZP3r and in rodents contains positively selected sites. We hypothesize that redundant mechanisms mediate ZP3 recognition in mammals and ZP3r’s relative importance to ZP recognition varies across species.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"25 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1007/s00239-024-10198-5
Shinde Nikhil, Habeeb Shaikh Mohideen, Raja Natesan Sella
Sorghum (Sorghum bicolor (L.) Moench) is a multipurpose crop grown for food, fodder, and bioenergy production. Its cultivated varieties, along with their wild counterparts, contribute to the core genetic pool. Despite the availability of several re-sequenced sorghum genomes, a variable portion of sorghum genomes is not reported during reference genome assembly and annotation. The present analysis used 223 publicly available RNA-seq datasets from seven sweet sorghum cultivars to construct superTranscriptome. This approach yielded 45,864 Representative Transcript Assemblies (RTAs) that showcased intriguing Presence/Absence Variation (PAV) across 15 published sorghum genomes. We found 301 superTranscripts were exclusive to sweet sorghum, including 58 de novo genes encoded core and linker histones, zinc finger domains, glucosyl transferases, cellulose synthase, etc. The superTranscriptome added 2,802 new protein-coding genes to the Sweet Sorghum Reference Genome (SSRG), of which 559 code for different transcription factors (TFs). Our analysis revealed that MULE-like transposases were abundant in the sweet sorghum genome and could play a hidden role in the evolution of sweet sorghum. We observed large deletions in the D locus and terminal deletions in four other NAC encoding loci in the SSRG compared to its wild progenitor (353) suggesting non-functional NAC genes contributed to trait development in sweet sorghum. Moreover, superTranscript-based methods for Differential Exon Usage (DEU) and Differential Gene Expression (DGE) analyses were more accurate than those based on the SSRG. This study demonstrates that the superTranscriptome can enhance our understanding of fundamental sorghum mechanisms, improve genome annotations, and potentially even replace the reference genome.
{"title":"Unveiling the Genomic Symphony: Identification Cultivar-Specific Genes and Enhanced Insights on Sweet Sorghum Genomes Through Comprehensive superTranscriptomic Analysis","authors":"Shinde Nikhil, Habeeb Shaikh Mohideen, Raja Natesan Sella","doi":"10.1007/s00239-024-10198-5","DOIUrl":"https://doi.org/10.1007/s00239-024-10198-5","url":null,"abstract":"<p>Sorghum (<i>Sorghum bicolor (L.) Moench</i>) is a multipurpose crop grown for food, fodder, and bioenergy production. Its cultivated varieties, along with their wild counterparts, contribute to the core genetic pool. Despite the availability of several re-sequenced sorghum genomes, a variable portion of sorghum genomes is not reported during reference genome assembly and annotation. The present analysis used 223 publicly available RNA-seq datasets from seven sweet sorghum cultivars to construct superTranscriptome. This approach yielded 45,864 Representative Transcript Assemblies (RTAs) that showcased intriguing Presence/Absence Variation (PAV) across 15 published sorghum genomes. We found 301 superTranscripts were exclusive to sweet sorghum, including 58 de novo genes encoded core and linker histones, zinc finger domains, glucosyl transferases, cellulose synthase, etc. The superTranscriptome added 2,802 new protein-coding genes to the Sweet Sorghum Reference Genome (SSRG), of which 559 code for different transcription factors (TFs). Our analysis revealed that MULE-like transposases were abundant in the sweet sorghum genome and could play a hidden role in the evolution of sweet sorghum. We observed large deletions in the D locus and terminal deletions in four other NAC encoding loci in the SSRG compared to its wild progenitor (353) suggesting non-functional NAC genes contributed to trait development in sweet sorghum. Moreover, superTranscript-based methods for Differential Exon Usage (DEU) and Differential Gene Expression (DGE) analyses were more accurate than those based on the SSRG. This study demonstrates that the superTranscriptome can enhance our understanding of fundamental sorghum mechanisms, improve genome annotations, and potentially even replace the reference genome.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"12 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1007/s00239-024-10199-4
Mario Rivas, George E. Fox
The Last Common Ancestor (LCA) is understood as a hypothetical population of organisms from which all extant living creatures are thought to have descended. Its biology and environment have been and continue to be the subject of discussions within the scientific community. Since the first bacterial genomes were obtained, multiple attempts to reconstruct the genetic content of the LCA have been made. In this review, we compare 10 of the most extensive reconstructions of the gene content possessed by the LCA as they relate to aspects of the translation machinery. Although each reconstruction has its own methodological biases and many disagree in the metabolic nature of the LCA all, to some extent, indicate that several components of the translation machinery are among the most conserved genetic elements. The datasets from each reconstruction clearly show that the LCA already had a largely complete translational system with a genetic code already in place and therefore was not a progenote. Among these features several ribosomal proteins, transcription factors like IF2, EF-G, and EF-Tu and both class I and class II aminoacyl tRNA synthetases were found in essentially all reconstructions. Due to the limitations of the various methodologies, some features such as the occurrence of rRNA posttranscriptional modified bases are not fully addressed. However, conserved as it is, non-universal ribosomal features found in various reconstructions indicate that LCA’s translation machinery was still evolving, thereby acquiring the domain specific features in the process. Although progenotes from the pre-LCA likely no longer exist recent results obtained by unraveling the early history of the ribosome and other genetic processes can provide insight to the nature of the pre-LCA world.
{"title":"On the Nature of the Last Common Ancestor: A Story from its Translation Machinery","authors":"Mario Rivas, George E. Fox","doi":"10.1007/s00239-024-10199-4","DOIUrl":"https://doi.org/10.1007/s00239-024-10199-4","url":null,"abstract":"<p>The Last Common Ancestor (LCA) is understood as a hypothetical population of organisms from which all extant living creatures are thought to have descended. Its biology and environment have been and continue to be the subject of discussions within the scientific community. Since the first bacterial genomes were obtained, multiple attempts to reconstruct the genetic content of the LCA have been made. In this review, we compare 10 of the most extensive reconstructions of the gene content possessed by the LCA as they relate to aspects of the translation machinery. Although each reconstruction has its own methodological biases and many disagree in the metabolic nature of the LCA all, to some extent, indicate that several components of the translation machinery are among the most conserved genetic elements. The datasets from each reconstruction clearly show that the LCA already had a largely complete translational system with a genetic code already in place and therefore was not a <i>progenote</i>. Among these features several ribosomal proteins, transcription factors like IF2, EF-G, and EF-Tu and both class I and class II aminoacyl tRNA synthetases were found in essentially all reconstructions. Due to the limitations of the various methodologies, some features such as the occurrence of rRNA posttranscriptional modified bases are not fully addressed. However, conserved as it is, non-universal ribosomal features found in various reconstructions indicate that LCA’s translation machinery was still evolving, thereby acquiring the domain specific features in the process. Although progenotes from the pre-LCA likely no longer exist recent results obtained by unraveling the early history of the ribosome and other genetic processes can provide insight to the nature of the pre-LCA world.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"55 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1007/s00239-024-10204-w
Bokai K. Zhang, Leonard Gines
This literature review is to present a new direction in developing better treatment or preventive measures. The larger the body of an organism, the more numerous the cells, which theoretically lead to a higher risk of cancer. However, observational studies suggest the lack of correlation between body size and cancer risk, which is known as Peto’s paradox. The corollary of Peto’s paradox is that large organisms must be cancer-resistant. Further investigation of the anti-cancer mechanisms in each species could be potentially rewarding, and how the anti-cancer mechanisms found in wild animals can help influence and develop more effective cancer treatment in humans is the main focus of this literature review. Due to a lack of research and understanding of the exact molecular mechanisms of the researched species, only a few (Elephants and rodents) that have been extensively researched have made substantive contributions to human oncology. A new research direction is to investigate the positively selective genes that are related to cancer resistance and see if homologous genes are presented in humans. Despite the great obstacle of applying anti-cancer mechanisms to the human body from phylogenetically distant species, this research direction of gaining insights through investigating cancer-resisting evolutionary adaptations in wild animals has great potential in human oncology research.
{"title":"Analysis of Cancer-Resisting Evolutionary Adaptations in Wild Animals and Applications for Human Oncology","authors":"Bokai K. Zhang, Leonard Gines","doi":"10.1007/s00239-024-10204-w","DOIUrl":"https://doi.org/10.1007/s00239-024-10204-w","url":null,"abstract":"<p>This literature review is to present a new direction in developing better treatment or preventive measures. The larger the body of an organism, the more numerous the cells, which theoretically lead to a higher risk of cancer. However, observational studies suggest the lack of correlation between body size and cancer risk, which is known as Peto’s paradox. The corollary of Peto’s paradox is that large organisms must be cancer-resistant. Further investigation of the anti-cancer mechanisms in each species could be potentially rewarding, and how the anti-cancer mechanisms found in wild animals can help influence and develop more effective cancer treatment in humans is the main focus of this literature review. Due to a lack of research and understanding of the exact molecular mechanisms of the researched species, only a few (Elephants and rodents) that have been extensively researched have made substantive contributions to human oncology. A new research direction is to investigate the positively selective genes that are related to cancer resistance and see if homologous genes are presented in humans. Despite the great obstacle of applying anti-cancer mechanisms to the human body from phylogenetically distant species, this research direction of gaining insights through investigating cancer-resisting evolutionary adaptations in wild animals has great potential in human oncology research.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":"4 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pif is a shell matrix protein (SMP) identified in the nacreous layer of Pinctada fucata (Pfu) comprised two proteins, Pif97 and Pif 80. Pif97 contains a von Willebrand factor A (VWA) and chitin-binding domains, whereas Pif80 can bind calcium carbonate crystals. The VWA domain is conserved in the SMPs of various mollusk species; however, their phylogenetic relationship remains obscure. Furthermore, although the VWA domain participates in protein-protein interactions, its role in shell formation has not been established. Accordingly, in the current study, we investigate the phylogenetic relationship between PfuPif and other VWA domain-containing proteins in major mollusk species. The shell-related proteins containing VWA domains formed a large clade (the Pif/BMSP family) and were classified into eight subfamilies with unique sequential features, expression patterns, and taxa diversity. Furthermore, a pull-down assay using recombinant proteins containing the VWA domain of PfuPif 97 revealed that the VWA domain interacts with five nacreous layer-related SMPs of P. fucata, including Pif 80 and nacrein. Collectively, these results suggest that the VWA domain is important in the formation of organic complexes and participates in shell mineralisation.
{"title":"Diversification of von Willebrand Factor A and Chitin-Binding Domains in Pif/BMSPs Among Mollusks.","authors":"Keisuke Shimizu, Lumi Negishi, Hitoshi Kurumizaka, Michio Suzuki","doi":"10.1007/s00239-024-10180-1","DOIUrl":"10.1007/s00239-024-10180-1","url":null,"abstract":"<p><p>Pif is a shell matrix protein (SMP) identified in the nacreous layer of Pinctada fucata (Pfu) comprised two proteins, Pif97 and Pif 80. Pif97 contains a von Willebrand factor A (VWA) and chitin-binding domains, whereas Pif80 can bind calcium carbonate crystals. The VWA domain is conserved in the SMPs of various mollusk species; however, their phylogenetic relationship remains obscure. Furthermore, although the VWA domain participates in protein-protein interactions, its role in shell formation has not been established. Accordingly, in the current study, we investigate the phylogenetic relationship between PfuPif and other VWA domain-containing proteins in major mollusk species. The shell-related proteins containing VWA domains formed a large clade (the Pif/BMSP family) and were classified into eight subfamilies with unique sequential features, expression patterns, and taxa diversity. Furthermore, a pull-down assay using recombinant proteins containing the VWA domain of PfuPif 97 revealed that the VWA domain interacts with five nacreous layer-related SMPs of P. fucata, including Pif 80 and nacrein. Collectively, these results suggest that the VWA domain is important in the formation of organic complexes and participates in shell mineralisation.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"415-431"},"PeriodicalIF":2.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11291548/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-06-17DOI: 10.1007/s00239-024-10179-8
Steven K Chen, Jing Liu, Alexander Van Nynatten, Benjamin M Tudor-Price, Belinda S W Chang
Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.
{"title":"Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods.","authors":"Steven K Chen, Jing Liu, Alexander Van Nynatten, Benjamin M Tudor-Price, Belinda S W Chang","doi":"10.1007/s00239-024-10179-8","DOIUrl":"10.1007/s00239-024-10179-8","url":null,"abstract":"<p><p>Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"402-414"},"PeriodicalIF":2.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141419508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1007/s00239-024-10183-y
Luis Delaye, Lizbeth Román-Padilla
{"title":"Correction: Untangling the Evolution of the Receptor-Binding Motif of SARS-CoV-2.","authors":"Luis Delaye, Lizbeth Román-Padilla","doi":"10.1007/s00239-024-10183-y","DOIUrl":"10.1007/s00239-024-10183-y","url":null,"abstract":"","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"525-526"},"PeriodicalIF":2.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11291600/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141446311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-06-11DOI: 10.1007/s00239-024-10181-0
Jan Gerwin, Julián Torres-Dowdall, Thomas F Brown, Axel Meyer
Gene duplication is one of the most important sources of novel genotypic diversity and the subsequent evolution of phenotypic diversity. Determining the evolutionary history and functional changes of duplicated genes is crucial for a comprehensive understanding of adaptive evolution. The evolutionary history of visual opsin genes is very dynamic, with repeated duplication events followed by sub- or neofunctionalization. While duplication of the green-sensitive opsins rh2 is common in teleost fish, fewer cases of multiple duplication events of the red-sensitive opsin lws are known. In this study, we investigate the visual opsin gene repertoire of the anabantoid fishes, focusing on the five lws opsin genes found in the genus Betta. We determine the evolutionary history of the lws opsin gene by taking advantage of whole-genome sequences of nine anabantoid species, including the newly assembled genome of Betta imbellis. Our results show that at least two independent duplications of lws occurred in the Betta lineage. The analysis of amino acid sequences of the lws paralogs of Betta revealed high levels of diversification in four of the seven transmembrane regions of the lws protein. Amino acid substitutions at two key-tuning sites are predicted to lead to differentiation of absorption maxima (λmax) between the paralogs within Betta. Finally, eye transcriptomics of B. splendens at different developmental stages revealed expression shifts between paralogs for all cone opsin classes. The lws genes are expressed according to their relative position in the lws opsin cluster throughout ontogeny. We conclude that temporal collinearity of lws expression might have facilitated subfunctionalization of lws in Betta and teleost opsins in general.
{"title":"Expansion and Functional Diversification of Long-Wavelength-Sensitive Opsin in Anabantoid Fishes.","authors":"Jan Gerwin, Julián Torres-Dowdall, Thomas F Brown, Axel Meyer","doi":"10.1007/s00239-024-10181-0","DOIUrl":"10.1007/s00239-024-10181-0","url":null,"abstract":"<p><p>Gene duplication is one of the most important sources of novel genotypic diversity and the subsequent evolution of phenotypic diversity. Determining the evolutionary history and functional changes of duplicated genes is crucial for a comprehensive understanding of adaptive evolution. The evolutionary history of visual opsin genes is very dynamic, with repeated duplication events followed by sub- or neofunctionalization. While duplication of the green-sensitive opsins rh2 is common in teleost fish, fewer cases of multiple duplication events of the red-sensitive opsin lws are known. In this study, we investigate the visual opsin gene repertoire of the anabantoid fishes, focusing on the five lws opsin genes found in the genus Betta. We determine the evolutionary history of the lws opsin gene by taking advantage of whole-genome sequences of nine anabantoid species, including the newly assembled genome of Betta imbellis. Our results show that at least two independent duplications of lws occurred in the Betta lineage. The analysis of amino acid sequences of the lws paralogs of Betta revealed high levels of diversification in four of the seven transmembrane regions of the lws protein. Amino acid substitutions at two key-tuning sites are predicted to lead to differentiation of absorption maxima (λ<sub>max</sub>) between the paralogs within Betta. Finally, eye transcriptomics of B. splendens at different developmental stages revealed expression shifts between paralogs for all cone opsin classes. The lws genes are expressed according to their relative position in the lws opsin cluster throughout ontogeny. We conclude that temporal collinearity of lws expression might have facilitated subfunctionalization of lws in Betta and teleost opsins in general.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"432-448"},"PeriodicalIF":2.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11291592/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141300795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}