Pub Date : 2025-12-01Epub Date: 2025-10-14DOI: 10.1007/s00239-025-10274-4
Maeva Perez, Katherine Hurm, David A Liberles
The Quest for Orthologs has focused on identifying orthologs from the perspective that they are more likely to have retained function over a given evolutionary distance than paralogs (or xenologs) have, enabling the transfer of functional annotation. It has become clear that function is defined by biochemistry that is under selective pressure. Quantitative descriptions of function are available within this framework and may offer understanding that is not provided by more qualitative descriptions of function. Changes in selected biochemistry, mutational processes, and selective strength can all lead to quantitative changes in function. This is discussed for proteins that have been subjected to gene duplication and for proteins that have evolved simply through the speciation process.
{"title":"Understanding Functional Evolution in Orthologs and Paralogs.","authors":"Maeva Perez, Katherine Hurm, David A Liberles","doi":"10.1007/s00239-025-10274-4","DOIUrl":"10.1007/s00239-025-10274-4","url":null,"abstract":"<p><p>The Quest for Orthologs has focused on identifying orthologs from the perspective that they are more likely to have retained function over a given evolutionary distance than paralogs (or xenologs) have, enabling the transfer of functional annotation. It has become clear that function is defined by biochemistry that is under selective pressure. Quantitative descriptions of function are available within this framework and may offer understanding that is not provided by more qualitative descriptions of function. Changes in selected biochemistry, mutational processes, and selective strength can all lead to quantitative changes in function. This is discussed for proteins that have been subjected to gene duplication and for proteins that have evolved simply through the speciation process.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"730-739"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756302/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145286319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-29DOI: 10.1007/s00239-025-10286-0
Yuting Xiao, Maureen Stolzer, Larry Wasserman, Dannie Durand
Reconstruction of the ancestral protein repertoire offers valuable insights into the tempo and mode of protein content evolution, but can be highly sensitive to model choice. We used a phylogenetic Birth-Death-Gain model to investigate the evolution of the metazoan protein domain repertoire. Domains, protein modules with a distinct structure and function, represent the basic components of protein repertoire. Given a species tree and a census of protein domain families in present-day species, we estimated the most likely rates of domain family origination, duplication, and loss. Rates were allowed to vary across species lineages and domain families, decoupling these factors. Statistical hierarchical clustering of family-specific rates reveals groups of domains evolving in concert. Moreover, we observe a strong and significant association between family rate and family function. Interestingly, families with functions associated with metazoan innovations tend to have the fastest rates. We further inferred the expected ancestral domain content and the history of domain family gains, losses, expansions, and contractions in each species lineage. Our analysis reveals an ongoing process of domain family replacement and resizing, consistent with extensive remodeling of the protein domain repertoire. This stands in contrast to recent reports of widespread loss during metazoan evolution, which were obtained with more constrained models. The use of a powerful, probabilistic Birth-Death-Gain model reveals an unexpected level of genomic plasticity.
{"title":"Evolution of the Metazoan Protein Domain Repertoire Revealed by a Birth-Death-Gain Model.","authors":"Yuting Xiao, Maureen Stolzer, Larry Wasserman, Dannie Durand","doi":"10.1007/s00239-025-10286-0","DOIUrl":"10.1007/s00239-025-10286-0","url":null,"abstract":"<p><p>Reconstruction of the ancestral protein repertoire offers valuable insights into the tempo and mode of protein content evolution, but can be highly sensitive to model choice. We used a phylogenetic Birth-Death-Gain model to investigate the evolution of the metazoan protein domain repertoire. Domains, protein modules with a distinct structure and function, represent the basic components of protein repertoire. Given a species tree and a census of protein domain families in present-day species, we estimated the most likely rates of domain family origination, duplication, and loss. Rates were allowed to vary across species lineages and domain families, decoupling these factors. Statistical hierarchical clustering of family-specific rates reveals groups of domains evolving in concert. Moreover, we observe a strong and significant association between family rate and family function. Interestingly, families with functions associated with metazoan innovations tend to have the fastest rates. We further inferred the expected ancestral domain content and the history of domain family gains, losses, expansions, and contractions in each species lineage. Our analysis reveals an ongoing process of domain family replacement and resizing, consistent with extensive remodeling of the protein domain repertoire. This stands in contrast to recent reports of widespread loss during metazoan evolution, which were obtained with more constrained models. The use of a powerful, probabilistic Birth-Death-Gain model reveals an unexpected level of genomic plasticity.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"777-799"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145850384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-20DOI: 10.1007/s00239-025-10290-4
Natasha Glover, David A Liberles
This special issue from the Quest for Orthologs community highlights ongoing challenges in detecting and characterizing orthologous genes in the annotation of protein functions. Eleven articles are presented describing ongoing research in this area.
本期来自Quest for Orthologs社区的特刊强调了在蛋白质功能注释中检测和表征同源基因的持续挑战。十一篇文章介绍了在这一领域正在进行的研究。
{"title":"Perspectives on Orthology During the Quest for Orthologs.","authors":"Natasha Glover, David A Liberles","doi":"10.1007/s00239-025-10290-4","DOIUrl":"10.1007/s00239-025-10290-4","url":null,"abstract":"<p><p>This special issue from the Quest for Orthologs community highlights ongoing challenges in detecting and characterizing orthologous genes in the annotation of protein functions. Eleven articles are presented describing ongoing research in this area.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"699-701"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145564278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1007/s00239-025-10291-3
Ksenia Macias Calix, Antara Anika Piya, Raquel Assis
Understanding the relationship between protein structures and their interactions is a fundamental biological problem. Here we broadly tackle this problem by examining associations between protein structural features and interaction patterns in rodents and yeast-two highly divergent taxa from different kingdoms. In both taxa, we uncover positive correlations between intrinsic disorders of interacting proteins, consistent with a prior study showing stronger affinity between proteins with similar structures. However, closer examination reveals that these relationships are restricted to proteins involved in evolutionarily conserved interactions, or interologs. We also find that interologs generally exhibit more similar protein structures and less evolutionary structural divergence than non-interologs, supporting the hypothesis that conserved interactions are associated with structural convergence of interacting proteins. Further analyses show that interologs are typically less intrinsically disordered and play more central functional roles than non-interologs, suggesting that these structural similarities may help preserve stable interactions involved in essential biological processes. Overall, this study underscores the interconnected evolution of protein structures and their interactions, illustrating how the optimization of protein fitness landscapes for both structural and functional stability may promote structural convergence across divergent taxa.
{"title":"Correlated Evolution Drives Structural Convergence of Interacting Proteins.","authors":"Ksenia Macias Calix, Antara Anika Piya, Raquel Assis","doi":"10.1007/s00239-025-10291-3","DOIUrl":"https://doi.org/10.1007/s00239-025-10291-3","url":null,"abstract":"<p><p>Understanding the relationship between protein structures and their interactions is a fundamental biological problem. Here we broadly tackle this problem by examining associations between protein structural features and interaction patterns in rodents and yeast-two highly divergent taxa from different kingdoms. In both taxa, we uncover positive correlations between intrinsic disorders of interacting proteins, consistent with a prior study showing stronger affinity between proteins with similar structures. However, closer examination reveals that these relationships are restricted to proteins involved in evolutionarily conserved interactions, or interologs. We also find that interologs generally exhibit more similar protein structures and less evolutionary structural divergence than non-interologs, supporting the hypothesis that conserved interactions are associated with structural convergence of interacting proteins. Further analyses show that interologs are typically less intrinsically disordered and play more central functional roles than non-interologs, suggesting that these structural similarities may help preserve stable interactions involved in essential biological processes. Overall, this study underscores the interconnected evolution of protein structures and their interactions, illustrating how the optimization of protein fitness landscapes for both structural and functional stability may promote structural convergence across divergent taxa.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145634780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1007/s00239-025-10287-z
Glen Stecher, Michael Suleski, Qiqing Tao, Koichiro Tamura, Sudhir Kumar
The Molecular Evolutionary Genetics Analysis (MEGA) software is widely used for molecular evolutionary and phylogenetic analyses. We present MEGA version 12.1, a cross-platform release that operates natively on macOS (Intel and Apple M-series processors) and modern Linux distributions. This version incorporates all the methodological and computational improvements of MEGA 12 for Microsoft Windows, including techniques that markedly reduce computational time during maximum likelihood (ML) analyses. These features include a filtered best-fit ML model test that bypasses evaluating derivative models unlikely to be optimal, an adaptive bootstrap test of phylogeny that automatically determines the necessary number of replicates, and fine-grained parallelization of ML algorithms for better multi-core performance. MEGA 12.1 has an enhanced graphical user interface, supporting high-resolution displays and improving analysis progress reporting and result visualization. A significant addition in MEGA 12.1 is an improved Calibration Editor that integrates seamlessly with the TimeTree database of molecular divergence times for easy retrieval of calibration points for molecular dating. This version also supports full cross-platform session file compatibility, allowing seamless sharing of analysis sessions across macOS, Linux, and Windows. These updates enhance accessibility, computational efficiency, and usability of MEGA across diverse computing environments. MEGA 12.1 is available for free at https://www.megasoftware.net.
{"title":"MEGA 12.1: Cross-Platform Release for macOS and Linux Operating Systems.","authors":"Glen Stecher, Michael Suleski, Qiqing Tao, Koichiro Tamura, Sudhir Kumar","doi":"10.1007/s00239-025-10287-z","DOIUrl":"https://doi.org/10.1007/s00239-025-10287-z","url":null,"abstract":"<p><p>The Molecular Evolutionary Genetics Analysis (MEGA) software is widely used for molecular evolutionary and phylogenetic analyses. We present MEGA version 12.1, a cross-platform release that operates natively on macOS (Intel and Apple M-series processors) and modern Linux distributions. This version incorporates all the methodological and computational improvements of MEGA 12 for Microsoft Windows, including techniques that markedly reduce computational time during maximum likelihood (ML) analyses. These features include a filtered best-fit ML model test that bypasses evaluating derivative models unlikely to be optimal, an adaptive bootstrap test of phylogeny that automatically determines the necessary number of replicates, and fine-grained parallelization of ML algorithms for better multi-core performance. MEGA 12.1 has an enhanced graphical user interface, supporting high-resolution displays and improving analysis progress reporting and result visualization. A significant addition in MEGA 12.1 is an improved Calibration Editor that integrates seamlessly with the TimeTree database of molecular divergence times for easy retrieval of calibration points for molecular dating. This version also supports full cross-platform session file compatibility, allowing seamless sharing of analysis sessions across macOS, Linux, and Windows. These updates enhance accessibility, computational efficiency, and usability of MEGA across diverse computing environments. MEGA 12.1 is available for free at https://www.megasoftware.net.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145540072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1007/s00239-025-10288-y
Nobuyuki Inomata, Yohey Terai, Junko Kusumi, Kosuke M Teshima, Ixchel F Mandagi, Sjamsu A Lawelle, Kawilarang W A Masengi, Sayaka Mitsumoto, Saki Hashizume
The green-sensitive opsin (RH2) family has experienced considerably more gene duplication and loss events than other opsin families during evolution of teleost fishes. Although evolutionary patterns of RH2 genes in Oryzias species, which belong to the three major species groups, inhabiting various areas have been previously investigated, the evolutionary mechanisms underlying the diversification of the endemic species on Sulawesi Island, whose common ancestor colonized the region ~ 20 million years ago, remain unclear. In this study, we determined nucleotide sequences of RH2 genes from 21 individuals of nineteen Oryzias species (Adrianichthyidae) on Sulawesi. In RH2-A, we identified four amino acid sites (positions 94, 112, 166, and 198) with ω values over 5.3, indicating strong positive selection. Notably, substitutions at three of these sites are known to affect the absorption spectra and occurred independently on separate phylogenetic branches during species divergence. In RH2‑B and RH2‑C genes, identical amino acid residues were shared within an individual and among species, suggesting parallel mutations and/or gene conversion events. Moreover, five amino acid substitutions between RH2‑B and RH2‑C genes were fixed before colonization on Sulawesi, and four of these substitutions were associated with fine spectral tuning. While RH2‑B and RH2‑C have undergone concerted evolution in the species outside of Sulawesi, on Sulawesi the paralogs have divergently evolved. This divergence appeared to result from newly arisen mutations in either the RH2-B or RH2-C during speciation. In the RH2 genes, a number of amino acid substitutions at distinct sites led to shifts in the absorption spectrum. In particular, RH2‑A contains positively selected residues involved in spectral tuning, suggesting that these substitutions may have contributed to adaptive evolution. Our findings provide new insights into the evolutionary dynamics of RH2 gene diversification.
{"title":"Molecular Evolution of the Green-Sensitive Opsins (RH2) in Sulawesi Oryzias Species with a Single Origin.","authors":"Nobuyuki Inomata, Yohey Terai, Junko Kusumi, Kosuke M Teshima, Ixchel F Mandagi, Sjamsu A Lawelle, Kawilarang W A Masengi, Sayaka Mitsumoto, Saki Hashizume","doi":"10.1007/s00239-025-10288-y","DOIUrl":"https://doi.org/10.1007/s00239-025-10288-y","url":null,"abstract":"<p><p>The green-sensitive opsin (RH2) family has experienced considerably more gene duplication and loss events than other opsin families during evolution of teleost fishes. Although evolutionary patterns of RH2 genes in Oryzias species, which belong to the three major species groups, inhabiting various areas have been previously investigated, the evolutionary mechanisms underlying the diversification of the endemic species on Sulawesi Island, whose common ancestor colonized the region ~ 20 million years ago, remain unclear. In this study, we determined nucleotide sequences of RH2 genes from 21 individuals of nineteen Oryzias species (Adrianichthyidae) on Sulawesi. In RH2-A, we identified four amino acid sites (positions 94, 112, 166, and 198) with ω values over 5.3, indicating strong positive selection. Notably, substitutions at three of these sites are known to affect the absorption spectra and occurred independently on separate phylogenetic branches during species divergence. In RH2‑B and RH2‑C genes, identical amino acid residues were shared within an individual and among species, suggesting parallel mutations and/or gene conversion events. Moreover, five amino acid substitutions between RH2‑B and RH2‑C genes were fixed before colonization on Sulawesi, and four of these substitutions were associated with fine spectral tuning. While RH2‑B and RH2‑C have undergone concerted evolution in the species outside of Sulawesi, on Sulawesi the paralogs have divergently evolved. This divergence appeared to result from newly arisen mutations in either the RH2-B or RH2-C during speciation. In the RH2 genes, a number of amino acid substitutions at distinct sites led to shifts in the absorption spectrum. In particular, RH2‑A contains positively selected residues involved in spectral tuning, suggesting that these substitutions may have contributed to adaptive evolution. Our findings provide new insights into the evolutionary dynamics of RH2 gene diversification.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145540588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1007/s00239-025-10285-1
Gennadi V Glinsky
Transposable elements (TEs) have played a pivotal role in shaping the regulatory architecture of mammalian genomes. This contribution reports multiple lines of evidence suggesting that TE have made a significant impact on brain development by providing sequences for thousands of transcription factor binding sites (TFBS). TE-encoded TFBS have scaffolded brain developmental regulatory regions (BDRRs) across mammalian evolution. TFBS density within BDRRs has markedly increased along the evolutionary trajectory from mouse to macaque to chimpanzee, reaching its highest levels in modern humans. This density increase is accompanied by the preferential selection of specific TFs that actually bind genomic regulatory sequences. Consequently, humans and chimpanzees exhibit distinct repertoires of BDRR-bound TFs, which contribute to divergent developmental trajectories across hundreds of brain regions ranging from subcortical to telencephalon structures, including the basal ganglia (12 regions), midbrain (48), thalamus & prethalamus (85), hindbrain & cerebellum (25), limbic system & amygdala (25), neurodevelopmental structures (26). Despite the diversity of sequences contributed by different TEs, they encode TFBS for a relatively small set of ~ 700 TFs that act as central nodes organizing these regulatory landscapes. This provides a unifying framework for understanding both conserved and species-specific patterns of primate brain development. It suggests that TF networks seeded by TEs are key drivers of human neurodevelopmental innovation. Differential enrichment analyses of human vis-à-vis chimpanzee BDRRs identified 25 human BDRR-bound TFs that emanate transcriptional signatures of small TF subsets with significantly increased expression in 202 neuroanatomical structures. These observations point to a regulatory paradigm that small sets of highly-expressed genes that are significantly enriched in distinct human brain regions are selected from genes encoding TFs bound to human-specific BDRRs, thus linking "neuroanatomical transcriptional signatures" of brain structures to TFs governing brain development. Together, our findings highlight TE-derived TFBS as central architects of primate brain evolution, providing both mechanistic insight and avenues for future discovery.
{"title":"Transposable Elements Seed Transcription Factor Binding Sites to Sequence-Specific Double-Stranded DNA Binding TF Networks Contributing to Governance of Primate Brain Evolution.","authors":"Gennadi V Glinsky","doi":"10.1007/s00239-025-10285-1","DOIUrl":"https://doi.org/10.1007/s00239-025-10285-1","url":null,"abstract":"<p><p>Transposable elements (TEs) have played a pivotal role in shaping the regulatory architecture of mammalian genomes. This contribution reports multiple lines of evidence suggesting that TE have made a significant impact on brain development by providing sequences for thousands of transcription factor binding sites (TFBS). TE-encoded TFBS have scaffolded brain developmental regulatory regions (BDRRs) across mammalian evolution. TFBS density within BDRRs has markedly increased along the evolutionary trajectory from mouse to macaque to chimpanzee, reaching its highest levels in modern humans. This density increase is accompanied by the preferential selection of specific TFs that actually bind genomic regulatory sequences. Consequently, humans and chimpanzees exhibit distinct repertoires of BDRR-bound TFs, which contribute to divergent developmental trajectories across hundreds of brain regions ranging from subcortical to telencephalon structures, including the basal ganglia (12 regions), midbrain (48), thalamus & prethalamus (85), hindbrain & cerebellum (25), limbic system & amygdala (25), neurodevelopmental structures (26). Despite the diversity of sequences contributed by different TEs, they encode TFBS for a relatively small set of ~ 700 TFs that act as central nodes organizing these regulatory landscapes. This provides a unifying framework for understanding both conserved and species-specific patterns of primate brain development. It suggests that TF networks seeded by TEs are key drivers of human neurodevelopmental innovation. Differential enrichment analyses of human vis-à-vis chimpanzee BDRRs identified 25 human BDRR-bound TFs that emanate transcriptional signatures of small TF subsets with significantly increased expression in 202 neuroanatomical structures. These observations point to a regulatory paradigm that small sets of highly-expressed genes that are significantly enriched in distinct human brain regions are selected from genes encoding TFs bound to human-specific BDRRs, thus linking \"neuroanatomical transcriptional signatures\" of brain structures to TFs governing brain development. Together, our findings highlight TE-derived TFBS as central architects of primate brain evolution, providing both mechanistic insight and avenues for future discovery.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145540900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-06DOI: 10.1007/s00239-025-10283-3
Gustavo Caetano-Anollés
Nearly 60 years ago, Eck and Dayhoff (Science 152:363-366, 1966) aligned amino acids from first and second halves of a ferredoxin sequence, revealing a symmetric CX2CX2CX3CX18CX2C2C3C spacing signature in which Xn denoted intervening residues. This symmetry, along with other cyclic patterns, suggested that a tandem duplication shaped ferredoxin evolution and that the ancestral sequence may have drawn from a reduced amino acid repertoire. Here, I revisit the duplication model using the deep learning-based AlphaFold2 ab initio pipeline, benchmarked against the I-TASSER threading tool. Predicted ancestral structures were obtained with high confidence, with some aligning to the two halves of a reference ferredoxin (PDB entry 1CIF) at acceptable RMSD and TM-score values. A chronology of loops and structural domains further identified which duplicate was ancestral, reinforcing the antiquity of the fold. Loops and domains also dissected the evolution of the [4Fe-4S] ferredoxin superfamily. The resulting structural models provided strong support for the tandem duplication hypothesis and the idea that modular units underpinned early molecular evolution. However, they also challenged the notion that the duplication event arose from a reduced amino acid alphabet. This work revisits Eck and Dayhoff's seminal insights and commemorates Dayhoff's pioneering contributions on the centenary of her birth.
{"title":"Revisiting Eck and Dayhoff's Building Block Model of Ferredoxin Evolution on Dayhoff's 100th Birthday.","authors":"Gustavo Caetano-Anollés","doi":"10.1007/s00239-025-10283-3","DOIUrl":"https://doi.org/10.1007/s00239-025-10283-3","url":null,"abstract":"<p><p>Nearly 60 years ago, Eck and Dayhoff (Science 152:363-366, 1966) aligned amino acids from first and second halves of a ferredoxin sequence, revealing a symmetric CX<sub>2</sub>CX<sub>2</sub>CX<sub>3</sub>CX<sub>18</sub>CX<sub>2</sub>C<sub>2</sub>C<sub>3</sub>C spacing signature in which X<sub>n</sub> denoted intervening residues. This symmetry, along with other cyclic patterns, suggested that a tandem duplication shaped ferredoxin evolution and that the ancestral sequence may have drawn from a reduced amino acid repertoire. Here, I revisit the duplication model using the deep learning-based AlphaFold2 ab initio pipeline, benchmarked against the I-TASSER threading tool. Predicted ancestral structures were obtained with high confidence, with some aligning to the two halves of a reference ferredoxin (PDB entry 1CIF) at acceptable RMSD and TM-score values. A chronology of loops and structural domains further identified which duplicate was ancestral, reinforcing the antiquity of the fold. Loops and domains also dissected the evolution of the [4Fe-4S] ferredoxin superfamily. The resulting structural models provided strong support for the tandem duplication hypothesis and the idea that modular units underpinned early molecular evolution. However, they also challenged the notion that the duplication event arose from a reduced amino acid alphabet. This work revisits Eck and Dayhoff's seminal insights and commemorates Dayhoff's pioneering contributions on the centenary of her birth.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145452055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-06DOI: 10.1007/s00239-025-10284-2
Angelo Pavesi
The discovery of translated alternative open reading frames (altORFs) in protein-coding regions has expanded the coding potential of viral, prokaryotic and eukaryotic genes. Experimental and computational approaches indicate that overlapping coding regions occur in mammals. In this study, I used a prediction method based on five criteria to detect novel altORFs in the human genes taken from the COSMIC Cancer Gene Census Database. Apart from the well characterized examples of human cancer-specific antigens expressed from altORF, the vast catalogue of nucleotide substitutions across cancer genes (the COSMIC database) is also likely to harbor previously uncharacterized altORFs. Under the five prediction criteria, I found 251 novel altORFs, 41 of which highly conserved in mammals and 60 uniquely resulting from nucleotide substitutions in the primary ORF of cancer genes. I found experimental evidence for 38% of the 251 novel altORFs from mass spectrometry and ribosome profiling databases. In particular, I found three altORFs in the proto-oncogene RET, three expressed altORfs in the isocitrate dehydrogenase-2 gene, and one expressed large altORF (498 nt) in the mutated TP53 gene. This study may offer clinical perspectives, because a potential source of cancer antigens may include antigens derived from translation of currently unannotated open reading frames. The altORFs detected in this study could be candidates for future experimental validation.
{"title":"Systematic Detection of Alternative Open Reading Frames (altORFs) in Cancer Driver Genes.","authors":"Angelo Pavesi","doi":"10.1007/s00239-025-10284-2","DOIUrl":"https://doi.org/10.1007/s00239-025-10284-2","url":null,"abstract":"<p><p>The discovery of translated alternative open reading frames (altORFs) in protein-coding regions has expanded the coding potential of viral, prokaryotic and eukaryotic genes. Experimental and computational approaches indicate that overlapping coding regions occur in mammals. In this study, I used a prediction method based on five criteria to detect novel altORFs in the human genes taken from the COSMIC Cancer Gene Census Database. Apart from the well characterized examples of human cancer-specific antigens expressed from altORF, the vast catalogue of nucleotide substitutions across cancer genes (the COSMIC database) is also likely to harbor previously uncharacterized altORFs. Under the five prediction criteria, I found 251 novel altORFs, 41 of which highly conserved in mammals and 60 uniquely resulting from nucleotide substitutions in the primary ORF of cancer genes. I found experimental evidence for 38% of the 251 novel altORFs from mass spectrometry and ribosome profiling databases. In particular, I found three altORFs in the proto-oncogene RET, three expressed altORfs in the isocitrate dehydrogenase-2 gene, and one expressed large altORF (498 nt) in the mutated TP53 gene. This study may offer clinical perspectives, because a potential source of cancer antigens may include antigens derived from translation of currently unannotated open reading frames. The altORFs detected in this study could be candidates for future experimental validation.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145452093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.1007/s00239-025-10278-0
Makarim Elfadil M Osman, Amina I Dirar, Mohanad A Ibrahim, Rieham Sallah H Osman, Doaa Awad Yassin Ali, Somia Elmosharaf Elrayah Yousif, Hana Badreldin Mohamed Abakar, Nada Hassan M Haj, Emadeldin Hassan E Konozy
Lectins are a diverse class of proteins that play crucial roles in plant defense, stress responses, and various physiological processes. However, comprehensive comparative analyses of lectin gene families across closely related Phaseolus species are lacking, and the evolutionary and stress-responsive dynamics of these genes remain poorly understood. This study provides a comprehensive genome-wide analysis of lectin genes in three Phaseolus species: P. vulgaris, P. lunatus, and P. acutifolius. Using genomic data from the Phytozome database, we identified 132, 132, and 134 putative lectin genes, respectively, across 8 lectin families, with the legume, GNA, and Nictaba families being the most abundant. Domain architecture analysis revealed a broad structural spectrum, from simple merolectins to complex chimerolectins with multiple domains. Expansion analysis indicated that lectin family expansion primarily occurred through tandem and dispersed duplications with similar syntenic profiles across the Phaseolus species. Codon-based evolutionary models revealed that while most lectin genes are purifying selection, several duplicated pairs from specific families (i.e., legume, Pl-Nictaba, and Pa-Hevein) show site-specific and episodic positive selection, suggesting adaptive divergence linked to functional specialization. Expression profiling under abiotic (salinity, cold) and biotic (bacterial, fungal, insect) stress conditions revealed differential regulation of lectin genes, with multiple genes (14) exhibiting pleiotropic effects through upregulation under several stresses. Regulatory analysis identified transcription factors from AP2, B3, Dof, ERF, MYB, and TCP families as potential upstream regulators of these pleiotropic genes, forming complex cis-regulatory networks integrating environmental and developmental signals. This study provides novel insights into the structural diversity, evolutionary dynamics, and stress-responsive roles of lectins in Phaseolus species and identifies promising targets for improving stress resilience in legume crops.
凝集素是一种多样的蛋白质,在植物防御、应激反应和各种生理过程中起着至关重要的作用。然而,缺乏对密切相关的菜豆属植物凝集素基因家族的全面比较分析,并且这些基因的进化和应激反应动力学仍然知之甚少。本研究对三种菜豆属植物(P. vulgaris, P. lunatus和P. acutifolius)的凝集素基因进行了全面的全基因组分析。利用Phytozome数据库的基因组数据,我们在8个凝集素家族中分别鉴定出132、132和134个推测的凝集素基因,其中豆科、GNA和Nictaba家族的数量最多。结构域结构分析揭示了其广泛的结构谱,从简单的单聚集素到具有多个结构域的复杂嵌合集素。扩增分析表明,凝集素家族扩增主要通过串联扩增和分散扩增发生。基于密码子的进化模型显示,虽然大多数凝集素基因是纯化选择,但来自特定家族(如豆科、l- nictaba和Pa-Hevein)的一些重复基因对表现出位点特异性和偶发性正选择,表明适应性分化与功能特化有关。在非生物(盐度、寒冷)和生物(细菌、真菌、昆虫)胁迫条件下的表达谱揭示了凝集素基因的不同调控,多个基因(14)在多种胁迫下通过上调表现出多效性效应。调控分析发现,来自AP2、B3、Dof、ERF、MYB和TCP家族的转录因子是这些多效基因的潜在上游调控因子,形成了复杂的顺式调控网络,整合了环境和发育信号。本研究为菜豆属植物凝集素的结构多样性、进化动态和应激响应作用提供了新的见解,并为提高豆科作物的应激恢复能力确定了有希望的目标。
{"title":"Lectin Gene Families in Three Phaseolus Species: Genome-Wide Identification, Evolutionary Analysis, Pleiotropic Effect, and Regulation Under Multiple Stress Conditions.","authors":"Makarim Elfadil M Osman, Amina I Dirar, Mohanad A Ibrahim, Rieham Sallah H Osman, Doaa Awad Yassin Ali, Somia Elmosharaf Elrayah Yousif, Hana Badreldin Mohamed Abakar, Nada Hassan M Haj, Emadeldin Hassan E Konozy","doi":"10.1007/s00239-025-10278-0","DOIUrl":"https://doi.org/10.1007/s00239-025-10278-0","url":null,"abstract":"<p><p>Lectins are a diverse class of proteins that play crucial roles in plant defense, stress responses, and various physiological processes. However, comprehensive comparative analyses of lectin gene families across closely related Phaseolus species are lacking, and the evolutionary and stress-responsive dynamics of these genes remain poorly understood. This study provides a comprehensive genome-wide analysis of lectin genes in three Phaseolus species: P. vulgaris, P. lunatus, and P. acutifolius. Using genomic data from the Phytozome database, we identified 132, 132, and 134 putative lectin genes, respectively, across 8 lectin families, with the legume, GNA, and Nictaba families being the most abundant. Domain architecture analysis revealed a broad structural spectrum, from simple merolectins to complex chimerolectins with multiple domains. Expansion analysis indicated that lectin family expansion primarily occurred through tandem and dispersed duplications with similar syntenic profiles across the Phaseolus species. Codon-based evolutionary models revealed that while most lectin genes are purifying selection, several duplicated pairs from specific families (i.e., legume, Pl-Nictaba, and Pa-Hevein) show site-specific and episodic positive selection, suggesting adaptive divergence linked to functional specialization. Expression profiling under abiotic (salinity, cold) and biotic (bacterial, fungal, insect) stress conditions revealed differential regulation of lectin genes, with multiple genes (14) exhibiting pleiotropic effects through upregulation under several stresses. Regulatory analysis identified transcription factors from AP2, B3, Dof, ERF, MYB, and TCP families as potential upstream regulators of these pleiotropic genes, forming complex cis-regulatory networks integrating environmental and developmental signals. This study provides novel insights into the structural diversity, evolutionary dynamics, and stress-responsive roles of lectins in Phaseolus species and identifies promising targets for improving stress resilience in legume crops.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145286369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}