Pub Date : 2025-12-03DOI: 10.1007/s00239-025-10289-x
Zhen Zhao, Junye Ma, Qun Yang, Gert Wörheide, Dirk Erpenbeck
Mitochondrial introns have a patchy distribution in sponge lineages. Here, we report on the finding of a group-II-intron in Eunapius rarus (Demospongiae, Spongillidae), which constitutes the first report of a mitochondrial intron in freshwater sponges. Group-II-introns are self-splicing ribozymes, and are particularly rare among sponge mitochondrial genomes. The intron contains complete open reading frames (ORFs), including typical intron-encoded proteins (IEPs). Phylogenetic analysis reveals that the intron is more closely related to those found in brown algae, and distant from other sponge group-II-introns, indicating an acquisition of this intron independent from other sponges. Remarkably, the congeneric E. fragilis does not possess this intron in their mitochondrial genome. However, we found pseudogenic copies of the E. rarus group-II-intron in the nuclear genome of E. fragilis, which indicates patterns of group-II-intron presence and their pseudogene transposition into the nuclear genomes in sponges for the first time. Our results show that a group-II-intron must have been present in the last common ancestor of both Eunapius mt genomes, and subsequently lost in E. fragilis, rather than independent acquisition. Consequently, our findings provide an explanation for the patchy distribution of introns in sponges as a result of frequent losses, besides multiple acquisitions.
{"title":"First Report on Presence of Mitochondrial Introns in Freshwater Sponges, and Pseudogenic Evidence of Their Loss.","authors":"Zhen Zhao, Junye Ma, Qun Yang, Gert Wörheide, Dirk Erpenbeck","doi":"10.1007/s00239-025-10289-x","DOIUrl":"https://doi.org/10.1007/s00239-025-10289-x","url":null,"abstract":"<p><p>Mitochondrial introns have a patchy distribution in sponge lineages. Here, we report on the finding of a group-II-intron in Eunapius rarus (Demospongiae, Spongillidae), which constitutes the first report of a mitochondrial intron in freshwater sponges. Group-II-introns are self-splicing ribozymes, and are particularly rare among sponge mitochondrial genomes. The intron contains complete open reading frames (ORFs), including typical intron-encoded proteins (IEPs). Phylogenetic analysis reveals that the intron is more closely related to those found in brown algae, and distant from other sponge group-II-introns, indicating an acquisition of this intron independent from other sponges. Remarkably, the congeneric E. fragilis does not possess this intron in their mitochondrial genome. However, we found pseudogenic copies of the E. rarus group-II-intron in the nuclear genome of E. fragilis, which indicates patterns of group-II-intron presence and their pseudogene transposition into the nuclear genomes in sponges for the first time. Our results show that a group-II-intron must have been present in the last common ancestor of both Eunapius mt genomes, and subsequently lost in E. fragilis, rather than independent acquisition. Consequently, our findings provide an explanation for the patchy distribution of introns in sponges as a result of frequent losses, besides multiple acquisitions.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145667938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-29DOI: 10.1007/s00239-025-10280-6
Martin Schoenstein, Pauline Mermillod, Arnaud Kress, Odile Lecompte, Yannis Nevers
Phylogenetic profiling, involving the analysis of presence-absence of orthologs in a set of species, is a way to infer functional association between genes through co-evolutionary patterns. Since its inception, numerous methods have been described to construct phylogenetic profiles, evaluate their similarity, or identify clusters of co-evolving genes. However, few of these methods are available as downloadable software. We present Profylo, a phylogenetic profiling toolkit made available as an open-source Python 3.0 package. Profylo implements seven methods for comparing phylogenetic profiling, four algorithms for identification of co-evolving clusters, as well as tools to help with their analysis, including visualization features. We take advantage of the variety of methods implemented in Profylo to benchmark their ability to predict functional relationships between human genes, using different datasets. Finally, we demonstrate the utility of the package with an example case study of the presence-absence of all protein-coding genes in the human genome. Profylo is available on GitHub at https://github.com/MartinSchoenstein/Profylo .
{"title":"Profylo: A Python Package for Phylogenetic Profile Comparison and Analysis.","authors":"Martin Schoenstein, Pauline Mermillod, Arnaud Kress, Odile Lecompte, Yannis Nevers","doi":"10.1007/s00239-025-10280-6","DOIUrl":"10.1007/s00239-025-10280-6","url":null,"abstract":"<p><p>Phylogenetic profiling, involving the analysis of presence-absence of orthologs in a set of species, is a way to infer functional association between genes through co-evolutionary patterns. Since its inception, numerous methods have been described to construct phylogenetic profiles, evaluate their similarity, or identify clusters of co-evolving genes. However, few of these methods are available as downloadable software. We present Profylo, a phylogenetic profiling toolkit made available as an open-source Python 3.0 package. Profylo implements seven methods for comparing phylogenetic profiling, four algorithms for identification of co-evolving clusters, as well as tools to help with their analysis, including visualization features. We take advantage of the variety of methods implemented in Profylo to benchmark their ability to predict functional relationships between human genes, using different datasets. Finally, we demonstrate the utility of the package with an example case study of the presence-absence of all protein-coding genes in the human genome. Profylo is available on GitHub at https://github.com/MartinSchoenstein/Profylo .</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"806-819"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756282/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145401055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-25DOI: 10.1007/s00239-025-10268-2
Dong-Ha Oh, Alexander Astashyn, Barbara Robbertse, Nuala A O'leary, W Ray Anderson, Laurie Breen, Eric Cox, Olga Ermolaeva, Robert Falk, Vichet Hem, J Bradley Holmes, Patrick Masterson, Kelly M McGarvey, Eyal Mozes, John P Torcivia, Mirian T N Tsuchiya, Craig Wallin, Françoise Thibaud-Nissen, Terence D Murphy, Vamsi K Kodali
Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed "NCBI Orthologs", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.
{"title":"NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes.","authors":"Dong-Ha Oh, Alexander Astashyn, Barbara Robbertse, Nuala A O'leary, W Ray Anderson, Laurie Breen, Eric Cox, Olga Ermolaeva, Robert Falk, Vichet Hem, J Bradley Holmes, Patrick Masterson, Kelly M McGarvey, Eyal Mozes, John P Torcivia, Mirian T N Tsuchiya, Craig Wallin, Françoise Thibaud-Nissen, Terence D Murphy, Vamsi K Kodali","doi":"10.1007/s00239-025-10268-2","DOIUrl":"10.1007/s00239-025-10268-2","url":null,"abstract":"<p><p>Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed \"NCBI Orthologs\", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"843-859"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756343/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145137838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-13DOI: 10.1007/s00239-025-10276-2
Felix Langschied, Ruben Iruegas, Mateusz Sikora, Roberto Covino, Ingo Ebersberger
Orthologs, evolutionarily related genes that diverged through speciation, are mutually the closest related sequences in different species. Consequently, they are ideal candidates for identifying functionally equivalent genes across taxa, a prerequisite for transferring gene function information from model to non-model organisms in silico. However, orthologs are not immune to functional divergence. Failing to recognize such divergent instances results in spurious functional annotation transfer. Here, we propose to treat the functional equivalence of orthologs as a null hypothesis that must be critically tested rather than assumed. This requires integrating several lines of evidence to evaluate both changes in an ortholog's network of molecular interactions and alterations in its biochemical activity. We outline how such activity shifts can be assessed using increasingly fine-grained analyses, including comparisons of protein feature architectures and predicted 3D structures. While some orthology resources incorporate aspects of this evidence, such assessments are often manual and not scalable. We argue for a systematic, multi-level perspective to detect functional divergence prior to annotation transfer. To support the broader adoption of this approach, we offer methodological recommendations and practical examples that demonstrate the value of this framework in large-scale comparative genomics.
{"title":"A Multi-level Perspective on the Evolution of Orthologs and Their Functions.","authors":"Felix Langschied, Ruben Iruegas, Mateusz Sikora, Roberto Covino, Ingo Ebersberger","doi":"10.1007/s00239-025-10276-2","DOIUrl":"10.1007/s00239-025-10276-2","url":null,"abstract":"<p><p>Orthologs, evolutionarily related genes that diverged through speciation, are mutually the closest related sequences in different species. Consequently, they are ideal candidates for identifying functionally equivalent genes across taxa, a prerequisite for transferring gene function information from model to non-model organisms in silico. However, orthologs are not immune to functional divergence. Failing to recognize such divergent instances results in spurious functional annotation transfer. Here, we propose to treat the functional equivalence of orthologs as a null hypothesis that must be critically tested rather than assumed. This requires integrating several lines of evidence to evaluate both changes in an ortholog's network of molecular interactions and alterations in its biochemical activity. We outline how such activity shifts can be assessed using increasingly fine-grained analyses, including comparisons of protein feature architectures and predicted 3D structures. While some orthology resources incorporate aspects of this evidence, such assessments are often manual and not scalable. We argue for a systematic, multi-level perspective to detect functional divergence prior to annotation transfer. To support the broader adoption of this approach, we offer methodological recommendations and practical examples that demonstrate the value of this framework in large-scale comparative genomics.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"720-729"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145280385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-26DOI: 10.1007/s00239-025-10271-7
Ali Yazdizadeh Kharrazi, Adrian M Altenhoff, Nikolai Romashchenko, Christophe Dessimoz, Sina Majidian
The OrthoXML file is a standard file format for orthology data. It provides a standardized structure for describing orthologous and paralogous relationships while allowing the user to store custom and database-specific data in the same format. Although many orthology databases use it as a way to export data, there is no comprehensive toolkit for working with this format. Here, we introduce the OrthoXML-tools ( https://github.com/DessimozLab/orthoxml-tools ), a comprehensive toolkit for loading, manipulating, and exporting the OrthoXML files to other formats. We show its capabilities and performance on our benchmarks.
{"title":"OrthoXML-Tools: A Toolkit for Manipulating OrthoXML Files for Orthology Data.","authors":"Ali Yazdizadeh Kharrazi, Adrian M Altenhoff, Nikolai Romashchenko, Christophe Dessimoz, Sina Majidian","doi":"10.1007/s00239-025-10271-7","DOIUrl":"10.1007/s00239-025-10271-7","url":null,"abstract":"<p><p>The OrthoXML file is a standard file format for orthology data. It provides a standardized structure for describing orthologous and paralogous relationships while allowing the user to store custom and database-specific data in the same format. Although many orthology databases use it as a way to export data, there is no comprehensive toolkit for working with this format. Here, we introduce the OrthoXML-tools ( https://github.com/DessimozLab/orthoxml-tools ), a comprehensive toolkit for loading, manipulating, and exporting the OrthoXML files to other formats. We show its capabilities and performance on our benchmarks.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"800-805"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-21DOI: 10.1007/s00239-025-10277-1
Garance Sarton-Lohéac, Nikolai Romashchenko, Clément Marie Train, Sina Majidian, Natasha Glover
With the rapid advancement of large-scale sequencing initiatives, the need for efficient and accurate methods for inferring orthologous and paralogous relationships has never been more critical. Hierarchical orthologous groups (HOGs) provide a powerful solution to this challenge, offering a precise, scalable framework to study gene families and their evolutionary histories across diverse species. In this review, we introduce the concept of HOGs and explore their advantages over traditional methods. Next, we highlight key applications of HOGs, including their use in representing gene families, inferring ancestral genomes, tracking gene gain and loss events, functional annotation, and phylogenetic profiling. We overview the process of constructing HOGs and discuss the challenges and limitations of HOG inference. The HOG framework provides a clear and structured approach to organizing homologous genes, making it possible to gain deeper insights into gene family and species evolution.
{"title":"Reconstructing Evolutionary Histories with Hierarchical Orthologous Groups.","authors":"Garance Sarton-Lohéac, Nikolai Romashchenko, Clément Marie Train, Sina Majidian, Natasha Glover","doi":"10.1007/s00239-025-10277-1","DOIUrl":"10.1007/s00239-025-10277-1","url":null,"abstract":"<p><p>With the rapid advancement of large-scale sequencing initiatives, the need for efficient and accurate methods for inferring orthologous and paralogous relationships has never been more critical. Hierarchical orthologous groups (HOGs) provide a powerful solution to this challenge, offering a precise, scalable framework to study gene families and their evolutionary histories across diverse species. In this review, we introduce the concept of HOGs and explore their advantages over traditional methods. Next, we highlight key applications of HOGs, including their use in representing gene families, inferring ancestral genomes, tracking gene gain and loss events, functional annotation, and phylogenetic profiling. We overview the process of constructing HOGs and discuss the challenges and limitations of HOG inference. The HOG framework provides a clear and structured approach to organizing homologous genes, making it possible to gain deeper insights into gene family and species evolution.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"740-764"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145564275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-22DOI: 10.1007/s00239-025-10279-z
Christopher M Williams, Paul D Thomas
The identification of orthologs plays an important role in comparative genomics and function inference. Here, we present OrthoGrafter, a bioinformatics tool for taking one or more query sequences and inferring a set of orthologs drawn from the collection of 143 well annotated species in the PANTHER database of reconciled gene trees. OrthoGrafter takes sets of graft points output by the highly used TreeGrafter software (either the standalone package or InterProScan), and for each one, outputs a list of predicted orthologous genes from the grafted PANTHER tree (with the ability to additionally output paralog and xenolog sets). If the taxonomic identifier for the query is also provided, OrthoGrafter incorporates the novel step of adjusting the graft point to facilitate consistent taxonomic assignment for the graft within the reconciled gene family, which we demonstrate shows an improvement in the ortholog inference via correlation with orthologs provided by the OMA database. Lightweight and utilizing precomputed results to enable rapid determination of ortholog predictions for large sample groups, OrthoGrafter is available at https://github.com/pantherdb/OrthoGrafter. .
{"title":"OrthoGrafter: Rapid Identification of Orthologs from Precomputed Placement in Phylogenetic Trees.","authors":"Christopher M Williams, Paul D Thomas","doi":"10.1007/s00239-025-10279-z","DOIUrl":"10.1007/s00239-025-10279-z","url":null,"abstract":"<p><p>The identification of orthologs plays an important role in comparative genomics and function inference. Here, we present OrthoGrafter, a bioinformatics tool for taking one or more query sequences and inferring a set of orthologs drawn from the collection of 143 well annotated species in the PANTHER database of reconciled gene trees. OrthoGrafter takes sets of graft points output by the highly used TreeGrafter software (either the standalone package or InterProScan), and for each one, outputs a list of predicted orthologous genes from the grafted PANTHER tree (with the ability to additionally output paralog and xenolog sets). If the taxonomic identifier for the query is also provided, OrthoGrafter incorporates the novel step of adjusting the graft point to facilitate consistent taxonomic assignment for the graft within the reconciled gene family, which we demonstrate shows an improvement in the ortholog inference via correlation with orthologs provided by the OMA database. Lightweight and utilizing precomputed results to enable rapid determination of ortholog predictions for large sample groups, OrthoGrafter is available at https://github.com/pantherdb/OrthoGrafter. .</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"820-829"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756373/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145582097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-14DOI: 10.1007/s00239-025-10272-6
Sina Majidian, Armin Hadziahmetovic, Felix Langschied, Stefano Pascarelli, Silvia Prieto-Baños, Jorge Rojas-Vargas, Edward L Braun, Christophe Dessimoz, Abdoulaye Baniré Diallo, Dannie Durand, Gang Fang, Toni Gabaldón, Natasha Glover, David A Liberles, Claire McWhite, Erik L L Sonnhammer, Paul D Thomas, Aïda Ouangraoua, Irene Julca
The rapid advancement of DNA sequencing technologies and computational algorithms has led to an unprecedented surge in genomic data, driven by several large-scale sequencing projects worldwide. Orthology plays a crucial role in understanding evolutionary patterns of genes and their functions. At the last Quest for Orthologs meeting (Montréal, Canada-2024), we discussed recent advances in orthology inference, with a focus on the impact of artificial intelligence (AI), protein structures, RNA splicing isoforms, and protein domain evolution together with other evolutionary considerations. A long-standing challenge in the field is the functional annotation of paralogs, for which we present novel approaches. The meeting also emphasised strategies for integrating diverse genetic features into the concept of orthology, encouraging frameworks that account for elements like alternative splicing, domain organisation, and regulatory sequences. We discuss various applications of orthology and paralogy to environmental research, agriculture, and comparative genomics. Additionally, we report recent progress in orthology inference methodologies and resources. This work represents a collaborative synthesis of insights and innovations presented at the 8th Quest for Orthologs meeting, highlighting current progress while outlining future directions for orthology research.
DNA测序技术和计算算法的快速发展导致了基因组数据前所未有的激增,这是由全球几个大规模测序项目推动的。正形学在理解基因的进化模式及其功能方面起着至关重要的作用。在最近的Quest for Orthologs会议上(montracimal, Canada-2024),我们讨论了Orthologs推断的最新进展,重点是人工智能(AI)、蛋白质结构、RNA剪接异构体、蛋白质结构域进化以及其他进化考虑的影响。该领域一个长期存在的挑战是类比的功能注释,为此我们提出了新的方法。会议还强调了将不同的遗传特征整合到同源学概念中的策略,鼓励考虑诸如选择性剪接、结构域组织和调控序列等因素的框架。我们讨论了在环境研究、农业和比较基因组学方面的不同应用。此外,我们报告了最近在正畸推理方法和资源方面的进展。这项工作代表了第八届正交学探索会议上提出的见解和创新的协作综合,突出了当前的进展,同时概述了正交学研究的未来方向。
{"title":"Quest for Orthologs in the era of Data Deluge and AI: Challenges and Innovations in Orthology Prediction and Data Integration.","authors":"Sina Majidian, Armin Hadziahmetovic, Felix Langschied, Stefano Pascarelli, Silvia Prieto-Baños, Jorge Rojas-Vargas, Edward L Braun, Christophe Dessimoz, Abdoulaye Baniré Diallo, Dannie Durand, Gang Fang, Toni Gabaldón, Natasha Glover, David A Liberles, Claire McWhite, Erik L L Sonnhammer, Paul D Thomas, Aïda Ouangraoua, Irene Julca","doi":"10.1007/s00239-025-10272-6","DOIUrl":"10.1007/s00239-025-10272-6","url":null,"abstract":"<p><p>The rapid advancement of DNA sequencing technologies and computational algorithms has led to an unprecedented surge in genomic data, driven by several large-scale sequencing projects worldwide. Orthology plays a crucial role in understanding evolutionary patterns of genes and their functions. At the last Quest for Orthologs meeting (Montréal, Canada-2024), we discussed recent advances in orthology inference, with a focus on the impact of artificial intelligence (AI), protein structures, RNA splicing isoforms, and protein domain evolution together with other evolutionary considerations. A long-standing challenge in the field is the functional annotation of paralogs, for which we present novel approaches. The meeting also emphasised strategies for integrating diverse genetic features into the concept of orthology, encouraging frameworks that account for elements like alternative splicing, domain organisation, and regulatory sequences. We discuss various applications of orthology and paralogy to environmental research, agriculture, and comparative genomics. Additionally, we report recent progress in orthology inference methodologies and resources. This work represents a collaborative synthesis of insights and innovations presented at the 8th Quest for Orthologs meeting, highlighting current progress while outlining future directions for orthology research.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"702-719"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145286362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-22DOI: 10.1007/s00239-025-10282-4
Robert Shaw, Samuel D Love, Claire D McWhite
Protein Language Models (PLMs) have emerged as powerful tools for representing protein sequences. We explore how embeddings (numeric vector representations) from pretrained PLMs can serve as direct numeric proxies for protein structure and function without requiring additional training or fine-tuning. In a proof-of-concept study of 22 cross-species complementation triplets-a gold standard for functional similarity where genes from one species are tested for their ability to rescue gene deletions in another species-we find that ESM-C 600 M embeddings summarized into pooled sliced-Wasserstein embeddings achieved high discrimination of subtle functional differences. This pooling method captures distributional properties of amino acid embeddings by comparing them against reference points using optimal transport theory. While our limited sample size precludes definitive conclusions about whether PLM embeddings systematically outperform sequence-based methods in detecting protein functional similarity, our preliminary results demonstrate the potential of using protein embeddings for functional analysis. Our exploratory analysis of orthology relationships suggests that embedding similarity may correlate with functional conservation, with the least diverged ortholog showing higher embedding similarity in approximately two-thirds of cases. Analyzing the Ortholog Conjecture-that orthologs maintain greater functional similarity than paralogs at equivalent sequence divergence-we do not observe clear differences between one-to-one orthologs and inparalog embedding similarities. Finally, we propose integrating PLMs with phylogenetic methods in a hybrid approach that leverages their complementary strengths: PLM-derived numeric embeddings for rapid homology detection and phylogenetics for evolutionary precision. We introduce embedding-tree versus gene-tree discordance as a potential metric to detect functional divergence between closely related proteins. Integrating protein embeddings with sequence analysis may enable a more nuanced understanding of protein function and evolutionary dynamics.
{"title":"Evaluating Pretrained Protein Language Model Embeddings as Proxies for Functional Similarity.","authors":"Robert Shaw, Samuel D Love, Claire D McWhite","doi":"10.1007/s00239-025-10282-4","DOIUrl":"10.1007/s00239-025-10282-4","url":null,"abstract":"<p><p>Protein Language Models (PLMs) have emerged as powerful tools for representing protein sequences. We explore how embeddings (numeric vector representations) from pretrained PLMs can serve as direct numeric proxies for protein structure and function without requiring additional training or fine-tuning. In a proof-of-concept study of 22 cross-species complementation triplets-a gold standard for functional similarity where genes from one species are tested for their ability to rescue gene deletions in another species-we find that ESM-C 600 M embeddings summarized into pooled sliced-Wasserstein embeddings achieved high discrimination of subtle functional differences. This pooling method captures distributional properties of amino acid embeddings by comparing them against reference points using optimal transport theory. While our limited sample size precludes definitive conclusions about whether PLM embeddings systematically outperform sequence-based methods in detecting protein functional similarity, our preliminary results demonstrate the potential of using protein embeddings for functional analysis. Our exploratory analysis of orthology relationships suggests that embedding similarity may correlate with functional conservation, with the least diverged ortholog showing higher embedding similarity in approximately two-thirds of cases. Analyzing the Ortholog Conjecture-that orthologs maintain greater functional similarity than paralogs at equivalent sequence divergence-we do not observe clear differences between one-to-one orthologs and inparalog embedding similarities. Finally, we propose integrating PLMs with phylogenetic methods in a hybrid approach that leverages their complementary strengths: PLM-derived numeric embeddings for rapid homology detection and phylogenetics for evolutionary precision. We introduce embedding-tree versus gene-tree discordance as a potential metric to detect functional divergence between closely related proteins. Integrating protein embeddings with sequence analysis may enable a more nuanced understanding of protein function and evolutionary dynamics.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"765-776"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145581999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}