首页 > 最新文献

Journal of Molecular Evolution最新文献

英文 中文
First Report on Presence of Mitochondrial Introns in Freshwater Sponges, and Pseudogenic Evidence of Their Loss. 淡水海绵中线粒体内含子的存在及其丢失的假基因证据的首次报道。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-03 DOI: 10.1007/s00239-025-10289-x
Zhen Zhao, Junye Ma, Qun Yang, Gert Wörheide, Dirk Erpenbeck

Mitochondrial introns have a patchy distribution in sponge lineages. Here, we report on the finding of a group-II-intron in Eunapius rarus (Demospongiae, Spongillidae), which constitutes the first report of a mitochondrial intron in freshwater sponges. Group-II-introns are self-splicing ribozymes, and are particularly rare among sponge mitochondrial genomes. The intron contains complete open reading frames (ORFs), including typical intron-encoded proteins (IEPs). Phylogenetic analysis reveals that the intron is more closely related to those found in brown algae, and distant from other sponge group-II-introns, indicating an acquisition of this intron independent from other sponges. Remarkably, the congeneric E. fragilis does not possess this intron in their mitochondrial genome. However, we found pseudogenic copies of the E. rarus group-II-intron in the nuclear genome of E. fragilis, which indicates patterns of group-II-intron presence and their pseudogene transposition into the nuclear genomes in sponges for the first time. Our results show that a group-II-intron must have been present in the last common ancestor of both Eunapius mt genomes, and subsequently lost in E. fragilis, rather than independent acquisition. Consequently, our findings provide an explanation for the patchy distribution of introns in sponges as a result of frequent losses, besides multiple acquisitions.

海绵系线粒体内含子呈斑片状分布。在这里,我们报道了在淡水海绵中首次发现线粒体内含子的乌纳皮乌斯·拉鲁斯(Demospongiae,海绵科)中发现的ii类内含子。ii类内含子是自剪接核酶,在海绵线粒体基因组中特别罕见。内含子包含完整的开放阅读框(orf),包括典型的内含子编码蛋白(IEPs)。系统发育分析表明,该内含子与褐藻中发现的内含子关系更为密切,而与其他海绵ii类内含子距离较远,表明该内含子的获得独立于其他海绵。值得注意的是,同源的易碎E.在它们的线粒体基因组中没有这个内含子。然而,我们首次在海绵动物的核基因组中发现了稀有e - ii类内含子的假基因拷贝,这表明海绵动物核基因组中存在ii类内含子及其假基因转位的模式。我们的研究结果表明,ii族内含子一定存在于两个Eunapius mt基因组的最后共同祖先中,随后在E. fragilis中丢失,而不是独立获得。因此,我们的研究结果解释了内含子在海绵中不均匀分布的原因,除了多次获得外,内含子还经常丢失。
{"title":"First Report on Presence of Mitochondrial Introns in Freshwater Sponges, and Pseudogenic Evidence of Their Loss.","authors":"Zhen Zhao, Junye Ma, Qun Yang, Gert Wörheide, Dirk Erpenbeck","doi":"10.1007/s00239-025-10289-x","DOIUrl":"https://doi.org/10.1007/s00239-025-10289-x","url":null,"abstract":"<p><p>Mitochondrial introns have a patchy distribution in sponge lineages. Here, we report on the finding of a group-II-intron in Eunapius rarus (Demospongiae, Spongillidae), which constitutes the first report of a mitochondrial intron in freshwater sponges. Group-II-introns are self-splicing ribozymes, and are particularly rare among sponge mitochondrial genomes. The intron contains complete open reading frames (ORFs), including typical intron-encoded proteins (IEPs). Phylogenetic analysis reveals that the intron is more closely related to those found in brown algae, and distant from other sponge group-II-introns, indicating an acquisition of this intron independent from other sponges. Remarkably, the congeneric E. fragilis does not possess this intron in their mitochondrial genome. However, we found pseudogenic copies of the E. rarus group-II-intron in the nuclear genome of E. fragilis, which indicates patterns of group-II-intron presence and their pseudogene transposition into the nuclear genomes in sponges for the first time. Our results show that a group-II-intron must have been present in the last common ancestor of both Eunapius mt genomes, and subsequently lost in E. fragilis, rather than independent acquisition. Consequently, our findings provide an explanation for the patchy distribution of introns in sponges as a result of frequent losses, besides multiple acquisitions.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145667938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Profylo: A Python Package for Phylogenetic Profile Comparison and Analysis. Profylo:用于系统发育概况比较和分析的Python包。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-10-29 DOI: 10.1007/s00239-025-10280-6
Martin Schoenstein, Pauline Mermillod, Arnaud Kress, Odile Lecompte, Yannis Nevers

Phylogenetic profiling, involving the analysis of presence-absence of orthologs in a set of species, is a way to infer functional association between genes through co-evolutionary patterns. Since its inception, numerous methods have been described to construct phylogenetic profiles, evaluate their similarity, or identify clusters of co-evolving genes. However, few of these methods are available as downloadable software. We present Profylo, a phylogenetic profiling toolkit made available as an open-source Python 3.0 package. Profylo implements seven methods for comparing phylogenetic profiling, four algorithms for identification of co-evolving clusters, as well as tools to help with their analysis, including visualization features. We take advantage of the variety of methods implemented in Profylo to benchmark their ability to predict functional relationships between human genes, using different datasets. Finally, we demonstrate the utility of the package with an example case study of the presence-absence of all protein-coding genes in the human genome. Profylo is available on GitHub at https://github.com/MartinSchoenstein/Profylo .

系统发育分析是一种通过共同进化模式推断基因之间功能关联的方法,涉及对一组物种中直系同源物存在与否的分析。自其成立以来,已经描述了许多方法来构建系统发育概况,评估它们的相似性,或识别共同进化基因的集群。然而,这些方法中很少有可下载的软件。我们介绍了Profylo,这是一个系统发育分析工具包,作为开源Python 3.0包提供。Profylo实现了7种比较系统发育分析的方法,4种识别共同进化集群的算法,以及帮助进行分析的工具,包括可视化功能。我们利用Profylo中实现的各种方法,使用不同的数据集对其预测人类基因之间功能关系的能力进行基准测试。最后,我们用人类基因组中所有蛋白质编码基因的存在-缺失的示例案例研究证明了该包装的实用性。Profylo可在GitHub上获得https://github.com/MartinSchoenstein/Profylo。
{"title":"Profylo: A Python Package for Phylogenetic Profile Comparison and Analysis.","authors":"Martin Schoenstein, Pauline Mermillod, Arnaud Kress, Odile Lecompte, Yannis Nevers","doi":"10.1007/s00239-025-10280-6","DOIUrl":"10.1007/s00239-025-10280-6","url":null,"abstract":"<p><p>Phylogenetic profiling, involving the analysis of presence-absence of orthologs in a set of species, is a way to infer functional association between genes through co-evolutionary patterns. Since its inception, numerous methods have been described to construct phylogenetic profiles, evaluate their similarity, or identify clusters of co-evolving genes. However, few of these methods are available as downloadable software. We present Profylo, a phylogenetic profiling toolkit made available as an open-source Python 3.0 package. Profylo implements seven methods for comparing phylogenetic profiling, four algorithms for identification of co-evolving clusters, as well as tools to help with their analysis, including visualization features. We take advantage of the variety of methods implemented in Profylo to benchmark their ability to predict functional relationships between human genes, using different datasets. Finally, we demonstrate the utility of the package with an example case study of the presence-absence of all protein-coding genes in the human genome. Profylo is available on GitHub at https://github.com/MartinSchoenstein/Profylo .</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"806-819"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756282/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145401055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying Hierarchical Conflicts in Homology Statements. 同调语句中层次冲突的量化。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-12-27 DOI: 10.1007/s00239-025-10281-5
Krister M Swenson, Afif Elghraoui, Faramarz Valafar, Siavash Mirarab, Mathias Weller
{"title":"Quantifying Hierarchical Conflicts in Homology Statements.","authors":"Krister M Swenson, Afif Elghraoui, Faramarz Valafar, Siavash Mirarab, Mathias Weller","doi":"10.1007/s00239-025-10281-5","DOIUrl":"10.1007/s00239-025-10281-5","url":null,"abstract":"","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"830-842"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145843964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes. NCBI同源物:跨真核生物基因组计算高精度同源物的公共资源和可扩展方法。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-09-25 DOI: 10.1007/s00239-025-10268-2
Dong-Ha Oh, Alexander Astashyn, Barbara Robbertse, Nuala A O'leary, W Ray Anderson, Laurie Breen, Eric Cox, Olga Ermolaeva, Robert Falk, Vichet Hem, J Bradley Holmes, Patrick Masterson, Kelly M McGarvey, Eyal Mozes, John P Torcivia, Mirian T N Tsuchiya, Craig Wallin, Françoise Thibaud-Nissen, Terence D Murphy, Vamsi K Kodali

Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed "NCBI Orthologs", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.

同源物是实现比较基因组学分析的基础,进一步加深了我们对真核生物的理解。高质量真核生物基因组的可用性前所未有的增加需要可扩展和准确的方法来进行同源推断。国家生物技术信息中心(NCBI)开发了“NCBI Orthologs”,这是一种资源和计算管道,旨在应对NCBI RefSeq框架内的这一挑战。该系统集成了蛋白质相似性,核苷酸比对和微合成,以实现跨多种真核生物的高精度同源分配。该管道利用高质量的RefSeq注释和单独处理基因组,确保可扩展性。由此产生的同源数据,组织成基因水平锚定集,使功能注释信息的传播和促进比较基因组学。至关重要的是,这些数据被整合到NCBI基因资源中,为用户提供从不同入口点访问的机会。NCBI数据集资源提供了一个直观的界面来探索web上的同源关系,并允许通过web、命令行工具和API批量下载数据。我们详细介绍了方法,包括锚种选择和决策树,用于获得高置信度的一对一正交关系。NCBI Orthologs是促进功能注释工作和增强我们对真核生物基因进化的理解的宝贵资源。
{"title":"NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes.","authors":"Dong-Ha Oh, Alexander Astashyn, Barbara Robbertse, Nuala A O'leary, W Ray Anderson, Laurie Breen, Eric Cox, Olga Ermolaeva, Robert Falk, Vichet Hem, J Bradley Holmes, Patrick Masterson, Kelly M McGarvey, Eyal Mozes, John P Torcivia, Mirian T N Tsuchiya, Craig Wallin, Françoise Thibaud-Nissen, Terence D Murphy, Vamsi K Kodali","doi":"10.1007/s00239-025-10268-2","DOIUrl":"10.1007/s00239-025-10268-2","url":null,"abstract":"<p><p>Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed \"NCBI Orthologs\", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"843-859"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756343/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145137838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-level Perspective on the Evolution of Orthologs and Their Functions. 正交词及其功能演化的多层次视角。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-10-13 DOI: 10.1007/s00239-025-10276-2
Felix Langschied, Ruben Iruegas, Mateusz Sikora, Roberto Covino, Ingo Ebersberger

Orthologs, evolutionarily related genes that diverged through speciation, are mutually the closest related sequences in different species. Consequently, they are ideal candidates for identifying functionally equivalent genes across taxa, a prerequisite for transferring gene function information from model to non-model organisms in silico. However, orthologs are not immune to functional divergence. Failing to recognize such divergent instances results in spurious functional annotation transfer. Here, we propose to treat the functional equivalence of orthologs as a null hypothesis that must be critically tested rather than assumed. This requires integrating several lines of evidence to evaluate both changes in an ortholog's network of molecular interactions and alterations in its biochemical activity. We outline how such activity shifts can be assessed using increasingly fine-grained analyses, including comparisons of protein feature architectures and predicted 3D structures. While some orthology resources incorporate aspects of this evidence, such assessments are often manual and not scalable. We argue for a systematic, multi-level perspective to detect functional divergence prior to annotation transfer. To support the broader adoption of this approach, we offer methodological recommendations and practical examples that demonstrate the value of this framework in large-scale comparative genomics.

同源基因是通过物种形成而分化的进化相关基因,是不同物种中相互关系最密切的序列。因此,它们是识别跨分类群功能等效基因的理想候选者,这是将基因功能信息从模型生物转移到非模式生物的先决条件。然而,同源词也不能避免功能分歧。如果不能识别这些不同的实例,就会导致虚假的功能注释转移。在这里,我们建议将同源物的功能等价视为必须经过严格检验而不是假设的零假设。这需要整合几条证据线来评估同源物分子相互作用网络的变化及其生化活性的变化。我们概述了如何使用越来越细粒度的分析来评估这种活动变化,包括比较蛋白质特征结构和预测的3D结构。虽然一些orthology资源包含了这些证据的各个方面,但这些评估通常是手动的,不可伸缩的。我们主张采用系统的、多层次的视角来检测注释转移之前的功能差异。为了支持这种方法的广泛采用,我们提供了方法学建议和实际例子,以证明该框架在大规模比较基因组学中的价值。
{"title":"A Multi-level Perspective on the Evolution of Orthologs and Their Functions.","authors":"Felix Langschied, Ruben Iruegas, Mateusz Sikora, Roberto Covino, Ingo Ebersberger","doi":"10.1007/s00239-025-10276-2","DOIUrl":"10.1007/s00239-025-10276-2","url":null,"abstract":"<p><p>Orthologs, evolutionarily related genes that diverged through speciation, are mutually the closest related sequences in different species. Consequently, they are ideal candidates for identifying functionally equivalent genes across taxa, a prerequisite for transferring gene function information from model to non-model organisms in silico. However, orthologs are not immune to functional divergence. Failing to recognize such divergent instances results in spurious functional annotation transfer. Here, we propose to treat the functional equivalence of orthologs as a null hypothesis that must be critically tested rather than assumed. This requires integrating several lines of evidence to evaluate both changes in an ortholog's network of molecular interactions and alterations in its biochemical activity. We outline how such activity shifts can be assessed using increasingly fine-grained analyses, including comparisons of protein feature architectures and predicted 3D structures. While some orthology resources incorporate aspects of this evidence, such assessments are often manual and not scalable. We argue for a systematic, multi-level perspective to detect functional divergence prior to annotation transfer. To support the broader adoption of this approach, we offer methodological recommendations and practical examples that demonstrate the value of this framework in large-scale comparative genomics.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"720-729"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145280385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OrthoXML-Tools: A Toolkit for Manipulating OrthoXML Files for Orthology Data. 用于操作Orthology数据的orthxml文件的工具包。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-09-26 DOI: 10.1007/s00239-025-10271-7
Ali Yazdizadeh Kharrazi, Adrian M Altenhoff, Nikolai Romashchenko, Christophe Dessimoz, Sina Majidian

The OrthoXML file is a standard file format for orthology data. It provides a standardized structure for describing orthologous and paralogous relationships while allowing the user to store custom and database-specific data in the same format. Although many orthology databases use it as a way to export data, there is no comprehensive toolkit for working with this format. Here, we introduce the OrthoXML-tools ( https://github.com/DessimozLab/orthoxml-tools ), a comprehensive toolkit for loading, manipulating, and exporting the OrthoXML files to other formats. We show its capabilities and performance on our benchmarks.

OrthoXML文件是用于正字学数据的标准文件格式。它提供了一个标准化的结构,用于描述同源和平行关系,同时允许用户以相同的格式存储自定义数据和特定于数据库的数据。尽管许多正字法数据库使用它作为导出数据的一种方式,但是没有用于处理这种格式的综合工具包。在这里,我们将介绍OrthoXML-tools (https://github.com/DessimozLab/orthoxml-tools),这是一个用于加载、操作和将OrthoXML文件导出为其他格式的综合工具包。我们在基准测试中展示了它的能力和性能。
{"title":"OrthoXML-Tools: A Toolkit for Manipulating OrthoXML Files for Orthology Data.","authors":"Ali Yazdizadeh Kharrazi, Adrian M Altenhoff, Nikolai Romashchenko, Christophe Dessimoz, Sina Majidian","doi":"10.1007/s00239-025-10271-7","DOIUrl":"10.1007/s00239-025-10271-7","url":null,"abstract":"<p><p>The OrthoXML file is a standard file format for orthology data. It provides a standardized structure for describing orthologous and paralogous relationships while allowing the user to store custom and database-specific data in the same format. Although many orthology databases use it as a way to export data, there is no comprehensive toolkit for working with this format. Here, we introduce the OrthoXML-tools ( https://github.com/DessimozLab/orthoxml-tools ), a comprehensive toolkit for loading, manipulating, and exporting the OrthoXML files to other formats. We show its capabilities and performance on our benchmarks.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"800-805"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstructing Evolutionary Histories with Hierarchical Orthologous Groups. 用等级制同源群重构进化史。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-11-21 DOI: 10.1007/s00239-025-10277-1
Garance Sarton-Lohéac, Nikolai Romashchenko, Clément Marie Train, Sina Majidian, Natasha Glover

With the rapid advancement of large-scale sequencing initiatives, the need for efficient and accurate methods for inferring orthologous and paralogous relationships has never been more critical. Hierarchical orthologous groups (HOGs) provide a powerful solution to this challenge, offering a precise, scalable framework to study gene families and their evolutionary histories across diverse species. In this review, we introduce the concept of HOGs and explore their advantages over traditional methods. Next, we highlight key applications of HOGs, including their use in representing gene families, inferring ancestral genomes, tracking gene gain and loss events, functional annotation, and phylogenetic profiling. We overview the process of constructing HOGs and discuss the challenges and limitations of HOG inference. The HOG framework provides a clear and structured approach to organizing homologous genes, making it possible to gain deeper insights into gene family and species evolution.

随着大规模测序计划的快速发展,对有效和准确的方法来推断同源和旁系关系的需求从未如此迫切。等级同源类群(hog)为这一挑战提供了一个强有力的解决方案,提供了一个精确的、可扩展的框架来研究不同物种的基因家族及其进化史。本文介绍了hog的概念,并探讨了其相对于传统方法的优势。接下来,我们重点介绍了hog的主要应用,包括它们在表示基因家族、推断祖先基因组、跟踪基因获得和丢失事件、功能注释和系统发育分析方面的应用。我们概述了构建HOG的过程,并讨论了HOG推理的挑战和局限性。HOG框架提供了一个清晰和结构化的方法来组织同源基因,使深入了解基因家族和物种进化成为可能。
{"title":"Reconstructing Evolutionary Histories with Hierarchical Orthologous Groups.","authors":"Garance Sarton-Lohéac, Nikolai Romashchenko, Clément Marie Train, Sina Majidian, Natasha Glover","doi":"10.1007/s00239-025-10277-1","DOIUrl":"10.1007/s00239-025-10277-1","url":null,"abstract":"<p><p>With the rapid advancement of large-scale sequencing initiatives, the need for efficient and accurate methods for inferring orthologous and paralogous relationships has never been more critical. Hierarchical orthologous groups (HOGs) provide a powerful solution to this challenge, offering a precise, scalable framework to study gene families and their evolutionary histories across diverse species. In this review, we introduce the concept of HOGs and explore their advantages over traditional methods. Next, we highlight key applications of HOGs, including their use in representing gene families, inferring ancestral genomes, tracking gene gain and loss events, functional annotation, and phylogenetic profiling. We overview the process of constructing HOGs and discuss the challenges and limitations of HOG inference. The HOG framework provides a clear and structured approach to organizing homologous genes, making it possible to gain deeper insights into gene family and species evolution.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"740-764"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145564275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OrthoGrafter: Rapid Identification of Orthologs from Precomputed Placement in Phylogenetic Trees. 在系统发育树中预先计算位置的快速识别。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-11-22 DOI: 10.1007/s00239-025-10279-z
Christopher M Williams, Paul D Thomas

The identification of orthologs plays an important role in comparative genomics and function inference. Here, we present OrthoGrafter, a bioinformatics tool for taking one or more query sequences and inferring a set of orthologs drawn from the collection of 143 well annotated species in the PANTHER database of reconciled gene trees. OrthoGrafter takes sets of graft points output by the highly used TreeGrafter software (either the standalone package or InterProScan), and for each one, outputs a list of predicted orthologous genes from the grafted PANTHER tree (with the ability to additionally output paralog and xenolog sets). If the taxonomic identifier for the query is also provided, OrthoGrafter incorporates the novel step of adjusting the graft point to facilitate consistent taxonomic assignment for the graft within the reconciled gene family, which we demonstrate shows an improvement in the ortholog inference via correlation with orthologs provided by the OMA database. Lightweight and utilizing precomputed results to enable rapid determination of ortholog predictions for large sample groups, OrthoGrafter is available at https://github.com/pantherdb/OrthoGrafter. .

同源物的鉴定在比较基因组学和功能推断中起着重要的作用。在这里,我们提出了OrthoGrafter,这是一种生物信息学工具,用于从PANTHER数据库中协调基因树的143个注释良好的物种中提取一个或多个查询序列并推断出一组同源物。OrthoGrafter采用高度使用的TreeGrafter软件(独立包或InterProScan)输出的移植点集,对于每个移植点,从嫁接的PANTHER树输出预测的同源基因列表(具有额外输出平行和异种集的能力)。如果还提供了查询的分类标识符,那么OrthoGrafter将包含调整移植物点的新步骤,以促进在和解基因家族中移植物的一致分类分配,我们证明了通过与OMA数据库提供的直系同源的相关性来改进直系同源推断。重量轻,并利用预先计算的结果,使快速确定同源预测的大样本组,OrthoGrafter可在https://github.com/pantherdb/OrthoGrafter。
{"title":"OrthoGrafter: Rapid Identification of Orthologs from Precomputed Placement in Phylogenetic Trees.","authors":"Christopher M Williams, Paul D Thomas","doi":"10.1007/s00239-025-10279-z","DOIUrl":"10.1007/s00239-025-10279-z","url":null,"abstract":"<p><p>The identification of orthologs plays an important role in comparative genomics and function inference. Here, we present OrthoGrafter, a bioinformatics tool for taking one or more query sequences and inferring a set of orthologs drawn from the collection of 143 well annotated species in the PANTHER database of reconciled gene trees. OrthoGrafter takes sets of graft points output by the highly used TreeGrafter software (either the standalone package or InterProScan), and for each one, outputs a list of predicted orthologous genes from the grafted PANTHER tree (with the ability to additionally output paralog and xenolog sets). If the taxonomic identifier for the query is also provided, OrthoGrafter incorporates the novel step of adjusting the graft point to facilitate consistent taxonomic assignment for the graft within the reconciled gene family, which we demonstrate shows an improvement in the ortholog inference via correlation with orthologs provided by the OMA database. Lightweight and utilizing precomputed results to enable rapid determination of ortholog predictions for large sample groups, OrthoGrafter is available at https://github.com/pantherdb/OrthoGrafter. .</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"820-829"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756373/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145582097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quest for Orthologs in the era of Data Deluge and AI: Challenges and Innovations in Orthology Prediction and Data Integration. 数据泛滥和人工智能时代对正字学的探索:正字学预测和数据集成的挑战与创新。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-10-14 DOI: 10.1007/s00239-025-10272-6
Sina Majidian, Armin Hadziahmetovic, Felix Langschied, Stefano Pascarelli, Silvia Prieto-Baños, Jorge Rojas-Vargas, Edward L Braun, Christophe Dessimoz, Abdoulaye Baniré Diallo, Dannie Durand, Gang Fang, Toni Gabaldón, Natasha Glover, David A Liberles, Claire McWhite, Erik L L Sonnhammer, Paul D Thomas, Aïda Ouangraoua, Irene Julca

The rapid advancement of DNA sequencing technologies and computational algorithms has led to an unprecedented surge in genomic data, driven by several large-scale sequencing projects worldwide. Orthology plays a crucial role in understanding evolutionary patterns of genes and their functions. At the last Quest for Orthologs meeting (Montréal, Canada-2024), we discussed recent advances in orthology inference, with a focus on the impact of artificial intelligence (AI), protein structures, RNA splicing isoforms, and protein domain evolution together with other evolutionary considerations. A long-standing challenge in the field is the functional annotation of paralogs, for which we present novel approaches. The meeting also emphasised strategies for integrating diverse genetic features into the concept of orthology, encouraging frameworks that account for elements like alternative splicing, domain organisation, and regulatory sequences. We discuss various applications of orthology and paralogy to environmental research, agriculture, and comparative genomics. Additionally, we report recent progress in orthology inference methodologies and resources. This work represents a collaborative synthesis of insights and innovations presented at the 8th Quest for Orthologs meeting, highlighting current progress while outlining future directions for orthology research.

DNA测序技术和计算算法的快速发展导致了基因组数据前所未有的激增,这是由全球几个大规模测序项目推动的。正形学在理解基因的进化模式及其功能方面起着至关重要的作用。在最近的Quest for Orthologs会议上(montracimal, Canada-2024),我们讨论了Orthologs推断的最新进展,重点是人工智能(AI)、蛋白质结构、RNA剪接异构体、蛋白质结构域进化以及其他进化考虑的影响。该领域一个长期存在的挑战是类比的功能注释,为此我们提出了新的方法。会议还强调了将不同的遗传特征整合到同源学概念中的策略,鼓励考虑诸如选择性剪接、结构域组织和调控序列等因素的框架。我们讨论了在环境研究、农业和比较基因组学方面的不同应用。此外,我们报告了最近在正畸推理方法和资源方面的进展。这项工作代表了第八届正交学探索会议上提出的见解和创新的协作综合,突出了当前的进展,同时概述了正交学研究的未来方向。
{"title":"Quest for Orthologs in the era of Data Deluge and AI: Challenges and Innovations in Orthology Prediction and Data Integration.","authors":"Sina Majidian, Armin Hadziahmetovic, Felix Langschied, Stefano Pascarelli, Silvia Prieto-Baños, Jorge Rojas-Vargas, Edward L Braun, Christophe Dessimoz, Abdoulaye Baniré Diallo, Dannie Durand, Gang Fang, Toni Gabaldón, Natasha Glover, David A Liberles, Claire McWhite, Erik L L Sonnhammer, Paul D Thomas, Aïda Ouangraoua, Irene Julca","doi":"10.1007/s00239-025-10272-6","DOIUrl":"10.1007/s00239-025-10272-6","url":null,"abstract":"<p><p>The rapid advancement of DNA sequencing technologies and computational algorithms has led to an unprecedented surge in genomic data, driven by several large-scale sequencing projects worldwide. Orthology plays a crucial role in understanding evolutionary patterns of genes and their functions. At the last Quest for Orthologs meeting (Montréal, Canada-2024), we discussed recent advances in orthology inference, with a focus on the impact of artificial intelligence (AI), protein structures, RNA splicing isoforms, and protein domain evolution together with other evolutionary considerations. A long-standing challenge in the field is the functional annotation of paralogs, for which we present novel approaches. The meeting also emphasised strategies for integrating diverse genetic features into the concept of orthology, encouraging frameworks that account for elements like alternative splicing, domain organisation, and regulatory sequences. We discuss various applications of orthology and paralogy to environmental research, agriculture, and comparative genomics. Additionally, we report recent progress in orthology inference methodologies and resources. This work represents a collaborative synthesis of insights and innovations presented at the 8th Quest for Orthologs meeting, highlighting current progress while outlining future directions for orthology research.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"702-719"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145286362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Pretrained Protein Language Model Embeddings as Proxies for Functional Similarity. 评估预训练的蛋白质语言模型嵌入作为功能相似度的代理。
IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-12-01 Epub Date: 2025-11-22 DOI: 10.1007/s00239-025-10282-4
Robert Shaw, Samuel D Love, Claire D McWhite

Protein Language Models (PLMs) have emerged as powerful tools for representing protein sequences. We explore how embeddings (numeric vector representations) from pretrained PLMs can serve as direct numeric proxies for protein structure and function without requiring additional training or fine-tuning. In a proof-of-concept study of 22 cross-species complementation triplets-a gold standard for functional similarity where genes from one species are tested for their ability to rescue gene deletions in another species-we find that ESM-C 600 M embeddings summarized into pooled sliced-Wasserstein embeddings achieved high discrimination of subtle functional differences. This pooling method captures distributional properties of amino acid embeddings by comparing them against reference points using optimal transport theory. While our limited sample size precludes definitive conclusions about whether PLM embeddings systematically outperform sequence-based methods in detecting protein functional similarity, our preliminary results demonstrate the potential of using protein embeddings for functional analysis. Our exploratory analysis of orthology relationships suggests that embedding similarity may correlate with functional conservation, with the least diverged ortholog showing higher embedding similarity in approximately two-thirds of cases. Analyzing the Ortholog Conjecture-that orthologs maintain greater functional similarity than paralogs at equivalent sequence divergence-we do not observe clear differences between one-to-one orthologs and inparalog embedding similarities. Finally, we propose integrating PLMs with phylogenetic methods in a hybrid approach that leverages their complementary strengths: PLM-derived numeric embeddings for rapid homology detection and phylogenetics for evolutionary precision. We introduce embedding-tree versus gene-tree discordance as a potential metric to detect functional divergence between closely related proteins. Integrating protein embeddings with sequence analysis may enable a more nuanced understanding of protein function and evolutionary dynamics.

蛋白质语言模型(PLMs)已成为表达蛋白质序列的有力工具。我们探索了如何从预训练的plm中嵌入(数字向量表示)作为蛋白质结构和功能的直接数字代理,而无需额外的训练或微调。在一项对22个跨物种互补三联体的概念验证研究中,我们发现esm - c600m嵌入被总结为混合切片瓦瑟斯坦嵌入,可以对细微的功能差异进行高度区分。这是功能相似性的黄金标准,来自一个物种的基因可以用来测试它们拯救另一个物种基因缺失的能力。这种池化方法通过使用最优输运理论将它们与参考点进行比较来捕获氨基酸嵌入的分布特性。虽然我们有限的样本量排除了关于PLM嵌入是否在检测蛋白质功能相似性方面系统地优于基于序列的方法的明确结论,但我们的初步结果证明了使用蛋白质嵌入进行功能分析的潜力。我们对同源关系的探索性分析表明,嵌入相似度可能与功能守恒有关,在大约三分之二的情况下,分化最小的同源度显示出较高的嵌入相似度。分析正交猜想——在相等的序列散度下,正交保持比平行相似更大的功能相似性——我们没有观察到一对一正交和平行嵌入相似性之间的明显差异。最后,我们建议将plm与系统发育方法以一种混合的方式整合,利用它们的互补优势:plm衍生的数字嵌入用于快速同源性检测,系统发育用于进化精度。我们引入嵌入树与基因树不一致性作为检测密切相关蛋白质之间功能差异的潜在度量。将蛋白质嵌入与序列分析相结合,可以更细致地了解蛋白质的功能和进化动力学。
{"title":"Evaluating Pretrained Protein Language Model Embeddings as Proxies for Functional Similarity.","authors":"Robert Shaw, Samuel D Love, Claire D McWhite","doi":"10.1007/s00239-025-10282-4","DOIUrl":"10.1007/s00239-025-10282-4","url":null,"abstract":"<p><p>Protein Language Models (PLMs) have emerged as powerful tools for representing protein sequences. We explore how embeddings (numeric vector representations) from pretrained PLMs can serve as direct numeric proxies for protein structure and function without requiring additional training or fine-tuning. In a proof-of-concept study of 22 cross-species complementation triplets-a gold standard for functional similarity where genes from one species are tested for their ability to rescue gene deletions in another species-we find that ESM-C 600 M embeddings summarized into pooled sliced-Wasserstein embeddings achieved high discrimination of subtle functional differences. This pooling method captures distributional properties of amino acid embeddings by comparing them against reference points using optimal transport theory. While our limited sample size precludes definitive conclusions about whether PLM embeddings systematically outperform sequence-based methods in detecting protein functional similarity, our preliminary results demonstrate the potential of using protein embeddings for functional analysis. Our exploratory analysis of orthology relationships suggests that embedding similarity may correlate with functional conservation, with the least diverged ortholog showing higher embedding similarity in approximately two-thirds of cases. Analyzing the Ortholog Conjecture-that orthologs maintain greater functional similarity than paralogs at equivalent sequence divergence-we do not observe clear differences between one-to-one orthologs and inparalog embedding similarities. Finally, we propose integrating PLMs with phylogenetic methods in a hybrid approach that leverages their complementary strengths: PLM-derived numeric embeddings for rapid homology detection and phylogenetics for evolutionary precision. We introduce embedding-tree versus gene-tree discordance as a potential metric to detect functional divergence between closely related proteins. Integrating protein embeddings with sequence analysis may enable a more nuanced understanding of protein function and evolutionary dynamics.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":"765-776"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145581999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Molecular Evolution
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1