Identifying recurrent changes in biological sequences is important to multiple aspects of biological research-from understanding the molecular basis of convergent phenotypes, to pinpointing the causative sequence changes that give rise to antibiotic resistance and disease. Here, we present RECUR, a method for identifying recurrent amino acid substitutions from multiple sequence alignments that is fast, easy to use, and scalable to thousands of sequences. We demonstrate that RECUR's recurrence detection achieves 100% accuracy on simulated data with known evolutionary histories. We further show that RECUR is robust to realistic levels of tree inference error. Finally, we apply RECUR to a large set of surface glycoprotein (S) protein sequences from SARS-CoV-2. This analysis identified widespread recurrent evolution throughout the protein with significant enrichment in the exposed receptor-binding S1 subunit and at the interface with the human angiotensin-converting enzyme 2 (hACE2). In contrast, recurrent substitutions were depleted at the trimeric interface of the S protein. In silico modelling showed that recurrent substitutions had no directional effect on stability at either interface, but effects at the hACE2 interface were significantly more variable. Multiple substitutions with large destabilizing effects on hACE2 binding have been linked to immune escape, while others represented reversions back to the reference sequence, suggesting that recurrent evolution at this interface reflects opposing selective pressures balancing receptor binding with immune evasion. A standalone implementation of the algorithm is available under the GPLv3 license at https://github.com/OrthoFinder/RECUR.
{"title":"RECUR: identifying recurrent amino acid substitutions from multiple sequence alignments.","authors":"Elizabeth H J Robbins, Yi Liu, Steven Kelly","doi":"10.1093/molbev/msag036","DOIUrl":"10.1093/molbev/msag036","url":null,"abstract":"<p><p>Identifying recurrent changes in biological sequences is important to multiple aspects of biological research-from understanding the molecular basis of convergent phenotypes, to pinpointing the causative sequence changes that give rise to antibiotic resistance and disease. Here, we present RECUR, a method for identifying recurrent amino acid substitutions from multiple sequence alignments that is fast, easy to use, and scalable to thousands of sequences. We demonstrate that RECUR's recurrence detection achieves 100% accuracy on simulated data with known evolutionary histories. We further show that RECUR is robust to realistic levels of tree inference error. Finally, we apply RECUR to a large set of surface glycoprotein (S) protein sequences from SARS-CoV-2. This analysis identified widespread recurrent evolution throughout the protein with significant enrichment in the exposed receptor-binding S1 subunit and at the interface with the human angiotensin-converting enzyme 2 (hACE2). In contrast, recurrent substitutions were depleted at the trimeric interface of the S protein. In silico modelling showed that recurrent substitutions had no directional effect on stability at either interface, but effects at the hACE2 interface were significantly more variable. Multiple substitutions with large destabilizing effects on hACE2 binding have been linked to immune escape, while others represented reversions back to the reference sequence, suggesting that recurrent evolution at this interface reflects opposing selective pressures balancing receptor binding with immune evasion. A standalone implementation of the algorithm is available under the GPLv3 license at https://github.com/OrthoFinder/RECUR.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12930092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146157787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ornamental medaka strains derived from wild Japanese medaka (Oryzias latipes species complex) are bred worldwide. Over 200 years of selective breeding have produced over 700 strains with a wide variety of phenotypes, including diverse body coloration, scales, eyeball morphology, and fin and body shapes. In this study, we first identified and described 34 phenotypes in ornamental medaka strains. To understand the genomic basis of this phenotypic diversity and the domestication process, we performed whole-genome sequencing on 181 individuals of 86 ornamental Japanese medaka strains. Population genomic analyses revealed that modern ornamental medaka strains are genetically closer to the wild Southern Japan population of the Kansai-Setouchi regions, suggesting the origin of ornamental strains. In addition, the gene loci poc1a, tyr, nme2a, and gabrr2b have undergone selection during domestication. We performed genome-wide association studies analysis for 29 phenotypes observed in ornamental medaka strains and identified strong candidate genes for some phenotypes, including kcnq5a for hirenaga and swallow, bmp5 for deme, adcy5 for orochi, and kitlga for aurora, respectively. We found that loss of exon 8 of adcy5 caused melanism, a dark body color phenotype, in medaka, providing a molecular insight into this phenomenon in vertebrates and human familial dyskinesia. In addition, we uncovered the predominant candidate peaks of genome-wide association studies, including a total of 3,328 genes associated with 26 phenotypes. Our findings highlight the potential of population genomics to explore genotype-phenotype correlations and the genomic basis of body coloration and morphogenesis in medaka.
观赏樱草系是由野生日本樱草(Oryzias latipes species complex)衍生而来。经过200多年的选择性育种,已经产生了700多种具有多种表型的菌株,包括各种身体颜色,鳞片,眼球形态,鳍和身体形状。在这项研究中,我们首次鉴定和描述了观赏medaka菌株的34种表型。为了了解这种表型多样性的基因组基础和驯化过程,我们对86个观赏日本medaka品系的181个个体进行了全基因组测序。群体基因组分析表明,现代观赏medaka菌株在遗传上更接近关西濑户内地区的野生日本南部种群,这表明观赏菌株的起源。此外,基因位点poc1a、tyr、nme2a和gabrr2b在驯化过程中也经历了选择。我们对观赏medaka菌株中观察到的29种表型进行了GWAS分析,发现了一些表型的强候选基因,包括hirenaga和swallow的kcnq5a, deme的bmp5, orochi的adcy5和aurora的kitlga。我们发现adcy5外显子8的缺失导致了medaka的黑化,一种深色体色表型,这为脊椎动物和人类固有运动障碍的这种现象提供了分子视角。此外,我们还发现了GWAS的主要候选峰,包括与26种表型相关的3328个基因。我们的研究结果突出了群体基因组学在探索medaka的基因型-表型相关性以及身体颜色和形态发生的基因组基础方面的潜力。
{"title":"Genomic consequences of domestication and the diversification of body coloration and morphology in ornamental medaka strains.","authors":"Tetsuo Kon, Rui Tang, Koto Kon-Nanjo, Soma Tomihara, Soichiro Fushiki, Wakana Fujii, Mifuyu Sera, Yusuke Takehana, Hideki Noguchi, Atsushi Toyoda, Kiyoshi Naruse, Yoshihiro Omori","doi":"10.1093/molbev/msag021","DOIUrl":"10.1093/molbev/msag021","url":null,"abstract":"<p><p>Ornamental medaka strains derived from wild Japanese medaka (Oryzias latipes species complex) are bred worldwide. Over 200 years of selective breeding have produced over 700 strains with a wide variety of phenotypes, including diverse body coloration, scales, eyeball morphology, and fin and body shapes. In this study, we first identified and described 34 phenotypes in ornamental medaka strains. To understand the genomic basis of this phenotypic diversity and the domestication process, we performed whole-genome sequencing on 181 individuals of 86 ornamental Japanese medaka strains. Population genomic analyses revealed that modern ornamental medaka strains are genetically closer to the wild Southern Japan population of the Kansai-Setouchi regions, suggesting the origin of ornamental strains. In addition, the gene loci poc1a, tyr, nme2a, and gabrr2b have undergone selection during domestication. We performed genome-wide association studies analysis for 29 phenotypes observed in ornamental medaka strains and identified strong candidate genes for some phenotypes, including kcnq5a for hirenaga and swallow, bmp5 for deme, adcy5 for orochi, and kitlga for aurora, respectively. We found that loss of exon 8 of adcy5 caused melanism, a dark body color phenotype, in medaka, providing a molecular insight into this phenomenon in vertebrates and human familial dyskinesia. In addition, we uncovered the predominant candidate peaks of genome-wide association studies, including a total of 3,328 genes associated with 26 phenotypes. Our findings highlight the potential of population genomics to explore genotype-phenotype correlations and the genomic basis of body coloration and morphogenesis in medaka.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12915790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146086425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Mackintosh, Maxence Brault, Denis Roze, Martin Lascoux, Sylvain Glémin
The effect of natural selection on linked sites has been suggested to be a major determinant of genetic diversity. While it is in principle possible to estimate this effect from genome sequence data, interactions between selection, demography and inbreeding are expected to make inference less reliable. Here, we investigate whether the genome-wide reduction in diversity due to background selection (B¯) can be accurately estimated when populations are at demographic non-equilibrium and/or reproduce by partial self-fertilization. We show that the classic-BGS model is surprisingly robust to both demographic non-equilibrium and low rates of selfing, although both processes do lead to biased estimation of the distribution of fitness effects (DFE) of deleterious mutations. A high rate of selfing leads to poor estimation of both B¯ and DFE parameters. We propose an alternative approach where background selection, demography and partial selfing are jointly estimated from windowed site frequency spectra. This approach resolves most of the bias observed under the classic-BGS model and can also generate estimates of past demography that account for the effect of background selection and partial selfing. We apply the approach to genome sequence data from Capsella grandiflora and Capsella orientalis, which have contrasting mating systems and display a forty-fold difference in nucleotide diversity. Our results suggest that background selection has a weak effect on levels of genetic diversity in the outcrosser C. grandiflora (B¯=0.89) and a more substantial effect in the predominantly selfing species C. orientalis (B¯=0.44), but that background selection alone cannot explain their disparity in genetic diversity.
{"title":"Estimating the Reduction in Genetic Diversity from Background Selection under Non-equilibrium Demography and Partial Selfing.","authors":"Alexander Mackintosh, Maxence Brault, Denis Roze, Martin Lascoux, Sylvain Glémin","doi":"10.1093/molbev/msag004","DOIUrl":"10.1093/molbev/msag004","url":null,"abstract":"<p><p>The effect of natural selection on linked sites has been suggested to be a major determinant of genetic diversity. While it is in principle possible to estimate this effect from genome sequence data, interactions between selection, demography and inbreeding are expected to make inference less reliable. Here, we investigate whether the genome-wide reduction in diversity due to background selection (B¯) can be accurately estimated when populations are at demographic non-equilibrium and/or reproduce by partial self-fertilization. We show that the classic-BGS model is surprisingly robust to both demographic non-equilibrium and low rates of selfing, although both processes do lead to biased estimation of the distribution of fitness effects (DFE) of deleterious mutations. A high rate of selfing leads to poor estimation of both B¯ and DFE parameters. We propose an alternative approach where background selection, demography and partial selfing are jointly estimated from windowed site frequency spectra. This approach resolves most of the bias observed under the classic-BGS model and can also generate estimates of past demography that account for the effect of background selection and partial selfing. We apply the approach to genome sequence data from Capsella grandiflora and Capsella orientalis, which have contrasting mating systems and display a forty-fold difference in nucleotide diversity. Our results suggest that background selection has a weak effect on levels of genetic diversity in the outcrosser C. grandiflora (B¯=0.89) and a more substantial effect in the predominantly selfing species C. orientalis (B¯=0.44), but that background selection alone cannot explain their disparity in genetic diversity.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12902155/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuoran Kuang, Xiaojie Yang, Na Wan, Jiaqi Chen, Qijiao Duan, Bowen Li, Xi Liu, Xiaolong Liang, Xinfeng Liu, Wenyu Liu, Eviatar Nevo, Kexin Li
Chromosomal fusion and fission are widespread across species, yet the underlying genomic mechanisms and their evolutionary implications remain poorly understood. Here, we present high-quality chromosome-level genome assemblies for two closely related subterranean rodent species, Eospalax rufescens and E. rothschildi. Through comparative genomic and synteny analyses, we identified two species-specific chromosomal fusions in E. rothschildi, likely mediated by ectopic recombination through repetitive elements and by mutations affecting genome stability. Despite minimal changes in base-level genomic features, the fused chromosomes are associated with altered three-dimensional (3D) chromatin architecture, including increased chromatin entropy, topologically associating domain (TAD) rearrangement, and compartment switching. Reduced gene flow on the fused chromosomes suggests a role in reproductive isolation. Additionally, molecular signals of relaxed selection and adaptive evolution in pathways related to DNA repair, chromatin dynamics, and environmental sensing highlight the interplay between structural and ecological factors in shaping divergence. Together, our findings provide a mechanistic and evolutionary framework linking chromosomal fusions with genome architecture remodeling, epigenetic changes, and barriers to gene flow in mammals, offering a valuable resource for future evolutionary genomics studies.
{"title":"Genomic insights into chromosomal fusion and its evolutionary implications for zokors.","authors":"Zhuoran Kuang, Xiaojie Yang, Na Wan, Jiaqi Chen, Qijiao Duan, Bowen Li, Xi Liu, Xiaolong Liang, Xinfeng Liu, Wenyu Liu, Eviatar Nevo, Kexin Li","doi":"10.1093/molbev/msag032","DOIUrl":"10.1093/molbev/msag032","url":null,"abstract":"<p><p>Chromosomal fusion and fission are widespread across species, yet the underlying genomic mechanisms and their evolutionary implications remain poorly understood. Here, we present high-quality chromosome-level genome assemblies for two closely related subterranean rodent species, Eospalax rufescens and E. rothschildi. Through comparative genomic and synteny analyses, we identified two species-specific chromosomal fusions in E. rothschildi, likely mediated by ectopic recombination through repetitive elements and by mutations affecting genome stability. Despite minimal changes in base-level genomic features, the fused chromosomes are associated with altered three-dimensional (3D) chromatin architecture, including increased chromatin entropy, topologically associating domain (TAD) rearrangement, and compartment switching. Reduced gene flow on the fused chromosomes suggests a role in reproductive isolation. Additionally, molecular signals of relaxed selection and adaptive evolution in pathways related to DNA repair, chromatin dynamics, and environmental sensing highlight the interplay between structural and ecological factors in shaping divergence. Together, our findings provide a mechanistic and evolutionary framework linking chromosomal fusions with genome architecture remodeling, epigenetic changes, and barriers to gene flow in mammals, offering a valuable resource for future evolutionary genomics studies.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12911924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146105008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Xue, Lei Tao, Junwei Cao, Liang Wu, Guang Li, Cai Li
Germline de novo mutations (DNMs) are the ultimate source of heritable variation, yet their patterns in highly heterozygous genomes remain poorly understood. Amphioxus, an early-branching chordate with exceptionally high genomic heterozygosity (3.2% to 4.2% in sequenced species), offers a unique model to explore mutational dynamics in such contexts. It is unclear whether high heterozygosity in amphioxus is due to a large effective population size, an increased mutation rate, or both. Here, we perform deep short-read whole-genome sequencing of a two-generation pedigree of the amphioxus Branchiostoma floridae comprising two parents and 104 offspring and develop a framework based on allele-aware parental assemblies as the reference to accurately identify DNMs. We detect 242 high-confidence DNMs, yielding a genome-wide mutation rate of 5.89 × 10-9 per base per generation, which is comparable to that of vertebrates. Combining this estimate with observed nucleotide diversity, we obtain an effective population size of ∼1.7 million, indicating that the elevated heterozygosity mainly results from a large effective population size. We observe no sex bias when considering all DNMs but a paternal-origin bias for early-occurring ones. Amphioxus harbors a much smaller fraction of CpG>TpG DNMs relative to vertebrates, attributable to its low methylation levels. We also investigate putative postzygotic mutations in the offspring, revealing an unexpected paternal-origin bias. These suggest some distinct mutational mechanisms in amphioxus. Our study not only provides the first DNM measurement for amphioxus but also offers a generalizable strategy for studying DNMs in highly heterozygous genomes, facilitating mutation rate studies across chordates and other lineages.
{"title":"Germline de novo mutation rate of the highly heterozygous amphioxus genome.","authors":"Jing Xue, Lei Tao, Junwei Cao, Liang Wu, Guang Li, Cai Li","doi":"10.1093/molbev/msag017","DOIUrl":"10.1093/molbev/msag017","url":null,"abstract":"<p><p>Germline de novo mutations (DNMs) are the ultimate source of heritable variation, yet their patterns in highly heterozygous genomes remain poorly understood. Amphioxus, an early-branching chordate with exceptionally high genomic heterozygosity (3.2% to 4.2% in sequenced species), offers a unique model to explore mutational dynamics in such contexts. It is unclear whether high heterozygosity in amphioxus is due to a large effective population size, an increased mutation rate, or both. Here, we perform deep short-read whole-genome sequencing of a two-generation pedigree of the amphioxus Branchiostoma floridae comprising two parents and 104 offspring and develop a framework based on allele-aware parental assemblies as the reference to accurately identify DNMs. We detect 242 high-confidence DNMs, yielding a genome-wide mutation rate of 5.89 × 10-9 per base per generation, which is comparable to that of vertebrates. Combining this estimate with observed nucleotide diversity, we obtain an effective population size of ∼1.7 million, indicating that the elevated heterozygosity mainly results from a large effective population size. We observe no sex bias when considering all DNMs but a paternal-origin bias for early-occurring ones. Amphioxus harbors a much smaller fraction of CpG>TpG DNMs relative to vertebrates, attributable to its low methylation levels. We also investigate putative postzygotic mutations in the offspring, revealing an unexpected paternal-origin bias. These suggest some distinct mutational mechanisms in amphioxus. Our study not only provides the first DNM measurement for amphioxus but also offers a generalizable strategy for studying DNMs in highly heterozygous genomes, facilitating mutation rate studies across chordates and other lineages.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12862219/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145966555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The process of migration and colonization is important in evolution; for example, modern humans experienced multiple waves of migrations out of Africa. However, no data cover the spatio-temporal patterns sufficiently to be truly informative. Metastatic cancer provides a unique in vivo model to study these processes through rapid somatic evolution. Here, we apply the high-resolution sampling technique (Dense 3D Crypt-scale Sampling) to analyze hundreds of spatially mapped micro-samples from the primary colorectal cancer and liver metastases in two representative cases. This would be analogous to recording the "out-of-Africa" events in two repeats. Our results support that liver metastases arise from polyphyletic and polyclonal seeding events where multiple, genetically distinct clones colonize a new site together. Following colonization, these multi-clonal populations can evolve into distinct spatial architectures: segregated territories formed by cells with low motility, or highly intermixed patterns driven by high motility. The colonization (or seeding) process begins within the first third of the primary tumor's progression, creating a large number of widespread but clinically undetectable micrometastatic colonies. These findings support a model where metastatic competence is not an intrinsic trait of a single "winner" clone but an emergent property of multiple concurrent clones. Collectively, our work supports metastasis as a multi-stage process initiated early in tumor development, characterized by continuous polyclonal dissemination and the formation of spatially distinct clonal architectures. This general pattern may echo the ecology of migration and colonization in organismal evolution.
迁移和殖民化过程在进化中是重要的;例如,现代人类经历了多次走出非洲的迁徙浪潮。然而,没有任何数据足以涵盖时空格局,足以提供真正的信息。转移性癌症提供了一个独特的体内模型,通过快速体细胞进化来研究这些过程。在这里,我们应用高分辨率采样技术(Dense 3D Crypt-scale sampling)分析了来自两个代表性病例的原发性结直肠癌和肝转移瘤的数百个空间映射微样本。这类似于将“走出非洲”的事件重复记录两次。我们的研究结果支持肝转移发生于多系和多克隆播种事件,其中多个遗传上不同的克隆一起定殖到一个新的位点。在定植之后,这些多克隆种群可以进化成不同的空间结构:由低运动性细胞形成的隔离区域,或由高运动性驱动的高度混合模式。定植(或播种)过程开始于原发肿瘤进展的前三分之一,产生大量广泛但临床无法检测到的微转移菌落。这些发现支持了一个模型,即转移能力不是单个“赢家”克隆的内在特征,而是多个并发克隆的涌现特性。总的来说,我们的工作支持转移是一个多阶段的过程,始于肿瘤发展的早期,其特征是持续的多克隆传播和空间上不同克隆结构的形成。这种普遍模式可能与生物进化中的迁移和殖民化生态学相呼应。
{"title":"Full spatio-temporal analyses of migration and colonization in evolution-dense 3D mapping of cancer metastases provides new insights.","authors":"Qihang Chen, Senmao Li, Xianrui Wu, Qing Xu, Ranran Zhu, Yongsen Ruan, Ao Lan, Zihan Liu, Jiarui Weng, Yanjiang Zhao, Xiying Xu, Xinyue Qi, Jinhong Lai, Leyi Xiao, Ping Lan, Chung-I Wu, Bingjie Chen","doi":"10.1093/molbev/msag008","DOIUrl":"10.1093/molbev/msag008","url":null,"abstract":"<p><p>The process of migration and colonization is important in evolution; for example, modern humans experienced multiple waves of migrations out of Africa. However, no data cover the spatio-temporal patterns sufficiently to be truly informative. Metastatic cancer provides a unique in vivo model to study these processes through rapid somatic evolution. Here, we apply the high-resolution sampling technique (Dense 3D Crypt-scale Sampling) to analyze hundreds of spatially mapped micro-samples from the primary colorectal cancer and liver metastases in two representative cases. This would be analogous to recording the \"out-of-Africa\" events in two repeats. Our results support that liver metastases arise from polyphyletic and polyclonal seeding events where multiple, genetically distinct clones colonize a new site together. Following colonization, these multi-clonal populations can evolve into distinct spatial architectures: segregated territories formed by cells with low motility, or highly intermixed patterns driven by high motility. The colonization (or seeding) process begins within the first third of the primary tumor's progression, creating a large number of widespread but clinically undetectable micrometastatic colonies. These findings support a model where metastatic competence is not an intrinsic trait of a single \"winner\" clone but an emergent property of multiple concurrent clones. Collectively, our work supports metastasis as a multi-stage process initiated early in tumor development, characterized by continuous polyclonal dissemination and the formation of spatially distinct clonal architectures. This general pattern may echo the ecology of migration and colonization in organismal evolution.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145952793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcos Ramos-González, Víctor Ramos-González, Emma Serrano-Pérez, Christina Arvanitidou, Jorge Hernández-García, Mercedes García-González, Francisco J Romero-Campero
Since DNA sequencing has become commonplace, the development of efficient methods and tools to explore gene sequences has become indispensable. In particular, despite photosynthetic eukaryotes constituting the largest percentage of terrestrial biomass, computational functional characterization of gene sequences in these organisms still predominantly relies on comparisons with Arabidopsis thaliana and other angiosperms. This paper introduces PharaohFUN, a web application designed for the evolutionary and functional analysis of protein sequences in photosynthetic eukaryotes, leveraging orthology relationships between them. PharaohFUN incorporates a homogeneous representative sampling of key species in this group, bridging clades that have traditionally been studied separately, thus establishing a comprehensive evolutionary framework to draw conclusions about sequence evolution and function. For this purpose, it incorporates modules for exploring gene tree evolutionary history, expansion and contraction events, ancestral states, domain identification, multiple sequence alignments, and diverse functional annotation. It also incorporates different search modes to facilitate its use and increase its reach within the community. Tests were performed on the whole transcription factor toolbox of A. thaliana and on CCA1 protein to assess its utility for both large-scale and fine-grained phylogenetic studies. These exemplify how PharaohFUN accurately traces the corresponding evolutionary histories of these proteins by unifying results for land plants, streptophyte and chlorophyte microalgae. Thus, PharaohFUN democratices access to these kinds of analyses in photosynthetic organisms for every user, independently of their prior training in bioinformatics.
{"title":"PharaohFUN: phylogenomic analysis for plant protein history and function elucidation.","authors":"Marcos Ramos-González, Víctor Ramos-González, Emma Serrano-Pérez, Christina Arvanitidou, Jorge Hernández-García, Mercedes García-González, Francisco J Romero-Campero","doi":"10.1093/molbev/msag011","DOIUrl":"10.1093/molbev/msag011","url":null,"abstract":"<p><p>Since DNA sequencing has become commonplace, the development of efficient methods and tools to explore gene sequences has become indispensable. In particular, despite photosynthetic eukaryotes constituting the largest percentage of terrestrial biomass, computational functional characterization of gene sequences in these organisms still predominantly relies on comparisons with Arabidopsis thaliana and other angiosperms. This paper introduces PharaohFUN, a web application designed for the evolutionary and functional analysis of protein sequences in photosynthetic eukaryotes, leveraging orthology relationships between them. PharaohFUN incorporates a homogeneous representative sampling of key species in this group, bridging clades that have traditionally been studied separately, thus establishing a comprehensive evolutionary framework to draw conclusions about sequence evolution and function. For this purpose, it incorporates modules for exploring gene tree evolutionary history, expansion and contraction events, ancestral states, domain identification, multiple sequence alignments, and diverse functional annotation. It also incorporates different search modes to facilitate its use and increase its reach within the community. Tests were performed on the whole transcription factor toolbox of A. thaliana and on CCA1 protein to assess its utility for both large-scale and fine-grained phylogenetic studies. These exemplify how PharaohFUN accurately traces the corresponding evolutionary histories of these proteins by unifying results for land plants, streptophyte and chlorophyte microalgae. Thus, PharaohFUN democratices access to these kinds of analyses in photosynthetic organisms for every user, independently of their prior training in bioinformatics.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Wang, Tao Zhu, Xinye Zhang, Xufang Ren, Anqi Chen, Zhonghua Ning, Marcel van Tuinen, Lujiang Qu
The evolutionary history of waterfowl (Anseriformes) has long been a focal point of avian research. However, previous phylogenetic investigations have focused primarily on morphology or mitochondrial DNA or have lacked sufficient taxon sampling. Accompanied by observed phylogenetic incongruence and incomplete resolution, waterfowl phylogenetic branching patterns remain uncertain at various taxonomic ranks. To further validate phylogenetic relationships among higher waterfowl taxa and assess presence of conflicting signal, we assembled and analyzed 24 waterfowl genomes representing all waterfowl families and several subfamilies. Utilizing both newly acquired and previously obtained genomes, we constructed and analyzed seven DNA data classes, which yielded highly resolved phylogenetic trees including a time-calibrated tree. Most of these trees consistently and completely resolved the phylogenetic relationships of the included waterfowl species. Despite these efforts, our analysis across chromosomes uncovered four instances of phylogenetic incongruous signal. After minimizing tree estimation error through focus on whole-genome alignment dataset and by sequence simulation, analyses revealed that incomplete lineage sorting and gene introgression essentially contributed to all gene-tree discordance. The variable impact of both factors across distinct waterfowl nodes reflects an underlying complexity that warrants further interpretation. This study not only presents a strongly-supported and well-resolved phylogenetic backbone for the major waterfowl lineages, but also provides foundational data for subsequent comparative genomics studies of a more expanded set of waterfowl taxa.
{"title":"Phylogeny of waterfowl (Anseriformes) constructed using genome sequences provides insights into topological incongruences.","authors":"Gang Wang, Tao Zhu, Xinye Zhang, Xufang Ren, Anqi Chen, Zhonghua Ning, Marcel van Tuinen, Lujiang Qu","doi":"10.1093/molbev/msag018","DOIUrl":"10.1093/molbev/msag018","url":null,"abstract":"<p><p>The evolutionary history of waterfowl (Anseriformes) has long been a focal point of avian research. However, previous phylogenetic investigations have focused primarily on morphology or mitochondrial DNA or have lacked sufficient taxon sampling. Accompanied by observed phylogenetic incongruence and incomplete resolution, waterfowl phylogenetic branching patterns remain uncertain at various taxonomic ranks. To further validate phylogenetic relationships among higher waterfowl taxa and assess presence of conflicting signal, we assembled and analyzed 24 waterfowl genomes representing all waterfowl families and several subfamilies. Utilizing both newly acquired and previously obtained genomes, we constructed and analyzed seven DNA data classes, which yielded highly resolved phylogenetic trees including a time-calibrated tree. Most of these trees consistently and completely resolved the phylogenetic relationships of the included waterfowl species. Despite these efforts, our analysis across chromosomes uncovered four instances of phylogenetic incongruous signal. After minimizing tree estimation error through focus on whole-genome alignment dataset and by sequence simulation, analyses revealed that incomplete lineage sorting and gene introgression essentially contributed to all gene-tree discordance. The variable impact of both factors across distinct waterfowl nodes reflects an underlying complexity that warrants further interpretation. This study not only presents a strongly-supported and well-resolved phylogenetic backbone for the major waterfowl lineages, but also provides foundational data for subsequent comparative genomics studies of a more expanded set of waterfowl taxa.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12902361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146011332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Ochsner, Judith Bouman, Timothy Vaughan, Tanja Stadler, Sebastian Bonhoeffer, Roland Regoes
Phylodynamic methods are widely used to infer the population dynamics of viruses between and within hosts. For HIV-1, these methods have been used to estimate migration rates between different anatomical compartments within a host. These methods typically assume that the genomic regions used for reconstruction are evolving without selective pressure, even though other parts of the viral genome are known to experience strong selection. In this study, we investigate how selection affects phylodynamic migration rate estimates. To this end, we developed a novel agent-based simulation tool, virolution, to simulate the evolution of virus within two anatomical compartments of a host. Using this tool, we generated viral sequences and genealogies assuming both, neutral evolution and selection governed by an empirically-supported distribution of fitness effects that is concordant in both compartments. We found that, under the selection regime, migration rates are significantly overestimated with a stochastic mixture model and a structured coalescent model in the Bayesian inference framework BEAST2. Our results reveal that commonly used phylogeographic methods, which assume neutral evolution, can significantly bias migration rate estimates in selective regimes. This study underscores the need for assessing the robustness of phylodynamic analysis with respect to more realistic selection regimes.
{"title":"Viral Simulation Reveals Overestimation Bias in Within-Host Phylodynamic Migration Rate Estimates Under Selection.","authors":"Nicolas Ochsner, Judith Bouman, Timothy Vaughan, Tanja Stadler, Sebastian Bonhoeffer, Roland Regoes","doi":"10.1093/molbev/msag014","DOIUrl":"10.1093/molbev/msag014","url":null,"abstract":"<p><p>Phylodynamic methods are widely used to infer the population dynamics of viruses between and within hosts. For HIV-1, these methods have been used to estimate migration rates between different anatomical compartments within a host. These methods typically assume that the genomic regions used for reconstruction are evolving without selective pressure, even though other parts of the viral genome are known to experience strong selection. In this study, we investigate how selection affects phylodynamic migration rate estimates. To this end, we developed a novel agent-based simulation tool, virolution, to simulate the evolution of virus within two anatomical compartments of a host. Using this tool, we generated viral sequences and genealogies assuming both, neutral evolution and selection governed by an empirically-supported distribution of fitness effects that is concordant in both compartments. We found that, under the selection regime, migration rates are significantly overestimated with a stochastic mixture model and a structured coalescent model in the Bayesian inference framework BEAST2. Our results reveal that commonly used phylogeographic methods, which assume neutral evolution, can significantly bias migration rate estimates in selective regimes. This study underscores the need for assessing the robustness of phylodynamic analysis with respect to more realistic selection regimes.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12911929/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ancient and modern genomic data provide insights into continuous human migrations and subsequent admixture and gene flow throughout human history. These demographic events and natural selection contribute to the genetic and phenotypic variation that gives the African population its unique characteristics. This genomic data have provided scientists with insights into complex migratory events, patterns of admixture and the spatial distribution of ancestral lineages. For example, the return migration from western Eurasia to Africa introduced pastoralism, and the remarkable expansion of Bantu-speaking groups brought agricultural practices to a wider area of eastern and southern Africa. In addition, the continent's vast and diverse environmental conditions as well as complex human history and higher-level genetic diversity contribute to varying degrees of susceptibility and resistance to complex diseases. With all these complex demographic histories of African populations and a multi-ethnic genomic diversity, it remains essential to deepen our understanding of the genetic basis of complex traits and diseases. This review provides an overview of insights into population admixture and complex disease states based on data from ancient and modern genomes. These include the major waves of population movement and patterns of admixture that influence the diverse, complex traits observed among populations within the African continent. Overall, this review will provide a deep insight into prehistoric demographic events and the genomic profiles of modern Africans and highlights the importance of integrated international cooperation to strengthen African genomics research.
{"title":"Insights into human history and complex traits from the genomes of African populations.","authors":"Habtom Kiros Bitsue, Chaonan Tang, Zhaohui Yang","doi":"10.1093/molbev/msaf329","DOIUrl":"10.1093/molbev/msaf329","url":null,"abstract":"<p><p>Ancient and modern genomic data provide insights into continuous human migrations and subsequent admixture and gene flow throughout human history. These demographic events and natural selection contribute to the genetic and phenotypic variation that gives the African population its unique characteristics. This genomic data have provided scientists with insights into complex migratory events, patterns of admixture and the spatial distribution of ancestral lineages. For example, the return migration from western Eurasia to Africa introduced pastoralism, and the remarkable expansion of Bantu-speaking groups brought agricultural practices to a wider area of eastern and southern Africa. In addition, the continent's vast and diverse environmental conditions as well as complex human history and higher-level genetic diversity contribute to varying degrees of susceptibility and resistance to complex diseases. With all these complex demographic histories of African populations and a multi-ethnic genomic diversity, it remains essential to deepen our understanding of the genetic basis of complex traits and diseases. This review provides an overview of insights into population admixture and complex disease states based on data from ancient and modern genomes. These include the major waves of population movement and patterns of admixture that influence the diverse, complex traits observed among populations within the African continent. Overall, this review will provide a deep insight into prehistoric demographic events and the genomic profiles of modern Africans and highlights the importance of integrated international cooperation to strengthen African genomics research.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12810205/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145810601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}