Understanding the evolutionary dynamics of cell populations requires models that link observed phylogenetic patterns to the underlying processes of cell division, death, and mutation. Classical phylodynamic inference methods-developed primarily for macroevolutionary settings-assume that mutations accrue in calendar time and often rely on a molecular clock. Here, we introduce a framework that ties mutations to discrete birth (division) events. In this setting, mutations accumulate via a compound Poisson process, capturing both visible and hidden cell divisions within the reconstructed phylogenetic tree. We present a computationally efficient dynamic programming algorithm to compute the likelihood based on tree topologies with associated mutations, integrating over latent variables such as branch durations and unobserved cell divisions. Our method is applicable to large-scale single-cell datasets, and we demonstrate its utility on simulated data and on single-cell phylogenies of hematopoietic stem cells.
{"title":"Phylodynamics of Somatic Evolution: A Likelihood-Based Approach for Cellular Reproduction.","authors":"Tobias Dieselhorst, Johannes Berg","doi":"10.1093/molbev/msag002","DOIUrl":"10.1093/molbev/msag002","url":null,"abstract":"<p><p>Understanding the evolutionary dynamics of cell populations requires models that link observed phylogenetic patterns to the underlying processes of cell division, death, and mutation. Classical phylodynamic inference methods-developed primarily for macroevolutionary settings-assume that mutations accrue in calendar time and often rely on a molecular clock. Here, we introduce a framework that ties mutations to discrete birth (division) events. In this setting, mutations accumulate via a compound Poisson process, capturing both visible and hidden cell divisions within the reconstructed phylogenetic tree. We present a computationally efficient dynamic programming algorithm to compute the likelihood based on tree topologies with associated mutations, integrating over latent variables such as branch durations and unobserved cell divisions. Our method is applicable to large-scale single-cell datasets, and we demonstrate its utility on simulated data and on single-cell phylogenies of hematopoietic stem cells.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12877876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145912311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziqi Deng, Claudia Sanchis-López, Ana Hernández-Plaza, Adrián A Davín, Jaime Huerta-Cepas
Profiling biological traits along gene or species tree topologies is a well-established approach in comparative genomics, widely employed to infer gene function from co-evolutionary patterns (phylogenetic profiling), reconstruct ancestral states, and uncover ecological associations. However, existing profiling tools are typically tailored to specific use cases, have limited scalability for large datasets, and lack robust methods to aggregate or summarize traits at internal tree nodes. Here, we present TreeProfiler, a tool for automated annotation and interactive exploration of hundreds of features along large gene and species trees, with seamless summarization of mapped traits at internal nodes. TreeProfiler supports the profiling of custom continuous and discrete traits, as well as ancestral character reconstruction and phylogenetic signal tests. It also integrates commonly used genomic features, including multiple sequence alignments, protein domain architectures, and functional annotations. We demonstrate TreeProfiler's utility beyond traditional phylogenetic profiling, as well as its ability to efficiently handle massive datasets, by analyzing the functional diversification of the methyl-accepting chemotaxis protein (MCP) family comprising over 400,000 genomic and metagenomic sequences, and by profiling the relative abundance of 124,295 bacterial and archaeal species across 51 biomes. TreeProfiler is open-source and freely available at https://github.com/compgenomicslab/TreeProfiler.
{"title":"TreeProfiler: Large-scale metadata profiling along gene and species trees.","authors":"Ziqi Deng, Claudia Sanchis-López, Ana Hernández-Plaza, Adrián A Davín, Jaime Huerta-Cepas","doi":"10.1093/molbev/msag028","DOIUrl":"https://doi.org/10.1093/molbev/msag028","url":null,"abstract":"<p><p>Profiling biological traits along gene or species tree topologies is a well-established approach in comparative genomics, widely employed to infer gene function from co-evolutionary patterns (phylogenetic profiling), reconstruct ancestral states, and uncover ecological associations. However, existing profiling tools are typically tailored to specific use cases, have limited scalability for large datasets, and lack robust methods to aggregate or summarize traits at internal tree nodes. Here, we present TreeProfiler, a tool for automated annotation and interactive exploration of hundreds of features along large gene and species trees, with seamless summarization of mapped traits at internal nodes. TreeProfiler supports the profiling of custom continuous and discrete traits, as well as ancestral character reconstruction and phylogenetic signal tests. It also integrates commonly used genomic features, including multiple sequence alignments, protein domain architectures, and functional annotations. We demonstrate TreeProfiler's utility beyond traditional phylogenetic profiling, as well as its ability to efficiently handle massive datasets, by analyzing the functional diversification of the methyl-accepting chemotaxis protein (MCP) family comprising over 400,000 genomic and metagenomic sequences, and by profiling the relative abundance of 124,295 bacterial and archaeal species across 51 biomes. TreeProfiler is open-source and freely available at https://github.com/compgenomicslab/TreeProfiler.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Xue, Lei Tao, Junwei Cao, Liang Wu, Guang Li, Cai Li
Germline de novo mutations (DNMs) are the ultimate source of heritable variation, yet their patterns in highly heterozygous genomes remain poorly understood. Amphioxus, an early-branching chordate with exceptionally high genomic heterozygosity (3.2% to 4.2% in sequenced species), offers a unique model to explore mutational dynamics in such contexts. It is unclear whether high heterozygosity in amphioxus is due to a large effective population size, an increased mutation rate, or both. Here, we perform deep short-read whole-genome sequencing of a two-generation pedigree of the amphioxus Branchiostoma floridae comprising two parents and 104 offspring and develop a framework based on allele-aware parental assemblies as the reference to accurately identify DNMs. We detect 242 high-confidence DNMs, yielding a genome-wide mutation rate of 5.89 × 10-9 per base per generation, which is comparable to that of vertebrates. Combining this estimate with observed nucleotide diversity, we obtain an effective population size of ∼1.7 million, indicating that the elevated heterozygosity mainly results from a large effective population size. We observe no sex bias when considering all DNMs but a paternal-origin bias for early-occurring ones. Amphioxus harbors a much smaller fraction of CpG>TpG DNMs relative to vertebrates, attributable to its low methylation levels. We also investigate putative postzygotic mutations in the offspring, revealing an unexpected paternal-origin bias. These suggest some distinct mutational mechanisms in amphioxus. Our study not only provides the first DNM measurement for amphioxus but also offers a generalizable strategy for studying DNMs in highly heterozygous genomes, facilitating mutation rate studies across chordates and other lineages.
{"title":"Germline de novo mutation rate of the highly heterozygous amphioxus genome.","authors":"Jing Xue, Lei Tao, Junwei Cao, Liang Wu, Guang Li, Cai Li","doi":"10.1093/molbev/msag017","DOIUrl":"10.1093/molbev/msag017","url":null,"abstract":"<p><p>Germline de novo mutations (DNMs) are the ultimate source of heritable variation, yet their patterns in highly heterozygous genomes remain poorly understood. Amphioxus, an early-branching chordate with exceptionally high genomic heterozygosity (3.2% to 4.2% in sequenced species), offers a unique model to explore mutational dynamics in such contexts. It is unclear whether high heterozygosity in amphioxus is due to a large effective population size, an increased mutation rate, or both. Here, we perform deep short-read whole-genome sequencing of a two-generation pedigree of the amphioxus Branchiostoma floridae comprising two parents and 104 offspring and develop a framework based on allele-aware parental assemblies as the reference to accurately identify DNMs. We detect 242 high-confidence DNMs, yielding a genome-wide mutation rate of 5.89 × 10-9 per base per generation, which is comparable to that of vertebrates. Combining this estimate with observed nucleotide diversity, we obtain an effective population size of ∼1.7 million, indicating that the elevated heterozygosity mainly results from a large effective population size. We observe no sex bias when considering all DNMs but a paternal-origin bias for early-occurring ones. Amphioxus harbors a much smaller fraction of CpG>TpG DNMs relative to vertebrates, attributable to its low methylation levels. We also investigate putative postzygotic mutations in the offspring, revealing an unexpected paternal-origin bias. These suggest some distinct mutational mechanisms in amphioxus. Our study not only provides the first DNM measurement for amphioxus but also offers a generalizable strategy for studying DNMs in highly heterozygous genomes, facilitating mutation rate studies across chordates and other lineages.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12862219/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145966555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The process of migration and colonization is important in evolution; for example, modern humans experienced multiple waves of migrations out of Africa. However, no data cover the spatio-temporal patterns sufficiently to be truly informative. Metastatic cancer provides a unique in vivo model to study these processes through rapid somatic evolution. Here, we apply the high-resolution sampling technique (Dense 3D Crypt-scale Sampling) to analyze hundreds of spatially mapped micro-samples from the primary colorectal cancer and liver metastases in two representative cases. This would be analogous to recording the "out-of-Africa" events in two repeats. Our results support that liver metastases arise from polyphyletic and polyclonal seeding events where multiple, genetically distinct clones colonize a new site together. Following colonization, these multi-clonal populations can evolve into distinct spatial architectures: segregated territories formed by cells with low motility, or highly intermixed patterns driven by high motility. The colonization (or seeding) process begins within the first third of the primary tumor's progression, creating a large number of widespread but clinically undetectable micrometastatic colonies. These findings support a model where metastatic competence is not an intrinsic trait of a single "winner" clone but an emergent property of multiple concurrent clones. Collectively, our work supports metastasis as a multi-stage process initiated early in tumor development, characterized by continuous polyclonal dissemination and the formation of spatially distinct clonal architectures. This general pattern may echo the ecology of migration and colonization in organismal evolution.
迁移和殖民化过程在进化中是重要的;例如,现代人类经历了多次走出非洲的迁徙浪潮。然而,没有任何数据足以涵盖时空格局,足以提供真正的信息。转移性癌症提供了一个独特的体内模型,通过快速体细胞进化来研究这些过程。在这里,我们应用高分辨率采样技术(Dense 3D Crypt-scale sampling)分析了来自两个代表性病例的原发性结直肠癌和肝转移瘤的数百个空间映射微样本。这类似于将“走出非洲”的事件重复记录两次。我们的研究结果支持肝转移发生于多系和多克隆播种事件,其中多个遗传上不同的克隆一起定殖到一个新的位点。在定植之后,这些多克隆种群可以进化成不同的空间结构:由低运动性细胞形成的隔离区域,或由高运动性驱动的高度混合模式。定植(或播种)过程开始于原发肿瘤进展的前三分之一,产生大量广泛但临床无法检测到的微转移菌落。这些发现支持了一个模型,即转移能力不是单个“赢家”克隆的内在特征,而是多个并发克隆的涌现特性。总的来说,我们的工作支持转移是一个多阶段的过程,始于肿瘤发展的早期,其特征是持续的多克隆传播和空间上不同克隆结构的形成。这种普遍模式可能与生物进化中的迁移和殖民化生态学相呼应。
{"title":"Full spatio-temporal analyses of migration and colonization in evolution-dense 3D mapping of cancer metastases provides new insights.","authors":"Qihang Chen, Senmao Li, Xianrui Wu, Qing Xu, Ranran Zhu, Yongsen Ruan, Ao Lan, Zihan Liu, Jiarui Weng, Yanjiang Zhao, Xiying Xu, Xinyue Qi, Jinhong Lai, Leyi Xiao, Ping Lan, Chung-I Wu, Bingjie Chen","doi":"10.1093/molbev/msag008","DOIUrl":"10.1093/molbev/msag008","url":null,"abstract":"<p><p>The process of migration and colonization is important in evolution; for example, modern humans experienced multiple waves of migrations out of Africa. However, no data cover the spatio-temporal patterns sufficiently to be truly informative. Metastatic cancer provides a unique in vivo model to study these processes through rapid somatic evolution. Here, we apply the high-resolution sampling technique (Dense 3D Crypt-scale Sampling) to analyze hundreds of spatially mapped micro-samples from the primary colorectal cancer and liver metastases in two representative cases. This would be analogous to recording the \"out-of-Africa\" events in two repeats. Our results support that liver metastases arise from polyphyletic and polyclonal seeding events where multiple, genetically distinct clones colonize a new site together. Following colonization, these multi-clonal populations can evolve into distinct spatial architectures: segregated territories formed by cells with low motility, or highly intermixed patterns driven by high motility. The colonization (or seeding) process begins within the first third of the primary tumor's progression, creating a large number of widespread but clinically undetectable micrometastatic colonies. These findings support a model where metastatic competence is not an intrinsic trait of a single \"winner\" clone but an emergent property of multiple concurrent clones. Collectively, our work supports metastasis as a multi-stage process initiated early in tumor development, characterized by continuous polyclonal dissemination and the formation of spatially distinct clonal architectures. This general pattern may echo the ecology of migration and colonization in organismal evolution.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145952793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcos Ramos-González, Víctor Ramos-González, Emma Serrano-Pérez, Christina Arvanitidou, Jorge Hernández-García, Mercedes García-González, Francisco J Romero-Campero
Since DNA sequencing has become commonplace, the development of efficient methods and tools to explore gene sequences has become indispensable. In particular, despite photosynthetic eukaryotes constituting the largest percentage of terrestrial biomass, computational functional characterization of gene sequences in these organisms still predominantly relies on comparisons with Arabidopsis thaliana and other angiosperms. This paper introduces PharaohFUN, a web application designed for the evolutionary and functional analysis of protein sequences in photosynthetic eukaryotes, leveraging orthology relationships between them. PharaohFUN incorporates a homogeneous representative sampling of key species in this group, bridging clades that have traditionally been studied separately, thus establishing a comprehensive evolutionary framework to draw conclusions about sequence evolution and function. For this purpose, it incorporates modules for exploring gene tree evolutionary history, expansion and contraction events, ancestral states, domain identification, multiple sequence alignments, and diverse functional annotation. It also incorporates different search modes to facilitate its use and increase its reach within the community. Tests were performed on the whole transcription factor toolbox of A. thaliana and on CCA1 protein to assess its utility for both large-scale and fine-grained phylogenetic studies. These exemplify how PharaohFUN accurately traces the corresponding evolutionary histories of these proteins by unifying results for land plants, streptophyte and chlorophyte microalgae. Thus, PharaohFUN democratices access to these kinds of analyses in photosynthetic organisms for every user, independently of their prior training in bioinformatics.
{"title":"PharaohFUN: phylogenomic analysis for plant protein history and function elucidation.","authors":"Marcos Ramos-González, Víctor Ramos-González, Emma Serrano-Pérez, Christina Arvanitidou, Jorge Hernández-García, Mercedes García-González, Francisco J Romero-Campero","doi":"10.1093/molbev/msag011","DOIUrl":"10.1093/molbev/msag011","url":null,"abstract":"<p><p>Since DNA sequencing has become commonplace, the development of efficient methods and tools to explore gene sequences has become indispensable. In particular, despite photosynthetic eukaryotes constituting the largest percentage of terrestrial biomass, computational functional characterization of gene sequences in these organisms still predominantly relies on comparisons with Arabidopsis thaliana and other angiosperms. This paper introduces PharaohFUN, a web application designed for the evolutionary and functional analysis of protein sequences in photosynthetic eukaryotes, leveraging orthology relationships between them. PharaohFUN incorporates a homogeneous representative sampling of key species in this group, bridging clades that have traditionally been studied separately, thus establishing a comprehensive evolutionary framework to draw conclusions about sequence evolution and function. For this purpose, it incorporates modules for exploring gene tree evolutionary history, expansion and contraction events, ancestral states, domain identification, multiple sequence alignments, and diverse functional annotation. It also incorporates different search modes to facilitate its use and increase its reach within the community. Tests were performed on the whole transcription factor toolbox of A. thaliana and on CCA1 protein to assess its utility for both large-scale and fine-grained phylogenetic studies. These exemplify how PharaohFUN accurately traces the corresponding evolutionary histories of these proteins by unifying results for land plants, streptophyte and chlorophyte microalgae. Thus, PharaohFUN democratices access to these kinds of analyses in photosynthetic organisms for every user, independently of their prior training in bioinformatics.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Ochsner, Judith Bouman, Timothy Vaughan, Tanja Stadler, Sebastian Bonhoeffer, Roland Regoes
Phylodynamic methods are widely used to infer the population dynamics of viruses between and within hosts. For HIV-1, these methods have been used to estimate migration rates between different anatomical compartments within a host. These methods typically assume that the genomic regions used for reconstruction are evolving without selective pressure, even though other parts of the viral genome are known to experience strong selection. In this study, we investigate how selection affects phylodynamic migration rate estimates. To this end, we developed a novel agent-based simulation tool, virolution, to simulate the evolution of virus within two anatomical compartments of a host. Using this tool, we generated viral sequences and genealogies assuming both, neutral evolution and purifying selection that is concordant in both compartments. We found that, under the selection regime, migration rates are significantly overestimated with a stochastic mixture model and a structured coalescent model in the Bayesian inference framework BEAST2. Our results reveal that commonly used phylogeographic methods, which assume neutral evolution, can significantly bias migration rate estimates in selective regimes.
{"title":"Viral Simulation Reveals Overestimation Bias in Within-Host Phylodynamic Migration Rate Estimates Under Selection.","authors":"Nicolas Ochsner, Judith Bouman, Timothy Vaughan, Tanja Stadler, Sebastian Bonhoeffer, Roland Regoes","doi":"10.1093/molbev/msag014","DOIUrl":"https://doi.org/10.1093/molbev/msag014","url":null,"abstract":"<p><p>Phylodynamic methods are widely used to infer the population dynamics of viruses between and within hosts. For HIV-1, these methods have been used to estimate migration rates between different anatomical compartments within a host. These methods typically assume that the genomic regions used for reconstruction are evolving without selective pressure, even though other parts of the viral genome are known to experience strong selection. In this study, we investigate how selection affects phylodynamic migration rate estimates. To this end, we developed a novel agent-based simulation tool, virolution, to simulate the evolution of virus within two anatomical compartments of a host. Using this tool, we generated viral sequences and genealogies assuming both, neutral evolution and purifying selection that is concordant in both compartments. We found that, under the selection regime, migration rates are significantly overestimated with a stochastic mixture model and a structured coalescent model in the Bayesian inference framework BEAST2. Our results reveal that commonly used phylogeographic methods, which assume neutral evolution, can significantly bias migration rate estimates in selective regimes.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ornamental medaka strains derived from wild Japanese medaka (Oryzias latipes species complex) are bred worldwide. Over 200 years of selective breeding have produced over 700 strains with a wide variety of phenotypes, including diverse body coloration, scales, eyeball morphology, and fin and body shapes. In this study, we first identified and described 34 phenotypes in ornamental medaka strains. To understand the genomic basis of this phenotypic diversity and the domestication process, we performed whole genome sequencing on 181 individuals of 86 ornamental Japanese medaka strains. Population genomic analyses revealed that modern ornamental medaka strains are genetically closer to the wild Southern Japan population of the Kansai-Setouchi regions, suggesting the origin of ornamental strains. In addition, the gene loci poc1a, tyr, nme2a, and gabrr2b have undergone selection during domestication. We performed GWAS analysis for 29 phenotypes observed in ornamental medaka strains and identified strong candidate genes for some phenotypes, including kcnq5a for hirenaga and swallow, bmp5 for deme, adcy5 for orochi, and kitlga for aurora, respectively. We found that loss of exon 8 of adcy5 caused melanism, a dark body color phenotype, in medaka, providing a molecular insight on this phenomenon in vertebrates and human inherent dyskinesia. In addition, we uncovered the predominant candidate peaks of GWAS, including a total of 3,328 genes associated with 26 phenotypes. Our findings highlight the potential of population genomics to explore genotype-phenotype correlations and the genomic basis of body coloration and morphogenesis in medaka.
观赏樱草系是由野生日本樱草(Oryzias latipes species complex)衍生而来。经过200多年的选择性育种,已经产生了700多种具有多种表型的菌株,包括各种身体颜色,鳞片,眼球形态,鳍和身体形状。在这项研究中,我们首次鉴定和描述了观赏medaka菌株的34种表型。为了了解这种表型多样性的基因组基础和驯化过程,我们对86个观赏日本medaka品系的181个个体进行了全基因组测序。群体基因组分析表明,现代观赏medaka菌株在遗传上更接近关西濑户内地区的野生日本南部种群,这表明观赏菌株的起源。此外,基因位点poc1a、tyr、nme2a和gabrr2b在驯化过程中也经历了选择。我们对观赏medaka菌株中观察到的29种表型进行了GWAS分析,发现了一些表型的强候选基因,包括hirenaga和swallow的kcnq5a, deme的bmp5, orochi的adcy5和aurora的kitlga。我们发现adcy5外显子8的缺失导致了medaka的黑化,一种深色体色表型,这为脊椎动物和人类固有运动障碍的这种现象提供了分子视角。此外,我们还发现了GWAS的主要候选峰,包括与26种表型相关的3328个基因。我们的研究结果突出了群体基因组学在探索medaka的基因型-表型相关性以及身体颜色和形态发生的基因组基础方面的潜力。
{"title":"Genomic consequences of domestication and the diversification of body coloration and morphology in ornamental medaka strains.","authors":"Tetsuo Kon, Rui Tang, Koto Kon-Nanjo, Soma Tomihara, Soichiro Fushiki, Wakana Fujii, Mifuyu Sera, Yusuke Takehana, Hideki Noguchi, Atsushi Toyoda, Kiyoshi Naruse, Yoshihiro Omori","doi":"10.1093/molbev/msag021","DOIUrl":"https://doi.org/10.1093/molbev/msag021","url":null,"abstract":"<p><p>Ornamental medaka strains derived from wild Japanese medaka (Oryzias latipes species complex) are bred worldwide. Over 200 years of selective breeding have produced over 700 strains with a wide variety of phenotypes, including diverse body coloration, scales, eyeball morphology, and fin and body shapes. In this study, we first identified and described 34 phenotypes in ornamental medaka strains. To understand the genomic basis of this phenotypic diversity and the domestication process, we performed whole genome sequencing on 181 individuals of 86 ornamental Japanese medaka strains. Population genomic analyses revealed that modern ornamental medaka strains are genetically closer to the wild Southern Japan population of the Kansai-Setouchi regions, suggesting the origin of ornamental strains. In addition, the gene loci poc1a, tyr, nme2a, and gabrr2b have undergone selection during domestication. We performed GWAS analysis for 29 phenotypes observed in ornamental medaka strains and identified strong candidate genes for some phenotypes, including kcnq5a for hirenaga and swallow, bmp5 for deme, adcy5 for orochi, and kitlga for aurora, respectively. We found that loss of exon 8 of adcy5 caused melanism, a dark body color phenotype, in medaka, providing a molecular insight on this phenomenon in vertebrates and human inherent dyskinesia. In addition, we uncovered the predominant candidate peaks of GWAS, including a total of 3,328 genes associated with 26 phenotypes. Our findings highlight the potential of population genomics to explore genotype-phenotype correlations and the genomic basis of body coloration and morphogenesis in medaka.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146086425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The genetics of complex traits has been fundamentally transformed by the dramatic reduction in short-read sequencing costs, leading to a dramatic reversal in the relative costs of genotyping versus phenotyping. We explore this new scientific landscape by examining key experimental strategies that leverage inexpensive sequencing, including low-coverage whole-genome sequencing with imputation (lcWGS+I) for genotyping large cohorts. Although somewhat limited in outbred populations, lcWGS+I can be extremely effective in multi-parent populations (MPPs) and in founder-unknown closed colonies, where imputation accuracy can exceed 98%. We further explore pooled-sequencing (Pool-seq) approaches for dissecting complex traits, such as Evolve and Resequence (E&R) for tracking adaptive changes in allele frequency over several generations, and Extreme QTL (X-QTL) mapping that identifies loci by contrasting pooled samples from phenotypic extremes. We show that X-QTL mapping in MPPs, by testing for shifts in founder haplotype frequencies across small genomic windows, can be extremely powerful and cost-effective. Finally, we discuss methods where sequencing reads serve as the phenotype itself. DNA barcoding enables massive-scale fitness assays, while the "*-seq" toolkit (e.g., RNA-seq, ATAC-seq) allows for mapping molecular QTLs, though this introduces a significant multiple testing burden. Systems leveraging certain breeding designs in concert with low cost sequencing can greatly accelerate progress towards a mechanistic understanding of the genotype-phenotype relationship.
{"title":"Leveraging Low-Cost Short-Read Sequencing: Revolutionizing Complex Trait Genetics.","authors":"Sarah N Ruckman, Anthony D Long","doi":"10.1093/molbev/msag025","DOIUrl":"https://doi.org/10.1093/molbev/msag025","url":null,"abstract":"<p><p>The genetics of complex traits has been fundamentally transformed by the dramatic reduction in short-read sequencing costs, leading to a dramatic reversal in the relative costs of genotyping versus phenotyping. We explore this new scientific landscape by examining key experimental strategies that leverage inexpensive sequencing, including low-coverage whole-genome sequencing with imputation (lcWGS+I) for genotyping large cohorts. Although somewhat limited in outbred populations, lcWGS+I can be extremely effective in multi-parent populations (MPPs) and in founder-unknown closed colonies, where imputation accuracy can exceed 98%. We further explore pooled-sequencing (Pool-seq) approaches for dissecting complex traits, such as Evolve and Resequence (E&R) for tracking adaptive changes in allele frequency over several generations, and Extreme QTL (X-QTL) mapping that identifies loci by contrasting pooled samples from phenotypic extremes. We show that X-QTL mapping in MPPs, by testing for shifts in founder haplotype frequencies across small genomic windows, can be extremely powerful and cost-effective. Finally, we discuss methods where sequencing reads serve as the phenotype itself. DNA barcoding enables massive-scale fitness assays, while the \"*-seq\" toolkit (e.g., RNA-seq, ATAC-seq) allows for mapping molecular QTLs, though this introduces a significant multiple testing burden. Systems leveraging certain breeding designs in concert with low cost sequencing can greatly accelerate progress towards a mechanistic understanding of the genotype-phenotype relationship.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146064862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daven C Presgraves, R Kelly Dawe, Kelly A Dyer, Lila Fishman, Soumitra A Bhide, Sasha L Bradshaw, Meghan J Brady, Alejandro Burga, Cécile Courret, Brandon L Fagen, Ana Beatriz Stein Machado Ferretti, Reka K Kelemen, Jun Kitano, Yiran Liu, Emiliano Martí, Theresa Erlenbach, Josephine A Reinhardt, Laura Ross, Jan-Niklas Runge, Callie M Swanepoel, Beatriz Vicoso, Aaron A Vogan, Anna K Lindholm, Amanda M Larracuente, Robert L Unckless
Meiotic drivers are selfish genetic elements that gain transmission advantages by distorting equal, Mendelian segregation. For decades, biologists have considered meiotic drivers as interesting, albeit esoteric, case studies. It is now clear, however, that meiotic drive is more common and phylogenetically widespread than previously supposed. Indeed, intensive study of a few well-known cases has begun to reveal the evolutionary genomic consequences of meiotic drive. We argue here that many features of genome evolution, content, and organization that are seemingly inexplicable by organismal adaptation or nearly neutral processes are instead best accounted for by recurrent histories of meiotic drive. We review how meiotic drive can affect the evolution of sequences, gene copy numbers, genes with functions in meiosis and gametogenesis, signatures of "selection", chromosome rearrangements, and karyotype evolution. We also explore the interactions of meiotic drive elements with other classes of selfish genetic elements, including satellite DNAs, transposable elements, and with the endogenous host genes involved in drive suppression. Finally, we argue that some aspects of drive-mediated genome evolution are now sufficiently well established that we might reverse the direction of discovery- rather than ask how drive affects genome evolution, we can use genome data to discover new putative drive elements.
{"title":"The Evolutionary Genomics of Meiotic Drive.","authors":"Daven C Presgraves, R Kelly Dawe, Kelly A Dyer, Lila Fishman, Soumitra A Bhide, Sasha L Bradshaw, Meghan J Brady, Alejandro Burga, Cécile Courret, Brandon L Fagen, Ana Beatriz Stein Machado Ferretti, Reka K Kelemen, Jun Kitano, Yiran Liu, Emiliano Martí, Theresa Erlenbach, Josephine A Reinhardt, Laura Ross, Jan-Niklas Runge, Callie M Swanepoel, Beatriz Vicoso, Aaron A Vogan, Anna K Lindholm, Amanda M Larracuente, Robert L Unckless","doi":"10.1093/molbev/msag020","DOIUrl":"https://doi.org/10.1093/molbev/msag020","url":null,"abstract":"<p><p>Meiotic drivers are selfish genetic elements that gain transmission advantages by distorting equal, Mendelian segregation. For decades, biologists have considered meiotic drivers as interesting, albeit esoteric, case studies. It is now clear, however, that meiotic drive is more common and phylogenetically widespread than previously supposed. Indeed, intensive study of a few well-known cases has begun to reveal the evolutionary genomic consequences of meiotic drive. We argue here that many features of genome evolution, content, and organization that are seemingly inexplicable by organismal adaptation or nearly neutral processes are instead best accounted for by recurrent histories of meiotic drive. We review how meiotic drive can affect the evolution of sequences, gene copy numbers, genes with functions in meiosis and gametogenesis, signatures of \"selection\", chromosome rearrangements, and karyotype evolution. We also explore the interactions of meiotic drive elements with other classes of selfish genetic elements, including satellite DNAs, transposable elements, and with the endogenous host genes involved in drive suppression. Finally, we argue that some aspects of drive-mediated genome evolution are now sufficiently well established that we might reverse the direction of discovery- rather than ask how drive affects genome evolution, we can use genome data to discover new putative drive elements.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146053110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Wang, Tao Zhu, Xinye Zhang, Xufang Ren, Anqi Chen, Zhonghua Ning, Marcel van Tuinen, Lujiang Qu
The evolutionary history of waterfowl (Anseriformes) has long been a focal point of avian research. However, previous phylogenetic investigations have focused primarily on morphology or mitochondrial DNA or have lacked sufficient taxon sampling. Accompanied by observed phylogenetic incongruence and incomplete resolution, waterfowl phylogenetic branching patterns remain uncertain at various taxonomic ranks. To further validate phylogenetic relationships among higher waterfowl taxa and assess presence of conflicting signal, we assembled and analyzed 24 waterfowl genomes representing all waterfowl families and several subfamilies. Utilizing both newly acquired and previously obtained genomes, we constructed and analyzed seven DNA data classes, which yielded highly resolved phylogenetic trees including a time-calibrated tree. Most of these trees consistently and completely resolved the phylogenetic relationships of the included waterfowl species. Despite these efforts, our analysis across chromosomes uncovered four instances of phylogenetic incongruous signal. After minimizing tree estimation error through focus on whole genome alignment (WGA) dataset and by sequence simulation, analyses revealed that ILS and gene introgression essentially contributed to all gene-tree discordance. The variable impact of both factors across distinct waterfowl nodes reflects an underlying complexity that warrants further interpretation. This study not only presents a strongly-supported and well-resolved phylogenetic backbone for the major waterfowl lineages, but also provides foundational data for subsequent comparative genomics studies of a more expanded set of waterfowl taxa.
{"title":"Phylogeny of waterfowl (Anseriformes) constructed using genome sequences provides insights into topological incongruences.","authors":"Gang Wang, Tao Zhu, Xinye Zhang, Xufang Ren, Anqi Chen, Zhonghua Ning, Marcel van Tuinen, Lujiang Qu","doi":"10.1093/molbev/msag018","DOIUrl":"https://doi.org/10.1093/molbev/msag018","url":null,"abstract":"<p><p>The evolutionary history of waterfowl (Anseriformes) has long been a focal point of avian research. However, previous phylogenetic investigations have focused primarily on morphology or mitochondrial DNA or have lacked sufficient taxon sampling. Accompanied by observed phylogenetic incongruence and incomplete resolution, waterfowl phylogenetic branching patterns remain uncertain at various taxonomic ranks. To further validate phylogenetic relationships among higher waterfowl taxa and assess presence of conflicting signal, we assembled and analyzed 24 waterfowl genomes representing all waterfowl families and several subfamilies. Utilizing both newly acquired and previously obtained genomes, we constructed and analyzed seven DNA data classes, which yielded highly resolved phylogenetic trees including a time-calibrated tree. Most of these trees consistently and completely resolved the phylogenetic relationships of the included waterfowl species. Despite these efforts, our analysis across chromosomes uncovered four instances of phylogenetic incongruous signal. After minimizing tree estimation error through focus on whole genome alignment (WGA) dataset and by sequence simulation, analyses revealed that ILS and gene introgression essentially contributed to all gene-tree discordance. The variable impact of both factors across distinct waterfowl nodes reflects an underlying complexity that warrants further interpretation. This study not only presents a strongly-supported and well-resolved phylogenetic backbone for the major waterfowl lineages, but also provides foundational data for subsequent comparative genomics studies of a more expanded set of waterfowl taxa.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146011332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}