Molecular biology and evolution最新文献_第8页

SMBE Secretary's Report 2025. 中小企业秘书报告2025。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf280

Emmanuelle Lerat

引用次数: 0

The loss of a supergene in obligately polygynous Formica wood ant species. 专一一夫多妻的福木蚁物种中一个超基因的缺失。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf320

Hanna Sigeman, Ina Satokangas, Matthieu de Lamarre, Patrick Krapf, Pierre Nouhaud, Riddhi Deshmukh, Heikki Helanterä, Michel Chapuisat, Jonna Kulmuni, Lumi Viljakainen

Some of the most striking examples of phenotypic variation within species are controlled by supergenes. However, most research on supergenes has focused on their emergence and long-term maintenance, leaving the later stages of their life cycle largely unexplored. Specifically, what happens to a derived supergene haplotype when the trait it controls reaches fixation? Here we answer this question using the ancient supergene system of Formica ants, where (monogynous) single-queen colonies typically carry only the ancestral haplotype M while the derived haplotype P is exclusive to (polygynous) colonies with multiple queens. Through comparative population genomics of 264 individuals from all seven European wood ant species, we found that the P haplotype was present in only 1/3 obligately polygynous species (Formica polyctena). In the two others (Formica aquilonia and Formica paralugubris), the P haplotype was completely missing except for duplicated P-specific paralogs of two genes, Zasp52 and TTLL2, with Zasp52 being directly involved in wing muscle development. We hypothesize that these genes play a direct role in polygyny and contribute to differences in body size and/or dispersal behavior between monogynous and polygynous queens. A complete lack of P/P genotypes among the 261 workers suggests strong selection against such genotypes. While our analyses did not reveal evidence of increased mutation load on the P, it is possible that this skew in genotype distributions is driven by a few loci with strong fitness effects. We propose that selection to escape P-associated fitness costs underlies the loss of this haplotype in obligately polygynous wood ants.

物种内表型变异的一些最显著的例子是由超基因控制的。然而，大多数关于超基因的研究都集中在它们的出现和长期维持上，而对它们生命周期的后期阶段则基本上没有进行探索。具体来说，当衍生的超基因单倍型控制的性状达到固定状态时，会发生什么？在这里，我们用Formica蚂蚁的古老的超基因系统回答了这个问题，其中（一夫一妻制）单蚁后群体通常只携带祖先的单倍型M，而衍生的单倍型P是（一夫多妻制）有多个蚁后的群体所独有的。通过比较7种欧洲木蚁的264个个体的群体基因组学，我们发现P单倍型仅存在于1/3的专一一夫多妻种（F. polyctena）中。在其他两个（F. aquilonia和F. paragubris）中，P单倍型完全缺失，只有两个基因Zasp52和TTLL2的P特异性相似物重复，其中Zasp52直接参与翅膀肌肉发育。我们假设这些基因在一夫多妻制中起直接作用，并有助于在一夫一妻制和一夫多妻制的蚁后之间的体型和/或分散行为的差异。在261名工人中完全缺乏P/P基因型，这表明对这些基因型有很强的选择。虽然我们的分析没有显示P突变负荷增加的证据，但基因型分布的这种偏态可能是由一些具有强适应性效应的位点驱动的。我们提出，逃避与p相关的适应性成本的选择是这种单倍型在一夫多妻制木蚁中丧失的基础。

{"title":"The loss of a supergene in obligately polygynous Formica wood ant species.","authors":"Hanna Sigeman, Ina Satokangas, Matthieu de Lamarre, Patrick Krapf, Pierre Nouhaud, Riddhi Deshmukh, Heikki Helanterä, Michel Chapuisat, Jonna Kulmuni, Lumi Viljakainen","doi":"10.1093/molbev/msaf320","DOIUrl":"10.1093/molbev/msaf320","url":null,"abstract":"Some of the most striking examples of phenotypic variation within species are controlled by supergenes. However, most research on supergenes has focused on their emergence and long-term maintenance, leaving the later stages of their life cycle largely unexplored. Specifically, what happens to a derived supergene haplotype when the trait it controls reaches fixation? Here we answer this question using the ancient supergene system of Formica ants, where (monogynous) single-queen colonies typically carry only the ancestral haplotype M while the derived haplotype P is exclusive to (polygynous) colonies with multiple queens. Through comparative population genomics of 264 individuals from all seven European wood ant species, we found that the P haplotype was present in only 1/3 obligately polygynous species (Formica polyctena). In the two others (Formica aquilonia and Formica paralugubris), the P haplotype was completely missing except for duplicated P-specific paralogs of two genes, Zasp52 and TTLL2, with Zasp52 being directly involved in wing muscle development. We hypothesize that these genes play a direct role in polygyny and contribute to differences in body size and/or dispersal behavior between monogynous and polygynous queens. A complete lack of P/P genotypes among the 261 workers suggests strong selection against such genotypes. While our analyses did not reveal evidence of increased mutation load on the P, it is possible that this skew in genotype distributions is driven by a few loci with strong fitness effects. We propose that selection to escape P-associated fitness costs underlies the loss of this haplotype in obligately polygynous wood ants.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12728502/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145768533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evolution of the Tri-PDZ Domain in PSD95 (DLG-4 Gene). PSD95 （DLG-4基因）三pdz结构域的进化

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf309

Riya Nilkant, Lisa Y Mesrop, Samuel Lobo, Onur Sakarya, Joan E Shea, Scott Shell, Soojin V Yi, Kenneth S Kosik

Some genes encoding proteins within the co-evolved pre- and postsynaptic compartments are present in genomes long preceding the origination of the synapse within the animal kingdom. DLG4, gene encoding PSD-95, is one of the most abundant synaptic proteins. It is a MAGUK family member that shares a conserved domain structure comprised of one or multiple PDZ domains, a Src homology 3 (SH3), and a guanylate kinase (GK) domain. Here, we construct the phylogeny of the tri-PDZ domains in DLG4 to its deep ancestral origin in Filozoa, which includes animals and their nearest unicellular relatives. PDZ domain architecture appears to be a strong organizing feature of this gene lineage that originated with a single ancestral PDZ3-like domain in Capsaspora owczarzaki from which PDZ1 and PDZ2 were derived. The strong conservation of individual PDZ domain identities was captured by Evolutionary Scale Modeling (ESM2) across the boundary to the animal kingdom, corroborating distinct clades formed by the divergence of PDZ1, PDZ2, and PDZ3 in the phylogeny. CRIPT, PDZ3 ligand, is present in all Filozoa genomes studied here. AlphaFold2 Multimer demonstrates conserved binding function; however, conserved binding does not completely depend on either sequence motifs or hydrophobicity profiles. Rather, the most conserved feature is hydrogen bonds at the 0 and -2 positions of the ligand as an ancient foundational innovation for PDZ3 ligand interaction. Hydrogen bonds may loosen the sequence requirements for binding to allow a more extensive search space for protein-protein interactions that enhance fitness before the mutations that secure those interactions occur.

在共同进化的突触前区室和突触后区室中，一些编码蛋白质的基因早在动物体内突触形成之前就存在于基因组中。DLG4是编码PSD-95的基因，是最丰富的突触蛋白之一。它是一个MAGUK家族成员，共享由一个或多个PDZ结构域、Src同源3 （SH3）和鸟苷酸激酶（GK）结构域组成的保守结构域结构。在这里，我们构建了DLG4的三pdz结构域的系统发育，以追溯到它在丝状动物中的深层祖先起源，包括动物及其最近的单细胞亲戚。PDZ结构域结构似乎是该基因谱系的一个强大的组织特征，该基因谱系起源于Capsaspora owczarzaki的单一祖先pdz3样结构域，PDZ1和PDZ2就是从这个结构域衍生出来的。进化尺度模型（ESM2）跨越动物界边界，捕捉到PDZ域特征的强保守性，证实了PDZ1、PDZ2和PDZ3在系统发育中分化形成的不同分支。PDZ3配体存在于本研究的所有丝虫基因组中。AlphaFold2 multitimer展示了保守的绑定功能；然而，保守结合并不完全取决于序列基序或疏水性谱。相反，最保守的特征是配体0和-2位置的氢键，这是PDZ3配体相互作用的古老基础创新。氢键可能会放松结合的序列要求，从而在确保这些相互作用的突变发生之前，为蛋白质-蛋白质相互作用提供更广泛的搜索空间，从而增强适应性。

{"title":"Evolution of the Tri-PDZ Domain in PSD95 (DLG-4 Gene).","authors":"Riya Nilkant, Lisa Y Mesrop, Samuel Lobo, Onur Sakarya, Joan E Shea, Scott Shell, Soojin V Yi, Kenneth S Kosik","doi":"10.1093/molbev/msaf309","DOIUrl":"10.1093/molbev/msaf309","url":null,"abstract":"Some genes encoding proteins within the co-evolved pre- and postsynaptic compartments are present in genomes long preceding the origination of the synapse within the animal kingdom. DLG4, gene encoding PSD-95, is one of the most abundant synaptic proteins. It is a MAGUK family member that shares a conserved domain structure comprised of one or multiple PDZ domains, a Src homology 3 (SH3), and a guanylate kinase (GK) domain. Here, we construct the phylogeny of the tri-PDZ domains in DLG4 to its deep ancestral origin in Filozoa, which includes animals and their nearest unicellular relatives. PDZ domain architecture appears to be a strong organizing feature of this gene lineage that originated with a single ancestral PDZ3-like domain in Capsaspora owczarzaki from which PDZ1 and PDZ2 were derived. The strong conservation of individual PDZ domain identities was captured by Evolutionary Scale Modeling (ESM2) across the boundary to the animal kingdom, corroborating distinct clades formed by the divergence of PDZ1, PDZ2, and PDZ3 in the phylogeny. CRIPT, PDZ3 ligand, is present in all Filozoa genomes studied here. AlphaFold2 Multimer demonstrates conserved binding function; however, conserved binding does not completely depend on either sequence motifs or hydrophobicity profiles. Rather, the most conserved feature is hydrogen bonds at the 0 and -2 positions of the ligand as an ancient foundational innovation for PDZ3 ligand interaction. Hydrogen bonds may loosen the sequence requirements for binding to allow a more extensive search space for protein-protein interactions that enhance fitness before the mutations that secure those interactions occur.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":"42 12","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145768420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerated Mitochondrial Genome Evolution in Parasitic Barnacles Driven by Adaptive and Non-adaptive Responses. 适应性和非适应性反应驱动寄生藤壶线粒体基因组加速进化。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf303

Jibom Jung, Siliang Song, Myeong-Yeon Kim, Haena Kwak, Benny K K Chan, Sun-Shin Cha, Ui Wook Hwang, Joong-Ki Park

Parasitic lifestyles often impose profound evolutionary pressures, affecting molecular evolution through both adaptive and non-adaptive mechanisms. Among barnacles (subclass Cirripedia), the obligate parasitic Rhizocephala differ markedly from their filter-feeding thoracican relatives in morphology, ecology, and life history. However, how the shift to parasitism has shaped mitochondrial genome evolution within Cirripedia remains unclear. Here, we present the first comprehensive comparative analysis of mitochondrial genomes between parasitic and non-parasitic barnacles, including three newly sequenced and one unpublished species of parasitic Rhizocephala, a clade whose mitochondrial genomes had not been characterized until now. Phylogenomic and molecular evolutionary analyses reveal that Rhizocephala species exhibit extremely long branches likely attributed to the clade-specific tempo (high substitution rate) and mode (selection pressure) of mtDNA sequence evolution associated with their parasitic lifestyle. A two-cluster molecular clock test reveals significantly elevated substitution rates across rhizocephalans, consistent with reduced effective population sizes (Ne) linked to their opportunistic, host-dependent life cycles. We also detect signatures of positive selection in protein-coding genes encoding key components of the electron transport chain complexes III and IV. Structural modeling highlights amino acid substitutions at functionally critical sites for electron transfer and proton pumping, suggesting adaptive modifications to mitochondrial bioenergetics under hypoxic conditions within host tissues. Together, our findings underscore that both non-adaptive (genetic drift, relaxed selection) and adaptive (positive selection) processes have driven the rapid sequence divergence of mitochondrial genomes in parasitic Rhizocephala. Further experimental study is needed to elucidate how mitochondrial and nuclear-encoded subunits of oxidative phosphorylation coevolve in this specialized parasitic group.

寄生的生活方式常常施加深刻的进化压力，通过适应性和非适应性机制影响分子进化。在藤壶亚纲中，专性寄生根头类与滤食性胸类在形态、生态和生活史上存在显著差异。然而，向寄生的转变如何影响了Cirripedia内线粒体基因组的进化尚不清楚。在此，我们首次对寄生和非寄生藤壶的线粒体基因组进行了全面的比较分析，包括三个新测序的物种和一个未发表的寄生根头类物种，这是一个迄今为止尚未表征其线粒体基因组的分支。系统基因组学和分子进化分析表明，根头类动物具有极长的分支，这可能与它们寄生生活方式相关的支系特异性速度（高替代率）和mtDNA序列进化模式（选择压力）有关。双簇分子钟测试显示，根头虫的替代率显著升高，这与它们的机会性、依赖宿主的生命周期相关的有效种群大小（Ne）降低一致。我们还在编码电子传递链复合物III和IV关键组分的蛋白质编码基因中检测到正选择的特征。结构建模突出了电子转移和质子泵送功能关键位点的氨基酸取代，表明宿主组织缺氧条件下线粒体生物能量学的适应性修饰。总之，我们的研究结果强调了非适应性（遗传漂变、宽松选择）和适应性（积极选择）过程都驱动了寄生根头藻线粒体基因组的快速序列分化。需要进一步的实验研究来阐明线粒体和核编码的氧化磷酸化亚基是如何在这种特殊的寄生群体中共同进化的。

{"title":"Accelerated Mitochondrial Genome Evolution in Parasitic Barnacles Driven by Adaptive and Non-adaptive Responses.","authors":"Jibom Jung, Siliang Song, Myeong-Yeon Kim, Haena Kwak, Benny K K Chan, Sun-Shin Cha, Ui Wook Hwang, Joong-Ki Park","doi":"10.1093/molbev/msaf303","DOIUrl":"10.1093/molbev/msaf303","url":null,"abstract":"Parasitic lifestyles often impose profound evolutionary pressures, affecting molecular evolution through both adaptive and non-adaptive mechanisms. Among barnacles (subclass Cirripedia), the obligate parasitic Rhizocephala differ markedly from their filter-feeding thoracican relatives in morphology, ecology, and life history. However, how the shift to parasitism has shaped mitochondrial genome evolution within Cirripedia remains unclear. Here, we present the first comprehensive comparative analysis of mitochondrial genomes between parasitic and non-parasitic barnacles, including three newly sequenced and one unpublished species of parasitic Rhizocephala, a clade whose mitochondrial genomes had not been characterized until now. Phylogenomic and molecular evolutionary analyses reveal that Rhizocephala species exhibit extremely long branches likely attributed to the clade-specific tempo (high substitution rate) and mode (selection pressure) of mtDNA sequence evolution associated with their parasitic lifestyle. A two-cluster molecular clock test reveals significantly elevated substitution rates across rhizocephalans, consistent with reduced effective population sizes (Ne) linked to their opportunistic, host-dependent life cycles. We also detect signatures of positive selection in protein-coding genes encoding key components of the electron transport chain complexes III and IV. Structural modeling highlights amino acid substitutions at functionally critical sites for electron transfer and proton pumping, suggesting adaptive modifications to mitochondrial bioenergetics under hypoxic conditions within host tissues. Together, our findings underscore that both non-adaptive (genetic drift, relaxed selection) and adaptive (positive selection) processes have driven the rapid sequence divergence of mitochondrial genomes in parasitic Rhizocephala. Further experimental study is needed to elucidate how mitochondrial and nuclear-encoded subunits of oxidative phosphorylation coevolve in this specialized parasitic group.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696376/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145588021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction to: UPrimer: A Clade-Specific Primer Design Program Based on Nested-PCR Strategy and Its Applications in Amplicon Capture Phylogenomics. UPrimer：基于巢式pcr策略的枝特异性引物设计程序及其在扩增子捕获系统基因组学中的应用。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf317

引用次数: 0

Modeling the Evolution of Ultraconserved Elements by Indels. 用模型模拟超保守元素的演化。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf299

Priscila Biller

Ultraconserved elements are segments of DNA that are identical or nearly identical in distantly related species. Finding 100% identity over long evolutionary times is unexpected, but pioneering research in human-mouse pairwise alignment uncovered something even more puzzling: these elements are not as rare as previously suspected. Furthermore, their sizes are distributed as a power-law, a feature that cannot be explained by standard models of genome evolution where conservation is expected to decay exponentially. Despite the power-law behavior having been reported and investigated in a wide variety of biological and physical contexts, from cell-division to protein family evolution, why it appears in the size distribution of ultraconserved elements remains elusive. To address this question, I propose a model of DNA sequence evolution by mutations of arbitrary length based on a classical integro-differential equation that arises in various applications in biology. The model captures the ultraconserved size distribution observed in pairwise alignments between human and 40 other vertebrates, encompassing more than 400 million years of evolution, from chimpanzee to zebrafish. I also show that the model can be used to predict other important aspects of genome evolution, such as indel rates and conservation in functional classes.

超保守元件是在远亲物种中相同或几乎相同的DNA片段。在漫长的进化过程中发现100%的同一性是出乎意料的，但对人鼠配对的开创性研究发现了一些更令人困惑的事情：这些元素并不像之前怀疑的那样罕见。此外，它们的大小呈幂律分布，这一特征无法用基因组进化的标准模型来解释，因为保守性预计会呈指数衰减。尽管幂律行为已经在各种生物和物理环境中被报道和研究，从细胞分裂到蛋白质家族进化，为什么它出现在超保守元件的大小分布中仍然是难以捉摸的。为了解决这个问题，我提出了一个基于经典积分-微分方程的任意长度突变的DNA序列进化模型，该模型在生物学的各种应用中出现。该模型捕获了在人类和其他40种脊椎动物的成对排列中观察到的超保守尺寸分布，包括从黑猩猩到斑马鱼的4亿年以上的进化。我还表明，该模型可以用来预测基因组进化的其他重要方面，如indel率和功能类的保护。

{"title":"Modeling the Evolution of Ultraconserved Elements by Indels.","authors":"Priscila Biller","doi":"10.1093/molbev/msaf299","DOIUrl":"10.1093/molbev/msaf299","url":null,"abstract":"Ultraconserved elements are segments of DNA that are identical or nearly identical in distantly related species. Finding 100% identity over long evolutionary times is unexpected, but pioneering research in human-mouse pairwise alignment uncovered something even more puzzling: these elements are not as rare as previously suspected. Furthermore, their sizes are distributed as a power-law, a feature that cannot be explained by standard models of genome evolution where conservation is expected to decay exponentially. Despite the power-law behavior having been reported and investigated in a wide variety of biological and physical contexts, from cell-division to protein family evolution, why it appears in the size distribution of ultraconserved elements remains elusive. To address this question, I propose a model of DNA sequence evolution by mutations of arbitrary length based on a classical integro-differential equation that arises in various applications in biology. The model captures the ultraconserved size distribution observed in pairwise alignments between human and 40 other vertebrates, encompassing more than 400 million years of evolution, from chimpanzee to zebrafish. I also show that the model can be used to predict other important aspects of genome evolution, such as indel rates and conservation in functional classes.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12673672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145573867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quick Analysis of Sedimentary Ancient DNA Using quicksand. 用流沙快速分析沉积的古代DNA。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf305

Merlin Szymanski, Johann Visagie, Frédéric Romagné, Matthias Meyer, Janet Kelso

Ancient DNA extracted from the sediments of archaeological sites (sedaDNA) can provide fine-grained information about the composition of past ecosystems and human site use, even in the absence of visible remains. However, the growing amount of available sequencing data and the nature of the data obtained from archaeological sediments pose several computational challenges; among these, the rapid and accurate taxonomic classification of sequences. While alignment-based taxonomic classifiers remain the standard in sedaDNA analysis pipelines, they are too computationally expensive for the processing of large numbers of sedaDNA sequences. In contrast, alignment-free methods offer fast classification but suffer from higher false-positive rates. To address these limits, we developed quicksand, an open-source Nextflow pipeline designed for rapid and accurate taxonomic classification of mammalian mitochondrial DNA in sedaDNA samples. quicksand combines fast alignment-free classification using KrakenUniq with post-classification mapping, filtering, and ancient DNA authentication. Based on simulations and reanalyses of published datasets, we demonstrate that quicksand achieves accuracy and sensitivity comparable to or better than existing methods, while significantly reducing runtime. quicksand offers an easy workflow for large-scale screening of sedaDNA samples for archaeological research and is freely available at https://github.com/mpieva/quicksand.

从考古遗址沉积物中提取的古代DNA （sedaDNA）可以提供有关过去生态系统组成和人类遗址使用的细粒度信息，即使没有可见的遗迹。然而，越来越多的可用测序数据和从考古沉积物中获得的数据的性质构成了几个计算挑战；其中，序列的快速准确分类。虽然基于比对的分类分类器仍然是sedaDNA分析管道中的标准，但对于处理大量sedaDNA序列来说，它们的计算成本太高。相比之下，无对齐方法提供快速分类，但有较高的假阳性率。为了解决这些限制，我们开发了quicksand，这是一种开源的Nextflow管道，旨在快速准确地对sedaDNA样本中的哺乳动物线粒体DNA （mtDNA）进行分类。quicksand结合使用KrakenUniq与分类后的映射，过滤和古代DNA认证快速对齐免费分类。基于对已发布数据集的模拟和再分析，我们证明了流沙方法的准确性和灵敏度与现有方法相当或更好，同时显著缩短了运行时间。quicksand为考古研究提供了一个简单的大规模筛选sedaDNA样本的工作流程，并可在https://github.com/mpieva/quicksand免费获得。

{"title":"Quick Analysis of Sedimentary Ancient DNA Using quicksand.","authors":"Merlin Szymanski, Johann Visagie, Frédéric Romagné, Matthias Meyer, Janet Kelso","doi":"10.1093/molbev/msaf305","DOIUrl":"10.1093/molbev/msaf305","url":null,"abstract":"Ancient DNA extracted from the sediments of archaeological sites (sedaDNA) can provide fine-grained information about the composition of past ecosystems and human site use, even in the absence of visible remains. However, the growing amount of available sequencing data and the nature of the data obtained from archaeological sediments pose several computational challenges; among these, the rapid and accurate taxonomic classification of sequences. While alignment-based taxonomic classifiers remain the standard in sedaDNA analysis pipelines, they are too computationally expensive for the processing of large numbers of sedaDNA sequences. In contrast, alignment-free methods offer fast classification but suffer from higher false-positive rates. To address these limits, we developed quicksand, an open-source Nextflow pipeline designed for rapid and accurate taxonomic classification of mammalian mitochondrial DNA in sedaDNA samples. quicksand combines fast alignment-free classification using KrakenUniq with post-classification mapping, filtering, and ancient DNA authentication. Based on simulations and reanalyses of published datasets, we demonstrate that quicksand achieves accuracy and sensitivity comparable to or better than existing methods, while significantly reducing runtime. quicksand offers an easy workflow for large-scale screening of sedaDNA samples for archaeological research and is freely available at https://github.com/mpieva/quicksand.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12684969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145596772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Estimation of Nucleotide Diversity and Divergence Using Callable Loci (and More). 利用可调用位点（和更多）有效估计核苷酸多样性和发散。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf282

Cade Mirchandani, Erik Enbody, Timothy B Sackton, Russ Corbett-Detig

The increasing scale of population genomic datasets presents computational challenges in estimating summary statistics such as nucleotide diversity (π) and divergence (dxy). Accurate estimates of diversity require knowledge of missing data, and existing tools require all-site VCFs. However, generating these files is computationally expensive for large datasets. Here, we introduce Callable Loci And More (clam), a tool that leverages callable loci-determined from depth information-to estimate population genetic statistics using a variant-only VCF. This approach offers improvements in storage footprint and computational performance compared to contemporary methods. We validate clam's accuracy using simulated data, demonstrating that it produces estimates of π, dxy, and fixation index (FST) identical to those from all-site VCF approaches. We then benchmark clam using a large muskox dataset and demonstrate that it produces accurate estimates of π while substantially reducing runtime requirements compared to current best-practice methods. clam provides an efficient and scalable alternative for population genomic analyses, facilitating the study of increasingly large and diverse datasets. clam is available as a standalone program and integrated into snpArcher for efficient reproducible population genomic analysis.

人口基因组数据集的规模不断扩大，在估计核苷酸多样性（π）和差异（dxy）等汇总统计数据方面提出了计算挑战。对多样性的准确估计需要了解缺失的数据，而现有的工具需要所有地点的VCFs。然而，对于大型数据集来说，生成这些文件在计算上是非常昂贵的。在这里，我们介绍了Callable Loci And More (clam)，这是一个利用深度信息确定的Callable Loci的工具，使用仅变量的VCF来估计群体遗传统计。与当前方法相比，这种方法在存储空间占用和计算性能方面有所改进。我们使用模拟数据验证了clam的准确性，证明它产生的π， dxy和FST估计值与所有站点VCF方法相同。然后，我们使用大型麝鼠数据集对clam进行基准测试，并证明与当前最佳实践方法相比，它产生了准确的π估计，同时大大减少了运行时需求。Clam为群体基因组分析提供了一种高效和可扩展的替代方案，促进了对日益庞大和多样化的数据集的研究。clam可以作为一个独立的程序，并集成到snpArcher中，用于高效可重复的种群基因组分析。

{"title":"Efficient Estimation of Nucleotide Diversity and Divergence Using Callable Loci (and More).","authors":"Cade Mirchandani, Erik Enbody, Timothy B Sackton, Russ Corbett-Detig","doi":"10.1093/molbev/msaf282","DOIUrl":"10.1093/molbev/msaf282","url":null,"abstract":"The increasing scale of population genomic datasets presents computational challenges in estimating summary statistics such as nucleotide diversity (π) and divergence (dxy). Accurate estimates of diversity require knowledge of missing data, and existing tools require all-site VCFs. However, generating these files is computationally expensive for large datasets. Here, we introduce Callable Loci And More (clam), a tool that leverages callable loci-determined from depth information-to estimate population genetic statistics using a variant-only VCF. This approach offers improvements in storage footprint and computational performance compared to contemporary methods. We validate clam's accuracy using simulated data, demonstrating that it produces estimates of π, dxy, and fixation index (FST) identical to those from all-site VCF approaches. We then benchmark clam using a large muskox dataset and demonstrate that it produces accurate estimates of π while substantially reducing runtime requirements compared to current best-practice methods. clam provides an efficient and scalable alternative for population genomic analyses, facilitating the study of increasingly large and diverse datasets. clam is available as a standalone program and integrated into snpArcher for efficient reproducible population genomic analysis.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12697346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145588078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust and Efficient Confidence Limits for Phylogenomic Inference of Organismal Relationships. 生物关系系统基因组推断的稳健和有效的置信限。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf296

Sudip Sharma, Sudhir Kumar

Phylogenomic data are indispensable for establishing reliable relationships needed to build a robust Tree of Life. The superalignment approach concatenates hundreds or thousands of genomic segments, providing a straightforward, computationally efficient, and effective means of inferring phylogenies. However, the standard bootstrap method can produce overly confident support for incorrect inferences based on superalignments. It fails to account for the heterogeneity in phylogenetic signals across the data, which is caused by incomplete lineage sorting (ILS), data errors, and other biological processes. To detect such erroneous inferences, researchers need to produce and deliberate on the concordance of inferences derived from many complex and computationally demanding analyses that require knowledge of data partitions. This study demonstrates that analyzing phylogenomic subsamples with bootstrap upsampling overcomes the overconfidence drawback of the superalignment approach. We found that bootstrapping multiple small, randomly selected site subsets can detect the presence of phylogeny variation signals across the dataset, similar to that detected using data partitions. We present the Net Bootstrap Support (NBS) approach that accounts for this phylogenetic variation in the estimates of bootstrap confidence. NBS values showed comparable performance to multispecies coalescent analyses in the presence of ILS and surpassed it for datasets simulated with gene tree estimation errors. NBS analyses of phylogenomic data from rodents, fungi, and carnivorous plants corroborated the performance observed in simulated datasets and even mitigated overconfidence resulting from some data errors. NBS calculations are computationally efficient, with low memory consumption and high computational time savings, making the NBS approach well suited for big data molecular phylogenetics on both desktops and high-performance computing systems.

系统基因组学数据对于建立可靠的关系以构建强健的生命之树是不可或缺的。超比对方法连接了数百或数千个基因组片段，提供了一种简单、计算效率高、有效的推断系统发育的方法。然而，标准的自举方法可能会对基于超比对的错误推断产生过于自信的支持，因为它无法解释数据中系统发育信号的异质性，这是由不完整的谱系排序（ILS）、数据错误和其他生物过程引起的。为了检测这些错误的推断，研究人员需要产生并仔细考虑从许多需要数据分区知识的复杂和计算要求高的分析中得出的推断的一致性。本研究表明，用自举上采样分析系统基因组亚样本克服了超比对方法的过度置信度缺点。我们发现，启动多个随机选择的小位点子集揭示了整个数据集中存在的系统发育变异信号，类似于使用生物数据分区检测到的信号。我们提出了Net Bootstrap Support （NBS）来解释这种系统发育差异。NBS值显示出与存在ILS的多物种聚结分析相当的性能，并且在基因树估计误差模拟的数据集上超过了它。NBS对啮齿类动物、真菌和食肉植物的系统基因组数据进行了分析，证实了在模拟数据集中观察到的性能，甚至减轻了由于一些数据错误而导致的过度自信。NBS计算计算效率高，内存消耗低，节省计算时间，使NBS非常适合台式机和高性能计算系统上的大数据分子系统发育。

{"title":"Robust and Efficient Confidence Limits for Phylogenomic Inference of Organismal Relationships.","authors":"Sudip Sharma, Sudhir Kumar","doi":"10.1093/molbev/msaf296","DOIUrl":"10.1093/molbev/msaf296","url":null,"abstract":"Phylogenomic data are indispensable for establishing reliable relationships needed to build a robust Tree of Life. The superalignment approach concatenates hundreds or thousands of genomic segments, providing a straightforward, computationally efficient, and effective means of inferring phylogenies. However, the standard bootstrap method can produce overly confident support for incorrect inferences based on superalignments. It fails to account for the heterogeneity in phylogenetic signals across the data, which is caused by incomplete lineage sorting (ILS), data errors, and other biological processes. To detect such erroneous inferences, researchers need to produce and deliberate on the concordance of inferences derived from many complex and computationally demanding analyses that require knowledge of data partitions. This study demonstrates that analyzing phylogenomic subsamples with bootstrap upsampling overcomes the overconfidence drawback of the superalignment approach. We found that bootstrapping multiple small, randomly selected site subsets can detect the presence of phylogeny variation signals across the dataset, similar to that detected using data partitions. We present the Net Bootstrap Support (NBS) approach that accounts for this phylogenetic variation in the estimates of bootstrap confidence. NBS values showed comparable performance to multispecies coalescent analyses in the presence of ILS and surpassed it for datasets simulated with gene tree estimation errors. NBS analyses of phylogenomic data from rodents, fungi, and carnivorous plants corroborated the performance observed in simulated datasets and even mitigated overconfidence resulting from some data errors. NBS calculations are computationally efficient, with low memory consumption and high computational time savings, making the NBS approach well suited for big data molecular phylogenetics on both desktops and high-performance computing systems.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145541390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Selection Estimation from Genetic Time-Series Data: Effects of Limited Sampling and Genetic Drift. 遗传时间序列数据的选择估计：有限抽样和遗传漂变的影响。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution

Pub Date : 2025-11-28 DOI: 10.1093/molbev/msaf301

Qingbei Cheng, Muhammad Saqib Sohail, Matthew R McKay

Estimating selection from genetic time-series data is fundamental to understanding evolutionary dynamics. Accurate selection inference is confounded by multiple noise sources, including limited sampling of populations and genetic drift. To characterize how these uncertainties collectively affect estimator performance, we analyze a mathematically tractable selection coefficient estimator derived under the marginal path likelihood (MPL) framework. We identify a parameter, the integrated mutant allele variance, as a key quantity determining estimator precision. Our analysis reveals that variance integration mitigates sampling and genetic drift errors at different rates, with drift typically becoming the dominant source of error in longer trajectories. The increased robustness of MPL-based estimation to sampling is surprising, since MPL is derived from a model that neglects this effect. Our findings offer insights into how incorporating temporal information reduces multiple sources of noise when estimating selection coefficients.

从遗传时间序列数据估计选择是理解进化动力学的基础。准确的选择推理受到多种噪声源的干扰，包括有限的种群采样和遗传漂变。为了描述这些不确定性如何共同影响估计器的性能，我们分析了在边际路径似然（MPL）框架下推导的数学上易于处理的选择系数估计器。我们确定了一个参数，即综合突变等位基因方差，作为决定估计精度的关键数量。我们的分析表明，方差积分以不同的速率减轻了抽样和遗传漂变误差，漂变通常成为较长轨迹的主要误差来源。基于MPL的估计对采样的鲁棒性增加是令人惊讶的，因为MPL是从忽略这种影响的模型中派生出来的。我们的研究结果为在估计选择系数时如何结合时间信息减少多个噪声源提供了见解。

引用次数: 0