Laura Villegas, Lucy Jimenez, Joëlle van der Sprong, Oleksandr Holovachov, Ann-Marie Waldvogel, Philipp H. Schiffer
Nematodes are among the most diverse animals, yet only around 28,000 of an estimated one million species have been morphologically described. Their small size, morphological simplicity, and cryptic diversity complicate phylogenetic analyses. Traditional morphological and single-locus molecular approaches often lack resolution for both recent and ancient divergences. To address these limitations, we developed the first ultraconserved elements (UCEs) probe sets for two nematode families: Panagrolaimidae, a group of non-model organisms with limited genomic resources when compared to model taxa, and Rhabditidae, which includes the model species Caenorhabditis elegans. Our probe sets targeted 1612 loci for Panagrolaimidae and 100,397 for Rhabditidae. In vitro testing recovered up to 1457 loci in Panagrolaimidae, supporting robust phylogenetic reconstruction. Results were largely consistent with previous analyses, except for one strain reclassified as Neocephalobus halophilus BSS8. Using machine learning, we determined the minimum number of loci needed for accurate genus-level classification. For Rhabditidae, XGBoost achieved high accuracy with just 46 loci. For Panagrolaimidae, 39 loci were most informative. Our UCE-based approach offers a scalable and cost-effective framework for phylogenomics, enhancing taxonomic resolution and evolutionary inference in nematodes. It is well suited for biodiversity assessments and shallow, field-based sequencing, expanding research possibilities across this ecologically important phylum.
{"title":"Ultraconserved Elements and Machine Learning Classifiers Enable Robust Phylogenetics and Taxonomy in Model and Non-Model Nematodes","authors":"Laura Villegas, Lucy Jimenez, Joëlle van der Sprong, Oleksandr Holovachov, Ann-Marie Waldvogel, Philipp H. Schiffer","doi":"10.1111/1755-0998.70046","DOIUrl":"10.1111/1755-0998.70046","url":null,"abstract":"<p>Nematodes are among the most diverse animals, yet only around 28,000 of an estimated one million species have been morphologically described. Their small size, morphological simplicity, and cryptic diversity complicate phylogenetic analyses. Traditional morphological and single-locus molecular approaches often lack resolution for both recent and ancient divergences. To address these limitations, we developed the first ultraconserved elements (UCEs) probe sets for two nematode families: Panagrolaimidae, a group of non-model organisms with limited genomic resources when compared to model taxa, and Rhabditidae, which includes the model species <i>Caenorhabditis elegans</i>. Our probe sets targeted 1612 loci for Panagrolaimidae and 100,397 for Rhabditidae. In vitro testing recovered up to 1457 loci in Panagrolaimidae, supporting robust phylogenetic reconstruction. Results were largely consistent with previous analyses, except for one strain reclassified as <i>Neocephalobus halophilus</i> BSS8. Using machine learning, we determined the minimum number of loci needed for accurate genus-level classification. For Rhabditidae, XGBoost achieved high accuracy with just 46 loci. For Panagrolaimidae, 39 loci were most informative. Our UCE-based approach offers a scalable and cost-effective framework for phylogenomics, enhancing taxonomic resolution and evolutionary inference in nematodes. It is well suited for biodiversity assessments and shallow, field-based sequencing, expanding research possibilities across this ecologically important phylum.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145249163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sterling L. Wright, Muslih Abdul-Aziz, Grace N. Blaha, Christine K. Ta, Abigail Gancz, Iyunoluwa J. Ademola-Popoola, Anna Szécsényi-Nagy, Paul C. Sereno, Laura S. Weyrich
Ancient DNA (aDNA) analysis of archaeological dental calculus has provided a wealth of insights into ancient health, demography and lifestyles. However, the workflow for ancient metagenomics is still evolving, raising concerns about reproducibility. Few systematic investigations have examined how DNA extraction methods and library preparation protocols influence ancient oral microbiome recovery, despite evidence from modern populations suggesting that they do. This leaves a gap in our understanding of how wet-lab protocols impact aDNA recovery from dental calculus. In this study, we apply two DNA extraction and two library preparation methods in the aDNA field on dental calculus samples from Hungary and Niger. Samples from each context have similar chronological ages, but differences in their levels of aDNA preservation are notable, providing additional insights into how the efficacy of wet-lab protocols is impacted by sample preservation. Several metrics were employed to assess intra- and inter-sample variability, such as DNA fragment length recovery, GC content, clonality, endogenous content, DNA deamination and microbial composition. Our findings indicate that both DNA extraction and library preparation protocols can considerably impact ancient DNA recovery from archaeological dental calculus. Furthermore, no single protocol consistently outperformed the others across all assessments, and the effectiveness of specific protocol combinations depended on the preservation of the sample. These findings highlight the challenges of meta-analyses and underscore the need to account for technical variability. Lastly, our study raises the question of whether the field should strive to standardise methods for comparability or optimise protocols based on sample preservation and specific research objectives.
{"title":"Wet Lab Protocols Matter: Choice of DNA Extraction and Library Preparation Protocols Bias Ancient Oral Microbiome Recovery","authors":"Sterling L. Wright, Muslih Abdul-Aziz, Grace N. Blaha, Christine K. Ta, Abigail Gancz, Iyunoluwa J. Ademola-Popoola, Anna Szécsényi-Nagy, Paul C. Sereno, Laura S. Weyrich","doi":"10.1111/1755-0998.70054","DOIUrl":"10.1111/1755-0998.70054","url":null,"abstract":"<p>Ancient DNA (aDNA) analysis of archaeological dental calculus has provided a wealth of insights into ancient health, demography and lifestyles. However, the workflow for ancient metagenomics is still evolving, raising concerns about reproducibility. Few systematic investigations have examined how DNA extraction methods and library preparation protocols influence ancient oral microbiome recovery, despite evidence from modern populations suggesting that they do. This leaves a gap in our understanding of how wet-lab protocols impact aDNA recovery from dental calculus. In this study, we apply two DNA extraction and two library preparation methods in the aDNA field on dental calculus samples from Hungary and Niger. Samples from each context have similar chronological ages, but differences in their levels of aDNA preservation are notable, providing additional insights into how the efficacy of wet-lab protocols is impacted by sample preservation. Several metrics were employed to assess intra- and inter-sample variability, such as DNA fragment length recovery, GC content, clonality, endogenous content, DNA deamination and microbial composition. Our findings indicate that both DNA extraction and library preparation protocols can considerably impact ancient DNA recovery from archaeological dental calculus. Furthermore, no single protocol consistently outperformed the others across all assessments, and the effectiveness of specific protocol combinations depended on the preservation of the sample. These findings highlight the challenges of meta-analyses and underscore the need to account for technical variability. Lastly, our study raises the question of whether the field should strive to standardise methods for comparability or optimise protocols based on sample preservation and specific research objectives.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145231309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anne Loos, Ella Doykova, Jiangzhao Qian, Florian Kümmel, Heba Ibrahim, Levente Kiss, Ralph Panstruga, Stefan Kusch
Obligate biotrophic plant pathogens like the powdery mildew fungi commit to a closely dependent relationship with their plant hosts and have lost the ability to grow and reproduce independently. Thus, at present, these organisms are not amenable to in vitro cultivation, which is a prerequisite for effective genetic modification and functional molecular studies. Saprotrophic fungi of the family Arachnopezizaceae are the closest known extant relatives of the powdery mildew fungi and may hold great potential for studying genetic components of their obligate biotrophic lifestyle. Here, we established telomere-to-telomere genome assemblies for two representatives of this family, Arachnopeziza aurata and A. aurelia. Both species harbour haploid genomes that are composed of 16 chromosomes at a genome size of 43.1 and 46.3 million base pairs, respectively, which, in contrast to most powdery mildew genomes that are transposon-enriched, show a repeat content below 5% and signs of repeat-induced point mutation. Both species could be grown in liquid culture and on standard solid media and were sensitive to common fungicides such as hygromycin and fenhexamid. We successfully expressed a red fluorescent protein and hygromycin resistance in A. aurata following polyethylene glycol-mediated protoplast transformation, demonstrating that Arachnopeziza species are amenable to genetic alterations, which may be expanded to include gene replacement, gene modification, and gene complementation in the future. With this work, we established a potential model system that promises to sidestep the need for genetic modification of powdery mildew fungi by using Arachnopeziza species as a proxy to uncover the molecular functions of powdery mildew proteins.
{"title":"Saprotrophic Arachnopeziza Species as New Resources to Study the Obligate Biotrophic Lifestyle of Powdery Mildew Fungi","authors":"Anne Loos, Ella Doykova, Jiangzhao Qian, Florian Kümmel, Heba Ibrahim, Levente Kiss, Ralph Panstruga, Stefan Kusch","doi":"10.1111/1755-0998.70045","DOIUrl":"10.1111/1755-0998.70045","url":null,"abstract":"<p>Obligate biotrophic plant pathogens like the powdery mildew fungi commit to a closely dependent relationship with their plant hosts and have lost the ability to grow and reproduce independently. Thus, at present, these organisms are not amenable to in vitro cultivation, which is a prerequisite for effective genetic modification and functional molecular studies. Saprotrophic fungi of the family <i>Arachnopezizaceae</i> are the closest known extant relatives of the powdery mildew fungi and may hold great potential for studying genetic components of their obligate biotrophic lifestyle. Here, we established telomere-to-telomere genome assemblies for two representatives of this family, <i>Arachnopeziza aurata</i> and <i>A. aurelia</i>. Both species harbour haploid genomes that are composed of 16 chromosomes at a genome size of 43.1 and 46.3 million base pairs, respectively, which, in contrast to most powdery mildew genomes that are transposon-enriched, show a repeat content below 5% and signs of repeat-induced point mutation. Both species could be grown in liquid culture and on standard solid media and were sensitive to common fungicides such as hygromycin and fenhexamid. We successfully expressed a red fluorescent protein and hygromycin resistance in <i>A. aurata</i> following polyethylene glycol-mediated protoplast transformation, demonstrating that <i>Arachnopeziza</i> species are amenable to genetic alterations, which may be expanded to include gene replacement, gene modification, and gene complementation in the future. With this work, we established a potential model system that promises to sidestep the need for genetic modification of powdery mildew fungi by using <i>Arachnopeziza</i> species as a proxy to uncover the molecular functions of powdery mildew proteins.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145224854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tram Vi, Katarina C. Stuart, Hui Zhen Tan, Audald Lloret-Villas, Anna W. Santure
Low-coverage sequencing (LCS) followed by genotype imputation has become a cost-efficient approach for obtaining whole-genome SNPs. Several imputation methods for LCS data have been developed over the last decade. However, comparisons of their accuracy in inferring missing genotypes and their effectiveness for downstream analysis such as population genetics have not been comprehensively studied. In the present study, we assessed the imputation performance of five different tools: GLIMPSE2, GeneImp, QUILT2, STITCH and Beagle5.4, using populations simulated by SLiM4 that represent different levels of genetic relatedness and inbreeding. Imputation accuracy was calculated at the level of variant, haplotype and sample. The effectiveness of using imputed genotypes in recovering genetic structure, relatedness, inbreeding coefficients and demographic history was subsequently evaluated. The imputation accuracy of different methods was further tested in a real population of 283 hihi (stitchbird) samples. Our results suggest a high accuracy of all the tested methods on populations with high levels of genetic relatedness. However, in populations with low relatedness, the imputation accuracy differed across different tools and impacted the results of some downstream analyses. The simulation and imputation pipeline presented here can help determine the most suitable imputation method for different population scenarios.
{"title":"Assessing Genotype Imputation Methods for Low-Coverage Sequencing Data in Populations With Differing Relatedness and Inbreeding Levels","authors":"Tram Vi, Katarina C. Stuart, Hui Zhen Tan, Audald Lloret-Villas, Anna W. Santure","doi":"10.1111/1755-0998.70049","DOIUrl":"10.1111/1755-0998.70049","url":null,"abstract":"<p>Low-coverage sequencing (LCS) followed by genotype imputation has become a cost-efficient approach for obtaining whole-genome SNPs. Several imputation methods for LCS data have been developed over the last decade. However, comparisons of their accuracy in inferring missing genotypes and their effectiveness for downstream analysis such as population genetics have not been comprehensively studied. In the present study, we assessed the imputation performance of five different tools: GLIMPSE2, GeneImp, QUILT2, STITCH and Beagle5.4, using populations simulated by SLiM4 that represent different levels of genetic relatedness and inbreeding. Imputation accuracy was calculated at the level of variant, haplotype and sample. The effectiveness of using imputed genotypes in recovering genetic structure, relatedness, inbreeding coefficients and demographic history was subsequently evaluated. The imputation accuracy of different methods was further tested in a real population of 283 hihi (stitchbird) samples. Our results suggest a high accuracy of all the tested methods on populations with high levels of genetic relatedness. However, in populations with low relatedness, the imputation accuracy differed across different tools and impacted the results of some downstream analyses. The simulation and imputation pipeline presented here can help determine the most suitable imputation method for different population scenarios.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70049","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145184420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p>PopCluster (Wang <span>2024</span>) represents a significant advancement in population structure analysis software, addressing key computational and methodological challenges that have limited the application of clustering methods to modern genomic datasets. The software, developed by Wang (<span>2024</span>), implements novel likelihood-based algorithms that substantially improve both speed and accuracy compared to existing methods like STRUCTURE and ADMIXTURE. Its most notable features include memory-efficient handling of millions of markers and individuals through 2-bit encoding and distributed computing via MPI, sophisticated treatment of unbalanced sampling through a scaling scheme, and the ability to handle both biallelic and multiallelic markers within a unified framework. PopCluster demonstrates particular strengths when analysing datasets with many assumed populations, weak differentiation between clusters, or highly unbalanced sample sizes—situations where current methods often fail. The software's multi-platform availability, integrated GUI for Windows users, and built-in simulation module further enhance its utility for researchers. As genomic datasets continue to grow in size and complexity, PopCluster provides essential capabilities for revealing fine-scale population structure that would otherwise remain hidden. I discuss the software's innovations in the context of current challenges in molecular ecology and highlight its potential applications in conservation genetics, domestication studies, and understanding complex admixture patterns.</p><p>Since its inception, a major focus of population genetics has been on identifying and explaining population structure—the non-random distribution of genetic variation among individuals and populations. A variety of mechanisms can lead sexually reproducing populations to form two or more distinct multi-locus genotypic clusters, which may then evolve and adapt independently, leading to further divergence and even speciation, but may also admix and exchange genetic material. Indeed, Mallet (<span>1995</span>) suggested that the maintenance of distinct genotypic clusters in sympatry should be used as a formal definition of species delimitation.</p><p>Since the seminal methodological developments of Pritchard et al. (<span>2000</span>) in creating the software Structure, the identification of genotypic clusters and admixture among them from multi-locus sequence data has become central to a variety of disciplines within the broad framework of molecular ecology. Examples include domestication studies (Matsuoka et al. <span>2002</span>), human population genetics (1000 Genomes Project Consortium <span>2015</span>; Allentoft et al. <span>2024</span>), conservation genetics (Miller et al. <span>2012</span>), and speciation research (Friedrich et al. <span>2023</span>). The importance of clustering methods continues to grow as large whole genome datasets become available, allowing highly detailed re
PopCluster (Wang 2024)代表了人口结构分析软件的重大进步,解决了限制聚类方法应用于现代基因组数据集的关键计算和方法挑战。该软件由Wang(2024)开发,实现了新颖的基于似然的算法,与现有的方法(如STRUCTURE和ADMIXTURE)相比,大大提高了速度和准确性。其最显著的特点包括通过2位编码和MPI分布式计算来高效地处理数百万个标记和个体,通过缩放方案对不平衡采样进行复杂处理,以及在统一框架内处理双等位基因和多等位基因标记的能力。PopCluster在分析具有许多假定人口的数据集、集群之间的弱差异或高度不平衡的样本量(当前方法经常失败的情况)时显示出特别的优势。该软件的多平台可用性、面向Windows用户的集成GUI和内置仿真模块进一步增强了其对研究人员的实用性。随着基因组数据集的规模和复杂性不断增长,PopCluster提供了揭示精细人口结构的基本能力,否则这些结构将被隐藏。我在分子生态学当前挑战的背景下讨论了软件的创新,并强调了其在保护遗传学、驯化研究和理解复杂混合模式方面的潜在应用。从一开始,群体遗传学的一个主要焦点就是识别和解释群体结构——个体和群体之间遗传变异的非随机分布。多种机制可导致有性繁殖的种群形成两个或多个不同的多位点基因型集群,这些集群随后可能独立进化和适应,导致进一步的分化甚至物种形成,但也可能混合和交换遗传物质。事实上,Mallet(1995)提出,维持同属植物中独特的基因型集群应该作为物种划分的正式定义。自从Pritchard等人(2000)在创建软件结构方面的开创性方法论发展以来,从多位点序列数据中识别基因型集群及其混合物已成为分子生态学广泛框架内各种学科的核心。例子包括驯化研究(Matsuoka et al. 2002)、人类种群遗传学(1000 Genomes Project Consortium 2015; Allentoft et al. 2024)、保护遗传学(Miller et al. 2012)和物种形成研究(Friedrich et al. 2023)。随着大型全基因组数据集的可用,聚类方法的重要性继续增长,允许非常详细的聚类和混合恢复。重建分化和随后混合的时间模式的历史系统基因组学方法(例如TreeMix; Pickrell和Pritchard 2012)正变得越来越普遍,但识别当代集群的原始概念仍然非常重要,尤其是由于易于使用和解释。随着大型基因组数据集的不断增加,在不牺牲准确性的情况下提高速度和计算效率已成为开发新的基因型聚类软件的首要任务。在这个方向上已经取得了重大进展,包括admix (Alexander et al. 2009)和sNMF (Frichot et al. 2014),最近增加的由Wang(2024)开发的软件PopCluster进一步提高了基因型聚类和外合分析的速度、准确性和可及性。PopCluster的一个主要焦点是有效地利用本地计算机和分布式集群上的可用内存,允许在笔记本电脑上分析整个基因组数据集,并在高性能集群上分析来自数百万个人的多达数百万个基因座。Wang(2024)表明,PopCluster可以处理比当前最流行的替代方案之一admix更大的数据集,并且在大多数情况下更快。另一个重点是多平台使用,该软件可以在Windows、Mac和Linux上运行。Windows上可用的GUI为构建编码管道经验较少的用户增加了用户友好性。我个人认为有用的另一个特性是文件转换工具,例如,它可以将VCF转换为结构风格的文件格式。当每个集群的样本量(通常事先不知道)很小或不平衡,假设k(集群数量)很大,或者集群之间的差异程度很低时,聚类软件就会出现主要问题。PopCluster特别关注这些情况下的改进估计。如图1 Wang(2024)所示,在极端恶劣的环境下,PopCluster的性能显著优于其他所有软件。 然而,并不是所有的事情都可以完全自动化,用户在选择最合适的模型设置时仍然要承担一些责任。Wang介绍了一种“缩放”方案,允许用户预先确定他们的样本在每个集群的个体数量方面的不平衡程度。然而,虽然选择正确的缩放可以提高估计,但这通常是事先不知道的。因此,用户必须使用常识性方法来决定他们选择的缩放值是否产生合理的结果。仍然需要提前选择k(聚类数量),每个k多次运行以处理随机模型拟合和多模态似然面,并运行多个k以统计比较模型拟合并确定适当的聚类数量。这个过程可以自动化,并且使用PopCluster,每次运行都很快,但是对于大型数据集来说,这仍然会导致大量的运行时间。我想添加一个不限于PopCluster的技术点。最初的Structure软件通过搜索Hardy-Weinberg和连锁平衡来识别集群,而最近的软件不包括这种明确的群体遗传要求。正如Wang强调的那样,这意味着不需要将位点分离,因此可以使用全基因组数据。事实上,在许多情况下,增加更多的基因座增加了识别真正精细人口结构的可能性。ld修剪仍然是许多分析管道中的一个常见步骤,但从统计角度来看是不必要的,并且考虑到计算效率的提高,从将数据减少到可管理的大小的角度来看,通常也是不必要的。PopCluster和另一个最近的快速聚类软件Neural admix (Dominguez Mantes et al. 2023)之间还没有直接的比较。然而,两者都提供了明显的ADMIXTURE计算改进。PopCluster快速,内存高效,多平台,高度准确,用户友好,使其成为分子生态学软件库的一个受欢迎的补充。作者声明无利益冲突。
{"title":"PopCluster Improves Accessibility, Speed and Accuracy of Available Genotypic Clustering Software","authors":"Richard Ian Bailey","doi":"10.1111/1755-0998.70050","DOIUrl":"10.1111/1755-0998.70050","url":null,"abstract":"<p>PopCluster (Wang <span>2024</span>) represents a significant advancement in population structure analysis software, addressing key computational and methodological challenges that have limited the application of clustering methods to modern genomic datasets. The software, developed by Wang (<span>2024</span>), implements novel likelihood-based algorithms that substantially improve both speed and accuracy compared to existing methods like STRUCTURE and ADMIXTURE. Its most notable features include memory-efficient handling of millions of markers and individuals through 2-bit encoding and distributed computing via MPI, sophisticated treatment of unbalanced sampling through a scaling scheme, and the ability to handle both biallelic and multiallelic markers within a unified framework. PopCluster demonstrates particular strengths when analysing datasets with many assumed populations, weak differentiation between clusters, or highly unbalanced sample sizes—situations where current methods often fail. The software's multi-platform availability, integrated GUI for Windows users, and built-in simulation module further enhance its utility for researchers. As genomic datasets continue to grow in size and complexity, PopCluster provides essential capabilities for revealing fine-scale population structure that would otherwise remain hidden. I discuss the software's innovations in the context of current challenges in molecular ecology and highlight its potential applications in conservation genetics, domestication studies, and understanding complex admixture patterns.</p><p>Since its inception, a major focus of population genetics has been on identifying and explaining population structure—the non-random distribution of genetic variation among individuals and populations. A variety of mechanisms can lead sexually reproducing populations to form two or more distinct multi-locus genotypic clusters, which may then evolve and adapt independently, leading to further divergence and even speciation, but may also admix and exchange genetic material. Indeed, Mallet (<span>1995</span>) suggested that the maintenance of distinct genotypic clusters in sympatry should be used as a formal definition of species delimitation.</p><p>Since the seminal methodological developments of Pritchard et al. (<span>2000</span>) in creating the software Structure, the identification of genotypic clusters and admixture among them from multi-locus sequence data has become central to a variety of disciplines within the broad framework of molecular ecology. Examples include domestication studies (Matsuoka et al. <span>2002</span>), human population genetics (1000 Genomes Project Consortium <span>2015</span>; Allentoft et al. <span>2024</span>), conservation genetics (Miller et al. <span>2012</span>), and speciation research (Friedrich et al. <span>2023</span>). The importance of clustering methods continues to grow as large whole genome datasets become available, allowing highly detailed re","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p>The study of intraspecific genetic variation in environmental DNA samples has recently gained momentum following its demonstration as an effective method to study population-level processes (Andres et al. <span>2023</span>). Although allele frequencies can be inferred from the distribution of allele sequence coverage within a sample, the number of detected alleles can be used to estimate the number of contributors (NOC), a long-standing issue in forensic science. This development enables the estimation of the absolute abundance of a species, opening up new possibilities for population monitoring and ecological and evolutionary studies. Although promising, no dedicated tools providing a straightforward way to implement it existed. In this issue of <i>Molecular Ecology Resources</i>, Liggan et al. (<span>2025</span>) make a welcome contribution to the field by introducing two new R packages that facilitate the estimation of multi-locus allelic diversity and of the NOC from the sequencing of microsatellites of mixed samples obtained from environmental DNA. The <span>Amplicomsat</span> R package determines the observed allele count (based on sequence length and sequence identity) from sequence-based microsatellite genotyping (Figure 1A). The <span>GenotypeQuant</span> R package estimates the NOC given the number of observed alleles within mixed samples and allele frequencies of a reference population (Figure 1B). The authors conducted extensive testing of the developed method using simulation and empirical work in the laboratory and in the field, providing a convincing demonstration of its strengths and limitations, but also useful guidelines for future applications to other biological models or to address a broad range of scientific inquiries. Importantly, these advances can help support ongoing global biodiversity monitoring efforts.</p><p>The results reported by Liggan et al. (<span>2025</span>) are promising because the method performed well with easy-to-gather molecular data. Using 11 microsatellites and 40 individuals as references, it revealed a total of 177 alleles differing by their size or 297 microhaplotypes based on allele sequence identity (averaging 16 and 27 alleles per locus, respectively), with satisfactory accuracy with up to 20 contributors (Liggan et al. <span>2025</span>). This was the case for the bull kelp gametophyte study (Figure 2), where empirical validation was conducted on eight contributors, and field sample estimates ranged from one to 12 contributors. However, more complex mixtures (> 50 contributors) as well as missing genotypes due to degraded DNA in environmental samples were more challenging, calling for caution when applying the method in these specific cases. Improving PCR efficiency or library preparation can help generate more data from difficult samples, as suggested by the authors. It is worth noting; however, that Liggan et al. (<span>2025</span>) were able to estimate the NOC from field samples with r
环境DNA样本中种内遗传变异的研究最近获得了动力,因为它被证明是研究种群水平过程的有效方法(Andres et al. 2023)。虽然等位基因频率可以从样本中等位基因序列覆盖的分布推断出来,但检测到的等位基因数量可以用来估计贡献基因的数量(NOC),这是法医学中一个长期存在的问题。这一发展使估计一个物种的绝对丰度成为可能,为种群监测以及生态和进化研究开辟了新的可能性。虽然很有希望,但没有专门的工具提供直接实现它的方法。在本期的《分子生态资源》(Molecular Ecology Resources)中,ligan等人(2025)引入了两个新的R包,为该领域做出了受欢迎的贡献,这两个R包有助于估计从环境DNA中获得的混合样品的微卫星测序的多位点等位基因多样性和NOC。Amplicomsat R包通过基于序列的微卫星基因分型确定观察到的等位基因计数(基于序列长度和序列同一性)(图1A)。GenotypeQuant R包根据混合样本中观察到的等位基因数量和参考群体的等位基因频率估算NOC(图1B)。作者在实验室和实地使用模拟和实证工作对开发的方法进行了广泛的测试,提供了令人信服的证明其优势和局限性,但也为未来应用于其他生物模型或解决广泛的科学问题提供了有用的指导。重要的是,这些进展可以帮助支持正在进行的全球生物多样性监测工作。Liggan等人(2025)报告的结果很有希望,因为该方法在易于收集的分子数据上表现良好。利用11个微卫星和40个个体作为参考,共发现了177个大小不同的等位基因或297个基于等位基因序列一致性的微单倍型(平均每个位点分别为16个和27个等位基因),高达20个贡献者具有令人满意的准确性(Liggan et al. 2025)。牛海带配子体研究就是这种情况(图2),其中对8个贡献者进行了实证验证,现场样本估计范围从1到12个贡献者。然而,更复杂的混合物(50个贡献者)以及由于环境样本中DNA降解而缺失的基因型更具挑战性,因此在将该方法应用于这些特定病例时需要谨慎。正如作者所建议的那样,提高PCR效率或文库制备可以帮助从困难的样品中产生更多的数据。值得注意的是;然而,Liggan等人(2025)即使在测序失败率很高的情况下(野外样本中有50%缺失基因型),也能够以合理的置信度从野外样本中估计出NOC。这说明了该方法如何成功地应用于现实场景,并与之前在该领域取得重大进展的案例研究相结合(Andres et al. 2021)。在作者的研究案例中,等位基因序列同一性提供的额外信息(与等位基因大小相比)对于恢复NOC与采样表面积之间的预期相关性至关重要。这一实证结果说明了通过考虑编码为微单倍型的测序扩增子中的所有多态性所提供的丰富信息。在最近的一项应用中,使用生物信息学分析snp的GT-seq对74个核位点进行了定位,(Shi et al. 2025)在565条奇努克鲑鱼中鉴定出252个独特的微单倍型(平均每个位点有3.4个等位基因)。这个精心整理的数据集提供了足够的能力来解决多达10个个体的混合物,并且具有最小的空间来适应消化道降解DNA上的高缺失基因型。观察到的等位基因的数量对于准确推断尤为重要。因此,整合高度多态性的变异,如微卫星,是理想的最大化等位基因的数量在一个短的DNA片段。这种紧凑型标记也更有可能在降解环境DNA的情况下进行PCR扩增。值得注意的是,Liggan等人(2025)对已知存在其研究生物的特定微栖息地进行了采样,从而增加了从目标物种中捕获DNA的可能性。将这种方法应用于对分散在环境中的高度稀释的DNA进行采样可能更具挑战性。Liggan等人(2025)进行的模拟和经验验证清楚地表明,在涉及更复杂的DNA混合物或较少多态性标记的情况下,可以通过增加标记数量、参考样本中的个体数量或检测到的等位基因数量来改进。 后一种解决方案可能涉及对较长的标记进行测序,这将增加检测到的多态性的累积数量,从而增加观察到的等位基因的总数,如在人类中所示,在多个微卫星中,有数百个大单倍型预测为8 kb标记(Ge et al. 2021)。随着可观察到的等位基因数量的增加,在不受基因座饱和影响的情况下,可以获得更多的能力来解析复杂的DNA混合物(Andres et al. 2023)。因此,GenotypeQuant R包代表了对以前实现的一个受欢迎的改进,因为它可以容忍大量的等位基因,同时为处理复杂的混合物和高多态性提供更高的计算效率。另一种取得进展的方法是减少基因分型错误,以检测罕见的变异(Andres等人,2021),这可以通过使用新兴的数字测序来实现(Andersson等人,2024)。通过在早期方案步骤中用独特的分子指数标记每个原始DNA分子,数字测序可以确定每个原始DNA链的共识序列,并大大提高了PCR和测序过程中引入的罕见变异和错误之间的区别(Carlson et al. 2015)。虽然实现数字测序的方法很多,但可能只有少数几种适合降解和弱浓缩的eDNA,这需要进一步的特异性测试和开发。混合捕获测序(Ai et al. 2025)已成功用于通过关注富集线粒体DNA的替换来估计两种虾虎鱼物种的丰度,这可能比基于pcr的降解DNA方法更有效。即使没有进一步的技术改进,微单倍型数据现在也可以很容易地在Liggan等人(2025)开发的工具中进行处理,以估计交配过程中环境中释放配子的个体数量。它为理解复杂的生态过程开辟了新的可能性,例如,通过研究昆虫传播的花粉(Kämper等人,2025)或通过空气中的被动采样器捕获的风传播花粉(Lin等人,2025)来研究植物授粉。这类新数据将提供有关植物生殖景观的新信息。正如作者所指出的那样,他们的方法提高了eDNA的能力,使其能够进一步监测具有复杂生活史的物种种群,或者是那些微观的、难以捉摸的或罕见到足以用传统的直接采样方法监测的物种种群。此外,ligan等人(2025)取得的进展对支持全球生物多样性监测工作具有重要意义。同时确定群落水平物种多样性和跨多个物种种内多样性的方法仍然是进化和保护生物学家的灵丹妙药。尽管仍然存在许多挑战,例如准确估计精确的等位基因频率或汇总统计,如eDNA样本的杂合性,但Liggan等人(2025)和其他人(Andres等人,2021)在方法学上取得的进步正在为实现这一目标铺平道路。由于等位基因变异是进化生物学和保护的关键特征(Allendorf et al. 2024),从环境样本中测序的物种特异性微卫星观察到的等位基因计数从保护的角度提供了非常有价值的信息。首先,观察到的等位基因数量可以作为等位基因丰富度的代表,等位基因丰富度是监测遗传组成的六个基本生物多样性变量之一(Hoban et al. 2022)。此外,观察到的等位基因计数可以揭示特定人群中罕见或私有的等位基因,从而了解其遗传独特性(Kalinowski 2004)。最后,可以将多特异性观察到的等位基因计数纳入系统保护规划工具,以确定种内遗传多样性保护的优先区域(Paz-Vinas et al. 2018)。NOC及其衍生的个体密度可以帮助估计人口普查规模(Nc),这是人口监测的关键指标。这些估计可以帮助计算联合国生物多样性公约《昆明-蒙特利尔全
{"title":"Counting the Invisible: New Tools to Estimate the Number of Contributors From Sequence-Based Microsatellite Genotyping of Environmental DNA Samples","authors":"Olivier Lepais, Ivan Paz-Vinas","doi":"10.1111/1755-0998.70051","DOIUrl":"10.1111/1755-0998.70051","url":null,"abstract":"<p>The study of intraspecific genetic variation in environmental DNA samples has recently gained momentum following its demonstration as an effective method to study population-level processes (Andres et al. <span>2023</span>). Although allele frequencies can be inferred from the distribution of allele sequence coverage within a sample, the number of detected alleles can be used to estimate the number of contributors (NOC), a long-standing issue in forensic science. This development enables the estimation of the absolute abundance of a species, opening up new possibilities for population monitoring and ecological and evolutionary studies. Although promising, no dedicated tools providing a straightforward way to implement it existed. In this issue of <i>Molecular Ecology Resources</i>, Liggan et al. (<span>2025</span>) make a welcome contribution to the field by introducing two new R packages that facilitate the estimation of multi-locus allelic diversity and of the NOC from the sequencing of microsatellites of mixed samples obtained from environmental DNA. The <span>Amplicomsat</span> R package determines the observed allele count (based on sequence length and sequence identity) from sequence-based microsatellite genotyping (Figure 1A). The <span>GenotypeQuant</span> R package estimates the NOC given the number of observed alleles within mixed samples and allele frequencies of a reference population (Figure 1B). The authors conducted extensive testing of the developed method using simulation and empirical work in the laboratory and in the field, providing a convincing demonstration of its strengths and limitations, but also useful guidelines for future applications to other biological models or to address a broad range of scientific inquiries. Importantly, these advances can help support ongoing global biodiversity monitoring efforts.</p><p>The results reported by Liggan et al. (<span>2025</span>) are promising because the method performed well with easy-to-gather molecular data. Using 11 microsatellites and 40 individuals as references, it revealed a total of 177 alleles differing by their size or 297 microhaplotypes based on allele sequence identity (averaging 16 and 27 alleles per locus, respectively), with satisfactory accuracy with up to 20 contributors (Liggan et al. <span>2025</span>). This was the case for the bull kelp gametophyte study (Figure 2), where empirical validation was conducted on eight contributors, and field sample estimates ranged from one to 12 contributors. However, more complex mixtures (> 50 contributors) as well as missing genotypes due to degraded DNA in environmental samples were more challenging, calling for caution when applying the method in these specific cases. Improving PCR efficiency or library preparation can help generate more data from difficult samples, as suggested by the authors. It is worth noting; however, that Liggan et al. (<span>2025</span>) were able to estimate the NOC from field samples with r","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Liu, Hang Zong, Yaowu Xing, Xi Jiao, Zhuoya Liu, Yusheng Niu, Zhiling Yang, Shimeng Liu, Yongqiang Wang, Haodong Zhao, Xianqing Chen, Zhenzhu Li, Xiao Wang, Jing Cai, Wen Wang, Zhongkai Wang
Arabica coffee (Coffea arabica) dominates global coffee production, accounting for over 60% of the world's coffee trade. The Mundo Novo cultivar, predominantly grown in Yunnan, China, represents a significant germplasm resource. However, the absence of a high-quality reference genome has hindered comprehensive genetic research and in-depth investigation of secondary metabolic pathways in Arabica. In this study, we present the first near telomere-to-telomere (T2T) genome assembly of Arabica, achieved through the integration of PacBio HiFi, Oxford Nanopore ultra-long, and Hi-C sequencing technologies, representing the highest-quality Arabica genome to date. Phylogenetic analysis of N-methyltransferases (NMTs), the key enzymes responsible for caffeine biosynthesis, revealed their independent evolution across caffeine-producing clades including coffee, cacao, and tea. Furthermore, GO enrichment analysis of expanded gene families at the Arabica ancestral node, combined with fruit-specific transcriptomic profiling, revealed that glycosyltransferases likely play a critical role in the secondary metabolism of Arabica. Notably, functional characterisation demonstrated that a UGT (uridine diphosphate glycosyltransferase, UGT) from the UGT29 subfamily, which exhibited increased gene copy number in the Arabica subgenome C than its ancestor, can directly convert Rebaudioside A (Reb A) into Rebaudioside M (Reb M) through a single-step enzymatic glycosylation. This direct pathway represents a crucial advancement over conventional multi-UGTs biosynthetic routes of Reb M, which is a highly desirable sweetener whereas with limited natural abundance. Taken together, this study not only provides a valuable genomic resource for studying the unique secondary metabolic processes in C. arabica but also accelerates innovative research frontiers for the synthetic biological production of the valuable sweetener Reb M.
{"title":"A Near Telomere-To-Telomere Genome Assembly of Coffea arabica (Mundo Novo) Provides Insights Into Its Secondary Metabolism","authors":"Yi Liu, Hang Zong, Yaowu Xing, Xi Jiao, Zhuoya Liu, Yusheng Niu, Zhiling Yang, Shimeng Liu, Yongqiang Wang, Haodong Zhao, Xianqing Chen, Zhenzhu Li, Xiao Wang, Jing Cai, Wen Wang, Zhongkai Wang","doi":"10.1111/1755-0998.70053","DOIUrl":"10.1111/1755-0998.70053","url":null,"abstract":"<p>Arabica coffee (<i>Coffea arabica</i>) dominates global coffee production, accounting for over 60% of the world's coffee trade. The Mundo Novo cultivar, predominantly grown in Yunnan, China, represents a significant germplasm resource. However, the absence of a high-quality reference genome has hindered comprehensive genetic research and in-depth investigation of secondary metabolic pathways in Arabica. In this study, we present the first near telomere-to-telomere (T2T) genome assembly of Arabica, achieved through the integration of PacBio HiFi, Oxford Nanopore ultra-long, and Hi-C sequencing technologies, representing the highest-quality Arabica genome to date. Phylogenetic analysis of N-methyltransferases (NMTs), the key enzymes responsible for caffeine biosynthesis, revealed their independent evolution across caffeine-producing clades including coffee, cacao, and tea. Furthermore, GO enrichment analysis of expanded gene families at the Arabica ancestral node, combined with fruit-specific transcriptomic profiling, revealed that glycosyltransferases likely play a critical role in the secondary metabolism of Arabica. Notably, functional characterisation demonstrated that a UGT (uridine diphosphate glycosyltransferase, UGT) from the UGT29 subfamily, which exhibited increased gene copy number in the Arabica subgenome C than its ancestor, can directly convert Rebaudioside A (Reb A) into Rebaudioside M (Reb M) through a single-step enzymatic glycosylation. This direct pathway represents a crucial advancement over conventional multi-UGTs biosynthetic routes of Reb M, which is a highly desirable sweetener whereas with limited natural abundance. Taken together, this study not only provides a valuable genomic resource for studying the unique secondary metabolic processes in <i>C. arabica</i> but also accelerates innovative research frontiers for the synthetic biological production of the valuable sweetener Reb M.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many studies have proposed various comparative genomic methods to probe the molecular basis for adaptive functional convergence between species, conventionally by detecting the convergence of amino acid states between orthologous protein sequences of these species or lineages. However, different amino acids with similar physicochemical properties at a site may contribute to the functional similarity of the protein. Hence, could the convergence of amino acid physicochemical properties, in addition to state convergence, also contribute to adaptive convergence of organismal functions? Here we grouped amino acids into physicochemically similar classes, and developed computational pipelines to detect the Convergence of Amino Acid Properties (CAAP, https://github.com/shanschen33/CAAP) by modifying previous state convergence detection methods. Investigating three organismal convergence cases including echolocating mammals, marine mammals and woody mangroves, we found genes with CAAP that likely contribute to the respective functional adaptation, supported by orthogonal evidence such as functional enrichment and positive selection analyses. Our findings in multiple cases corroborate the hypothesis that CAAP may underlie adaptive convergent evolution of organismal functions, emphasising the importance of considering sequence features more complex than amino acid states when studying adaptive sequence convergence.
{"title":"Detecting Convergence of Amino Acid Physicochemical Properties Underlying the Organismal Adaptive Convergent Evolution","authors":"Shanshan Chen, Zhengting Zou","doi":"10.1111/1755-0998.70052","DOIUrl":"10.1111/1755-0998.70052","url":null,"abstract":"<p>Many studies have proposed various comparative genomic methods to probe the molecular basis for adaptive functional convergence between species, conventionally by detecting the convergence of amino acid states between orthologous protein sequences of these species or lineages. However, different amino acids with similar physicochemical properties at a site may contribute to the functional similarity of the protein. Hence, could the convergence of amino acid physicochemical properties, in addition to state convergence, also contribute to adaptive convergence of organismal functions? Here we grouped amino acids into physicochemically similar classes, and developed computational pipelines to detect the Convergence of Amino Acid Properties (CAAP, https://github.com/shanschen33/CAAP) by modifying previous state convergence detection methods. Investigating three organismal convergence cases including echolocating mammals, marine mammals and woody mangroves, we found genes with CAAP that likely contribute to the respective functional adaptation, supported by orthogonal evidence such as functional enrichment and positive selection analyses. Our findings in multiple cases corroborate the hypothesis that CAAP may underlie adaptive convergent evolution of organismal functions, emphasising the importance of considering sequence features more complex than amino acid states when studying adaptive sequence convergence.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145147203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DNA technologies have many advantages for biomonitoring and biodiversity analyses, but these depend on the availability of relevant reference DNA barcodes. To be most useful, a DNA barcode should be linked to a taxonomic name, which can in turn be connected to ecological information. This linking can be achieved by DNA barcoding of taxonomically identified specimens. Museums are a promising source of such specimens, but the DNA in museum specimens is often degraded, necessitating carefully optimised DNA extraction methods. In this issue of Molecular Ecology Resources, Holmquist et al. (2025) present a DNA extraction protocol for museum insect specimens, using in-house formulated Solid Phase Reversible Immobilisation (SPRI) beads. The authors carried out several experiments with statistical evaluation to determine optimal DNA extraction parameters, before testing the protocol on a large and diverse pool of museum-held insect specimens. The result is a low-cost and effective DNA extraction protocol for diverse museum insect specimens.
Insects are vitally important components of Earth's biodiversity, but monitoring these communities is challenging due to the huge diversity of species that exist. DNA sequencing technologies enable efficient molecular characterisation of insect diversity, but the resulting molecular taxonomic units are typically disconnected from species or functional information (Meier et al. 2024). This makes ecological insights difficult to achieve for wide swathes of biodiversity. Reference DNA barcodes from taxonomically identified species can bridge this gap (Kress et al. 2015), but the process of taxonomically identifying insect specimens is very difficult due to a scarcity of suitable taxonomic expertise. New workflows that combine machine learning with mass DNA barcoding of trapped insect samples have the potential to resolve this challenge over time (Meier et al. 2024). On the other hand, it is important to consider existing resources such as museum collections as sources of reference DNA barcodes.
Museums often hold rich collections of biological specimens, usually with taxonomic identifications, accumulated over long periods of time (Figure 1). In theory, these collections represent a compelling source of DNA barcodes (Raxworthy and Smith 2021). The DNA in museum specimens is often degraded due to specimen age and suboptimal preservation, however; this makes it difficult to recover DNA barcode sequences from some specimens (Hebert et al. 2013). Assembly of multiple shorter amplicons can be effective in cases where DNA fragmentation makes PCR amplification of standard barcode regions unfeasible (D'Ercole et al. 2021; Prosser et al. 2016), but this is more complex and costly than conventional DNA barcode generation. Therefore, it is important to develop optimal methods of DNA extraction from museum insect collections to
{"title":"Optimising Extraction of DNA From Museum Insect Specimens","authors":"Andrew Dopheide, Thomas Buckley","doi":"10.1111/1755-0998.70048","DOIUrl":"10.1111/1755-0998.70048","url":null,"abstract":"<p>DNA technologies have many advantages for biomonitoring and biodiversity analyses, but these depend on the availability of relevant reference DNA barcodes. To be most useful, a DNA barcode should be linked to a taxonomic name, which can in turn be connected to ecological information. This linking can be achieved by DNA barcoding of taxonomically identified specimens. Museums are a promising source of such specimens, but the DNA in museum specimens is often degraded, necessitating carefully optimised DNA extraction methods. In this issue of Molecular Ecology Resources, Holmquist et al. (2025) present a DNA extraction protocol for museum insect specimens, using in-house formulated Solid Phase Reversible Immobilisation (SPRI) beads. The authors carried out several experiments with statistical evaluation to determine optimal DNA extraction parameters, before testing the protocol on a large and diverse pool of museum-held insect specimens. The result is a low-cost and effective DNA extraction protocol for diverse museum insect specimens.</p><p>Insects are vitally important components of Earth's biodiversity, but monitoring these communities is challenging due to the huge diversity of species that exist. DNA sequencing technologies enable efficient molecular characterisation of insect diversity, but the resulting molecular taxonomic units are typically disconnected from species or functional information (Meier et al. <span>2024</span>). This makes ecological insights difficult to achieve for wide swathes of biodiversity. Reference DNA barcodes from taxonomically identified species can bridge this gap (Kress et al. <span>2015</span>), but the process of taxonomically identifying insect specimens is very difficult due to a scarcity of suitable taxonomic expertise. New workflows that combine machine learning with mass DNA barcoding of trapped insect samples have the potential to resolve this challenge over time (Meier et al. <span>2024</span>). On the other hand, it is important to consider existing resources such as museum collections as sources of reference DNA barcodes.</p><p>Museums often hold rich collections of biological specimens, usually with taxonomic identifications, accumulated over long periods of time (Figure 1). In theory, these collections represent a compelling source of DNA barcodes (Raxworthy and Smith <span>2021</span>). The DNA in museum specimens is often degraded due to specimen age and suboptimal preservation, however; this makes it difficult to recover DNA barcode sequences from some specimens (Hebert et al. <span>2013</span>). Assembly of multiple shorter amplicons can be effective in cases where DNA fragmentation makes PCR amplification of standard barcode regions unfeasible (D'Ercole et al. <span>2021</span>; Prosser et al. <span>2016</span>), but this is more complex and costly than conventional DNA barcode generation. Therefore, it is important to develop optimal methods of DNA extraction from museum insect collections to ","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p>Biodiversity assessments are a critical part of ecological monitoring, food systems management, and many areas of research. Traditionally, recording which taxa are present in an area has been accomplished by resource-intensive surveys and the morphological identification of bulk samples by taxonomic experts. The advent of DNA metabarcoding has allowed many of these barriers to be circumvented by amplifying and sequencing taxon-specific DNA loci to create an inventory of what organisms are present in a sample. However, this method is limited to a small fraction of DNA present in a sample and is biased towards over- and under-representing certain organisms based on their genomic content. With continually decreasing sequencing and computational costs, metagenomic analysis of the entire DNA content of a sample aims to capture a more complete picture of an area's biodiversity. Callens et al. (<span>2025</span>) provide a direct comparison of metabarcoding and metagenomic analysis on morphologically identified macrobenthos bulk samples and detail a strategy for expanding the use of metagenomics in bulk sample characterisation. In their case study, metagenomics enabled the composition and biomass of the organisms in the samples to be reconstructed with greater accuracy. Contrary to common belief, this was achieved with a similar level of sequencing effort to that required for metabarcoding. Can this approach be generalised to any biodiversity inventory with the same success?</p><p>The DNA-based taxonomic annotation of environmental samples (eDNA) and bulk samples to characterise the biodiversity of an area has seen a rapid growth of implementation in the past decades. In addition to having a lower resource cost than surveys, DNA-based methods can detect species that are difficult to morphologically identify or observe in the wild. Previous comparisons between the DNA-based metabarcoding and metagenomic workflows have documented both largely overlapping taxonomic detections (Courtin et al. <span>2022</span>) as well as minimally overlapping datasets with inverse alpha diversity patterns (Hollman et al. <span>2025</span>).</p><p>Metabarcoding has been the most established and widely implemented of the two workflows, but metagenomics offers several advantages and challenges by comparison. By sequencing the entire DNA content of a sample with metagenomics, all organisms can potentially be identified and quantified instead of limiting the detection to a specific taxonomic group with metabarcoding primers. However, without amplification, DNA from taxa of interest is at risk of being swamped by organisms with a higher total biomass or being preferentially captured in a sample, leading to false negatives (Zimmermann et al. <span>2023</span>). In environmental samples, non-microbial DNA typically represents less than a third of the total DNA (Eisenhofer et al. <span>2024</span>), and sometimes less than 10%.</p><p>By not focusing on taxon-specific DNA loci, m
生物多样性评估是生态监测、粮食系统管理和许多研究领域的重要组成部分。传统上,记录一个地区存在哪些分类群是通过资源密集的调查和分类专家对大量样本的形态鉴定来完成的。DNA元条形码的出现使得许多这些障碍可以通过扩增和测序分类群特异性DNA位点来创建样本中存在的生物体的清单来规避。然而,这种方法仅限于样品中存在的一小部分DNA,并且基于其基因组内容偏向于过度或不足代表某些生物体。随着测序和计算成本的不断降低,对样本的整个DNA含量进行宏基因组分析的目的是更全面地了解一个地区的生物多样性。Callens等人(2025)对形态学鉴定的大型底生物大样本进行了元条形码和宏基因组分析的直接比较,并详细介绍了在大样本表征中扩大宏基因组学使用的策略。在他们的案例研究中,宏基因组学使样本中生物的组成和生物量得以更准确地重建。与通常的看法相反,这是通过与元条形码所需的类似水平的测序工作实现的。这种方法能否推广到任何生物多样性清单,并取得同样的成功?在过去的几十年里,基于dna的环境样本分类注释(eDNA)和大样本分类注释来描述一个地区的生物多样性已经看到了快速增长的实施。除了具有比调查更低的资源成本外,基于dna的方法可以检测难以在野外形态学上识别或观察的物种。先前基于dna的元条形码和宏基因组工作流程之间的比较记录了大量重叠的分类检测(Courtin et al. 2022)以及具有逆α多样性模式的最小重叠数据集(Hollman et al. 2025)。元条形码是这两种工作流程中最成熟和最广泛实施的,但元基因组学通过比较提供了一些优势和挑战。通过使用元基因组学对样本的全部DNA内容进行测序,可以潜在地鉴定和量化所有生物体,而不是使用元条形码引物将检测限制在特定的分类类群上。然而,如果没有扩增,来自感兴趣的分类群的DNA有被总生物量更高的生物淹没或被优先捕获在样品中的风险,导致假阴性(Zimmermann et al. 2023)。在环境样本中,非微生物DNA通常占总DNA的不到三分之一(Eisenhofer et al. 2024),有时甚至不到10%。宏基因组学不关注分类群特异性DNA位点,允许研究全基因组区域,为大量可能的系统发育和功能分析打开大门(Gelabert et al. 2021)。然而,由于真核生物基因组的大小和保守区域在分类群中共享的程度,只有一小部分植物和后动物基因组在物种水平上具有分类信息,导致许多元基因组研究将分类分配限制在属水平(例如,Wang et al. 2021; Elliott et al. 2025)。相比之下,元条形码引物可以对超过50%的检测类群实现物种水平的分辨率(garc<s:1> - pastor et al. 2022)。由于非信息性DNA片段加上环境样本中细菌DNA的过度代表,导致宏基因组研究的总体足迹更大,因此宏基因组工作流程需要增加测序和计算资源。然而,由此产生的大型数据集可以更容易地重新分析并重新用于未来的研究。在定量方面,Callens等人(2025)报道,与元条形码相比,元基因组学的相对读取丰度与生物量之间存在更强的相关性。然而,本研究中用于元条形码的COI基因旨在以许多不匹配的引物结合位点为代价来扩增广泛的分类群多样性(Deagle et al. 2014)。另外,更多的分类限制条形码区域已被证明在读取数和生物量之间产生更强的相关性(Elbrecht et al. 2016)。考虑到大型底栖动物样本的分类多样性和高内源DNA含量,Callens等人(2025)证明,在这种情况下,宏基因组学是一个更合适的工作流程,这取决于一个完整的、同样具有代表性的数据库。虽然宏基因组学避免了元编码所需的大量PCR扩增周期的偏倚效应,但它不能被认为是一种完全无偏的方法,因为已知鸟嘌呤-胞嘧啶含量等各种因素会影响生物体的最终DNA读取计数(Browne et al. 2020)。 大样本含有高浓度的新鲜内源DNA,可以通过宏基因组学在低测序深度下检测到(Callens et al. 2025),而环境DNA样本由微生物主导,具有更高的DNA复杂性,需要更多的数量级测序。部分由于宏基因组数据集的复杂性,假阳性分类群识别通常存在风险,许多工具报告基线率并建议按总读取计数的百分比进行过滤(Pedersen et al. 2016)。Callens等人(2025)计算出这个阈值为数据集的0.2%,并指出,即使新鲜的内源性DNA的百分比很高,整个参考数据库在读取计数低于该百分比时出现在每个样本中。这项研究是在一个包含26种生物的小型数据库中进行的,而大量样本,特别是环境样本,可以包含数量级更多的多样性。宏基因组分析的最大障碍之一是缺乏高质量的参考材料,其中许多多样性没有得到体现。基因组略读或低覆盖全基因组测序为扩展参考数据库提供了有效的方法(Alsos et al. 2020; Lavergne et al. 2025)。先前的研究甚至包括部分组装的数据版本,表明可以分类注释的DNA读数数量大幅增加(Wang et al. 2021)。为了充分利用基因组图谱中包含的信息,Callens等人(2025)使用基于k-mer的方法对未组装的基因组图谱作为参考材料,证明即使在1倍的覆盖率下,该物种的大多数reads也可以被分类。使用kraken2 (Wood等人,2019)等程序对用作参考数据库的未组装基因组图谱进行升级计算具有挑战性,但可以使用概率数据结构实现(Elliott等人,2025)。最终,对于所有的分析,宏基因组和元条形码方法都不能绝对优于其他方法。与往常一样,两者之间的选择高度依赖于研究问题、可用资金、样本组成和来源。理解这两个工作流的优点和局限性对于解释它们的结果至关重要。Callens等人(2025)在应用于大宗样本的生物多样性评估时,与元条形码相比,展示了宏基因组学的实用性。扩大低覆盖率基因组测序的参考数据库,同时开发管理这一大量数据的计算工具,将在未来不断扩大宏基因组学的价值。然而,作为科学家,重要的是要记住,对现实的衡量并不是现实本身。我们必须了解工具的局限性,并为每项任务选择最合适的工具,因为对一项任务最有效的工具可能对另一项任务最无效。E.C.构思并撰写了手稿。作者声明无利益冲突。
{"title":"Can Amplicon Sequencing Be Replaced by Metagenomics for Biodiversity Inventories?","authors":"Lucas Elliott, Eric Coissac","doi":"10.1111/1755-0998.70047","DOIUrl":"10.1111/1755-0998.70047","url":null,"abstract":"<p>Biodiversity assessments are a critical part of ecological monitoring, food systems management, and many areas of research. Traditionally, recording which taxa are present in an area has been accomplished by resource-intensive surveys and the morphological identification of bulk samples by taxonomic experts. The advent of DNA metabarcoding has allowed many of these barriers to be circumvented by amplifying and sequencing taxon-specific DNA loci to create an inventory of what organisms are present in a sample. However, this method is limited to a small fraction of DNA present in a sample and is biased towards over- and under-representing certain organisms based on their genomic content. With continually decreasing sequencing and computational costs, metagenomic analysis of the entire DNA content of a sample aims to capture a more complete picture of an area's biodiversity. Callens et al. (<span>2025</span>) provide a direct comparison of metabarcoding and metagenomic analysis on morphologically identified macrobenthos bulk samples and detail a strategy for expanding the use of metagenomics in bulk sample characterisation. In their case study, metagenomics enabled the composition and biomass of the organisms in the samples to be reconstructed with greater accuracy. Contrary to common belief, this was achieved with a similar level of sequencing effort to that required for metabarcoding. Can this approach be generalised to any biodiversity inventory with the same success?</p><p>The DNA-based taxonomic annotation of environmental samples (eDNA) and bulk samples to characterise the biodiversity of an area has seen a rapid growth of implementation in the past decades. In addition to having a lower resource cost than surveys, DNA-based methods can detect species that are difficult to morphologically identify or observe in the wild. Previous comparisons between the DNA-based metabarcoding and metagenomic workflows have documented both largely overlapping taxonomic detections (Courtin et al. <span>2022</span>) as well as minimally overlapping datasets with inverse alpha diversity patterns (Hollman et al. <span>2025</span>).</p><p>Metabarcoding has been the most established and widely implemented of the two workflows, but metagenomics offers several advantages and challenges by comparison. By sequencing the entire DNA content of a sample with metagenomics, all organisms can potentially be identified and quantified instead of limiting the detection to a specific taxonomic group with metabarcoding primers. However, without amplification, DNA from taxa of interest is at risk of being swamped by organisms with a higher total biomass or being preferentially captured in a sample, leading to false negatives (Zimmermann et al. <span>2023</span>). In environmental samples, non-microbial DNA typically represents less than a third of the total DNA (Eisenhofer et al. <span>2024</span>), and sometimes less than 10%.</p><p>By not focusing on taxon-specific DNA loci, m","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145147159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}