首页 > 最新文献

Molecular Ecology Resources最新文献

英文 中文
Ultraconserved Elements and Machine Learning Classifiers Enable Robust Phylogenetics and Taxonomy in Model and Non-Model Nematodes 超保守元件和机器学习分类器实现模型和非模型线虫的鲁棒系统发育和分类。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-10-08 DOI: 10.1111/1755-0998.70046
Laura Villegas, Lucy Jimenez, Joëlle van der Sprong, Oleksandr Holovachov, Ann-Marie Waldvogel, Philipp H. Schiffer

Nematodes are among the most diverse animals, yet only around 28,000 of an estimated one million species have been morphologically described. Their small size, morphological simplicity, and cryptic diversity complicate phylogenetic analyses. Traditional morphological and single-locus molecular approaches often lack resolution for both recent and ancient divergences. To address these limitations, we developed the first ultraconserved elements (UCEs) probe sets for two nematode families: Panagrolaimidae, a group of non-model organisms with limited genomic resources when compared to model taxa, and Rhabditidae, which includes the model species Caenorhabditis elegans. Our probe sets targeted 1612 loci for Panagrolaimidae and 100,397 for Rhabditidae. In vitro testing recovered up to 1457 loci in Panagrolaimidae, supporting robust phylogenetic reconstruction. Results were largely consistent with previous analyses, except for one strain reclassified as Neocephalobus halophilus BSS8. Using machine learning, we determined the minimum number of loci needed for accurate genus-level classification. For Rhabditidae, XGBoost achieved high accuracy with just 46 loci. For Panagrolaimidae, 39 loci were most informative. Our UCE-based approach offers a scalable and cost-effective framework for phylogenomics, enhancing taxonomic resolution and evolutionary inference in nematodes. It is well suited for biodiversity assessments and shallow, field-based sequencing, expanding research possibilities across this ecologically important phylum.

线虫是最多样化的动物之一,但在估计的100万种中,只有大约2.8万种被形态学地描述过。它们的体积小,形态简单,和隐蔽的多样性使系统发育分析复杂化。传统的形态学和单位点分子方法往往缺乏解决最近和古代的分歧。为了解决这些限制,我们为两个线虫科开发了第一个超保守元件(UCEs)探针集:Panagrolaimidae,一组与模型分类群相比基因组资源有限的非模式生物,以及Rhabditidae,其中包括模型物种秀丽隐杆线虫。我们的探针集针对Panagrolaimidae的1612个位点和Rhabditidae的100,397个位点。体外测试在Panagrolaimidae中恢复了多达1457个位点,支持强大的系统发育重建。结果与先前的分析基本一致,除了一个菌株被重新分类为新嗜盐头孢菌BSS8。使用机器学习,我们确定了准确的属级分类所需的最小位点数量。对于Rhabditidae, XGBoost仅使用46个位点就实现了很高的准确性。对于拟蝇科,39个位点信息量最大。我们基于uce的方法为系统基因组学提供了一个可扩展且具有成本效益的框架,提高了线虫的分类分辨率和进化推断。它非常适合生物多样性评估和浅层野外测序,扩大了这一重要生态门的研究可能性。
{"title":"Ultraconserved Elements and Machine Learning Classifiers Enable Robust Phylogenetics and Taxonomy in Model and Non-Model Nematodes","authors":"Laura Villegas,&nbsp;Lucy Jimenez,&nbsp;Joëlle van der Sprong,&nbsp;Oleksandr Holovachov,&nbsp;Ann-Marie Waldvogel,&nbsp;Philipp H. Schiffer","doi":"10.1111/1755-0998.70046","DOIUrl":"10.1111/1755-0998.70046","url":null,"abstract":"<p>Nematodes are among the most diverse animals, yet only around 28,000 of an estimated one million species have been morphologically described. Their small size, morphological simplicity, and cryptic diversity complicate phylogenetic analyses. Traditional morphological and single-locus molecular approaches often lack resolution for both recent and ancient divergences. To address these limitations, we developed the first ultraconserved elements (UCEs) probe sets for two nematode families: Panagrolaimidae, a group of non-model organisms with limited genomic resources when compared to model taxa, and Rhabditidae, which includes the model species <i>Caenorhabditis elegans</i>. Our probe sets targeted 1612 loci for Panagrolaimidae and 100,397 for Rhabditidae. In vitro testing recovered up to 1457 loci in Panagrolaimidae, supporting robust phylogenetic reconstruction. Results were largely consistent with previous analyses, except for one strain reclassified as <i>Neocephalobus halophilus</i> BSS8. Using machine learning, we determined the minimum number of loci needed for accurate genus-level classification. For Rhabditidae, XGBoost achieved high accuracy with just 46 loci. For Panagrolaimidae, 39 loci were most informative. Our UCE-based approach offers a scalable and cost-effective framework for phylogenomics, enhancing taxonomic resolution and evolutionary inference in nematodes. It is well suited for biodiversity assessments and shallow, field-based sequencing, expanding research possibilities across this ecologically important phylum.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145249163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wet Lab Protocols Matter: Choice of DNA Extraction and Library Preparation Protocols Bias Ancient Oral Microbiome Recovery 湿实验室方案问题:选择DNA提取和文库制备方案偏向古代口腔微生物组恢复。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-10-06 DOI: 10.1111/1755-0998.70054
Sterling L. Wright, Muslih Abdul-Aziz, Grace N. Blaha, Christine K. Ta, Abigail Gancz, Iyunoluwa J. Ademola-Popoola, Anna Szécsényi-Nagy, Paul C. Sereno, Laura S. Weyrich

Ancient DNA (aDNA) analysis of archaeological dental calculus has provided a wealth of insights into ancient health, demography and lifestyles. However, the workflow for ancient metagenomics is still evolving, raising concerns about reproducibility. Few systematic investigations have examined how DNA extraction methods and library preparation protocols influence ancient oral microbiome recovery, despite evidence from modern populations suggesting that they do. This leaves a gap in our understanding of how wet-lab protocols impact aDNA recovery from dental calculus. In this study, we apply two DNA extraction and two library preparation methods in the aDNA field on dental calculus samples from Hungary and Niger. Samples from each context have similar chronological ages, but differences in their levels of aDNA preservation are notable, providing additional insights into how the efficacy of wet-lab protocols is impacted by sample preservation. Several metrics were employed to assess intra- and inter-sample variability, such as DNA fragment length recovery, GC content, clonality, endogenous content, DNA deamination and microbial composition. Our findings indicate that both DNA extraction and library preparation protocols can considerably impact ancient DNA recovery from archaeological dental calculus. Furthermore, no single protocol consistently outperformed the others across all assessments, and the effectiveness of specific protocol combinations depended on the preservation of the sample. These findings highlight the challenges of meta-analyses and underscore the need to account for technical variability. Lastly, our study raises the question of whether the field should strive to standardise methods for comparability or optimise protocols based on sample preservation and specific research objectives.

考古牙石的古代DNA (aDNA)分析为了解古代健康、人口和生活方式提供了丰富的见解。然而,古代宏基因组学的工作流程仍在不断发展,这引起了人们对可重复性的担忧。很少有系统的调查研究DNA提取方法和文库制备方案如何影响古代口腔微生物群的恢复,尽管来自现代人群的证据表明它们确实如此。这使得我们对湿实验室方案如何影响牙石中aDNA恢复的理解存在空白。在这项研究中,我们应用两种DNA提取和两种库制备方法在aDNA领域对来自匈牙利和尼日尔的牙结石样本进行分析。来自每种环境的样品具有相似的实际年龄,但其aDNA保存水平的差异是显着的,这为湿实验室方案的有效性如何受到样品保存的影响提供了额外的见解。采用了几个指标来评估样品内和样品间的可变性,如DNA片段长度恢复、GC含量、克隆性、内源含量、DNA脱氨和微生物组成。我们的研究结果表明,DNA提取和文库制备方案都可以显著影响考古牙石中古代DNA的恢复。此外,没有一种方案在所有评估中始终优于其他方案,特定方案组合的有效性取决于样本的保存。这些发现突出了荟萃分析的挑战,并强调了考虑技术可变性的必要性。最后,我们的研究提出了一个问题,即该领域是否应该努力标准化方法以实现可比性,或者基于样本保存和特定研究目标优化方案。
{"title":"Wet Lab Protocols Matter: Choice of DNA Extraction and Library Preparation Protocols Bias Ancient Oral Microbiome Recovery","authors":"Sterling L. Wright,&nbsp;Muslih Abdul-Aziz,&nbsp;Grace N. Blaha,&nbsp;Christine K. Ta,&nbsp;Abigail Gancz,&nbsp;Iyunoluwa J. Ademola-Popoola,&nbsp;Anna Szécsényi-Nagy,&nbsp;Paul C. Sereno,&nbsp;Laura S. Weyrich","doi":"10.1111/1755-0998.70054","DOIUrl":"10.1111/1755-0998.70054","url":null,"abstract":"<p>Ancient DNA (aDNA) analysis of archaeological dental calculus has provided a wealth of insights into ancient health, demography and lifestyles. However, the workflow for ancient metagenomics is still evolving, raising concerns about reproducibility. Few systematic investigations have examined how DNA extraction methods and library preparation protocols influence ancient oral microbiome recovery, despite evidence from modern populations suggesting that they do. This leaves a gap in our understanding of how wet-lab protocols impact aDNA recovery from dental calculus. In this study, we apply two DNA extraction and two library preparation methods in the aDNA field on dental calculus samples from Hungary and Niger. Samples from each context have similar chronological ages, but differences in their levels of aDNA preservation are notable, providing additional insights into how the efficacy of wet-lab protocols is impacted by sample preservation. Several metrics were employed to assess intra- and inter-sample variability, such as DNA fragment length recovery, GC content, clonality, endogenous content, DNA deamination and microbial composition. Our findings indicate that both DNA extraction and library preparation protocols can considerably impact ancient DNA recovery from archaeological dental calculus. Furthermore, no single protocol consistently outperformed the others across all assessments, and the effectiveness of specific protocol combinations depended on the preservation of the sample. These findings highlight the challenges of meta-analyses and underscore the need to account for technical variability. Lastly, our study raises the question of whether the field should strive to standardise methods for comparability or optimise protocols based on sample preservation and specific research objectives.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145231309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Saprotrophic Arachnopeziza Species as New Resources to Study the Obligate Biotrophic Lifestyle of Powdery Mildew Fungi 腐养型蜘蛛属作为研究白粉菌专性生物营养生活方式的新资源。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-10-03 DOI: 10.1111/1755-0998.70045
Anne Loos, Ella Doykova, Jiangzhao Qian, Florian Kümmel, Heba Ibrahim, Levente Kiss, Ralph Panstruga, Stefan Kusch

Obligate biotrophic plant pathogens like the powdery mildew fungi commit to a closely dependent relationship with their plant hosts and have lost the ability to grow and reproduce independently. Thus, at present, these organisms are not amenable to in vitro cultivation, which is a prerequisite for effective genetic modification and functional molecular studies. Saprotrophic fungi of the family Arachnopezizaceae are the closest known extant relatives of the powdery mildew fungi and may hold great potential for studying genetic components of their obligate biotrophic lifestyle. Here, we established telomere-to-telomere genome assemblies for two representatives of this family, Arachnopeziza aurata and A. aurelia. Both species harbour haploid genomes that are composed of 16 chromosomes at a genome size of 43.1 and 46.3 million base pairs, respectively, which, in contrast to most powdery mildew genomes that are transposon-enriched, show a repeat content below 5% and signs of repeat-induced point mutation. Both species could be grown in liquid culture and on standard solid media and were sensitive to common fungicides such as hygromycin and fenhexamid. We successfully expressed a red fluorescent protein and hygromycin resistance in A. aurata following polyethylene glycol-mediated protoplast transformation, demonstrating that Arachnopeziza species are amenable to genetic alterations, which may be expanded to include gene replacement, gene modification, and gene complementation in the future. With this work, we established a potential model system that promises to sidestep the need for genetic modification of powdery mildew fungi by using Arachnopeziza species as a proxy to uncover the molecular functions of powdery mildew proteins.

专性生物营养植物病原体,如白粉病真菌,与它们的植物宿主建立了密切的依赖关系,失去了独立生长和繁殖的能力。因此,目前这些生物还不适合体外培养,而体外培养是进行有效的基因改造和功能分子研究的先决条件。腐养真菌是已知的与白粉病真菌最接近的现存亲戚,可能在研究其专性生物营养生活方式的遗传成分方面具有很大的潜力。在这里,我们建立端粒到端粒基因组组装为该家族的两个代表,Arachnopeziza aurata和A. aurelia。这两个物种的单倍体基因组分别由16条染色体组成,基因组大小分别为4310万和4630万碱基对,与大多数转座子富集的白粉病基因组相比,其重复含量低于5%,并有重复诱导点突变的迹象。两种菌种均可在液体培养基和标准固体培养基上生长,对常见的杀菌剂如潮霉素和芬甲霉素敏感。我们在聚乙二醇介导的原生质体转化后成功表达了一种红色荧光蛋白和对水霉素的抗性,这表明Arachnopeziza物种可以进行遗传改变,未来可能会扩展到基因替换、基因修饰和基因互补。通过这项工作,我们建立了一个潜在的模型系统,有望通过Arachnopeziza物种作为代理来揭示白粉病蛋白的分子功能,从而避免对白粉病真菌进行基因改造的需要。
{"title":"Saprotrophic Arachnopeziza Species as New Resources to Study the Obligate Biotrophic Lifestyle of Powdery Mildew Fungi","authors":"Anne Loos,&nbsp;Ella Doykova,&nbsp;Jiangzhao Qian,&nbsp;Florian Kümmel,&nbsp;Heba Ibrahim,&nbsp;Levente Kiss,&nbsp;Ralph Panstruga,&nbsp;Stefan Kusch","doi":"10.1111/1755-0998.70045","DOIUrl":"10.1111/1755-0998.70045","url":null,"abstract":"<p>Obligate biotrophic plant pathogens like the powdery mildew fungi commit to a closely dependent relationship with their plant hosts and have lost the ability to grow and reproduce independently. Thus, at present, these organisms are not amenable to in vitro cultivation, which is a prerequisite for effective genetic modification and functional molecular studies. Saprotrophic fungi of the family <i>Arachnopezizaceae</i> are the closest known extant relatives of the powdery mildew fungi and may hold great potential for studying genetic components of their obligate biotrophic lifestyle. Here, we established telomere-to-telomere genome assemblies for two representatives of this family, <i>Arachnopeziza aurata</i> and <i>A. aurelia</i>. Both species harbour haploid genomes that are composed of 16 chromosomes at a genome size of 43.1 and 46.3 million base pairs, respectively, which, in contrast to most powdery mildew genomes that are transposon-enriched, show a repeat content below 5% and signs of repeat-induced point mutation. Both species could be grown in liquid culture and on standard solid media and were sensitive to common fungicides such as hygromycin and fenhexamid. We successfully expressed a red fluorescent protein and hygromycin resistance in <i>A. aurata</i> following polyethylene glycol-mediated protoplast transformation, demonstrating that <i>Arachnopeziza</i> species are amenable to genetic alterations, which may be expanded to include gene replacement, gene modification, and gene complementation in the future. With this work, we established a potential model system that promises to sidestep the need for genetic modification of powdery mildew fungi by using <i>Arachnopeziza</i> species as a proxy to uncover the molecular functions of powdery mildew proteins.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145224854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Genotype Imputation Methods for Low-Coverage Sequencing Data in Populations With Differing Relatedness and Inbreeding Levels 在不同亲缘性和近交水平的群体中评估低覆盖率测序数据的基因型代入方法。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-09-29 DOI: 10.1111/1755-0998.70049
Tram Vi, Katarina C. Stuart, Hui Zhen Tan, Audald Lloret-Villas, Anna W. Santure

Low-coverage sequencing (LCS) followed by genotype imputation has become a cost-efficient approach for obtaining whole-genome SNPs. Several imputation methods for LCS data have been developed over the last decade. However, comparisons of their accuracy in inferring missing genotypes and their effectiveness for downstream analysis such as population genetics have not been comprehensively studied. In the present study, we assessed the imputation performance of five different tools: GLIMPSE2, GeneImp, QUILT2, STITCH and Beagle5.4, using populations simulated by SLiM4 that represent different levels of genetic relatedness and inbreeding. Imputation accuracy was calculated at the level of variant, haplotype and sample. The effectiveness of using imputed genotypes in recovering genetic structure, relatedness, inbreeding coefficients and demographic history was subsequently evaluated. The imputation accuracy of different methods was further tested in a real population of 283 hihi (stitchbird) samples. Our results suggest a high accuracy of all the tested methods on populations with high levels of genetic relatedness. However, in populations with low relatedness, the imputation accuracy differed across different tools and impacted the results of some downstream analyses. The simulation and imputation pipeline presented here can help determine the most suitable imputation method for different population scenarios.

低覆盖测序(LCS)和基因型插补已经成为获得全基因组snp的一种经济有效的方法。在过去的十年中,已经开发了几种LCS数据的估算方法。然而,它们在推断缺失基因型方面的准确性及其在下游分析(如群体遗传学)中的有效性的比较还没有得到全面的研究。在本研究中,我们评估了五种不同工具的代入性能:GLIMPSE2, GeneImp, QUILT2, STITCH和Beagle5.4,使用SLiM4模拟的群体,代表不同水平的遗传亲缘性和近交。在变异、单倍型和样本水平上计算插补精度。随后评估了利用输入基因型恢复遗传结构、亲缘性、近交系数和人口统计学历史的有效性。在283个针鸟样本的实际种群中进一步测试了不同方法的归算精度。我们的研究结果表明,所有测试方法对具有高水平遗传亲缘关系的人群具有很高的准确性。然而,在低亲缘关系的种群中,不同工具的代入精度不同,并影响了一些下游分析的结果。本文所提出的模拟和计算流程可以帮助确定最适合不同人口情景的计算方法。
{"title":"Assessing Genotype Imputation Methods for Low-Coverage Sequencing Data in Populations With Differing Relatedness and Inbreeding Levels","authors":"Tram Vi,&nbsp;Katarina C. Stuart,&nbsp;Hui Zhen Tan,&nbsp;Audald Lloret-Villas,&nbsp;Anna W. Santure","doi":"10.1111/1755-0998.70049","DOIUrl":"10.1111/1755-0998.70049","url":null,"abstract":"<p>Low-coverage sequencing (LCS) followed by genotype imputation has become a cost-efficient approach for obtaining whole-genome SNPs. Several imputation methods for LCS data have been developed over the last decade. However, comparisons of their accuracy in inferring missing genotypes and their effectiveness for downstream analysis such as population genetics have not been comprehensively studied. In the present study, we assessed the imputation performance of five different tools: GLIMPSE2, GeneImp, QUILT2, STITCH and Beagle5.4, using populations simulated by SLiM4 that represent different levels of genetic relatedness and inbreeding. Imputation accuracy was calculated at the level of variant, haplotype and sample. The effectiveness of using imputed genotypes in recovering genetic structure, relatedness, inbreeding coefficients and demographic history was subsequently evaluated. The imputation accuracy of different methods was further tested in a real population of 283 hihi (stitchbird) samples. Our results suggest a high accuracy of all the tested methods on populations with high levels of genetic relatedness. However, in populations with low relatedness, the imputation accuracy differed across different tools and impacted the results of some downstream analyses. The simulation and imputation pipeline presented here can help determine the most suitable imputation method for different population scenarios.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70049","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145184420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PopCluster Improves Accessibility, Speed and Accuracy of Available Genotypic Clustering Software PopCluster提高了可用基因型聚类软件的可访问性、速度和准确性。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70050
Richard Ian Bailey
<p>PopCluster (Wang <span>2024</span>) represents a significant advancement in population structure analysis software, addressing key computational and methodological challenges that have limited the application of clustering methods to modern genomic datasets. The software, developed by Wang (<span>2024</span>), implements novel likelihood-based algorithms that substantially improve both speed and accuracy compared to existing methods like STRUCTURE and ADMIXTURE. Its most notable features include memory-efficient handling of millions of markers and individuals through 2-bit encoding and distributed computing via MPI, sophisticated treatment of unbalanced sampling through a scaling scheme, and the ability to handle both biallelic and multiallelic markers within a unified framework. PopCluster demonstrates particular strengths when analysing datasets with many assumed populations, weak differentiation between clusters, or highly unbalanced sample sizes—situations where current methods often fail. The software's multi-platform availability, integrated GUI for Windows users, and built-in simulation module further enhance its utility for researchers. As genomic datasets continue to grow in size and complexity, PopCluster provides essential capabilities for revealing fine-scale population structure that would otherwise remain hidden. I discuss the software's innovations in the context of current challenges in molecular ecology and highlight its potential applications in conservation genetics, domestication studies, and understanding complex admixture patterns.</p><p>Since its inception, a major focus of population genetics has been on identifying and explaining population structure—the non-random distribution of genetic variation among individuals and populations. A variety of mechanisms can lead sexually reproducing populations to form two or more distinct multi-locus genotypic clusters, which may then evolve and adapt independently, leading to further divergence and even speciation, but may also admix and exchange genetic material. Indeed, Mallet (<span>1995</span>) suggested that the maintenance of distinct genotypic clusters in sympatry should be used as a formal definition of species delimitation.</p><p>Since the seminal methodological developments of Pritchard et al. (<span>2000</span>) in creating the software Structure, the identification of genotypic clusters and admixture among them from multi-locus sequence data has become central to a variety of disciplines within the broad framework of molecular ecology. Examples include domestication studies (Matsuoka et al. <span>2002</span>), human population genetics (1000 Genomes Project Consortium <span>2015</span>; Allentoft et al. <span>2024</span>), conservation genetics (Miller et al. <span>2012</span>), and speciation research (Friedrich et al. <span>2023</span>). The importance of clustering methods continues to grow as large whole genome datasets become available, allowing highly detailed re
PopCluster (Wang 2024)代表了人口结构分析软件的重大进步,解决了限制聚类方法应用于现代基因组数据集的关键计算和方法挑战。该软件由Wang(2024)开发,实现了新颖的基于似然的算法,与现有的方法(如STRUCTURE和ADMIXTURE)相比,大大提高了速度和准确性。其最显著的特点包括通过2位编码和MPI分布式计算来高效地处理数百万个标记和个体,通过缩放方案对不平衡采样进行复杂处理,以及在统一框架内处理双等位基因和多等位基因标记的能力。PopCluster在分析具有许多假定人口的数据集、集群之间的弱差异或高度不平衡的样本量(当前方法经常失败的情况)时显示出特别的优势。该软件的多平台可用性、面向Windows用户的集成GUI和内置仿真模块进一步增强了其对研究人员的实用性。随着基因组数据集的规模和复杂性不断增长,PopCluster提供了揭示精细人口结构的基本能力,否则这些结构将被隐藏。我在分子生态学当前挑战的背景下讨论了软件的创新,并强调了其在保护遗传学、驯化研究和理解复杂混合模式方面的潜在应用。从一开始,群体遗传学的一个主要焦点就是识别和解释群体结构——个体和群体之间遗传变异的非随机分布。多种机制可导致有性繁殖的种群形成两个或多个不同的多位点基因型集群,这些集群随后可能独立进化和适应,导致进一步的分化甚至物种形成,但也可能混合和交换遗传物质。事实上,Mallet(1995)提出,维持同属植物中独特的基因型集群应该作为物种划分的正式定义。自从Pritchard等人(2000)在创建软件结构方面的开创性方法论发展以来,从多位点序列数据中识别基因型集群及其混合物已成为分子生态学广泛框架内各种学科的核心。例子包括驯化研究(Matsuoka et al. 2002)、人类种群遗传学(1000 Genomes Project Consortium 2015; Allentoft et al. 2024)、保护遗传学(Miller et al. 2012)和物种形成研究(Friedrich et al. 2023)。随着大型全基因组数据集的可用,聚类方法的重要性继续增长,允许非常详细的聚类和混合恢复。重建分化和随后混合的时间模式的历史系统基因组学方法(例如TreeMix; Pickrell和Pritchard 2012)正变得越来越普遍,但识别当代集群的原始概念仍然非常重要,尤其是由于易于使用和解释。随着大型基因组数据集的不断增加,在不牺牲准确性的情况下提高速度和计算效率已成为开发新的基因型聚类软件的首要任务。在这个方向上已经取得了重大进展,包括admix (Alexander et al. 2009)和sNMF (Frichot et al. 2014),最近增加的由Wang(2024)开发的软件PopCluster进一步提高了基因型聚类和外合分析的速度、准确性和可及性。PopCluster的一个主要焦点是有效地利用本地计算机和分布式集群上的可用内存,允许在笔记本电脑上分析整个基因组数据集,并在高性能集群上分析来自数百万个人的多达数百万个基因座。Wang(2024)表明,PopCluster可以处理比当前最流行的替代方案之一admix更大的数据集,并且在大多数情况下更快。另一个重点是多平台使用,该软件可以在Windows、Mac和Linux上运行。Windows上可用的GUI为构建编码管道经验较少的用户增加了用户友好性。我个人认为有用的另一个特性是文件转换工具,例如,它可以将VCF转换为结构风格的文件格式。当每个集群的样本量(通常事先不知道)很小或不平衡,假设k(集群数量)很大,或者集群之间的差异程度很低时,聚类软件就会出现主要问题。PopCluster特别关注这些情况下的改进估计。如图1 Wang(2024)所示,在极端恶劣的环境下,PopCluster的性能显著优于其他所有软件。 然而,并不是所有的事情都可以完全自动化,用户在选择最合适的模型设置时仍然要承担一些责任。Wang介绍了一种“缩放”方案,允许用户预先确定他们的样本在每个集群的个体数量方面的不平衡程度。然而,虽然选择正确的缩放可以提高估计,但这通常是事先不知道的。因此,用户必须使用常识性方法来决定他们选择的缩放值是否产生合理的结果。仍然需要提前选择k(聚类数量),每个k多次运行以处理随机模型拟合和多模态似然面,并运行多个k以统计比较模型拟合并确定适当的聚类数量。这个过程可以自动化,并且使用PopCluster,每次运行都很快,但是对于大型数据集来说,这仍然会导致大量的运行时间。我想添加一个不限于PopCluster的技术点。最初的Structure软件通过搜索Hardy-Weinberg和连锁平衡来识别集群,而最近的软件不包括这种明确的群体遗传要求。正如Wang强调的那样,这意味着不需要将位点分离,因此可以使用全基因组数据。事实上,在许多情况下,增加更多的基因座增加了识别真正精细人口结构的可能性。ld修剪仍然是许多分析管道中的一个常见步骤,但从统计角度来看是不必要的,并且考虑到计算效率的提高,从将数据减少到可管理的大小的角度来看,通常也是不必要的。PopCluster和另一个最近的快速聚类软件Neural admix (Dominguez Mantes et al. 2023)之间还没有直接的比较。然而,两者都提供了明显的ADMIXTURE计算改进。PopCluster快速,内存高效,多平台,高度准确,用户友好,使其成为分子生态学软件库的一个受欢迎的补充。作者声明无利益冲突。
{"title":"PopCluster Improves Accessibility, Speed and Accuracy of Available Genotypic Clustering Software","authors":"Richard Ian Bailey","doi":"10.1111/1755-0998.70050","DOIUrl":"10.1111/1755-0998.70050","url":null,"abstract":"&lt;p&gt;PopCluster (Wang &lt;span&gt;2024&lt;/span&gt;) represents a significant advancement in population structure analysis software, addressing key computational and methodological challenges that have limited the application of clustering methods to modern genomic datasets. The software, developed by Wang (&lt;span&gt;2024&lt;/span&gt;), implements novel likelihood-based algorithms that substantially improve both speed and accuracy compared to existing methods like STRUCTURE and ADMIXTURE. Its most notable features include memory-efficient handling of millions of markers and individuals through 2-bit encoding and distributed computing via MPI, sophisticated treatment of unbalanced sampling through a scaling scheme, and the ability to handle both biallelic and multiallelic markers within a unified framework. PopCluster demonstrates particular strengths when analysing datasets with many assumed populations, weak differentiation between clusters, or highly unbalanced sample sizes—situations where current methods often fail. The software's multi-platform availability, integrated GUI for Windows users, and built-in simulation module further enhance its utility for researchers. As genomic datasets continue to grow in size and complexity, PopCluster provides essential capabilities for revealing fine-scale population structure that would otherwise remain hidden. I discuss the software's innovations in the context of current challenges in molecular ecology and highlight its potential applications in conservation genetics, domestication studies, and understanding complex admixture patterns.&lt;/p&gt;&lt;p&gt;Since its inception, a major focus of population genetics has been on identifying and explaining population structure—the non-random distribution of genetic variation among individuals and populations. A variety of mechanisms can lead sexually reproducing populations to form two or more distinct multi-locus genotypic clusters, which may then evolve and adapt independently, leading to further divergence and even speciation, but may also admix and exchange genetic material. Indeed, Mallet (&lt;span&gt;1995&lt;/span&gt;) suggested that the maintenance of distinct genotypic clusters in sympatry should be used as a formal definition of species delimitation.&lt;/p&gt;&lt;p&gt;Since the seminal methodological developments of Pritchard et al. (&lt;span&gt;2000&lt;/span&gt;) in creating the software Structure, the identification of genotypic clusters and admixture among them from multi-locus sequence data has become central to a variety of disciplines within the broad framework of molecular ecology. Examples include domestication studies (Matsuoka et al. &lt;span&gt;2002&lt;/span&gt;), human population genetics (1000 Genomes Project Consortium &lt;span&gt;2015&lt;/span&gt;; Allentoft et al. &lt;span&gt;2024&lt;/span&gt;), conservation genetics (Miller et al. &lt;span&gt;2012&lt;/span&gt;), and speciation research (Friedrich et al. &lt;span&gt;2023&lt;/span&gt;). The importance of clustering methods continues to grow as large whole genome datasets become available, allowing highly detailed re","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Counting the Invisible: New Tools to Estimate the Number of Contributors From Sequence-Based Microsatellite Genotyping of Environmental DNA Samples 计数看不见的:新工具估计贡献者的数量从基于序列的微卫星基因分型环境DNA样本。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70051
Olivier Lepais, Ivan Paz-Vinas
<p>The study of intraspecific genetic variation in environmental DNA samples has recently gained momentum following its demonstration as an effective method to study population-level processes (Andres et al. <span>2023</span>). Although allele frequencies can be inferred from the distribution of allele sequence coverage within a sample, the number of detected alleles can be used to estimate the number of contributors (NOC), a long-standing issue in forensic science. This development enables the estimation of the absolute abundance of a species, opening up new possibilities for population monitoring and ecological and evolutionary studies. Although promising, no dedicated tools providing a straightforward way to implement it existed. In this issue of <i>Molecular Ecology Resources</i>, Liggan et al. (<span>2025</span>) make a welcome contribution to the field by introducing two new R packages that facilitate the estimation of multi-locus allelic diversity and of the NOC from the sequencing of microsatellites of mixed samples obtained from environmental DNA. The <span>Amplicomsat</span> R package determines the observed allele count (based on sequence length and sequence identity) from sequence-based microsatellite genotyping (Figure 1A). The <span>GenotypeQuant</span> R package estimates the NOC given the number of observed alleles within mixed samples and allele frequencies of a reference population (Figure 1B). The authors conducted extensive testing of the developed method using simulation and empirical work in the laboratory and in the field, providing a convincing demonstration of its strengths and limitations, but also useful guidelines for future applications to other biological models or to address a broad range of scientific inquiries. Importantly, these advances can help support ongoing global biodiversity monitoring efforts.</p><p>The results reported by Liggan et al. (<span>2025</span>) are promising because the method performed well with easy-to-gather molecular data. Using 11 microsatellites and 40 individuals as references, it revealed a total of 177 alleles differing by their size or 297 microhaplotypes based on allele sequence identity (averaging 16 and 27 alleles per locus, respectively), with satisfactory accuracy with up to 20 contributors (Liggan et al. <span>2025</span>). This was the case for the bull kelp gametophyte study (Figure 2), where empirical validation was conducted on eight contributors, and field sample estimates ranged from one to 12 contributors. However, more complex mixtures (> 50 contributors) as well as missing genotypes due to degraded DNA in environmental samples were more challenging, calling for caution when applying the method in these specific cases. Improving PCR efficiency or library preparation can help generate more data from difficult samples, as suggested by the authors. It is worth noting; however, that Liggan et al. (<span>2025</span>) were able to estimate the NOC from field samples with r
环境DNA样本中种内遗传变异的研究最近获得了动力,因为它被证明是研究种群水平过程的有效方法(Andres et al. 2023)。虽然等位基因频率可以从样本中等位基因序列覆盖的分布推断出来,但检测到的等位基因数量可以用来估计贡献基因的数量(NOC),这是法医学中一个长期存在的问题。这一发展使估计一个物种的绝对丰度成为可能,为种群监测以及生态和进化研究开辟了新的可能性。虽然很有希望,但没有专门的工具提供直接实现它的方法。在本期的《分子生态资源》(Molecular Ecology Resources)中,ligan等人(2025)引入了两个新的R包,为该领域做出了受欢迎的贡献,这两个R包有助于估计从环境DNA中获得的混合样品的微卫星测序的多位点等位基因多样性和NOC。Amplicomsat R包通过基于序列的微卫星基因分型确定观察到的等位基因计数(基于序列长度和序列同一性)(图1A)。GenotypeQuant R包根据混合样本中观察到的等位基因数量和参考群体的等位基因频率估算NOC(图1B)。作者在实验室和实地使用模拟和实证工作对开发的方法进行了广泛的测试,提供了令人信服的证明其优势和局限性,但也为未来应用于其他生物模型或解决广泛的科学问题提供了有用的指导。重要的是,这些进展可以帮助支持正在进行的全球生物多样性监测工作。Liggan等人(2025)报告的结果很有希望,因为该方法在易于收集的分子数据上表现良好。利用11个微卫星和40个个体作为参考,共发现了177个大小不同的等位基因或297个基于等位基因序列一致性的微单倍型(平均每个位点分别为16个和27个等位基因),高达20个贡献者具有令人满意的准确性(Liggan et al. 2025)。牛海带配子体研究就是这种情况(图2),其中对8个贡献者进行了实证验证,现场样本估计范围从1到12个贡献者。然而,更复杂的混合物(50个贡献者)以及由于环境样本中DNA降解而缺失的基因型更具挑战性,因此在将该方法应用于这些特定病例时需要谨慎。正如作者所建议的那样,提高PCR效率或文库制备可以帮助从困难的样品中产生更多的数据。值得注意的是;然而,Liggan等人(2025)即使在测序失败率很高的情况下(野外样本中有50%缺失基因型),也能够以合理的置信度从野外样本中估计出NOC。这说明了该方法如何成功地应用于现实场景,并与之前在该领域取得重大进展的案例研究相结合(Andres et al. 2021)。在作者的研究案例中,等位基因序列同一性提供的额外信息(与等位基因大小相比)对于恢复NOC与采样表面积之间的预期相关性至关重要。这一实证结果说明了通过考虑编码为微单倍型的测序扩增子中的所有多态性所提供的丰富信息。在最近的一项应用中,使用生物信息学分析snp的GT-seq对74个核位点进行了定位,(Shi et al. 2025)在565条奇努克鲑鱼中鉴定出252个独特的微单倍型(平均每个位点有3.4个等位基因)。这个精心整理的数据集提供了足够的能力来解决多达10个个体的混合物,并且具有最小的空间来适应消化道降解DNA上的高缺失基因型。观察到的等位基因的数量对于准确推断尤为重要。因此,整合高度多态性的变异,如微卫星,是理想的最大化等位基因的数量在一个短的DNA片段。这种紧凑型标记也更有可能在降解环境DNA的情况下进行PCR扩增。值得注意的是,Liggan等人(2025)对已知存在其研究生物的特定微栖息地进行了采样,从而增加了从目标物种中捕获DNA的可能性。将这种方法应用于对分散在环境中的高度稀释的DNA进行采样可能更具挑战性。Liggan等人(2025)进行的模拟和经验验证清楚地表明,在涉及更复杂的DNA混合物或较少多态性标记的情况下,可以通过增加标记数量、参考样本中的个体数量或检测到的等位基因数量来改进。 后一种解决方案可能涉及对较长的标记进行测序,这将增加检测到的多态性的累积数量,从而增加观察到的等位基因的总数,如在人类中所示,在多个微卫星中,有数百个大单倍型预测为8 kb标记(Ge et al. 2021)。随着可观察到的等位基因数量的增加,在不受基因座饱和影响的情况下,可以获得更多的能力来解析复杂的DNA混合物(Andres et al. 2023)。因此,GenotypeQuant R包代表了对以前实现的一个受欢迎的改进,因为它可以容忍大量的等位基因,同时为处理复杂的混合物和高多态性提供更高的计算效率。另一种取得进展的方法是减少基因分型错误,以检测罕见的变异(Andres等人,2021),这可以通过使用新兴的数字测序来实现(Andersson等人,2024)。通过在早期方案步骤中用独特的分子指数标记每个原始DNA分子,数字测序可以确定每个原始DNA链的共识序列,并大大提高了PCR和测序过程中引入的罕见变异和错误之间的区别(Carlson et al. 2015)。虽然实现数字测序的方法很多,但可能只有少数几种适合降解和弱浓缩的eDNA,这需要进一步的特异性测试和开发。混合捕获测序(Ai et al. 2025)已成功用于通过关注富集线粒体DNA的替换来估计两种虾虎鱼物种的丰度,这可能比基于pcr的降解DNA方法更有效。即使没有进一步的技术改进,微单倍型数据现在也可以很容易地在Liggan等人(2025)开发的工具中进行处理,以估计交配过程中环境中释放配子的个体数量。它为理解复杂的生态过程开辟了新的可能性,例如,通过研究昆虫传播的花粉(Kämper等人,2025)或通过空气中的被动采样器捕获的风传播花粉(Lin等人,2025)来研究植物授粉。这类新数据将提供有关植物生殖景观的新信息。正如作者所指出的那样,他们的方法提高了eDNA的能力,使其能够进一步监测具有复杂生活史的物种种群,或者是那些微观的、难以捉摸的或罕见到足以用传统的直接采样方法监测的物种种群。此外,ligan等人(2025)取得的进展对支持全球生物多样性监测工作具有重要意义。同时确定群落水平物种多样性和跨多个物种种内多样性的方法仍然是进化和保护生物学家的灵丹妙药。尽管仍然存在许多挑战,例如准确估计精确的等位基因频率或汇总统计,如eDNA样本的杂合性,但Liggan等人(2025)和其他人(Andres等人,2021)在方法学上取得的进步正在为实现这一目标铺平道路。由于等位基因变异是进化生物学和保护的关键特征(Allendorf et al. 2024),从环境样本中测序的物种特异性微卫星观察到的等位基因计数从保护的角度提供了非常有价值的信息。首先,观察到的等位基因数量可以作为等位基因丰富度的代表,等位基因丰富度是监测遗传组成的六个基本生物多样性变量之一(Hoban et al. 2022)。此外,观察到的等位基因计数可以揭示特定人群中罕见或私有的等位基因,从而了解其遗传独特性(Kalinowski 2004)。最后,可以将多特异性观察到的等位基因计数纳入系统保护规划工具,以确定种内遗传多样性保护的优先区域(Paz-Vinas et al. 2018)。NOC及其衍生的个体密度可以帮助估计人口普查规模(Nc),这是人口监测的关键指标。这些估计可以帮助计算联合国生物多样性公约《昆明-蒙特利尔全
{"title":"Counting the Invisible: New Tools to Estimate the Number of Contributors From Sequence-Based Microsatellite Genotyping of Environmental DNA Samples","authors":"Olivier Lepais,&nbsp;Ivan Paz-Vinas","doi":"10.1111/1755-0998.70051","DOIUrl":"10.1111/1755-0998.70051","url":null,"abstract":"&lt;p&gt;The study of intraspecific genetic variation in environmental DNA samples has recently gained momentum following its demonstration as an effective method to study population-level processes (Andres et al. &lt;span&gt;2023&lt;/span&gt;). Although allele frequencies can be inferred from the distribution of allele sequence coverage within a sample, the number of detected alleles can be used to estimate the number of contributors (NOC), a long-standing issue in forensic science. This development enables the estimation of the absolute abundance of a species, opening up new possibilities for population monitoring and ecological and evolutionary studies. Although promising, no dedicated tools providing a straightforward way to implement it existed. In this issue of &lt;i&gt;Molecular Ecology Resources&lt;/i&gt;, Liggan et al. (&lt;span&gt;2025&lt;/span&gt;) make a welcome contribution to the field by introducing two new R packages that facilitate the estimation of multi-locus allelic diversity and of the NOC from the sequencing of microsatellites of mixed samples obtained from environmental DNA. The &lt;span&gt;Amplicomsat&lt;/span&gt; R package determines the observed allele count (based on sequence length and sequence identity) from sequence-based microsatellite genotyping (Figure 1A). The &lt;span&gt;GenotypeQuant&lt;/span&gt; R package estimates the NOC given the number of observed alleles within mixed samples and allele frequencies of a reference population (Figure 1B). The authors conducted extensive testing of the developed method using simulation and empirical work in the laboratory and in the field, providing a convincing demonstration of its strengths and limitations, but also useful guidelines for future applications to other biological models or to address a broad range of scientific inquiries. Importantly, these advances can help support ongoing global biodiversity monitoring efforts.&lt;/p&gt;&lt;p&gt;The results reported by Liggan et al. (&lt;span&gt;2025&lt;/span&gt;) are promising because the method performed well with easy-to-gather molecular data. Using 11 microsatellites and 40 individuals as references, it revealed a total of 177 alleles differing by their size or 297 microhaplotypes based on allele sequence identity (averaging 16 and 27 alleles per locus, respectively), with satisfactory accuracy with up to 20 contributors (Liggan et al. &lt;span&gt;2025&lt;/span&gt;). This was the case for the bull kelp gametophyte study (Figure 2), where empirical validation was conducted on eight contributors, and field sample estimates ranged from one to 12 contributors. However, more complex mixtures (&gt; 50 contributors) as well as missing genotypes due to degraded DNA in environmental samples were more challenging, calling for caution when applying the method in these specific cases. Improving PCR efficiency or library preparation can help generate more data from difficult samples, as suggested by the authors. It is worth noting; however, that Liggan et al. (&lt;span&gt;2025&lt;/span&gt;) were able to estimate the NOC from field samples with r","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Near Telomere-To-Telomere Genome Assembly of Coffea arabica (Mundo Novo) Provides Insights Into Its Secondary Metabolism 阿拉比卡咖啡(Mundo Novo)的近端粒到端粒基因组组装提供了其次级代谢的见解。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70053
Yi Liu, Hang Zong, Yaowu Xing, Xi Jiao, Zhuoya Liu, Yusheng Niu, Zhiling Yang, Shimeng Liu, Yongqiang Wang, Haodong Zhao, Xianqing Chen, Zhenzhu Li, Xiao Wang, Jing Cai, Wen Wang, Zhongkai Wang

Arabica coffee (Coffea arabica) dominates global coffee production, accounting for over 60% of the world's coffee trade. The Mundo Novo cultivar, predominantly grown in Yunnan, China, represents a significant germplasm resource. However, the absence of a high-quality reference genome has hindered comprehensive genetic research and in-depth investigation of secondary metabolic pathways in Arabica. In this study, we present the first near telomere-to-telomere (T2T) genome assembly of Arabica, achieved through the integration of PacBio HiFi, Oxford Nanopore ultra-long, and Hi-C sequencing technologies, representing the highest-quality Arabica genome to date. Phylogenetic analysis of N-methyltransferases (NMTs), the key enzymes responsible for caffeine biosynthesis, revealed their independent evolution across caffeine-producing clades including coffee, cacao, and tea. Furthermore, GO enrichment analysis of expanded gene families at the Arabica ancestral node, combined with fruit-specific transcriptomic profiling, revealed that glycosyltransferases likely play a critical role in the secondary metabolism of Arabica. Notably, functional characterisation demonstrated that a UGT (uridine diphosphate glycosyltransferase, UGT) from the UGT29 subfamily, which exhibited increased gene copy number in the Arabica subgenome C than its ancestor, can directly convert Rebaudioside A (Reb A) into Rebaudioside M (Reb M) through a single-step enzymatic glycosylation. This direct pathway represents a crucial advancement over conventional multi-UGTs biosynthetic routes of Reb M, which is a highly desirable sweetener whereas with limited natural abundance. Taken together, this study not only provides a valuable genomic resource for studying the unique secondary metabolic processes in C. arabica but also accelerates innovative research frontiers for the synthetic biological production of the valuable sweetener Reb M.

阿拉比卡咖啡(Coffea Arabica)主导着全球咖啡生产,占世界咖啡贸易的60%以上。该品种主要生长于中国云南,是一种重要的种质资源。然而,缺乏高质量的参考基因组阻碍了阿拉比卡的全面遗传研究和深入研究次生代谢途径。在这项研究中,我们通过整合PacBio HiFi、Oxford Nanopore超长测序和Hi-C测序技术,展示了阿拉比卡咖啡的第一个近端粒到端粒(T2T)基因组组装,代表了迄今为止最高质量的阿拉比卡基因组。n -甲基转移酶(NMTs)是咖啡因生物合成的关键酶,其系统发育分析揭示了它们在咖啡、可可和茶等咖啡因产生支系中的独立进化。此外,对阿拉比卡咖啡祖先节点扩展基因家族的氧化石墨烯富集分析,结合果实特异性转录组分析,揭示了糖基转移酶可能在阿拉比卡咖啡的次生代谢中发挥关键作用。值得注意的是,功能表征表明,来自UGT29亚家族的UGT(尿苷二磷酸糖基转移酶,UGT)在阿拉比卡咖啡亚基因组C中的基因拷贝数比其祖先增加,可以通过一步酶糖基化直接将雷鲍迪糖苷a (Reb a)转化为雷鲍迪糖苷M (Reb M)。这种直接途径代表了传统的多ugts生物合成途径的重要进步,Reb M是一种非常理想的甜味剂,但天然丰度有限。综上所述,该研究不仅为研究阿拉比卡咖啡独特的次生代谢过程提供了宝贵的基因组资源,而且加速了有价值的甜味剂Reb M的合成生物学生产的创新研究前沿。
{"title":"A Near Telomere-To-Telomere Genome Assembly of Coffea arabica (Mundo Novo) Provides Insights Into Its Secondary Metabolism","authors":"Yi Liu,&nbsp;Hang Zong,&nbsp;Yaowu Xing,&nbsp;Xi Jiao,&nbsp;Zhuoya Liu,&nbsp;Yusheng Niu,&nbsp;Zhiling Yang,&nbsp;Shimeng Liu,&nbsp;Yongqiang Wang,&nbsp;Haodong Zhao,&nbsp;Xianqing Chen,&nbsp;Zhenzhu Li,&nbsp;Xiao Wang,&nbsp;Jing Cai,&nbsp;Wen Wang,&nbsp;Zhongkai Wang","doi":"10.1111/1755-0998.70053","DOIUrl":"10.1111/1755-0998.70053","url":null,"abstract":"<p>Arabica coffee (<i>Coffea arabica</i>) dominates global coffee production, accounting for over 60% of the world's coffee trade. The Mundo Novo cultivar, predominantly grown in Yunnan, China, represents a significant germplasm resource. However, the absence of a high-quality reference genome has hindered comprehensive genetic research and in-depth investigation of secondary metabolic pathways in Arabica. In this study, we present the first near telomere-to-telomere (T2T) genome assembly of Arabica, achieved through the integration of PacBio HiFi, Oxford Nanopore ultra-long, and Hi-C sequencing technologies, representing the highest-quality Arabica genome to date. Phylogenetic analysis of N-methyltransferases (NMTs), the key enzymes responsible for caffeine biosynthesis, revealed their independent evolution across caffeine-producing clades including coffee, cacao, and tea. Furthermore, GO enrichment analysis of expanded gene families at the Arabica ancestral node, combined with fruit-specific transcriptomic profiling, revealed that glycosyltransferases likely play a critical role in the secondary metabolism of Arabica. Notably, functional characterisation demonstrated that a UGT (uridine diphosphate glycosyltransferase, UGT) from the UGT29 subfamily, which exhibited increased gene copy number in the Arabica subgenome C than its ancestor, can directly convert Rebaudioside A (Reb A) into Rebaudioside M (Reb M) through a single-step enzymatic glycosylation. This direct pathway represents a crucial advancement over conventional multi-UGTs biosynthetic routes of Reb M, which is a highly desirable sweetener whereas with limited natural abundance. Taken together, this study not only provides a valuable genomic resource for studying the unique secondary metabolic processes in <i>C. arabica</i> but also accelerates innovative research frontiers for the synthetic biological production of the valuable sweetener Reb M.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Convergence of Amino Acid Physicochemical Properties Underlying the Organismal Adaptive Convergent Evolution 生物体自适应趋同进化中氨基酸理化性质的趋同性检测。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70052
Shanshan Chen, Zhengting Zou

Many studies have proposed various comparative genomic methods to probe the molecular basis for adaptive functional convergence between species, conventionally by detecting the convergence of amino acid states between orthologous protein sequences of these species or lineages. However, different amino acids with similar physicochemical properties at a site may contribute to the functional similarity of the protein. Hence, could the convergence of amino acid physicochemical properties, in addition to state convergence, also contribute to adaptive convergence of organismal functions? Here we grouped amino acids into physicochemically similar classes, and developed computational pipelines to detect the Convergence of Amino Acid Properties (CAAP, https://github.com/shanschen33/CAAP) by modifying previous state convergence detection methods. Investigating three organismal convergence cases including echolocating mammals, marine mammals and woody mangroves, we found genes with CAAP that likely contribute to the respective functional adaptation, supported by orthogonal evidence such as functional enrichment and positive selection analyses. Our findings in multiple cases corroborate the hypothesis that CAAP may underlie adaptive convergent evolution of organismal functions, emphasising the importance of considering sequence features more complex than amino acid states when studying adaptive sequence convergence.

许多研究提出了各种比较基因组学方法来探测物种之间适应性功能趋同的分子基础,通常是通过检测这些物种或谱系的同源蛋白序列之间氨基酸状态的趋同。然而,在一个位点上具有相似物理化学性质的不同氨基酸可能有助于蛋白质的功能相似性。因此,除了状态趋同之外,氨基酸物理化学性质的趋同是否也有助于生物体功能的适应性趋同?在这里,我们将氨基酸分为物理化学上相似的类别,并通过修改先前的状态收敛检测方法,开发了计算管道来检测氨基酸性质的收敛性(CAAP, https://github.com/shanschen33/CAAP)。通过对回声定位哺乳动物、海洋哺乳动物和红树林三种生物趋同案例的研究,我们发现了具有CAAP的基因可能有助于各自的功能适应,并得到了功能富集和正选择分析等正交证据的支持。我们在多个案例中的研究结果证实了CAAP可能是生物体功能自适应收敛进化的基础,强调了在研究自适应序列收敛时考虑比氨基酸状态更复杂的序列特征的重要性。
{"title":"Detecting Convergence of Amino Acid Physicochemical Properties Underlying the Organismal Adaptive Convergent Evolution","authors":"Shanshan Chen,&nbsp;Zhengting Zou","doi":"10.1111/1755-0998.70052","DOIUrl":"10.1111/1755-0998.70052","url":null,"abstract":"<p>Many studies have proposed various comparative genomic methods to probe the molecular basis for adaptive functional convergence between species, conventionally by detecting the convergence of amino acid states between orthologous protein sequences of these species or lineages. However, different amino acids with similar physicochemical properties at a site may contribute to the functional similarity of the protein. Hence, could the convergence of amino acid physicochemical properties, in addition to state convergence, also contribute to adaptive convergence of organismal functions? Here we grouped amino acids into physicochemically similar classes, and developed computational pipelines to detect the Convergence of Amino Acid Properties (CAAP, https://github.com/shanschen33/CAAP) by modifying previous state convergence detection methods. Investigating three organismal convergence cases including echolocating mammals, marine mammals and woody mangroves, we found genes with CAAP that likely contribute to the respective functional adaptation, supported by orthogonal evidence such as functional enrichment and positive selection analyses. Our findings in multiple cases corroborate the hypothesis that CAAP may underlie adaptive convergent evolution of organismal functions, emphasising the importance of considering sequence features more complex than amino acid states when studying adaptive sequence convergence.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145147203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimising Extraction of DNA From Museum Insect Specimens 博物馆昆虫标本DNA的优化提取。
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70048
Andrew Dopheide, Thomas Buckley

DNA technologies have many advantages for biomonitoring and biodiversity analyses, but these depend on the availability of relevant reference DNA barcodes. To be most useful, a DNA barcode should be linked to a taxonomic name, which can in turn be connected to ecological information. This linking can be achieved by DNA barcoding of taxonomically identified specimens. Museums are a promising source of such specimens, but the DNA in museum specimens is often degraded, necessitating carefully optimised DNA extraction methods. In this issue of Molecular Ecology Resources, Holmquist et al. (2025) present a DNA extraction protocol for museum insect specimens, using in-house formulated Solid Phase Reversible Immobilisation (SPRI) beads. The authors carried out several experiments with statistical evaluation to determine optimal DNA extraction parameters, before testing the protocol on a large and diverse pool of museum-held insect specimens. The result is a low-cost and effective DNA extraction protocol for diverse museum insect specimens.

Insects are vitally important components of Earth's biodiversity, but monitoring these communities is challenging due to the huge diversity of species that exist. DNA sequencing technologies enable efficient molecular characterisation of insect diversity, but the resulting molecular taxonomic units are typically disconnected from species or functional information (Meier et al. 2024). This makes ecological insights difficult to achieve for wide swathes of biodiversity. Reference DNA barcodes from taxonomically identified species can bridge this gap (Kress et al. 2015), but the process of taxonomically identifying insect specimens is very difficult due to a scarcity of suitable taxonomic expertise. New workflows that combine machine learning with mass DNA barcoding of trapped insect samples have the potential to resolve this challenge over time (Meier et al. 2024). On the other hand, it is important to consider existing resources such as museum collections as sources of reference DNA barcodes.

Museums often hold rich collections of biological specimens, usually with taxonomic identifications, accumulated over long periods of time (Figure 1). In theory, these collections represent a compelling source of DNA barcodes (Raxworthy and Smith 2021). The DNA in museum specimens is often degraded due to specimen age and suboptimal preservation, however; this makes it difficult to recover DNA barcode sequences from some specimens (Hebert et al. 2013). Assembly of multiple shorter amplicons can be effective in cases where DNA fragmentation makes PCR amplification of standard barcode regions unfeasible (D'Ercole et al. 2021; Prosser et al. 2016), but this is more complex and costly than conventional DNA barcode generation. Therefore, it is important to develop optimal methods of DNA extraction from museum insect collections to

DNA技术在生物监测和生物多样性分析方面具有许多优势,但这取决于相关参考DNA条形码的可用性。为了发挥最大的作用,DNA条形码应该与一个分类学名称相关联,而分类学名称又可以与生态信息相关联。这种联系可以通过对分类学上鉴定的标本进行DNA条形码来实现。博物馆是这类标本的一个有希望的来源,但博物馆标本中的DNA经常被降解,需要精心优化的DNA提取方法。在本期的《分子生态资源》中,Holmquist等人(2025)提出了一种博物馆昆虫标本的DNA提取方案,使用内部配制的固相可逆固定化(SPRI)珠。在对博物馆保存的大量不同昆虫标本进行测试之前,作者进行了几次统计评估实验,以确定最佳的DNA提取参数。结果是一种低成本和有效的DNA提取方案,适用于各种博物馆昆虫标本。
{"title":"Optimising Extraction of DNA From Museum Insect Specimens","authors":"Andrew Dopheide,&nbsp;Thomas Buckley","doi":"10.1111/1755-0998.70048","DOIUrl":"10.1111/1755-0998.70048","url":null,"abstract":"<p>DNA technologies have many advantages for biomonitoring and biodiversity analyses, but these depend on the availability of relevant reference DNA barcodes. To be most useful, a DNA barcode should be linked to a taxonomic name, which can in turn be connected to ecological information. This linking can be achieved by DNA barcoding of taxonomically identified specimens. Museums are a promising source of such specimens, but the DNA in museum specimens is often degraded, necessitating carefully optimised DNA extraction methods. In this issue of Molecular Ecology Resources, Holmquist et al. (2025) present a DNA extraction protocol for museum insect specimens, using in-house formulated Solid Phase Reversible Immobilisation (SPRI) beads. The authors carried out several experiments with statistical evaluation to determine optimal DNA extraction parameters, before testing the protocol on a large and diverse pool of museum-held insect specimens. The result is a low-cost and effective DNA extraction protocol for diverse museum insect specimens.</p><p>Insects are vitally important components of Earth's biodiversity, but monitoring these communities is challenging due to the huge diversity of species that exist. DNA sequencing technologies enable efficient molecular characterisation of insect diversity, but the resulting molecular taxonomic units are typically disconnected from species or functional information (Meier et al. <span>2024</span>). This makes ecological insights difficult to achieve for wide swathes of biodiversity. Reference DNA barcodes from taxonomically identified species can bridge this gap (Kress et al. <span>2015</span>), but the process of taxonomically identifying insect specimens is very difficult due to a scarcity of suitable taxonomic expertise. New workflows that combine machine learning with mass DNA barcoding of trapped insect samples have the potential to resolve this challenge over time (Meier et al. <span>2024</span>). On the other hand, it is important to consider existing resources such as museum collections as sources of reference DNA barcodes.</p><p>Museums often hold rich collections of biological specimens, usually with taxonomic identifications, accumulated over long periods of time (Figure 1). In theory, these collections represent a compelling source of DNA barcodes (Raxworthy and Smith <span>2021</span>). The DNA in museum specimens is often degraded due to specimen age and suboptimal preservation, however; this makes it difficult to recover DNA barcode sequences from some specimens (Hebert et al. <span>2013</span>). Assembly of multiple shorter amplicons can be effective in cases where DNA fragmentation makes PCR amplification of standard barcode regions unfeasible (D'Ercole et al. <span>2021</span>; Prosser et al. <span>2016</span>), but this is more complex and costly than conventional DNA barcode generation. Therefore, it is important to develop optimal methods of DNA extraction from museum insect collections to ","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Amplicon Sequencing Be Replaced by Metagenomics for Biodiversity Inventories? 扩增子测序能被宏基因组学取代吗?
IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-09-25 DOI: 10.1111/1755-0998.70047
Lucas Elliott, Eric Coissac
<p>Biodiversity assessments are a critical part of ecological monitoring, food systems management, and many areas of research. Traditionally, recording which taxa are present in an area has been accomplished by resource-intensive surveys and the morphological identification of bulk samples by taxonomic experts. The advent of DNA metabarcoding has allowed many of these barriers to be circumvented by amplifying and sequencing taxon-specific DNA loci to create an inventory of what organisms are present in a sample. However, this method is limited to a small fraction of DNA present in a sample and is biased towards over- and under-representing certain organisms based on their genomic content. With continually decreasing sequencing and computational costs, metagenomic analysis of the entire DNA content of a sample aims to capture a more complete picture of an area's biodiversity. Callens et al. (<span>2025</span>) provide a direct comparison of metabarcoding and metagenomic analysis on morphologically identified macrobenthos bulk samples and detail a strategy for expanding the use of metagenomics in bulk sample characterisation. In their case study, metagenomics enabled the composition and biomass of the organisms in the samples to be reconstructed with greater accuracy. Contrary to common belief, this was achieved with a similar level of sequencing effort to that required for metabarcoding. Can this approach be generalised to any biodiversity inventory with the same success?</p><p>The DNA-based taxonomic annotation of environmental samples (eDNA) and bulk samples to characterise the biodiversity of an area has seen a rapid growth of implementation in the past decades. In addition to having a lower resource cost than surveys, DNA-based methods can detect species that are difficult to morphologically identify or observe in the wild. Previous comparisons between the DNA-based metabarcoding and metagenomic workflows have documented both largely overlapping taxonomic detections (Courtin et al. <span>2022</span>) as well as minimally overlapping datasets with inverse alpha diversity patterns (Hollman et al. <span>2025</span>).</p><p>Metabarcoding has been the most established and widely implemented of the two workflows, but metagenomics offers several advantages and challenges by comparison. By sequencing the entire DNA content of a sample with metagenomics, all organisms can potentially be identified and quantified instead of limiting the detection to a specific taxonomic group with metabarcoding primers. However, without amplification, DNA from taxa of interest is at risk of being swamped by organisms with a higher total biomass or being preferentially captured in a sample, leading to false negatives (Zimmermann et al. <span>2023</span>). In environmental samples, non-microbial DNA typically represents less than a third of the total DNA (Eisenhofer et al. <span>2024</span>), and sometimes less than 10%.</p><p>By not focusing on taxon-specific DNA loci, m
生物多样性评估是生态监测、粮食系统管理和许多研究领域的重要组成部分。传统上,记录一个地区存在哪些分类群是通过资源密集的调查和分类专家对大量样本的形态鉴定来完成的。DNA元条形码的出现使得许多这些障碍可以通过扩增和测序分类群特异性DNA位点来创建样本中存在的生物体的清单来规避。然而,这种方法仅限于样品中存在的一小部分DNA,并且基于其基因组内容偏向于过度或不足代表某些生物体。随着测序和计算成本的不断降低,对样本的整个DNA含量进行宏基因组分析的目的是更全面地了解一个地区的生物多样性。Callens等人(2025)对形态学鉴定的大型底生物大样本进行了元条形码和宏基因组分析的直接比较,并详细介绍了在大样本表征中扩大宏基因组学使用的策略。在他们的案例研究中,宏基因组学使样本中生物的组成和生物量得以更准确地重建。与通常的看法相反,这是通过与元条形码所需的类似水平的测序工作实现的。这种方法能否推广到任何生物多样性清单,并取得同样的成功?在过去的几十年里,基于dna的环境样本分类注释(eDNA)和大样本分类注释来描述一个地区的生物多样性已经看到了快速增长的实施。除了具有比调查更低的资源成本外,基于dna的方法可以检测难以在野外形态学上识别或观察的物种。先前基于dna的元条形码和宏基因组工作流程之间的比较记录了大量重叠的分类检测(Courtin et al. 2022)以及具有逆α多样性模式的最小重叠数据集(Hollman et al. 2025)。元条形码是这两种工作流程中最成熟和最广泛实施的,但元基因组学通过比较提供了一些优势和挑战。通过使用元基因组学对样本的全部DNA内容进行测序,可以潜在地鉴定和量化所有生物体,而不是使用元条形码引物将检测限制在特定的分类类群上。然而,如果没有扩增,来自感兴趣的分类群的DNA有被总生物量更高的生物淹没或被优先捕获在样品中的风险,导致假阴性(Zimmermann et al. 2023)。在环境样本中,非微生物DNA通常占总DNA的不到三分之一(Eisenhofer et al. 2024),有时甚至不到10%。宏基因组学不关注分类群特异性DNA位点,允许研究全基因组区域,为大量可能的系统发育和功能分析打开大门(Gelabert et al. 2021)。然而,由于真核生物基因组的大小和保守区域在分类群中共享的程度,只有一小部分植物和后动物基因组在物种水平上具有分类信息,导致许多元基因组研究将分类分配限制在属水平(例如,Wang et al. 2021; Elliott et al. 2025)。相比之下,元条形码引物可以对超过50%的检测类群实现物种水平的分辨率(garc<s:1> - pastor et al. 2022)。由于非信息性DNA片段加上环境样本中细菌DNA的过度代表,导致宏基因组研究的总体足迹更大,因此宏基因组工作流程需要增加测序和计算资源。然而,由此产生的大型数据集可以更容易地重新分析并重新用于未来的研究。在定量方面,Callens等人(2025)报道,与元条形码相比,元基因组学的相对读取丰度与生物量之间存在更强的相关性。然而,本研究中用于元条形码的COI基因旨在以许多不匹配的引物结合位点为代价来扩增广泛的分类群多样性(Deagle et al. 2014)。另外,更多的分类限制条形码区域已被证明在读取数和生物量之间产生更强的相关性(Elbrecht et al. 2016)。考虑到大型底栖动物样本的分类多样性和高内源DNA含量,Callens等人(2025)证明,在这种情况下,宏基因组学是一个更合适的工作流程,这取决于一个完整的、同样具有代表性的数据库。虽然宏基因组学避免了元编码所需的大量PCR扩增周期的偏倚效应,但它不能被认为是一种完全无偏的方法,因为已知鸟嘌呤-胞嘧啶含量等各种因素会影响生物体的最终DNA读取计数(Browne et al. 2020)。 大样本含有高浓度的新鲜内源DNA,可以通过宏基因组学在低测序深度下检测到(Callens et al. 2025),而环境DNA样本由微生物主导,具有更高的DNA复杂性,需要更多的数量级测序。部分由于宏基因组数据集的复杂性,假阳性分类群识别通常存在风险,许多工具报告基线率并建议按总读取计数的百分比进行过滤(Pedersen et al. 2016)。Callens等人(2025)计算出这个阈值为数据集的0.2%,并指出,即使新鲜的内源性DNA的百分比很高,整个参考数据库在读取计数低于该百分比时出现在每个样本中。这项研究是在一个包含26种生物的小型数据库中进行的,而大量样本,特别是环境样本,可以包含数量级更多的多样性。宏基因组分析的最大障碍之一是缺乏高质量的参考材料,其中许多多样性没有得到体现。基因组略读或低覆盖全基因组测序为扩展参考数据库提供了有效的方法(Alsos et al. 2020; Lavergne et al. 2025)。先前的研究甚至包括部分组装的数据版本,表明可以分类注释的DNA读数数量大幅增加(Wang et al. 2021)。为了充分利用基因组图谱中包含的信息,Callens等人(2025)使用基于k-mer的方法对未组装的基因组图谱作为参考材料,证明即使在1倍的覆盖率下,该物种的大多数reads也可以被分类。使用kraken2 (Wood等人,2019)等程序对用作参考数据库的未组装基因组图谱进行升级计算具有挑战性,但可以使用概率数据结构实现(Elliott等人,2025)。最终,对于所有的分析,宏基因组和元条形码方法都不能绝对优于其他方法。与往常一样,两者之间的选择高度依赖于研究问题、可用资金、样本组成和来源。理解这两个工作流的优点和局限性对于解释它们的结果至关重要。Callens等人(2025)在应用于大宗样本的生物多样性评估时,与元条形码相比,展示了宏基因组学的实用性。扩大低覆盖率基因组测序的参考数据库,同时开发管理这一大量数据的计算工具,将在未来不断扩大宏基因组学的价值。然而,作为科学家,重要的是要记住,对现实的衡量并不是现实本身。我们必须了解工具的局限性,并为每项任务选择最合适的工具,因为对一项任务最有效的工具可能对另一项任务最无效。E.C.构思并撰写了手稿。作者声明无利益冲突。
{"title":"Can Amplicon Sequencing Be Replaced by Metagenomics for Biodiversity Inventories?","authors":"Lucas Elliott,&nbsp;Eric Coissac","doi":"10.1111/1755-0998.70047","DOIUrl":"10.1111/1755-0998.70047","url":null,"abstract":"&lt;p&gt;Biodiversity assessments are a critical part of ecological monitoring, food systems management, and many areas of research. Traditionally, recording which taxa are present in an area has been accomplished by resource-intensive surveys and the morphological identification of bulk samples by taxonomic experts. The advent of DNA metabarcoding has allowed many of these barriers to be circumvented by amplifying and sequencing taxon-specific DNA loci to create an inventory of what organisms are present in a sample. However, this method is limited to a small fraction of DNA present in a sample and is biased towards over- and under-representing certain organisms based on their genomic content. With continually decreasing sequencing and computational costs, metagenomic analysis of the entire DNA content of a sample aims to capture a more complete picture of an area's biodiversity. Callens et al. (&lt;span&gt;2025&lt;/span&gt;) provide a direct comparison of metabarcoding and metagenomic analysis on morphologically identified macrobenthos bulk samples and detail a strategy for expanding the use of metagenomics in bulk sample characterisation. In their case study, metagenomics enabled the composition and biomass of the organisms in the samples to be reconstructed with greater accuracy. Contrary to common belief, this was achieved with a similar level of sequencing effort to that required for metabarcoding. Can this approach be generalised to any biodiversity inventory with the same success?&lt;/p&gt;&lt;p&gt;The DNA-based taxonomic annotation of environmental samples (eDNA) and bulk samples to characterise the biodiversity of an area has seen a rapid growth of implementation in the past decades. In addition to having a lower resource cost than surveys, DNA-based methods can detect species that are difficult to morphologically identify or observe in the wild. Previous comparisons between the DNA-based metabarcoding and metagenomic workflows have documented both largely overlapping taxonomic detections (Courtin et al. &lt;span&gt;2022&lt;/span&gt;) as well as minimally overlapping datasets with inverse alpha diversity patterns (Hollman et al. &lt;span&gt;2025&lt;/span&gt;).&lt;/p&gt;&lt;p&gt;Metabarcoding has been the most established and widely implemented of the two workflows, but metagenomics offers several advantages and challenges by comparison. By sequencing the entire DNA content of a sample with metagenomics, all organisms can potentially be identified and quantified instead of limiting the detection to a specific taxonomic group with metabarcoding primers. However, without amplification, DNA from taxa of interest is at risk of being swamped by organisms with a higher total biomass or being preferentially captured in a sample, leading to false negatives (Zimmermann et al. &lt;span&gt;2023&lt;/span&gt;). In environmental samples, non-microbial DNA typically represents less than a third of the total DNA (Eisenhofer et al. &lt;span&gt;2024&lt;/span&gt;), and sometimes less than 10%.&lt;/p&gt;&lt;p&gt;By not focusing on taxon-specific DNA loci, m","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145147159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Ecology Resources
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1