Molecular Ecology Resources最新文献_第6页

PopCluster Improves Accessibility, Speed and Accuracy of Available Genotypic Clustering Software PopCluster提高了可用基因型聚类软件的可访问性、速度和准确性。

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70050

Richard Ian Bailey

PopCluster (Wang 2024) represents a significant advancement in population structure analysis software, addressing key computational and methodological challenges that have limited the application of clustering methods to modern genomic datasets. The software, developed by Wang (2024), implements novel likelihood-based algorithms that substantially improve both speed and accuracy compared to existing methods like STRUCTURE and ADMIXTURE. Its most notable features include memory-efficient handling of millions of markers and individuals through 2-bit encoding and distributed computing via MPI, sophisticated treatment of unbalanced sampling through a scaling scheme, and the ability to handle both biallelic and multiallelic markers within a unified framework. PopCluster demonstrates particular strengths when analysing datasets with many assumed populations, weak differentiation between clusters, or highly unbalanced sample sizes—situations where current methods often fail. The software's multi-platform availability, integrated GUI for Windows users, and built-in simulation module further enhance its utility for researchers. As genomic datasets continue to grow in size and complexity, PopCluster provides essential capabilities for revealing fine-scale population structure that would otherwise remain hidden. I discuss the software's innovations in the context of current challenges in molecular ecology and highlight its potential applications in conservation genetics, domestication studies, and understanding complex admixture patterns.Since its inception, a major focus of population genetics has been on identifying and explaining population structure—the non-random distribution of genetic variation among individuals and populations. A variety of mechanisms can lead sexually reproducing populations to form two or more distinct multi-locus genotypic clusters, which may then evolve and adapt independently, leading to further divergence and even speciation, but may also admix and exchange genetic material. Indeed, Mallet (1995) suggested that the maintenance of distinct genotypic clusters in sympatry should be used as a formal definition of species delimitation.Since the seminal methodological developments of Pritchard et al. (2000) in creating the software Structure, the identification of genotypic clusters and admixture among them from multi-locus sequence data has become central to a variety of disciplines within the broad framework of molecular ecology. Examples include domestication studies (Matsuoka et al. 2002), human population genetics (1000 Genomes Project Consortium 2015; Allentoft et al. 2024), conservation genetics (Miller et al. 2012), and speciation research (Friedrich et al. 2023). The importance of clustering methods continues to grow as large whole genome datasets become available, allowing highly detailed re

PopCluster （Wang 2024）代表了人口结构分析软件的重大进步，解决了限制聚类方法应用于现代基因组数据集的关键计算和方法挑战。该软件由Wang（2024）开发，实现了新颖的基于似然的算法，与现有的方法（如STRUCTURE和ADMIXTURE）相比，大大提高了速度和准确性。其最显著的特点包括通过2位编码和MPI分布式计算来高效地处理数百万个标记和个体，通过缩放方案对不平衡采样进行复杂处理，以及在统一框架内处理双等位基因和多等位基因标记的能力。PopCluster在分析具有许多假定人口的数据集、集群之间的弱差异或高度不平衡的样本量（当前方法经常失败的情况）时显示出特别的优势。该软件的多平台可用性、面向Windows用户的集成GUI和内置仿真模块进一步增强了其对研究人员的实用性。随着基因组数据集的规模和复杂性不断增长，PopCluster提供了揭示精细人口结构的基本能力，否则这些结构将被隐藏。我在分子生态学当前挑战的背景下讨论了软件的创新，并强调了其在保护遗传学、驯化研究和理解复杂混合模式方面的潜在应用。从一开始，群体遗传学的一个主要焦点就是识别和解释群体结构——个体和群体之间遗传变异的非随机分布。多种机制可导致有性繁殖的种群形成两个或多个不同的多位点基因型集群，这些集群随后可能独立进化和适应，导致进一步的分化甚至物种形成，但也可能混合和交换遗传物质。事实上，Mallet（1995）提出，维持同属植物中独特的基因型集群应该作为物种划分的正式定义。自从Pritchard等人（2000）在创建软件结构方面的开创性方法论发展以来，从多位点序列数据中识别基因型集群及其混合物已成为分子生态学广泛框架内各种学科的核心。例子包括驯化研究（Matsuoka et al. 2002）、人类种群遗传学（1000 Genomes Project Consortium 2015; Allentoft et al. 2024）、保护遗传学（Miller et al. 2012）和物种形成研究（Friedrich et al. 2023）。随着大型全基因组数据集的可用，聚类方法的重要性继续增长，允许非常详细的聚类和混合恢复。重建分化和随后混合的时间模式的历史系统基因组学方法（例如TreeMix； Pickrell和Pritchard 2012）正变得越来越普遍，但识别当代集群的原始概念仍然非常重要，尤其是由于易于使用和解释。随着大型基因组数据集的不断增加，在不牺牲准确性的情况下提高速度和计算效率已成为开发新的基因型聚类软件的首要任务。在这个方向上已经取得了重大进展，包括admix （Alexander et al. 2009）和sNMF (Frichot et al. 2014)，最近增加的由Wang（2024）开发的软件PopCluster进一步提高了基因型聚类和外合分析的速度、准确性和可及性。PopCluster的一个主要焦点是有效地利用本地计算机和分布式集群上的可用内存，允许在笔记本电脑上分析整个基因组数据集，并在高性能集群上分析来自数百万个人的多达数百万个基因座。Wang（2024）表明，PopCluster可以处理比当前最流行的替代方案之一admix更大的数据集，并且在大多数情况下更快。另一个重点是多平台使用，该软件可以在Windows、Mac和Linux上运行。Windows上可用的GUI为构建编码管道经验较少的用户增加了用户友好性。我个人认为有用的另一个特性是文件转换工具，例如，它可以将VCF转换为结构风格的文件格式。当每个集群的样本量（通常事先不知道）很小或不平衡，假设k（集群数量）很大，或者集群之间的差异程度很低时，聚类软件就会出现主要问题。PopCluster特别关注这些情况下的改进估计。如图1 Wang（2024）所示，在极端恶劣的环境下，PopCluster的性能显著优于其他所有软件。然而，并不是所有的事情都可以完全自动化，用户在选择最合适的模型设置时仍然要承担一些责任。Wang介绍了一种“缩放”方案，允许用户预先确定他们的样本在每个集群的个体数量方面的不平衡程度。然而，虽然选择正确的缩放可以提高估计，但这通常是事先不知道的。因此，用户必须使用常识性方法来决定他们选择的缩放值是否产生合理的结果。仍然需要提前选择k（聚类数量），每个k多次运行以处理随机模型拟合和多模态似然面，并运行多个k以统计比较模型拟合并确定适当的聚类数量。这个过程可以自动化，并且使用PopCluster，每次运行都很快，但是对于大型数据集来说，这仍然会导致大量的运行时间。我想添加一个不限于PopCluster的技术点。最初的Structure软件通过搜索Hardy-Weinberg和连锁平衡来识别集群，而最近的软件不包括这种明确的群体遗传要求。正如Wang强调的那样，这意味着不需要将位点分离，因此可以使用全基因组数据。事实上，在许多情况下，增加更多的基因座增加了识别真正精细人口结构的可能性。ld修剪仍然是许多分析管道中的一个常见步骤，但从统计角度来看是不必要的，并且考虑到计算效率的提高，从将数据减少到可管理的大小的角度来看，通常也是不必要的。PopCluster和另一个最近的快速聚类软件Neural admix （Dominguez Mantes et al. 2023）之间还没有直接的比较。然而，两者都提供了明显的ADMIXTURE计算改进。PopCluster快速，内存高效，多平台，高度准确，用户友好，使其成为分子生态学软件库的一个受欢迎的补充。作者声明无利益冲突。

{"title":"PopCluster Improves Accessibility, Speed and Accuracy of Available Genotypic Clustering Software","authors":"Richard Ian Bailey","doi":"10.1111/1755-0998.70050","DOIUrl":"10.1111/1755-0998.70050","url":null,"abstract":"PopCluster (Wang 2024) represents a significant advancement in population structure analysis software, addressing key computational and methodological challenges that have limited the application of clustering methods to modern genomic datasets. The software, developed by Wang (2024), implements novel likelihood-based algorithms that substantially improve both speed and accuracy compared to existing methods like STRUCTURE and ADMIXTURE. Its most notable features include memory-efficient handling of millions of markers and individuals through 2-bit encoding and distributed computing via MPI, sophisticated treatment of unbalanced sampling through a scaling scheme, and the ability to handle both biallelic and multiallelic markers within a unified framework. PopCluster demonstrates particular strengths when analysing datasets with many assumed populations, weak differentiation between clusters, or highly unbalanced sample sizes—situations where current methods often fail. The software's multi-platform availability, integrated GUI for Windows users, and built-in simulation module further enhance its utility for researchers. As genomic datasets continue to grow in size and complexity, PopCluster provides essential capabilities for revealing fine-scale population structure that would otherwise remain hidden. I discuss the software's innovations in the context of current challenges in molecular ecology and highlight its potential applications in conservation genetics, domestication studies, and understanding complex admixture patterns.Since its inception, a major focus of population genetics has been on identifying and explaining population structure—the non-random distribution of genetic variation among individuals and populations. A variety of mechanisms can lead sexually reproducing populations to form two or more distinct multi-locus genotypic clusters, which may then evolve and adapt independently, leading to further divergence and even speciation, but may also admix and exchange genetic material. Indeed, Mallet (1995) suggested that the maintenance of distinct genotypic clusters in sympatry should be used as a formal definition of species delimitation.Since the seminal methodological developments of Pritchard et al. (2000) in creating the software Structure, the identification of genotypic clusters and admixture among them from multi-locus sequence data has become central to a variety of disciplines within the broad framework of molecular ecology. Examples include domestication studies (Matsuoka et al. 2002), human population genetics (1000 Genomes Project Consortium 2015; Allentoft et al. 2024), conservation genetics (Miller et al. 2012), and speciation research (Friedrich et al. 2023). The importance of clustering methods continues to grow as large whole genome datasets become available, allowing highly detailed re","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Counting the Invisible: New Tools to Estimate the Number of Contributors From Sequence-Based Microsatellite Genotyping of Environmental DNA Samples 计数看不见的：新工具估计贡献者的数量从基于序列的微卫星基因分型环境DNA样本。

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70051

Olivier Lepais, Ivan Paz-Vinas

The study of intraspecific genetic variation in environmental DNA samples has recently gained momentum following its demonstration as an effective method to study population-level processes (Andres et al. 2023). Although allele frequencies can be inferred from the distribution of allele sequence coverage within a sample, the number of detected alleles can be used to estimate the number of contributors (NOC), a long-standing issue in forensic science. This development enables the estimation of the absolute abundance of a species, opening up new possibilities for population monitoring and ecological and evolutionary studies. Although promising, no dedicated tools providing a straightforward way to implement it existed. In this issue of Molecular Ecology Resources, Liggan et al. (2025) make a welcome contribution to the field by introducing two new R packages that facilitate the estimation of multi-locus allelic diversity and of the NOC from the sequencing of microsatellites of mixed samples obtained from environmental DNA. The Amplicomsat R package determines the observed allele count (based on sequence length and sequence identity) from sequence-based microsatellite genotyping (Figure 1A). The GenotypeQuant R package estimates the NOC given the number of observed alleles within mixed samples and allele frequencies of a reference population (Figure 1B). The authors conducted extensive testing of the developed method using simulation and empirical work in the laboratory and in the field, providing a convincing demonstration of its strengths and limitations, but also useful guidelines for future applications to other biological models or to address a broad range of scientific inquiries. Importantly, these advances can help support ongoing global biodiversity monitoring efforts.The results reported by Liggan et al. (2025) are promising because the method performed well with easy-to-gather molecular data. Using 11 microsatellites and 40 individuals as references, it revealed a total of 177 alleles differing by their size or 297 microhaplotypes based on allele sequence identity (averaging 16 and 27 alleles per locus, respectively), with satisfactory accuracy with up to 20 contributors (Liggan et al. 2025). This was the case for the bull kelp gametophyte study (Figure 2), where empirical validation was conducted on eight contributors, and field sample estimates ranged from one to 12 contributors. However, more complex mixtures (> 50 contributors) as well as missing genotypes due to degraded DNA in environmental samples were more challenging, calling for caution when applying the method in these specific cases. Improving PCR efficiency or library preparation can help generate more data from difficult samples, as suggested by the authors. It is worth noting; however, that Liggan et al. (2025) were able to estimate the NOC from field samples with r

环境DNA样本中种内遗传变异的研究最近获得了动力，因为它被证明是研究种群水平过程的有效方法（Andres et al. 2023）。虽然等位基因频率可以从样本中等位基因序列覆盖的分布推断出来，但检测到的等位基因数量可以用来估计贡献基因的数量（NOC），这是法医学中一个长期存在的问题。这一发展使估计一个物种的绝对丰度成为可能，为种群监测以及生态和进化研究开辟了新的可能性。虽然很有希望，但没有专门的工具提供直接实现它的方法。在本期的《分子生态资源》（Molecular Ecology Resources）中，ligan等人（2025）引入了两个新的R包，为该领域做出了受欢迎的贡献，这两个R包有助于估计从环境DNA中获得的混合样品的微卫星测序的多位点等位基因多样性和NOC。Amplicomsat R包通过基于序列的微卫星基因分型确定观察到的等位基因计数（基于序列长度和序列同一性）（图1A）。GenotypeQuant R包根据混合样本中观察到的等位基因数量和参考群体的等位基因频率估算NOC（图1B）。作者在实验室和实地使用模拟和实证工作对开发的方法进行了广泛的测试，提供了令人信服的证明其优势和局限性，但也为未来应用于其他生物模型或解决广泛的科学问题提供了有用的指导。重要的是，这些进展可以帮助支持正在进行的全球生物多样性监测工作。Liggan等人（2025）报告的结果很有希望，因为该方法在易于收集的分子数据上表现良好。利用11个微卫星和40个个体作为参考，共发现了177个大小不同的等位基因或297个基于等位基因序列一致性的微单倍型（平均每个位点分别为16个和27个等位基因），高达20个贡献者具有令人满意的准确性（Liggan et al. 2025）。牛海带配子体研究就是这种情况（图2），其中对8个贡献者进行了实证验证，现场样本估计范围从1到12个贡献者。然而，更复杂的混合物（50个贡献者）以及由于环境样本中DNA降解而缺失的基因型更具挑战性，因此在将该方法应用于这些特定病例时需要谨慎。正如作者所建议的那样，提高PCR效率或文库制备可以帮助从困难的样品中产生更多的数据。值得注意的是；然而，Liggan等人（2025）即使在测序失败率很高的情况下（野外样本中有50%缺失基因型），也能够以合理的置信度从野外样本中估计出NOC。这说明了该方法如何成功地应用于现实场景，并与之前在该领域取得重大进展的案例研究相结合（Andres et al. 2021）。在作者的研究案例中，等位基因序列同一性提供的额外信息（与等位基因大小相比）对于恢复NOC与采样表面积之间的预期相关性至关重要。这一实证结果说明了通过考虑编码为微单倍型的测序扩增子中的所有多态性所提供的丰富信息。在最近的一项应用中，使用生物信息学分析snp的GT-seq对74个核位点进行了定位，（Shi et al. 2025）在565条奇努克鲑鱼中鉴定出252个独特的微单倍型（平均每个位点有3.4个等位基因）。这个精心整理的数据集提供了足够的能力来解决多达10个个体的混合物，并且具有最小的空间来适应消化道降解DNA上的高缺失基因型。观察到的等位基因的数量对于准确推断尤为重要。因此，整合高度多态性的变异，如微卫星，是理想的最大化等位基因的数量在一个短的DNA片段。这种紧凑型标记也更有可能在降解环境DNA的情况下进行PCR扩增。值得注意的是，Liggan等人（2025）对已知存在其研究生物的特定微栖息地进行了采样，从而增加了从目标物种中捕获DNA的可能性。将这种方法应用于对分散在环境中的高度稀释的DNA进行采样可能更具挑战性。Liggan等人（2025）进行的模拟和经验验证清楚地表明，在涉及更复杂的DNA混合物或较少多态性标记的情况下，可以通过增加标记数量、参考样本中的个体数量或检测到的等位基因数量来改进。后一种解决方案可能涉及对较长的标记进行测序，这将增加检测到的多态性的累积数量，从而增加观察到的等位基因的总数，如在人类中所示，在多个微卫星中，有数百个大单倍型预测为8 kb标记（Ge et al. 2021）。随着可观察到的等位基因数量的增加，在不受基因座饱和影响的情况下，可以获得更多的能力来解析复杂的DNA混合物（Andres et al. 2023）。因此，GenotypeQuant R包代表了对以前实现的一个受欢迎的改进，因为它可以容忍大量的等位基因，同时为处理复杂的混合物和高多态性提供更高的计算效率。另一种取得进展的方法是减少基因分型错误，以检测罕见的变异（Andres等人，2021），这可以通过使用新兴的数字测序来实现（Andersson等人，2024）。通过在早期方案步骤中用独特的分子指数标记每个原始DNA分子，数字测序可以确定每个原始DNA链的共识序列，并大大提高了PCR和测序过程中引入的罕见变异和错误之间的区别（Carlson et al. 2015）。虽然实现数字测序的方法很多，但可能只有少数几种适合降解和弱浓缩的eDNA，这需要进一步的特异性测试和开发。混合捕获测序（Ai et al. 2025）已成功用于通过关注富集线粒体DNA的替换来估计两种虾虎鱼物种的丰度，这可能比基于pcr的降解DNA方法更有效。即使没有进一步的技术改进，微单倍型数据现在也可以很容易地在Liggan等人（2025）开发的工具中进行处理，以估计交配过程中环境中释放配子的个体数量。它为理解复杂的生态过程开辟了新的可能性，例如，通过研究昆虫传播的花粉（Kämper等人，2025）或通过空气中的被动采样器捕获的风传播花粉（Lin等人，2025）来研究植物授粉。这类新数据将提供有关植物生殖景观的新信息。正如作者所指出的那样，他们的方法提高了eDNA的能力，使其能够进一步监测具有复杂生活史的物种种群，或者是那些微观的、难以捉摸的或罕见到足以用传统的直接采样方法监测的物种种群。此外，ligan等人（2025）取得的进展对支持全球生物多样性监测工作具有重要意义。同时确定群落水平物种多样性和跨多个物种种内多样性的方法仍然是进化和保护生物学家的灵丹妙药。尽管仍然存在许多挑战，例如准确估计精确的等位基因频率或汇总统计，如eDNA样本的杂合性，但Liggan等人（2025）和其他人（Andres等人，2021）在方法学上取得的进步正在为实现这一目标铺平道路。由于等位基因变异是进化生物学和保护的关键特征（Allendorf et al. 2024），从环境样本中测序的物种特异性微卫星观察到的等位基因计数从保护的角度提供了非常有价值的信息。首先，观察到的等位基因数量可以作为等位基因丰富度的代表，等位基因丰富度是监测遗传组成的六个基本生物多样性变量之一（Hoban et al. 2022）。此外，观察到的等位基因计数可以揭示特定人群中罕见或私有的等位基因，从而了解其遗传独特性（Kalinowski 2004）。最后，可以将多特异性观察到的等位基因计数纳入系统保护规划工具，以确定种内遗传多样性保护的优先区域（Paz-Vinas et al. 2018）。NOC及其衍生的个体密度可以帮助估计人口普查规模（Nc），这是人口监测的关键指标。这些估计可以帮助计算联合国生物多样性公约《昆明-蒙特利尔全

{"title":"Counting the Invisible: New Tools to Estimate the Number of Contributors From Sequence-Based Microsatellite Genotyping of Environmental DNA Samples","authors":"Olivier Lepais, Ivan Paz-Vinas","doi":"10.1111/1755-0998.70051","DOIUrl":"10.1111/1755-0998.70051","url":null,"abstract":"The study of intraspecific genetic variation in environmental DNA samples has recently gained momentum following its demonstration as an effective method to study population-level processes (Andres et al. 2023). Although allele frequencies can be inferred from the distribution of allele sequence coverage within a sample, the number of detected alleles can be used to estimate the number of contributors (NOC), a long-standing issue in forensic science. This development enables the estimation of the absolute abundance of a species, opening up new possibilities for population monitoring and ecological and evolutionary studies. Although promising, no dedicated tools providing a straightforward way to implement it existed. In this issue of Molecular Ecology Resources, Liggan et al. (2025) make a welcome contribution to the field by introducing two new R packages that facilitate the estimation of multi-locus allelic diversity and of the NOC from the sequencing of microsatellites of mixed samples obtained from environmental DNA. The Amplicomsat R package determines the observed allele count (based on sequence length and sequence identity) from sequence-based microsatellite genotyping (Figure 1A). The GenotypeQuant R package estimates the NOC given the number of observed alleles within mixed samples and allele frequencies of a reference population (Figure 1B). The authors conducted extensive testing of the developed method using simulation and empirical work in the laboratory and in the field, providing a convincing demonstration of its strengths and limitations, but also useful guidelines for future applications to other biological models or to address a broad range of scientific inquiries. Importantly, these advances can help support ongoing global biodiversity monitoring efforts.The results reported by Liggan et al. (2025) are promising because the method performed well with easy-to-gather molecular data. Using 11 microsatellites and 40 individuals as references, it revealed a total of 177 alleles differing by their size or 297 microhaplotypes based on allele sequence identity (averaging 16 and 27 alleles per locus, respectively), with satisfactory accuracy with up to 20 contributors (Liggan et al. 2025). This was the case for the bull kelp gametophyte study (Figure 2), where empirical validation was conducted on eight contributors, and field sample estimates ranged from one to 12 contributors. However, more complex mixtures (> 50 contributors) as well as missing genotypes due to degraded DNA in environmental samples were more challenging, calling for caution when applying the method in these specific cases. Improving PCR efficiency or library preparation can help generate more data from difficult samples, as suggested by the authors. It is worth noting; however, that Liggan et al. (2025) were able to estimate the NOC from field samples with r","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Near Telomere-To-Telomere Genome Assembly of Coffea arabica (Mundo Novo) Provides Insights Into Its Secondary Metabolism 阿拉比卡咖啡（Mundo Novo）的近端粒到端粒基因组组装提供了其次级代谢的见解。

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70053

Yi Liu, Hang Zong, Yaowu Xing, Xi Jiao, Zhuoya Liu, Yusheng Niu, Zhiling Yang, Shimeng Liu, Yongqiang Wang, Haodong Zhao, Xianqing Chen, Zhenzhu Li, Xiao Wang, Jing Cai, Wen Wang, Zhongkai Wang

Arabica coffee (Coffea arabica) dominates global coffee production, accounting for over 60% of the world's coffee trade. The Mundo Novo cultivar, predominantly grown in Yunnan, China, represents a significant germplasm resource. However, the absence of a high-quality reference genome has hindered comprehensive genetic research and in-depth investigation of secondary metabolic pathways in Arabica. In this study, we present the first near telomere-to-telomere (T2T) genome assembly of Arabica, achieved through the integration of PacBio HiFi, Oxford Nanopore ultra-long, and Hi-C sequencing technologies, representing the highest-quality Arabica genome to date. Phylogenetic analysis of N-methyltransferases (NMTs), the key enzymes responsible for caffeine biosynthesis, revealed their independent evolution across caffeine-producing clades including coffee, cacao, and tea. Furthermore, GO enrichment analysis of expanded gene families at the Arabica ancestral node, combined with fruit-specific transcriptomic profiling, revealed that glycosyltransferases likely play a critical role in the secondary metabolism of Arabica. Notably, functional characterisation demonstrated that a UGT (uridine diphosphate glycosyltransferase, UGT) from the UGT29 subfamily, which exhibited increased gene copy number in the Arabica subgenome C than its ancestor, can directly convert Rebaudioside A (Reb A) into Rebaudioside M (Reb M) through a single-step enzymatic glycosylation. This direct pathway represents a crucial advancement over conventional multi-UGTs biosynthetic routes of Reb M, which is a highly desirable sweetener whereas with limited natural abundance. Taken together, this study not only provides a valuable genomic resource for studying the unique secondary metabolic processes in C. arabica but also accelerates innovative research frontiers for the synthetic biological production of the valuable sweetener Reb M.

阿拉比卡咖啡（Coffea Arabica）主导着全球咖啡生产，占世界咖啡贸易的60%以上。该品种主要生长于中国云南，是一种重要的种质资源。然而，缺乏高质量的参考基因组阻碍了阿拉比卡的全面遗传研究和深入研究次生代谢途径。在这项研究中，我们通过整合PacBio HiFi、Oxford Nanopore超长测序和Hi-C测序技术，展示了阿拉比卡咖啡的第一个近端粒到端粒（T2T）基因组组装，代表了迄今为止最高质量的阿拉比卡基因组。n -甲基转移酶（NMTs）是咖啡因生物合成的关键酶，其系统发育分析揭示了它们在咖啡、可可和茶等咖啡因产生支系中的独立进化。此外，对阿拉比卡咖啡祖先节点扩展基因家族的氧化石墨烯富集分析，结合果实特异性转录组分析，揭示了糖基转移酶可能在阿拉比卡咖啡的次生代谢中发挥关键作用。值得注意的是，功能表征表明，来自UGT29亚家族的UGT（尿苷二磷酸糖基转移酶，UGT）在阿拉比卡咖啡亚基因组C中的基因拷贝数比其祖先增加，可以通过一步酶糖基化直接将雷鲍迪糖苷a （Reb a）转化为雷鲍迪糖苷M （Reb M）。这种直接途径代表了传统的多ugts生物合成途径的重要进步，Reb M是一种非常理想的甜味剂，但天然丰度有限。综上所述，该研究不仅为研究阿拉比卡咖啡独特的次生代谢过程提供了宝贵的基因组资源，而且加速了有价值的甜味剂Reb M的合成生物学生产的创新研究前沿。

{"title":"A Near Telomere-To-Telomere Genome Assembly of Coffea arabica (Mundo Novo) Provides Insights Into Its Secondary Metabolism","authors":"Yi Liu, Hang Zong, Yaowu Xing, Xi Jiao, Zhuoya Liu, Yusheng Niu, Zhiling Yang, Shimeng Liu, Yongqiang Wang, Haodong Zhao, Xianqing Chen, Zhenzhu Li, Xiao Wang, Jing Cai, Wen Wang, Zhongkai Wang","doi":"10.1111/1755-0998.70053","DOIUrl":"10.1111/1755-0998.70053","url":null,"abstract":"Arabica coffee (Coffea arabica) dominates global coffee production, accounting for over 60% of the world's coffee trade. The Mundo Novo cultivar, predominantly grown in Yunnan, China, represents a significant germplasm resource. However, the absence of a high-quality reference genome has hindered comprehensive genetic research and in-depth investigation of secondary metabolic pathways in Arabica. In this study, we present the first near telomere-to-telomere (T2T) genome assembly of Arabica, achieved through the integration of PacBio HiFi, Oxford Nanopore ultra-long, and Hi-C sequencing technologies, representing the highest-quality Arabica genome to date. Phylogenetic analysis of N-methyltransferases (NMTs), the key enzymes responsible for caffeine biosynthesis, revealed their independent evolution across caffeine-producing clades including coffee, cacao, and tea. Furthermore, GO enrichment analysis of expanded gene families at the Arabica ancestral node, combined with fruit-specific transcriptomic profiling, revealed that glycosyltransferases likely play a critical role in the secondary metabolism of Arabica. Notably, functional characterisation demonstrated that a UGT (uridine diphosphate glycosyltransferase, UGT) from the UGT29 subfamily, which exhibited increased gene copy number in the Arabica subgenome C than its ancestor, can directly convert Rebaudioside A (Reb A) into Rebaudioside M (Reb M) through a single-step enzymatic glycosylation. This direct pathway represents a crucial advancement over conventional multi-UGTs biosynthetic routes of Reb M, which is a highly desirable sweetener whereas with limited natural abundance. Taken together, this study not only provides a valuable genomic resource for studying the unique secondary metabolic processes in C. arabica but also accelerates innovative research frontiers for the synthetic biological production of the valuable sweetener Reb M.","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Convergence of Amino Acid Physicochemical Properties Underlying the Organismal Adaptive Convergent Evolution 生物体自适应趋同进化中氨基酸理化性质的趋同性检测。

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70052

Shanshan Chen, Zhengting Zou

Many studies have proposed various comparative genomic methods to probe the molecular basis for adaptive functional convergence between species, conventionally by detecting the convergence of amino acid states between orthologous protein sequences of these species or lineages. However, different amino acids with similar physicochemical properties at a site may contribute to the functional similarity of the protein. Hence, could the convergence of amino acid physicochemical properties, in addition to state convergence, also contribute to adaptive convergence of organismal functions? Here we grouped amino acids into physicochemically similar classes, and developed computational pipelines to detect the Convergence of Amino Acid Properties (CAAP, https://github.com/shanschen33/CAAP) by modifying previous state convergence detection methods. Investigating three organismal convergence cases including echolocating mammals, marine mammals and woody mangroves, we found genes with CAAP that likely contribute to the respective functional adaptation, supported by orthogonal evidence such as functional enrichment and positive selection analyses. Our findings in multiple cases corroborate the hypothesis that CAAP may underlie adaptive convergent evolution of organismal functions, emphasising the importance of considering sequence features more complex than amino acid states when studying adaptive sequence convergence.

许多研究提出了各种比较基因组学方法来探测物种之间适应性功能趋同的分子基础，通常是通过检测这些物种或谱系的同源蛋白序列之间氨基酸状态的趋同。然而，在一个位点上具有相似物理化学性质的不同氨基酸可能有助于蛋白质的功能相似性。因此，除了状态趋同之外，氨基酸物理化学性质的趋同是否也有助于生物体功能的适应性趋同？在这里，我们将氨基酸分为物理化学上相似的类别，并通过修改先前的状态收敛检测方法，开发了计算管道来检测氨基酸性质的收敛性（CAAP, https://github.com/shanschen33/CAAP）。通过对回声定位哺乳动物、海洋哺乳动物和红树林三种生物趋同案例的研究，我们发现了具有CAAP的基因可能有助于各自的功能适应，并得到了功能富集和正选择分析等正交证据的支持。我们在多个案例中的研究结果证实了CAAP可能是生物体功能自适应收敛进化的基础，强调了在研究自适应序列收敛时考虑比氨基酸状态更复杂的序列特征的重要性。

{"title":"Detecting Convergence of Amino Acid Physicochemical Properties Underlying the Organismal Adaptive Convergent Evolution","authors":"Shanshan Chen, Zhengting Zou","doi":"10.1111/1755-0998.70052","DOIUrl":"10.1111/1755-0998.70052","url":null,"abstract":"Many studies have proposed various comparative genomic methods to probe the molecular basis for adaptive functional convergence between species, conventionally by detecting the convergence of amino acid states between orthologous protein sequences of these species or lineages. However, different amino acids with similar physicochemical properties at a site may contribute to the functional similarity of the protein. Hence, could the convergence of amino acid physicochemical properties, in addition to state convergence, also contribute to adaptive convergence of organismal functions? Here we grouped amino acids into physicochemically similar classes, and developed computational pipelines to detect the Convergence of Amino Acid Properties (CAAP, https://github.com/shanschen33/CAAP) by modifying previous state convergence detection methods. Investigating three organismal convergence cases including echolocating mammals, marine mammals and woody mangroves, we found genes with CAAP that likely contribute to the respective functional adaptation, supported by orthogonal evidence such as functional enrichment and positive selection analyses. Our findings in multiple cases corroborate the hypothesis that CAAP may underlie adaptive convergent evolution of organismal functions, emphasising the importance of considering sequence features more complex than amino acid states when studying adaptive sequence convergence.","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145147203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimising Extraction of DNA From Museum Insect Specimens 博物馆昆虫标本DNA的优化提取。

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-26 DOI: 10.1111/1755-0998.70048

Andrew Dopheide, Thomas Buckley

DNA technologies have many advantages for biomonitoring and biodiversity analyses, but these depend on the availability of relevant reference DNA barcodes. To be most useful, a DNA barcode should be linked to a taxonomic name, which can in turn be connected to ecological information. This linking can be achieved by DNA barcoding of taxonomically identified specimens. Museums are a promising source of such specimens, but the DNA in museum specimens is often degraded, necessitating carefully optimised DNA extraction methods. In this issue of Molecular Ecology Resources, Holmquist et al. (2025) present a DNA extraction protocol for museum insect specimens, using in-house formulated Solid Phase Reversible Immobilisation (SPRI) beads. The authors carried out several experiments with statistical evaluation to determine optimal DNA extraction parameters, before testing the protocol on a large and diverse pool of museum-held insect specimens. The result is a low-cost and effective DNA extraction protocol for diverse museum insect specimens.

Insects are vitally important components of Earth's biodiversity, but monitoring these communities is challenging due to the huge diversity of species that exist. DNA sequencing technologies enable efficient molecular characterisation of insect diversity, but the resulting molecular taxonomic units are typically disconnected from species or functional information (Meier et al. 2024). This makes ecological insights difficult to achieve for wide swathes of biodiversity. Reference DNA barcodes from taxonomically identified species can bridge this gap (Kress et al. 2015), but the process of taxonomically identifying insect specimens is very difficult due to a scarcity of suitable taxonomic expertise. New workflows that combine machine learning with mass DNA barcoding of trapped insect samples have the potential to resolve this challenge over time (Meier et al. 2024). On the other hand, it is important to consider existing resources such as museum collections as sources of reference DNA barcodes.

Museums often hold rich collections of biological specimens, usually with taxonomic identifications, accumulated over long periods of time (Figure 1). In theory, these collections represent a compelling source of DNA barcodes (Raxworthy and Smith 2021). The DNA in museum specimens is often degraded due to specimen age and suboptimal preservation, however; this makes it difficult to recover DNA barcode sequences from some specimens (Hebert et al. 2013). Assembly of multiple shorter amplicons can be effective in cases where DNA fragmentation makes PCR amplification of standard barcode regions unfeasible (D'Ercole et al. 2021; Prosser et al. 2016), but this is more complex and costly than conventional DNA barcode generation. Therefore, it is important to develop optimal methods of DNA extraction from museum insect collections to

DNA技术在生物监测和生物多样性分析方面具有许多优势，但这取决于相关参考DNA条形码的可用性。为了发挥最大的作用，DNA条形码应该与一个分类学名称相关联，而分类学名称又可以与生态信息相关联。这种联系可以通过对分类学上鉴定的标本进行DNA条形码来实现。博物馆是这类标本的一个有希望的来源，但博物馆标本中的DNA经常被降解，需要精心优化的DNA提取方法。在本期的《分子生态资源》中，Holmquist等人（2025）提出了一种博物馆昆虫标本的DNA提取方案，使用内部配制的固相可逆固定化（SPRI）珠。在对博物馆保存的大量不同昆虫标本进行测试之前，作者进行了几次统计评估实验，以确定最佳的DNA提取参数。结果是一种低成本和有效的DNA提取方案，适用于各种博物馆昆虫标本。

{"title":"Optimising Extraction of DNA From Museum Insect Specimens","authors":"Andrew Dopheide, Thomas Buckley","doi":"10.1111/1755-0998.70048","DOIUrl":"10.1111/1755-0998.70048","url":null,"abstract":"DNA technologies have many advantages for biomonitoring and biodiversity analyses, but these depend on the availability of relevant reference DNA barcodes. To be most useful, a DNA barcode should be linked to a taxonomic name, which can in turn be connected to ecological information. This linking can be achieved by DNA barcoding of taxonomically identified specimens. Museums are a promising source of such specimens, but the DNA in museum specimens is often degraded, necessitating carefully optimised DNA extraction methods. In this issue of Molecular Ecology Resources, Holmquist et al. (2025) present a DNA extraction protocol for museum insect specimens, using in-house formulated Solid Phase Reversible Immobilisation (SPRI) beads. The authors carried out several experiments with statistical evaluation to determine optimal DNA extraction parameters, before testing the protocol on a large and diverse pool of museum-held insect specimens. The result is a low-cost and effective DNA extraction protocol for diverse museum insect specimens.Insects are vitally important components of Earth's biodiversity, but monitoring these communities is challenging due to the huge diversity of species that exist. DNA sequencing technologies enable efficient molecular characterisation of insect diversity, but the resulting molecular taxonomic units are typically disconnected from species or functional information (Meier et al. 2024). This makes ecological insights difficult to achieve for wide swathes of biodiversity. Reference DNA barcodes from taxonomically identified species can bridge this gap (Kress et al. 2015), but the process of taxonomically identifying insect specimens is very difficult due to a scarcity of suitable taxonomic expertise. New workflows that combine machine learning with mass DNA barcoding of trapped insect samples have the potential to resolve this challenge over time (Meier et al. 2024). On the other hand, it is important to consider existing resources such as museum collections as sources of reference DNA barcodes.Museums often hold rich collections of biological specimens, usually with taxonomic identifications, accumulated over long periods of time (Figure 1). In theory, these collections represent a compelling source of DNA barcodes (Raxworthy and Smith 2021). The DNA in museum specimens is often degraded due to specimen age and suboptimal preservation, however; this makes it difficult to recover DNA barcode sequences from some specimens (Hebert et al. 2013). Assembly of multiple shorter amplicons can be effective in cases where DNA fragmentation makes PCR amplification of standard barcode regions unfeasible (D'Ercole et al. 2021; Prosser et al. 2016), but this is more complex and costly than conventional DNA barcode generation. Therefore, it is important to develop optimal methods of DNA extraction from museum insect collections to ","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Can Amplicon Sequencing Be Replaced by Metagenomics for Biodiversity Inventories? 扩增子测序能被宏基因组学取代吗？

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-25 DOI: 10.1111/1755-0998.70047

Lucas Elliott, Eric Coissac

Biodiversity assessments are a critical part of ecological monitoring, food systems management, and many areas of research. Traditionally, recording which taxa are present in an area has been accomplished by resource-intensive surveys and the morphological identification of bulk samples by taxonomic experts. The advent of DNA metabarcoding has allowed many of these barriers to be circumvented by amplifying and sequencing taxon-specific DNA loci to create an inventory of what organisms are present in a sample. However, this method is limited to a small fraction of DNA present in a sample and is biased towards over- and under-representing certain organisms based on their genomic content. With continually decreasing sequencing and computational costs, metagenomic analysis of the entire DNA content of a sample aims to capture a more complete picture of an area's biodiversity. Callens et al. (2025) provide a direct comparison of metabarcoding and metagenomic analysis on morphologically identified macrobenthos bulk samples and detail a strategy for expanding the use of metagenomics in bulk sample characterisation. In their case study, metagenomics enabled the composition and biomass of the organisms in the samples to be reconstructed with greater accuracy. Contrary to common belief, this was achieved with a similar level of sequencing effort to that required for metabarcoding. Can this approach be generalised to any biodiversity inventory with the same success?The DNA-based taxonomic annotation of environmental samples (eDNA) and bulk samples to characterise the biodiversity of an area has seen a rapid growth of implementation in the past decades. In addition to having a lower resource cost than surveys, DNA-based methods can detect species that are difficult to morphologically identify or observe in the wild. Previous comparisons between the DNA-based metabarcoding and metagenomic workflows have documented both largely overlapping taxonomic detections (Courtin et al. 2022) as well as minimally overlapping datasets with inverse alpha diversity patterns (Hollman et al. 2025).Metabarcoding has been the most established and widely implemented of the two workflows, but metagenomics offers several advantages and challenges by comparison. By sequencing the entire DNA content of a sample with metagenomics, all organisms can potentially be identified and quantified instead of limiting the detection to a specific taxonomic group with metabarcoding primers. However, without amplification, DNA from taxa of interest is at risk of being swamped by organisms with a higher total biomass or being preferentially captured in a sample, leading to false negatives (Zimmermann et al. 2023). In environmental samples, non-microbial DNA typically represents less than a third of the total DNA (Eisenhofer et al. 2024), and sometimes less than 10%.By not focusing on taxon-specific DNA loci, m

生物多样性评估是生态监测、粮食系统管理和许多研究领域的重要组成部分。传统上，记录一个地区存在哪些分类群是通过资源密集的调查和分类专家对大量样本的形态鉴定来完成的。DNA元条形码的出现使得许多这些障碍可以通过扩增和测序分类群特异性DNA位点来创建样本中存在的生物体的清单来规避。然而，这种方法仅限于样品中存在的一小部分DNA，并且基于其基因组内容偏向于过度或不足代表某些生物体。随着测序和计算成本的不断降低，对样本的整个DNA含量进行宏基因组分析的目的是更全面地了解一个地区的生物多样性。Callens等人（2025）对形态学鉴定的大型底生物大样本进行了元条形码和宏基因组分析的直接比较，并详细介绍了在大样本表征中扩大宏基因组学使用的策略。在他们的案例研究中，宏基因组学使样本中生物的组成和生物量得以更准确地重建。与通常的看法相反，这是通过与元条形码所需的类似水平的测序工作实现的。这种方法能否推广到任何生物多样性清单，并取得同样的成功？在过去的几十年里，基于dna的环境样本分类注释（eDNA）和大样本分类注释来描述一个地区的生物多样性已经看到了快速增长的实施。除了具有比调查更低的资源成本外，基于dna的方法可以检测难以在野外形态学上识别或观察的物种。先前基于dna的元条形码和宏基因组工作流程之间的比较记录了大量重叠的分类检测（Courtin et al. 2022）以及具有逆α多样性模式的最小重叠数据集（Hollman et al. 2025）。元条形码是这两种工作流程中最成熟和最广泛实施的，但元基因组学通过比较提供了一些优势和挑战。通过使用元基因组学对样本的全部DNA内容进行测序，可以潜在地鉴定和量化所有生物体，而不是使用元条形码引物将检测限制在特定的分类类群上。然而，如果没有扩增，来自感兴趣的分类群的DNA有被总生物量更高的生物淹没或被优先捕获在样品中的风险，导致假阴性（Zimmermann et al. 2023）。在环境样本中，非微生物DNA通常占总DNA的不到三分之一（Eisenhofer et al. 2024），有时甚至不到10%。宏基因组学不关注分类群特异性DNA位点，允许研究全基因组区域，为大量可能的系统发育和功能分析打开大门（Gelabert et al. 2021）。然而，由于真核生物基因组的大小和保守区域在分类群中共享的程度，只有一小部分植物和后动物基因组在物种水平上具有分类信息，导致许多元基因组研究将分类分配限制在属水平（例如，Wang et al. 2021; Elliott et al. 2025）。相比之下，元条形码引物可以对超过50%的检测类群实现物种水平的分辨率（garc<s:1> - pastor et al. 2022）。由于非信息性DNA片段加上环境样本中细菌DNA的过度代表，导致宏基因组研究的总体足迹更大，因此宏基因组工作流程需要增加测序和计算资源。然而，由此产生的大型数据集可以更容易地重新分析并重新用于未来的研究。在定量方面，Callens等人（2025）报道，与元条形码相比，元基因组学的相对读取丰度与生物量之间存在更强的相关性。然而，本研究中用于元条形码的COI基因旨在以许多不匹配的引物结合位点为代价来扩增广泛的分类群多样性（Deagle et al. 2014）。另外，更多的分类限制条形码区域已被证明在读取数和生物量之间产生更强的相关性（Elbrecht et al. 2016）。考虑到大型底栖动物样本的分类多样性和高内源DNA含量，Callens等人（2025）证明，在这种情况下，宏基因组学是一个更合适的工作流程，这取决于一个完整的、同样具有代表性的数据库。虽然宏基因组学避免了元编码所需的大量PCR扩增周期的偏倚效应，但它不能被认为是一种完全无偏的方法，因为已知鸟嘌呤-胞嘧啶含量等各种因素会影响生物体的最终DNA读取计数（Browne et al. 2020）。大样本含有高浓度的新鲜内源DNA，可以通过宏基因组学在低测序深度下检测到（Callens et al. 2025），而环境DNA样本由微生物主导，具有更高的DNA复杂性，需要更多的数量级测序。部分由于宏基因组数据集的复杂性，假阳性分类群识别通常存在风险，许多工具报告基线率并建议按总读取计数的百分比进行过滤（Pedersen et al. 2016）。Callens等人（2025）计算出这个阈值为数据集的0.2%，并指出，即使新鲜的内源性DNA的百分比很高，整个参考数据库在读取计数低于该百分比时出现在每个样本中。这项研究是在一个包含26种生物的小型数据库中进行的，而大量样本，特别是环境样本，可以包含数量级更多的多样性。宏基因组分析的最大障碍之一是缺乏高质量的参考材料，其中许多多样性没有得到体现。基因组略读或低覆盖全基因组测序为扩展参考数据库提供了有效的方法（Alsos et al. 2020; Lavergne et al. 2025）。先前的研究甚至包括部分组装的数据版本，表明可以分类注释的DNA读数数量大幅增加（Wang et al. 2021）。为了充分利用基因组图谱中包含的信息，Callens等人（2025）使用基于k-mer的方法对未组装的基因组图谱作为参考材料，证明即使在1倍的覆盖率下，该物种的大多数reads也可以被分类。使用kraken2 （Wood等人，2019）等程序对用作参考数据库的未组装基因组图谱进行升级计算具有挑战性，但可以使用概率数据结构实现（Elliott等人，2025）。最终，对于所有的分析，宏基因组和元条形码方法都不能绝对优于其他方法。与往常一样，两者之间的选择高度依赖于研究问题、可用资金、样本组成和来源。理解这两个工作流的优点和局限性对于解释它们的结果至关重要。Callens等人（2025）在应用于大宗样本的生物多样性评估时，与元条形码相比，展示了宏基因组学的实用性。扩大低覆盖率基因组测序的参考数据库，同时开发管理这一大量数据的计算工具，将在未来不断扩大宏基因组学的价值。然而，作为科学家，重要的是要记住，对现实的衡量并不是现实本身。我们必须了解工具的局限性，并为每项任务选择最合适的工具，因为对一项任务最有效的工具可能对另一项任务最无效。E.C.构思并撰写了手稿。作者声明无利益冲突。

{"title":"Can Amplicon Sequencing Be Replaced by Metagenomics for Biodiversity Inventories?","authors":"Lucas Elliott, Eric Coissac","doi":"10.1111/1755-0998.70047","DOIUrl":"10.1111/1755-0998.70047","url":null,"abstract":"Biodiversity assessments are a critical part of ecological monitoring, food systems management, and many areas of research. Traditionally, recording which taxa are present in an area has been accomplished by resource-intensive surveys and the morphological identification of bulk samples by taxonomic experts. The advent of DNA metabarcoding has allowed many of these barriers to be circumvented by amplifying and sequencing taxon-specific DNA loci to create an inventory of what organisms are present in a sample. However, this method is limited to a small fraction of DNA present in a sample and is biased towards over- and under-representing certain organisms based on their genomic content. With continually decreasing sequencing and computational costs, metagenomic analysis of the entire DNA content of a sample aims to capture a more complete picture of an area's biodiversity. Callens et al. (2025) provide a direct comparison of metabarcoding and metagenomic analysis on morphologically identified macrobenthos bulk samples and detail a strategy for expanding the use of metagenomics in bulk sample characterisation. In their case study, metagenomics enabled the composition and biomass of the organisms in the samples to be reconstructed with greater accuracy. Contrary to common belief, this was achieved with a similar level of sequencing effort to that required for metabarcoding. Can this approach be generalised to any biodiversity inventory with the same success?The DNA-based taxonomic annotation of environmental samples (eDNA) and bulk samples to characterise the biodiversity of an area has seen a rapid growth of implementation in the past decades. In addition to having a lower resource cost than surveys, DNA-based methods can detect species that are difficult to morphologically identify or observe in the wild. Previous comparisons between the DNA-based metabarcoding and metagenomic workflows have documented both largely overlapping taxonomic detections (Courtin et al. 2022) as well as minimally overlapping datasets with inverse alpha diversity patterns (Hollman et al. 2025).Metabarcoding has been the most established and widely implemented of the two workflows, but metagenomics offers several advantages and challenges by comparison. By sequencing the entire DNA content of a sample with metagenomics, all organisms can potentially be identified and quantified instead of limiting the detection to a specific taxonomic group with metabarcoding primers. However, without amplification, DNA from taxa of interest is at risk of being swamped by organisms with a higher total biomass or being preferentially captured in a sample, leading to false negatives (Zimmermann et al. 2023). In environmental samples, non-microbial DNA typically represents less than a third of the total DNA (Eisenhofer et al. 2024), and sometimes less than 10%.By not focusing on taxon-specific DNA loci, m","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145147159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rapid DNA/eDNA-Based ID Tools for Improved Chondrichthyan Monitoring and Management 改进软骨鱼监测和管理的快速DNA/ edna ID工具。

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-19 DOI: 10.1111/1755-0998.70044

Marcela Maki Alvarenga, Ingrid Vasconcellos Bunholi, Aisni Mayumi C. L. Adachi, Marcelo Merten Cruz, Leonardo Manir Feitosa, Eduarda Valério de Jesus, Maria Eduarda Leda Lopes, Cintia Povill, Daniela Souza, Yan Torres, Antonio Mateo Solé-Cava, Rodrigo Rodrigues Domingues, Patricia Charvet, Vanessa Paes da Cruz

Rapid DNA/eDNA-based ID tools, which detect specific genetic patterns without requiring sequencing, are essential for biodiversity and wildlife trade monitoring, particularly for species of conservation concern. However, the practical application of these methods remains limited by the availability of standardised protocols, accessibility of resources, and coverage across diverse taxa. This challenge is especially pronounced for Chondrichthyes, a group heavily overexploited due to fishing and illegal trade, and with data scarcity for conservation assessments. Despite their ecological and economic importance, many species lack reference sequences in databases, as well as other molecular data and tools, hindering the development of molecular tools for species identification and trade regulation. This review synthesises the current state of rapid DNA/eDNA-based ID tools for the detection of chondrichthyan species, including established and emerging methods. It also compiles available taxon-specific primers to facilitate efficient species identification and recommends the most suitable methods. We identify key gaps in taxonomic and geographic coverage, emphasising the need for further research to expand these tools to under-represented species and regions. Additionally, we highlight the importance of integrating genetic approaches into enforcement frameworks to enhance conservation strategies and regulatory compliance. By providing an accessible reference for time- and cost-effective genetic monitoring, this work will support evidence-based decision-making and improve the practical application of rapid DNA/eDNA-based ID tools in the conservation and management of Chondrichthyes species worldwide.

基于DNA/ edna的快速识别工具可以检测特定的遗传模式，而不需要测序，这对生物多样性和野生动物贸易监测至关重要，特别是对具有保护意义的物种。然而，这些方法的实际应用仍然受到标准化协议的可用性、资源的可及性和不同分类群的覆盖范围的限制。对于软骨鱼来说，这一挑战尤其明显。由于捕捞和非法贸易，软骨鱼被严重过度捕捞，而保护评估数据匮乏。尽管具有重要的生态和经济意义，但许多物种缺乏数据库中的参考序列，以及其他分子数据和工具，阻碍了物种鉴定和贸易监管分子工具的发展。本文综述了目前基于DNA/ edna的快速检测软骨鱼物种的ID工具的现状，包括现有的和新兴的方法。它还汇编了现有的分类群特异性引物，以促进有效的物种鉴定，并推荐了最合适的方法。我们确定了分类学和地理覆盖方面的关键差距，强调需要进一步研究将这些工具扩展到代表性不足的物种和地区。此外，我们强调了将遗传方法纳入执法框架以加强保护策略和法规遵从性的重要性。本研究为实时、高性价比的遗传监测提供了参考，将为基于证据的决策提供支持，并促进基于DNA/ edna的快速ID工具在全球软骨鱼物种保护和管理中的实际应用。

{"title":"Rapid DNA/eDNA-Based ID Tools for Improved Chondrichthyan Monitoring and Management","authors":"Marcela Maki Alvarenga, Ingrid Vasconcellos Bunholi, Aisni Mayumi C. L. Adachi, Marcelo Merten Cruz, Leonardo Manir Feitosa, Eduarda Valério de Jesus, Maria Eduarda Leda Lopes, Cintia Povill, Daniela Souza, Yan Torres, Antonio Mateo Solé-Cava, Rodrigo Rodrigues Domingues, Patricia Charvet, Vanessa Paes da Cruz","doi":"10.1111/1755-0998.70044","DOIUrl":"10.1111/1755-0998.70044","url":null,"abstract":"Rapid DNA/eDNA-based ID tools, which detect specific genetic patterns without requiring sequencing, are essential for biodiversity and wildlife trade monitoring, particularly for species of conservation concern. However, the practical application of these methods remains limited by the availability of standardised protocols, accessibility of resources, and coverage across diverse taxa. This challenge is especially pronounced for Chondrichthyes, a group heavily overexploited due to fishing and illegal trade, and with data scarcity for conservation assessments. Despite their ecological and economic importance, many species lack reference sequences in databases, as well as other molecular data and tools, hindering the development of molecular tools for species identification and trade regulation. This review synthesises the current state of rapid DNA/eDNA-based ID tools for the detection of chondrichthyan species, including established and emerging methods. It also compiles available taxon-specific primers to facilitate efficient species identification and recommends the most suitable methods. We identify key gaps in taxonomic and geographic coverage, emphasising the need for further research to expand these tools to under-represented species and regions. Additionally, we highlight the importance of integrating genetic approaches into enforcement frameworks to enhance conservation strategies and regulatory compliance. By providing an accessible reference for time- and cost-effective genetic monitoring, this work will support evidence-based decision-making and improve the practical application of rapid DNA/eDNA-based ID tools in the conservation and management of Chondrichthyes species worldwide.","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145084708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BOLDistilled: Automated Construction of Comprehensive but Compact DNA Barcode Reference Libraries 全面而紧凑的DNA条形码参考文库的自动构建。

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-14 DOI: 10.1111/1755-0998.70043

S. W. J. Prosser, R. M. Floyd, K. A. Thompson, S. K. Monckton, P. D. N. Hebert

Advances in DNA sequencing technology have stimulated the rapid uptake of protocols—such as eDNA analysis and metabarcoding—that infer the species composition of environmental samples from DNA sequences. DNA barcode reference libraries play a critical role in the interpretation of sequences gathered through such protocols, but many of these libraries lack a taxonomic consensus, include redundant records, do not support end-user analytical pipelines, and are not permanently archived. Furthermore, because DNA sequencers are outpacing Moore's Law and reference libraries are growing, the computational power required to assign sequences to source taxa is rapidly increasing. This paper introduces an algorithmic approach to construct DNA barcode reference libraries that addresses these issues. Hosted online, ‘BOLDistilled’ libraries are comprehensive but compact, because the algorithm distills genetic variation into a minimal set of records. We provide a BOLDistilled library for the barcode region of the cytochrome c oxidase 1 gene (COI) based on data in the Barcode of Life Data System (BOLD). It contains 1.7 M records versus the 15.7 M in the complete library, a compression that reduced the time required for sequence analysis of metabarcoded samples by ≥ 98% with no reduction in the accuracy of taxonomic placements. BOLDistilled libraries will be updated regularly, with current and previous versions available at https://boldsystems.org/data/boldistilled. By providing access to persistent, comprehensive, and high-quality reference data, these libraries strengthen the capacity of DNA-based identification systems to advance biodiversity science.

DNA测序技术的进步促进了诸如eDNA分析和元条形码等从DNA序列推断环境样本物种组成的方法的迅速普及。DNA条形码参考文库在解释通过此类协议收集的序列方面发挥着关键作用，但许多这些文库缺乏分类共识，包括冗余记录，不支持最终用户分析管道，并且没有永久存档。此外，由于DNA测序仪的发展速度超过了摩尔定律，参考文库也在不断增加，将序列分配给源分类群所需的计算能力也在迅速增加。本文介绍了一种算法方法来构建DNA条形码参考库，以解决这些问题。在线托管的“BOLDistilled”库全面而紧凑，因为该算法将遗传变异提炼成一组最小的记录。基于生命条形码数据系统（BOLD）的数据，建立了细胞色素c氧化酶1基因（COI）条形码区域的BOLDistilled文库。它包含1.7 M条记录，而整个文库为15.7 M，压缩后的元条形码样本序列分析所需的时间减少了≥98%，而分类定位的准确性没有降低。BOLDistilled libraries将定期更新，当前和以前的版本可在https://boldsystems.org/data/boldistilled获得。通过提供持久、全面和高质量的参考数据，这些图书馆加强了基于dna的鉴定系统的能力，促进了生物多样性科学的发展。

{"title":"BOLDistilled: Automated Construction of Comprehensive but Compact DNA Barcode Reference Libraries","authors":"S. W. J. Prosser, R. M. Floyd, K. A. Thompson, S. K. Monckton, P. D. N. Hebert","doi":"10.1111/1755-0998.70043","DOIUrl":"10.1111/1755-0998.70043","url":null,"abstract":"Advances in DNA sequencing technology have stimulated the rapid uptake of protocols—such as eDNA analysis and metabarcoding—that infer the species composition of environmental samples from DNA sequences. DNA barcode reference libraries play a critical role in the interpretation of sequences gathered through such protocols, but many of these libraries lack a taxonomic consensus, include redundant records, do not support end-user analytical pipelines, and are not permanently archived. Furthermore, because DNA sequencers are outpacing Moore's Law and reference libraries are growing, the computational power required to assign sequences to source taxa is rapidly increasing. This paper introduces an algorithmic approach to construct DNA barcode reference libraries that addresses these issues. Hosted online, ‘BOLDistilled’ libraries are comprehensive but compact, because the algorithm distills genetic variation into a minimal set of records. We provide a BOLDistilled library for the barcode region of the cytochrome c oxidase 1 gene (COI) based on data in the Barcode of Life Data System (BOLD). It contains 1.7 M records versus the 15.7 M in the complete library, a compression that reduced the time required for sequence analysis of metabarcoded samples by ≥ 98% with no reduction in the accuracy of taxonomic placements. BOLDistilled libraries will be updated regularly, with current and previous versions available at https://boldsystems.org/data/boldistilled. By providing access to persistent, comprehensive, and high-quality reference data, these libraries strengthen the capacity of DNA-based identification systems to advance biodiversity science.","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145063059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MITE Annotation and Landscape in 207 Plant Genomes Reveal Their Evolutionary Dynamics and Functional Roles 207个植物基因组的MITE注释与景观分析：进化动态与功能作用

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-09 DOI: 10.1111/1755-0998.70041

Jie Gao, Long-Long Yang, Yi-Ran Wang, Yue-Yan Zhao, Yu Shi, Shuai-Jie Wei, Ning Chen, Yu-Lan Zhang, Wu-Jun Gao, Shu-Fen Li

Miniature inverted-repeat transposable elements (MITEs) are short, non-autonomous class II transposable elements prevalent in eukaryotic genomes, contributing to various genomic and genic functions in plants. However, research on MITEs mainly targets a few species, limiting a comprehensive understanding and systematic comparison of MITEs in plants. Here, we developed a highly sensitive MITE annotation pipeline with a low false positive rate and applied it to 207 high-quality plant genomes. We found over a 20,000-fold variation in MITE copy numbers among species. The Mutator superfamily accounted for 41.5% of MITEs, whereas the Tc1/Mariner and PIF/Harbinger superfamilies expanded rapidly in monocots, particularly in Poaceae. Insertion time analysis revealed a general pattern of a single amplification wave, with initial insertions occurring around 30 million years ago (Mya) and peaking at 0–9 Mya. In addition, some species exhibited evidence of another ancient, slower expansion phase. In three representative families, we identified many more species-specific MITE loci than shared MITE loci, underscoring MITEs' significant role in genome diversity. Phylogenomic analyses indicate that MITEs accumulated gradually and specifically during speciation, primarily through recent insertions rather than the retention of ancient elements. MITEs preferentially insert near genes and are often associated with enhanced gene expression. Furthermore, we identified 985 MITE-derived miRNAs from 392 families across 56 species, mainly from Mutator, Tc1/Mariner, and PIF/Harbinger, targeting a variety of gene functions. This study enhances our understanding of the evolution and functional roles of MITEs in plants and provides a basis for exploring their function in further research.

微型逆重复转座元件（MITEs）是一类存在于真核生物基因组中的短而非自主的II类转座元件，在植物中具有多种基因组和基因功能。然而，对螨类的研究主要针对少数几种，限制了对植物螨类的全面认识和系统比较。在此，我们开发了一个高灵敏度、低假阳性率的MITE注释管道，并将其应用于207个高质量的植物基因组。我们发现不同物种间的螨虫拷贝数差异超过2万倍。Mutator超家族占螨的41.5%，而Tc1/Mariner和PIF/Harbinger超家族在单子房中迅速扩展，特别是在Poaceae中。插入时间分析揭示了单一扩增波的一般模式，最初的插入发生在大约3000万年前（Mya），在0- 900万年前达到峰值。此外，一些物种显示出另一个古老的、较慢的扩张阶段的证据。在三个有代表性的家庭中，我们发现了更多的物种特异性的螨位点，而不是共享的螨位点，强调了螨在基因组多样性中的重要作用。系统基因组学分析表明，螨虫在物种形成过程中逐渐积累，主要是通过最近的插入而不是保留古老的元素。螨虫优先插入邻近基因，通常与基因表达增强有关。此外，我们从56个物种的392个家族中鉴定出985个来自螨虫的mirna，主要来自Mutator、Tc1/Mariner和PIF/Harbinger，靶向多种基因功能。本研究提高了我们对植物中螨虫的进化和功能作用的认识，为进一步探索其功能提供了基础。

{"title":"MITE Annotation and Landscape in 207 Plant Genomes Reveal Their Evolutionary Dynamics and Functional Roles","authors":"Jie Gao, Long-Long Yang, Yi-Ran Wang, Yue-Yan Zhao, Yu Shi, Shuai-Jie Wei, Ning Chen, Yu-Lan Zhang, Wu-Jun Gao, Shu-Fen Li","doi":"10.1111/1755-0998.70041","DOIUrl":"10.1111/1755-0998.70041","url":null,"abstract":"Miniature inverted-repeat transposable elements (MITEs) are short, non-autonomous class II transposable elements prevalent in eukaryotic genomes, contributing to various genomic and genic functions in plants. However, research on MITEs mainly targets a few species, limiting a comprehensive understanding and systematic comparison of MITEs in plants. Here, we developed a highly sensitive MITE annotation pipeline with a low false positive rate and applied it to 207 high-quality plant genomes. We found over a 20,000-fold variation in MITE copy numbers among species. The Mutator superfamily accounted for 41.5% of MITEs, whereas the Tc1/Mariner and PIF/Harbinger superfamilies expanded rapidly in monocots, particularly in Poaceae. Insertion time analysis revealed a general pattern of a single amplification wave, with initial insertions occurring around 30 million years ago (Mya) and peaking at 0–9 Mya. In addition, some species exhibited evidence of another ancient, slower expansion phase. In three representative families, we identified many more species-specific MITE loci than shared MITE loci, underscoring MITEs' significant role in genome diversity. Phylogenomic analyses indicate that MITEs accumulated gradually and specifically during speciation, primarily through recent insertions rather than the retention of ancient elements. MITEs preferentially insert near genes and are often associated with enhanced gene expression. Furthermore, we identified 985 MITE-derived miRNAs from 392 families across 56 species, mainly from Mutator, Tc1/Mariner, and PIF/Harbinger, targeting a variety of gene functions. This study enhances our understanding of the evolution and functional roles of MITEs in plants and provides a basis for exploring their function in further research.","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145022563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust, Open-Source and Automation-Friendly DNA Extraction Protocol for Hologenomic Research 用于全基因组研究的健壮、开源和自动化的DNA提取协议。

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources

Pub Date : 2025-09-09 DOI: 10.1111/1755-0998.70042

Jonas G. Lauritsen, Christian Carøe, Nanna Gaun, Garazi Martin-Bideguren, Aoife Leonard, Raphael Eisenhofer, Iñaki Odriozola, M. Thomas P. Gilbert, Ostaizka Aizpurua, Antton Alberdi, Carlotta Pietroni

Global efforts to standardise methodologies benefit greatly from open-source procedures that enable the generation of comparable data. Here, we present a modular, high-throughput nucleic acid extraction protocol standardised within the Earth Hologenome Initiative to generate both genomic and microbial metagenomic data from faecal samples of vertebrates. The procedure enables the purification of either RNA and DNA in separate fractions (DREX1) or as total nucleic acids (DREX2). We demonstrate their effectiveness across faecal samples from amphibians, reptiles and mammals, with reduced performance observed on bird guano. Despite some variation in laboratory performance metrics, both DREX1 and DREX2 yielded highly similar microbial community profiles, as well as comparable depth and breadth of host genome coverages. Benchmarking against a commercial kit widely used in microbiome research showed comparable recovery of host genomic data and microbial community complexity. Our open-source method offers a robust, cost-effective, scalable and automation-friendly nucleic acid extraction procedure to generate high-quality hologenomic data across vertebrate taxa. The method enhances research comparability and reproducibility by providing standardised, high-throughput, open-access protocols with fully transparent reagents. It is designed to integrate automatised pipelines, and its modular structure also supports continuous development and improvement.

使方法论标准化的全球努力从能够生成可比较数据的开源程序中受益匪浅。在这里，我们提出了一个模块化的、高通量的核酸提取方案，该方案在地球全基因组计划中标准化，从脊椎动物的粪便样本中生成基因组和微生物宏基因组数据。该程序可以分离纯化RNA和DNA （DREX1）或作为总核酸（DREX2）。我们在两栖动物、爬行动物和哺乳动物的粪便样本中证明了它们的有效性，但在鸟粪上观察到性能下降。尽管在实验室性能指标上存在一些差异，DREX1和DREX2都产生了高度相似的微生物群落概况，以及相当的宿主基因组覆盖的深度和广度。对广泛用于微生物组研究的商业试剂盒进行基准测试显示，宿主基因组数据和微生物群落复杂性的恢复相当。我们的开源方法提供了一个强大、经济、可扩展和自动化友好的核酸提取程序，以生成高质量的脊椎动物类群全基因组数据。该方法通过提供具有完全透明试剂的标准化、高通量、开放获取协议，增强了研究的可比性和可重复性。它旨在集成自动化管道，其模块化结构也支持持续开发和改进。

{"title":"Robust, Open-Source and Automation-Friendly DNA Extraction Protocol for Hologenomic Research","authors":"Jonas G. Lauritsen, Christian Carøe, Nanna Gaun, Garazi Martin-Bideguren, Aoife Leonard, Raphael Eisenhofer, Iñaki Odriozola, M. Thomas P. Gilbert, Ostaizka Aizpurua, Antton Alberdi, Carlotta Pietroni","doi":"10.1111/1755-0998.70042","DOIUrl":"10.1111/1755-0998.70042","url":null,"abstract":"Global efforts to standardise methodologies benefit greatly from open-source procedures that enable the generation of comparable data. Here, we present a modular, high-throughput nucleic acid extraction protocol standardised within the Earth Hologenome Initiative to generate both genomic and microbial metagenomic data from faecal samples of vertebrates. The procedure enables the purification of either RNA and DNA in separate fractions (DREX1) or as total nucleic acids (DREX2). We demonstrate their effectiveness across faecal samples from amphibians, reptiles and mammals, with reduced performance observed on bird guano. Despite some variation in laboratory performance metrics, both DREX1 and DREX2 yielded highly similar microbial community profiles, as well as comparable depth and breadth of host genome coverages. Benchmarking against a commercial kit widely used in microbiome research showed comparable recovery of host genomic data and microbial community complexity. Our open-source method offers a robust, cost-effective, scalable and automation-friendly nucleic acid extraction procedure to generate high-quality hologenomic data across vertebrate taxa. The method enhances research comparability and reproducibility by providing standardised, high-throughput, open-access protocols with fully transparent reagents. It is designed to integrate automatised pipelines, and its modular structure also supports continuous development and improvement.","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145022607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0