M. Cros, Jean-Marc Frigerio, N. Peyrard, Alain Franc
{"title":"基于异质性阵列评估 OTU 质量的简单方法","authors":"M. Cros, Jean-Marc Frigerio, N. Peyrard, Alain Franc","doi":"10.3897/mbmg.8.108649","DOIUrl":null,"url":null,"abstract":"An accurate and complete taxonomic description of the diversity present in an environmental sample is out of reach at this time. Instead, metabarcoding is used today and it is expected that OTUs represent a category relevant for biodiversity inventories on a molecular basis. However, artefacts in the production of OTUs can occur at different stages and may impact ecological conclusions. We propose to evaluate the quality of OTUs in a sample by characterising the deviation of each OTU’s dissimilarity array from that of an ideal OTU where all sequences are at distances smaller than the barcoding gap. We consider two deviations: the creation of composed OTUs, corresponding to the artificial merging of several OTUs and the creation of noisy OTUs that contain some sequences that are loosely associated with the core sequence of the OTUs and that do not form a compact subgroup. We propose a simple and automatic 2-step method that successively categorises the OTUs of a sample as composed or single and then identifies OTUs with noise amongst the single ones. The associated code is available at https://forgemia.inra.fr/alain.franc/otu_shape. We applied the method on 32 samples of diatoms from Arcachon Bay (France) that represent contrasted environmental conditions and we obtained good agreement with expert categorisation of OTUs. We suggest that single OTUs without noise can be used as such for further ecological studies. Composed OTUs should be post-treated with classical clustering or community detection tools. The quality of single OTUs with noise remains to be further tested via supplementary studies on a diversity of organisms.","PeriodicalId":18374,"journal":{"name":"Metabarcoding and Metagenomics","volume":"16 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Simple approaches for evaluation of OTU quality based on dissimilarity arrays\",\"authors\":\"M. Cros, Jean-Marc Frigerio, N. Peyrard, Alain Franc\",\"doi\":\"10.3897/mbmg.8.108649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An accurate and complete taxonomic description of the diversity present in an environmental sample is out of reach at this time. Instead, metabarcoding is used today and it is expected that OTUs represent a category relevant for biodiversity inventories on a molecular basis. However, artefacts in the production of OTUs can occur at different stages and may impact ecological conclusions. We propose to evaluate the quality of OTUs in a sample by characterising the deviation of each OTU’s dissimilarity array from that of an ideal OTU where all sequences are at distances smaller than the barcoding gap. We consider two deviations: the creation of composed OTUs, corresponding to the artificial merging of several OTUs and the creation of noisy OTUs that contain some sequences that are loosely associated with the core sequence of the OTUs and that do not form a compact subgroup. We propose a simple and automatic 2-step method that successively categorises the OTUs of a sample as composed or single and then identifies OTUs with noise amongst the single ones. The associated code is available at https://forgemia.inra.fr/alain.franc/otu_shape. We applied the method on 32 samples of diatoms from Arcachon Bay (France) that represent contrasted environmental conditions and we obtained good agreement with expert categorisation of OTUs. We suggest that single OTUs without noise can be used as such for further ecological studies. Composed OTUs should be post-treated with classical clustering or community detection tools. The quality of single OTUs with noise remains to be further tested via supplementary studies on a diversity of organisms.\",\"PeriodicalId\":18374,\"journal\":{\"name\":\"Metabarcoding and Metagenomics\",\"volume\":\"16 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Metabarcoding and Metagenomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3897/mbmg.8.108649\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabarcoding and Metagenomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/mbmg.8.108649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
对环境样本中存在的多样性进行准确而完整的分类描述目前还无法实现。相反,目前使用的是代谢条码,预计 OTU 代表了与分子基础上的生物多样性清单相关的类别。然而,OTU 生成过程中的误差可能发生在不同阶段,并可能影响生态学结论。我们建议通过描述每个 OTU 的异质性阵列与理想 OTU 的异质性阵列之间的偏差来评估样本中 OTU 的质量,在理想 OTU 中,所有序列的距离都小于条码间隙。我们考虑了两种偏差:一种是人为合并多个 OTU 而产生的组成 OTU,另一种是包含一些与 OTU 核心序列关联松散且未形成紧凑亚群的序列的噪声 OTU。我们提出了一种简单、自动的两步法,可将样本中的 OTU 分为组成 OTU 和单一 OTU,然后在单一 OTU 中识别出带有噪声的 OTU。相关代码见 https://forgemia.inra.fr/alain.franc/otu_shape。我们将该方法应用于代表不同环境条件的阿卡雄湾(法国)32 个硅藻样本,结果与专家的 OTU 分类结果非常吻合。我们建议,在进一步的生态学研究中,可以使用无噪声的单个 OTU。合成的 OTU 应使用经典的聚类或群落检测工具进行后处理。有噪声的单个 OTU 的质量还有待通过对多种生物进行补充研究来进一步检验。
Simple approaches for evaluation of OTU quality based on dissimilarity arrays
An accurate and complete taxonomic description of the diversity present in an environmental sample is out of reach at this time. Instead, metabarcoding is used today and it is expected that OTUs represent a category relevant for biodiversity inventories on a molecular basis. However, artefacts in the production of OTUs can occur at different stages and may impact ecological conclusions. We propose to evaluate the quality of OTUs in a sample by characterising the deviation of each OTU’s dissimilarity array from that of an ideal OTU where all sequences are at distances smaller than the barcoding gap. We consider two deviations: the creation of composed OTUs, corresponding to the artificial merging of several OTUs and the creation of noisy OTUs that contain some sequences that are loosely associated with the core sequence of the OTUs and that do not form a compact subgroup. We propose a simple and automatic 2-step method that successively categorises the OTUs of a sample as composed or single and then identifies OTUs with noise amongst the single ones. The associated code is available at https://forgemia.inra.fr/alain.franc/otu_shape. We applied the method on 32 samples of diatoms from Arcachon Bay (France) that represent contrasted environmental conditions and we obtained good agreement with expert categorisation of OTUs. We suggest that single OTUs without noise can be used as such for further ecological studies. Composed OTUs should be post-treated with classical clustering or community detection tools. The quality of single OTUs with noise remains to be further tested via supplementary studies on a diversity of organisms.