Bourret, Audrey, Nozères, Claude, Parent, Eric, Parent, Geneviève J.
{"title":"Maximizing使用区域图书馆和公共资源库进行元条形码研究的物种分配的可靠性和数量","authors":"Bourret, Audrey, Nozères, Claude, Parent, Eric, Parent, Geneviève J.","doi":"10.3897/mbmg.7.98539","DOIUrl":null,"url":null,"abstract":"Biodiversity assessments relying on DNA have increased rapidly over the last decade. However, the reliability of taxonomic assignments in metabarcoding studies is variable and affected by the reference databases and the assignment methods used. Species level assignments are usually considered as reliable using regional libraries but unreliable using public repositories. In this study, we aimed to test this assumption for metazoan species detected in the Gulf of St. Lawrence in the Northwest Atlantic. We first created a regional library (GSL-rl) by data mining COI barcode sequences from BOLD, and included a reliability ranking system for species assignments. We then estimated 1) the accuracy and precision of the public repository NCBI-nt for species assignments using sequences from the regional library and 2) compared the detection and reliability of species assignments of a metabarcoding dataset using either NCBI-nt or the regional library and popular assignment methods. With NCBI-nt and sequences from the regional library, the BLAST-LCA (least common ancestor) method was the most precise method for species assignments, but the accuracy was higher with the BLAST-TopHit method (>80% over all taxa, between 70% and 90% amongst taxonomic groups). With the metabarcoding dataset, the reliability of species assignments was greater using GSL-rl compared to NCBI-nt. However, we also observed that the total number of reliable species assignments could be maximized using both GSL-rl and NCBI-nt with different optimized assignment methods. The use of a two-step approach for species assignments, i.e., using a regional library and a public repository, could improve the reliability and the number of detected species in metabarcoding studies.","PeriodicalId":18374,"journal":{"name":"Metabarcoding and Metagenomics","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Maximizing the reliability and the number of species assignments in metabarcoding studies using a curated regional library and a public repository\",\"authors\":\"Bourret, Audrey, Nozères, Claude, Parent, Eric, Parent, Geneviève J.\",\"doi\":\"10.3897/mbmg.7.98539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Biodiversity assessments relying on DNA have increased rapidly over the last decade. However, the reliability of taxonomic assignments in metabarcoding studies is variable and affected by the reference databases and the assignment methods used. Species level assignments are usually considered as reliable using regional libraries but unreliable using public repositories. In this study, we aimed to test this assumption for metazoan species detected in the Gulf of St. Lawrence in the Northwest Atlantic. We first created a regional library (GSL-rl) by data mining COI barcode sequences from BOLD, and included a reliability ranking system for species assignments. We then estimated 1) the accuracy and precision of the public repository NCBI-nt for species assignments using sequences from the regional library and 2) compared the detection and reliability of species assignments of a metabarcoding dataset using either NCBI-nt or the regional library and popular assignment methods. With NCBI-nt and sequences from the regional library, the BLAST-LCA (least common ancestor) method was the most precise method for species assignments, but the accuracy was higher with the BLAST-TopHit method (>80% over all taxa, between 70% and 90% amongst taxonomic groups). With the metabarcoding dataset, the reliability of species assignments was greater using GSL-rl compared to NCBI-nt. However, we also observed that the total number of reliable species assignments could be maximized using both GSL-rl and NCBI-nt with different optimized assignment methods. The use of a two-step approach for species assignments, i.e., using a regional library and a public repository, could improve the reliability and the number of detected species in metabarcoding studies.\",\"PeriodicalId\":18374,\"journal\":{\"name\":\"Metabarcoding and Metagenomics\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Metabarcoding and Metagenomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3897/mbmg.7.98539\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabarcoding and Metagenomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/mbmg.7.98539","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
过去十年来,依赖DNA的生物多样性评估迅速增加。然而,元条形码研究中分类分配的可靠性是可变的,并且受参考数据库和使用的分配方法的影响。物种水平的分配通常被认为是可靠的使用区域图书馆,但不可靠的使用公共库。在这项研究中,我们的目的是在西北大西洋的圣劳伦斯湾检测到的后生动物物种中验证这一假设。我们首先通过对BOLD中COI条形码序列的数据挖掘创建了一个区域库(GSL-rl),并包含了一个物种分配的可靠性排序系统。然后,我们估计了1)公共数据库NCBI-nt使用来自区域库的序列进行物种分配的准确性和精密度;2)比较了使用NCBI-nt或区域库和流行的分配方法进行元条形码数据集物种分配的检测和可靠性。利用NCBI-nt和区域文库序列,BLAST-LCA (least common ancestor)方法是最精确的物种分配方法,但BLAST-TopHit方法的准确率更高(在所有分类群中为80%,在分类群中为70% ~ 90%)。与NCBI-nt相比,在元条形码数据集上,GSL-rl的物种分配可靠性更高。然而,我们也观察到,GSL-rl和NCBI-nt采用不同的优化分配方法都能最大限度地获得可靠的物种分配。采用区域文库和公共文库两步法进行物种分配,可以提高元条形码研究的可靠性和检测物种的数量。
Maximizing the reliability and the number of species assignments in metabarcoding studies using a curated regional library and a public repository
Biodiversity assessments relying on DNA have increased rapidly over the last decade. However, the reliability of taxonomic assignments in metabarcoding studies is variable and affected by the reference databases and the assignment methods used. Species level assignments are usually considered as reliable using regional libraries but unreliable using public repositories. In this study, we aimed to test this assumption for metazoan species detected in the Gulf of St. Lawrence in the Northwest Atlantic. We first created a regional library (GSL-rl) by data mining COI barcode sequences from BOLD, and included a reliability ranking system for species assignments. We then estimated 1) the accuracy and precision of the public repository NCBI-nt for species assignments using sequences from the regional library and 2) compared the detection and reliability of species assignments of a metabarcoding dataset using either NCBI-nt or the regional library and popular assignment methods. With NCBI-nt and sequences from the regional library, the BLAST-LCA (least common ancestor) method was the most precise method for species assignments, but the accuracy was higher with the BLAST-TopHit method (>80% over all taxa, between 70% and 90% amongst taxonomic groups). With the metabarcoding dataset, the reliability of species assignments was greater using GSL-rl compared to NCBI-nt. However, we also observed that the total number of reliable species assignments could be maximized using both GSL-rl and NCBI-nt with different optimized assignment methods. The use of a two-step approach for species assignments, i.e., using a regional library and a public repository, could improve the reliability and the number of detected species in metabarcoding studies.