Assessing and monitoring genetic diversity is vital for understanding the ecology and evolution of natural populations but is often challenging in animal and plant species due to technically and physically demanding tissue sampling. Although environmental DNA (eDNA) metabarcoding is a promising alternative to the traditional population genetic monitoring based on biological samples, its practical application remains challenging due to spurious sequences present in the amplicon data, even after data processing with the existing sequence filtering and denoising (error correction) methods. Here we developed a novel amplicon filtering approach that can effectively eliminate such spurious amplicon sequence variants (ASVs) in eDNA metabarcoding data. A simple simulation of eDNA metabarcoding processes was performed to understand the patterns of read count (abundance) distributions of true ASVs and their polymerase chain reaction (PCR)-generated artefacts (i.e., false-positive ASVs). Based on the simulation results, the approach was developed to estimate the abundance distributions of true and false-positive ASVs using Gaussian mixture models and to determine a statistically based threshold between them. The developed approach was implemented as an R package, gmmDenoise and evaluated using single-species metabarcoding datasets in which all or some true ASVs (i.e., haplotypes) were known. Example analyses using community (multi-species) metabarcoding datasets were also performed to demonstrate how gmmDenoise can be used to derive reliable intraspecific diversity estimates and population genetic inferences from noisy amplicon sequencing data. The gmmDenoise package is freely available in the GitHub repository (https://github.com/YSKoseki/gmmDenoise).
{"title":"gmmDenoise: A New Method and R Package for High-Confidence Sequence Variant Filtering in Environmental DNA Amplicon Analysis","authors":"Yusuke Koseki, Hirohiko Takeshima, Ryuji Yoneda, Kaito Katayanagi, Gen Ito, Hiroki Yamanaka","doi":"10.1111/1755-0998.70023","DOIUrl":"10.1111/1755-0998.70023","url":null,"abstract":"<p>Assessing and monitoring genetic diversity is vital for understanding the ecology and evolution of natural populations but is often challenging in animal and plant species due to technically and physically demanding tissue sampling. Although environmental DNA (eDNA) metabarcoding is a promising alternative to the traditional population genetic monitoring based on biological samples, its practical application remains challenging due to spurious sequences present in the amplicon data, even after data processing with the existing sequence filtering and denoising (error correction) methods. Here we developed a novel amplicon filtering approach that can effectively eliminate such spurious amplicon sequence variants (ASVs) in eDNA metabarcoding data. A simple simulation of eDNA metabarcoding processes was performed to understand the patterns of read count (abundance) distributions of true ASVs and their polymerase chain reaction (PCR)-generated artefacts (i.e., false-positive ASVs). Based on the simulation results, the approach was developed to estimate the abundance distributions of true and false-positive ASVs using Gaussian mixture models and to determine a statistically based threshold between them. The developed approach was implemented as an <i>R</i> package, <i>gmmDenoise</i> and evaluated using single-species metabarcoding datasets in which all or some true ASVs (i.e., haplotypes) were known. Example analyses using community (multi-species) metabarcoding datasets were also performed to demonstrate how <i>gmmDenoise</i> can be used to derive reliable intraspecific diversity estimates and population genetic inferences from noisy amplicon sequencing data. The <i>gmmDenoise</i> package is freely available in the GitHub repository (https://github.com/YSKoseki/gmmDenoise).</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144774396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian J. Johnson, Melissa C. Graham, Elina Panahi, Carla Julia S. P. Vieira, Nisa Suraj Nath, Paul Mason, Jamie Gleadhill, Darran Thomas, Michael B. Onn, Martin A. Shivas, Damien Shearman, Jonathan M. Darbro, Gregor J. Devine
Next-generation sequencing (NGS) has the potential to transform mosquito-borne disease surveillance but remains under-utilised. This study introduces a comprehensive multi-loci metabarcoding-based MX (molecular xenomonitoring) approach to mosquito and arbovirus surveillance, enabling parallel identification of mosquito vectors, circulating arboviruses, and vertebrate hosts from bulk mosquito collections. The feasibility of this approach was demonstrated through its application to a large set (n = 110) of bulk field collections. This set was complemented by a number (n = 28) of single-species mosquito pools that had previously been screened for viruses using quantitative reverse transcription PCR (RT-qPCR) and metatranscriptomics. Universal alphavirus and flavivirus primer sets were used to screen for arboviruses in the resulting metabarcoding library. Viral amplicons were then indexed and combined with mosquito-specific (ITS2), universal invertebrate (COI), and vertebrate (Cyt b) barcode amplicons prior to sequencing. This approach confirmed the presence of all previously identified mosquito species, as well as those commonly misidentified morphologically, and enabled a degree of quantification regarding their relative physical abundance in each collection. Additionally, the developed approach identified a diverse vertebrate host community (18 species), demonstrating its potential for defining host preferences and, in tandem with the viral screens and associated vector data, understanding disease transmission pathways. Importantly, metabarcoding detected a diversity of regionally prevalent arboviruses and insect-specific viruses, with all three viral diagnostics demonstrating a similar sensitivity and specificity in detecting Ross River virus and Barmah Forest virus, Australia's most common arboviruses. In summary, multi-loci metabarcoding is an affordable and efficient MX tool that enables complete mosquito-borne disease surveillance.
{"title":"An All-in-One Metabarcoding Approach to Mosquito and Arbovirus Xenosurveillance","authors":"Brian J. Johnson, Melissa C. Graham, Elina Panahi, Carla Julia S. P. Vieira, Nisa Suraj Nath, Paul Mason, Jamie Gleadhill, Darran Thomas, Michael B. Onn, Martin A. Shivas, Damien Shearman, Jonathan M. Darbro, Gregor J. Devine","doi":"10.1111/1755-0998.70022","DOIUrl":"10.1111/1755-0998.70022","url":null,"abstract":"<p>Next-generation sequencing (NGS) has the potential to transform mosquito-borne disease surveillance but remains under-utilised. This study introduces a comprehensive multi-loci metabarcoding-based MX (molecular xenomonitoring) approach to mosquito and arbovirus surveillance, enabling parallel identification of mosquito vectors, circulating arboviruses, and vertebrate hosts from bulk mosquito collections. The feasibility of this approach was demonstrated through its application to a large set (<i>n</i> = 110) of bulk field collections. This set was complemented by a number (<i>n</i> = 28) of single-species mosquito pools that had previously been screened for viruses using quantitative reverse transcription PCR (RT-qPCR) and metatranscriptomics. Universal alphavirus and flavivirus primer sets were used to screen for arboviruses in the resulting metabarcoding library. Viral amplicons were then indexed and combined with mosquito-specific (ITS2), universal invertebrate (<i>COI</i>), and vertebrate (<i>Cyt</i> b) barcode amplicons prior to sequencing. This approach confirmed the presence of all previously identified mosquito species, as well as those commonly misidentified morphologically, and enabled a degree of quantification regarding their relative physical abundance in each collection. Additionally, the developed approach identified a diverse vertebrate host community (18 species), demonstrating its potential for defining host preferences and, in tandem with the viral screens and associated vector data, understanding disease transmission pathways. Importantly, metabarcoding detected a diversity of regionally prevalent arboviruses and insect-specific viruses, with all three viral diagnostics demonstrating a similar sensitivity and specificity in detecting Ross River virus and Barmah Forest virus, Australia's most common arboviruses. In summary, multi-loci metabarcoding is an affordable and efficient MX tool that enables complete mosquito-borne disease surveillance.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 8","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144774395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}