Duong Vu, Michel de Vries, Bert Gerrits van den Ende, Jos Houbraken, R. Henrik Nilsson, Balázs Brankovics, Magarita Hernández-Restrepo, Johannes Z. Groenewald, Pedro W. Crous, Ferry Hagen, Wieland Meyer, Gerard J. M. Verkley, Marizeth Groenewald
Yeast identification is essential in fields ranging from microbiology and biotechnology to food science and medicine. While DNA barcoding has become the standard for identifying cultured strains, environmental DNA (eDNA) metabarcoding has revolutionised microbial community profiling, providing deeper insights into yeast communities across diverse ecosystems. A major challenge in DNA (meta)barcoding remains the limited availability of high-quality reference sequences, which are critical for accurate species identification and comprehensive taxonomic profiling of both environmental and clinical samples. To address this gap, the Westerdijk Fungal Biodiversity Institute (WI) launched a DNA barcoding initiative in 2006 to generate high-quality, often type-derived ITS and LSU barcodes for all ~100,000 fungal strains preserved in the CBS culture collection, including approximately 15,000 yeasts. Building on the yeast barcode dataset released in 2016, we now present an expanded set of 2856 ITS and 3815 LSU sequences, representing 911 and 1137 yeast species, respectively. Notably, 27%–29% of these sequences are derived from ex-type cultures. Using both newly generated and previously published barcodes, we assess the taxonomic resolution of commonly used yeast metabarcoding markers (ITS, ITS1, ITS2 and LSU) and propose marker-specific similarity cutoffs for different yeast taxonomic groups. These results provide actionable guidance for marker selection and improve the interpretation of metabarcoding data. We further demonstrate the impact of well-curated reference databases with up-to-date taxonomy by reanalyzing Human Microbiome Project data, revealing how diet and environment shape the gut mycobiota.
{"title":"Advancing Yeast Identification Using High-Throughput DNA Barcode Data From a Curated Culture Collection","authors":"Duong Vu, Michel de Vries, Bert Gerrits van den Ende, Jos Houbraken, R. Henrik Nilsson, Balázs Brankovics, Magarita Hernández-Restrepo, Johannes Z. Groenewald, Pedro W. Crous, Ferry Hagen, Wieland Meyer, Gerard J. M. Verkley, Marizeth Groenewald","doi":"10.1111/1755-0998.70082","DOIUrl":"10.1111/1755-0998.70082","url":null,"abstract":"<p>Yeast identification is essential in fields ranging from microbiology and biotechnology to food science and medicine. While DNA barcoding has become the standard for identifying cultured strains, environmental DNA (eDNA) metabarcoding has revolutionised microbial community profiling, providing deeper insights into yeast communities across diverse ecosystems. A major challenge in DNA (meta)barcoding remains the limited availability of high-quality reference sequences, which are critical for accurate species identification and comprehensive taxonomic profiling of both environmental and clinical samples. To address this gap, the Westerdijk Fungal Biodiversity Institute (WI) launched a DNA barcoding initiative in 2006 to generate high-quality, often type-derived ITS and LSU barcodes for all ~100,000 fungal strains preserved in the CBS culture collection, including approximately 15,000 yeasts. Building on the yeast barcode dataset released in 2016, we now present an expanded set of 2856 ITS and 3815 LSU sequences, representing 911 and 1137 yeast species, respectively. Notably, 27%–29% of these sequences are derived from ex-type cultures. Using both newly generated and previously published barcodes, we assess the taxonomic resolution of commonly used yeast metabarcoding markers (ITS, ITS1, ITS2 and LSU) and propose marker-specific similarity cutoffs for different yeast taxonomic groups. These results provide actionable guidance for marker selection and improve the interpretation of metabarcoding data. We further demonstrate the impact of well-curated reference databases with up-to-date taxonomy by reanalyzing Human Microbiome Project data, revealing how diet and environment shape the gut mycobiota.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70082","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145601560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Navarro, Emily K. Latch, Anaïs K. Tallon, Caitlin N. Ott-Conn, Randy W. DeYoung, Daniel P. Walsh, Peter T. Euclide, Chandika R. G., Wes A. Larson, Arun S. Seetharam, Andrew J. Severin, James M. Reecy, Zhi-Liang Hu, Jay R. Cantrell, Michelle Carstensen, Joe N. Caudell, Charlie H. Killmaster, Mitch L. Lockwood, William T. McKinley, Andrew S. Norton, Krysten L. Schuler, Daniel J. Storm, Jason A. Sumners, W. David Walter, Julie A. Blanchong
White-tailed deer (Odocoileus virginianus) are the most abundant and widespread cervid in North America. Genetic data are used as a tool to monitor populations and make management decisions for this game species. However, the development and use of genomic tools that can generate a set of markers suitable for longitudinal genomic data collection, whether for management purposes or to study the demographic and evolutionary processes of widely distributed species, have been challenging. This is mainly due to the cost required to fully implement and interpret the data produced. Here, we generated whole genome resequencing data for 44 free-ranging deer from three regions in their central and eastern North American range and identified over 89 million single nucleotide polymorphisms (SNPs). We used a subset of these SNPs to develop two nested SNP tools, a high-density array (702,183 SNPs) and a medium-density array (72,723 SNPs) to support deer and chronic wasting disease (CWD) management and research. SNPs were selected to ensure an even distribution across scaffolds of the reference genome and include SNPs associated with CWD susceptibility. Using genotyping results for 469 deer from 15 states in the US and Mexico generated by the high-density array and 1335 deer from 18 states generated by the medium-density array, we assessed genotyping success across different populations and explored some insights into population structure. These genomic tools offer a standard set of markers that will enable researchers and managers to address important questions related to white-tailed deer and CWD management. Our SNP arrays also offer the opportunity to examine aspects of white-tailed deer ecology and evolutionary history that were previously difficult to address.
{"title":"Development of High-Throughput Genomic Resources to Inform White-Tailed Deer Population and Disease Management","authors":"David Navarro, Emily K. Latch, Anaïs K. Tallon, Caitlin N. Ott-Conn, Randy W. DeYoung, Daniel P. Walsh, Peter T. Euclide, Chandika R. G., Wes A. Larson, Arun S. Seetharam, Andrew J. Severin, James M. Reecy, Zhi-Liang Hu, Jay R. Cantrell, Michelle Carstensen, Joe N. Caudell, Charlie H. Killmaster, Mitch L. Lockwood, William T. McKinley, Andrew S. Norton, Krysten L. Schuler, Daniel J. Storm, Jason A. Sumners, W. David Walter, Julie A. Blanchong","doi":"10.1111/1755-0998.70085","DOIUrl":"10.1111/1755-0998.70085","url":null,"abstract":"<p>White-tailed deer (<i>Odocoileus virginianus</i>) are the most abundant and widespread cervid in North America. Genetic data are used as a tool to monitor populations and make management decisions for this game species. However, the development and use of genomic tools that can generate a set of markers suitable for longitudinal genomic data collection, whether for management purposes or to study the demographic and evolutionary processes of widely distributed species, have been challenging. This is mainly due to the cost required to fully implement and interpret the data produced. Here, we generated whole genome resequencing data for 44 free-ranging deer from three regions in their central and eastern North American range and identified over 89 million single nucleotide polymorphisms (SNPs). We used a subset of these SNPs to develop two nested SNP tools, a high-density array (702,183 SNPs) and a medium-density array (72,723 SNPs) to support deer and chronic wasting disease (CWD) management and research. SNPs were selected to ensure an even distribution across scaffolds of the reference genome and include SNPs associated with CWD susceptibility. Using genotyping results for 469 deer from 15 states in the US and Mexico generated by the high-density array and 1335 deer from 18 states generated by the medium-density array, we assessed genotyping success across different populations and explored some insights into population structure. These genomic tools offer a standard set of markers that will enable researchers and managers to address important questions related to white-tailed deer and CWD management. Our SNP arrays also offer the opportunity to examine aspects of white-tailed deer ecology and evolutionary history that were previously difficult to address.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145601588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rebekah A. Mohn, Mira Garner, Paul S. Manos, Andrew L. Hipp
The increasing numbers of published reference genomes and affordability of whole genome resequencing have enabled multispecies population genomic and phylogenomic studies on non-model organisms, but they raise a new question for comparative genomics: what reference genome and mapping method combination results in the most data with the least bias? We mapped short-read resequencing data from seven eastern North American white oak (Quercus sect. Quercus) and two related samples to four Quercus reference genomes (Q. alba, Q. lobata, Q. mongolica, and Q. rubra) which represent different degrees of phylogenetic relatedness to the samples. We used three different mapping methods: a global (Bowtie 2 --end-to-end) and two local (Bowtie 2 --local and BWA-MEM) alignment approaches. For the twelve resulting datasets, we analysed read mapping accuracy and efficiency, missing data, heterozygosity, and inferred phylogenies to evaluate the impact of reference genome and read-mapping method. We found that the genetic distance of the reference genome to the samples and mapping method together impacted heterozygosity and phylogenetic tree estimation. There were two notable effects. First, when using a global alignment method (Bowtie 2 --end-to-end), estimated heterozygosity negligibly decreased with increased genetic distance between the reference and sample. Second, the most distantly related reference genome had significantly reduced base pair recovery and resulted in under- or overestimating heterozygosity depending on the method, and a more unbalanced phylogeny. We conclude that using a closely related but not conspecific reference is ideal to minimise reference bias and using Bowtie 2 --end-to-end minimises mismapping, resulting in the most accurate variant calls.
{"title":"Variant Calling in the Goldilocks Zone: How Reference Genome Choice and Read Mapping Stringency Impact Heterozygosity Estimates and Phylogenetic Analyses","authors":"Rebekah A. Mohn, Mira Garner, Paul S. Manos, Andrew L. Hipp","doi":"10.1111/1755-0998.70079","DOIUrl":"10.1111/1755-0998.70079","url":null,"abstract":"<p>The increasing numbers of published reference genomes and affordability of whole genome resequencing have enabled multispecies population genomic and phylogenomic studies on non-model organisms, but they raise a new question for comparative genomics: what reference genome and mapping method combination results in the most data with the least bias? We mapped short-read resequencing data from seven eastern North American white oak (<i>Quercus</i> sect. <i>Quercus</i>) and two related samples to four <i>Quercus</i> reference genomes (<i>Q. alba, Q</i><i>. lobata</i><i>, Q. mongolica,</i> and <i>Q. rubra</i>) which represent different degrees of phylogenetic relatedness to the samples. We used three different mapping methods: a global (Bowtie 2 --end-to-end) and two local (Bowtie 2 --local and BWA-MEM) alignment approaches. For the twelve resulting datasets, we analysed read mapping accuracy and efficiency, missing data, heterozygosity, and inferred phylogenies to evaluate the impact of reference genome and read-mapping method. We found that the genetic distance of the reference genome to the samples and mapping method together impacted heterozygosity and phylogenetic tree estimation. There were two notable effects. First, when using a global alignment method (Bowtie 2 --end-to-end), estimated heterozygosity negligibly decreased with increased genetic distance between the reference and sample. Second, the most distantly related reference genome had significantly reduced base pair recovery and resulted in under- or overestimating heterozygosity depending on the method, and a more unbalanced phylogeny. We conclude that using a closely related but not conspecific reference is ideal to minimise reference bias and using Bowtie 2 --end-to-end minimises mismapping, resulting in the most accurate variant calls.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12631958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145562060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
More than 25 years ago, Aitchison showed that the logratio principal component analysis of multiple samples of a biallelic polymorphism can evidentiate the Hardy–Weinberg law. However, hitherto compositional data analysis, that is, the logratio approach, has had little impact in population genetics. This article extends Aitchison's work to multiallelic polymorphisms showing how the Hardy–Weinberg law manifests itself in a logratio based statistical analysis with larger genotypic compositions. Excellent visualisations of equilibrium and disequilibrium are achieved by using compositional biplots based on allele and genotype frequencies taken across multiple populations. Some fundamental relationships between allelic and genotypic compositions are derived, and the close relationships between the logratio principal component analysis of allelic and genotypic compositions and the corresponding compositional biplots are established. Simulations and practical genetic data analysis are used to explore the implications of Hardy–Weinberg equilibrium for the logratio principal component analysis of genotypic compositions. A general multiallelic compositional measure for disequilibrium is presented, and shown to relate to the classical inbreeding coefficient. The proposed compositional analysis is illustrated with biallelic glyoxalase genotypes and with two multiallelic loci from the 1000 Genomes project, the forensic microsatellite D2S441 and the ABO locus. For the latter, a haplotype based approach is used and generates predictions of the three-allele ABO genotypes for the individuals of the expanded 1000 Genomes project.
{"title":"A Logratio Approach to the Analysis of Autosomal Genotype Frequencies Across Multiple Samples","authors":"Jan Graffelman","doi":"10.1111/1755-0998.70072","DOIUrl":"10.1111/1755-0998.70072","url":null,"abstract":"<p>More than 25 years ago, Aitchison showed that the logratio principal component analysis of multiple samples of a biallelic polymorphism can evidentiate the Hardy–Weinberg law. However, hitherto compositional data analysis, that is, the logratio approach, has had little impact in population genetics. This article extends Aitchison's work to multiallelic polymorphisms showing how the Hardy–Weinberg law manifests itself in a logratio based statistical analysis with larger genotypic compositions. Excellent visualisations of equilibrium and disequilibrium are achieved by using compositional biplots based on allele and genotype frequencies taken across multiple populations. Some fundamental relationships between allelic and genotypic compositions are derived, and the close relationships between the logratio principal component analysis of allelic and genotypic compositions and the corresponding compositional biplots are established. Simulations and practical genetic data analysis are used to explore the implications of Hardy–Weinberg equilibrium for the logratio principal component analysis of genotypic compositions. A general multiallelic compositional measure for disequilibrium is presented, and shown to relate to the classical inbreeding coefficient. The proposed compositional analysis is illustrated with biallelic glyoxalase genotypes and with two multiallelic loci from the 1000 Genomes project, the forensic microsatellite D2S441 and the ABO locus. For the latter, a haplotype based approach is used and generates predictions of the three-allele ABO genotypes for the individuals of the expanded 1000 Genomes project.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625807/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145538186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ultraconserved elements (UCEs) have become a powerful tool for phylogenomics, but probe sets optimized for one lineage often perform inconsistently when applied in others. Here, we designed and tested new UCE probe sets derived from both genome and transcriptome data of an understudied molluscan class, Polyplacophora (chitons). In this study, we identified 5730 ultra-conserved elements (UCEs) from available chiton genomes and transcriptomes, and designed a set of 19,080 probes. These probes showed an average efficiency of 55% in the genome and 20% in transcriptomes, significantly outperforming available molluscan probe sets. A coalescence-based phylogenetic tree based on in silico extractions of UCEs from transcriptome and genome data successfully resolved chiton phylogeny at the species level. Relatively shorter flanking regions performed best. Where genome and transcriptome data were available for the same species, they did not always resolve as sister taxa in non-optimized methods; instead, genome- and transcriptome-derived sequences tended to form separate clades. This offers a caution for combining data harvested from published datasets. Quantifying phylogenetic signal at individual UCE loci demonstrates that the dataset retains topological stability across a range of filtering stringencies. This resource provides a foundation for integrating new genomic and transcriptomic datasets and has the potential to enable targeted sequencing of historical museum specimens. More broadly, our study highlights the importance of tailored probe design for phylogenomic studies in understudied lineages and the challenges of combining diverse molecular data types.
{"title":"Evaluating Probe Design for Phylogenomics Across Taxonomic Scales: First Steps for Applying Ultraconserved Elements in an Understudied Class (Mollusca: Polyplacophora)","authors":"Zeyuan Chen, Katarzyna Vončina, Julia D. Sigwart","doi":"10.1111/1755-0998.70076","DOIUrl":"10.1111/1755-0998.70076","url":null,"abstract":"<p>Ultraconserved elements (UCEs) have become a powerful tool for phylogenomics, but probe sets optimized for one lineage often perform inconsistently when applied in others. Here, we designed and tested new UCE probe sets derived from both genome and transcriptome data of an understudied molluscan class, Polyplacophora (chitons). In this study, we identified 5730 ultra-conserved elements (UCEs) from available chiton genomes and transcriptomes, and designed a set of 19,080 probes. These probes showed an average efficiency of 55% in the genome and 20% in transcriptomes, significantly outperforming available molluscan probe sets. A coalescence-based phylogenetic tree based on <i>in silico</i> extractions of UCEs from transcriptome and genome data successfully resolved chiton phylogeny at the species level. Relatively shorter flanking regions performed best. Where genome and transcriptome data were available for the same species, they did not always resolve as sister taxa in non-optimized methods; instead, genome- and transcriptome-derived sequences tended to form separate clades. This offers a caution for combining data harvested from published datasets. Quantifying phylogenetic signal at individual UCE loci demonstrates that the dataset retains topological stability across a range of filtering stringencies. This resource provides a foundation for integrating new genomic and transcriptomic datasets and has the potential to enable targeted sequencing of historical museum specimens. More broadly, our study highlights the importance of tailored probe design for phylogenomic studies in understudied lineages and the challenges of combining diverse molecular data types.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70076","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145501256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodney T. Richardson, Grace Avalos, Cameron J. Garland, Regina Trott, Olivia Hager, Mark J. Hepner, Clayton Raines, Karen Goodell
Terrestrial environmental DNA (eDNA) techniques have been proposed as a means of sensitive, non-lethal pollinator monitoring. To date, however, no studies have provided evidence that eDNA methods can achieve detection sensitivity on par with traditional pollinator surveys. Using a large-scale dataset of eDNA and corresponding net surveys, we show that eDNA methods enable sensitive, species-level characterisation of whole bumble bee communities, including rare and critically endangered species such as the rusty patched bumble bee (RPBB; Bombus affinis). All species present in netting surveys were detected within eDNA surveys, apart from two rare species in the socially parasitic subgenus Psithyrus (cuckoo bumble bees). Further, for rare non-parasitic species, eDNA methods exhibited similar sensitivity relative to traditional netting. Compared with flower eDNA samples, sequenced leaf surface eDNA samples resulted in significantly lower rates of Bombus detection, and these detections were likely attributable to high rates of background eDNA on environmental surfaces, perhaps due to airborne eDNA or eDNA movement during rainfall events. Lastly, we found that eDNA-based frequency of detection across replicate surveys was strongly associated with net-based measures of abundance across site visits. We conclude that the COI-based metabarcoding method we present is cost-effective and highly scalable for quantitative characterisation of at-risk bumble bee communities, providing a new approach for improving our understanding of species habitat associations.
{"title":"Sensitive Environmental DNA Methods for Low-Risk Surveillance of At-Risk Bumble Bees","authors":"Rodney T. Richardson, Grace Avalos, Cameron J. Garland, Regina Trott, Olivia Hager, Mark J. Hepner, Clayton Raines, Karen Goodell","doi":"10.1111/1755-0998.70073","DOIUrl":"10.1111/1755-0998.70073","url":null,"abstract":"<p>Terrestrial environmental DNA (eDNA) techniques have been proposed as a means of sensitive, non-lethal pollinator monitoring. To date, however, no studies have provided evidence that eDNA methods can achieve detection sensitivity on par with traditional pollinator surveys. Using a large-scale dataset of eDNA and corresponding net surveys, we show that eDNA methods enable sensitive, species-level characterisation of whole bumble bee communities, including rare and critically endangered species such as the rusty patched bumble bee (RPBB; <i>Bombus affinis</i>). All species present in netting surveys were detected within eDNA surveys, apart from two rare species in the socially parasitic subgenus <i>Psithyrus</i> (cuckoo bumble bees). Further, for rare non-parasitic species, eDNA methods exhibited similar sensitivity relative to traditional netting. Compared with flower eDNA samples, sequenced leaf surface eDNA samples resulted in significantly lower rates of <i>Bombus</i> detection, and these detections were likely attributable to high rates of background eDNA on environmental surfaces, perhaps due to airborne eDNA or eDNA movement during rainfall events. Lastly, we found that eDNA-based frequency of detection across replicate surveys was strongly associated with net-based measures of abundance across site visits. We conclude that the COI-based metabarcoding method we present is cost-effective and highly scalable for quantitative characterisation of at-risk bumble bee communities, providing a new approach for improving our understanding of species habitat associations.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12624520/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145501205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chyi Yin Gwee, Laura Tassoni, Zlatozar Boev, Teresa Tomek, Zbigniew M. Bochenski, Sahra Talamo, Jochen B. W. Wolf
Ancient DNA (aDNA) analysis remains challenging due to low endogenous DNA content of degraded samples. Hybridisation-based in-solution enrichment has emerged as an effective tool for targeting genomic regions, enhancing endogenous DNA yield while minimising overall sequencing effort. Despite their widespread use, the performance of different probe kits in capture efficiency remains insufficiently understood, particularly in nonhuman model organisms. In this study, we examined the performance of two commercially available custom probe systems, the RNA-based myBaits and DNA-based Twist, in enriching endogenous aDNA (0.9%–90.1%) extracted from crow bones collected from the early to late Holocene (100–14,000 years ago). The target regions included a panel of 104 K genome-wide single nucleotide polymorphisms (SNPs) identified from modern populations of the Corvus corone species complex. Both custom probe kits substantially improved fold enrichment and target site detection rates compared with shotgun sequencing. Between the two kits, myBaits consistently achieved higher capture efficiency. In contrast, Twist retained a greater proportion of endogenous DNA, but most of this originated from off-target regions, resulting in lower target efficiency under our experimental conditions. Twist demonstrated higher coverage in regions with extreme GC content, highlighting its utility for applications targeting GC-rich genomic regions. These findings provide insights into the performance of commercially available DNA enrichment methods and help guide study design.
{"title":"Performance of Two Custom Probe Kits for In-Solution Enrichment of Ancient Avian DNA","authors":"Chyi Yin Gwee, Laura Tassoni, Zlatozar Boev, Teresa Tomek, Zbigniew M. Bochenski, Sahra Talamo, Jochen B. W. Wolf","doi":"10.1111/1755-0998.70071","DOIUrl":"10.1111/1755-0998.70071","url":null,"abstract":"<p>Ancient DNA (aDNA) analysis remains challenging due to low endogenous DNA content of degraded samples. Hybridisation-based in-solution enrichment has emerged as an effective tool for targeting genomic regions, enhancing endogenous DNA yield while minimising overall sequencing effort. Despite their widespread use, the performance of different probe kits in capture efficiency remains insufficiently understood, particularly in nonhuman model organisms. In this study, we examined the performance of two commercially available custom probe systems, the RNA-based myBaits and DNA-based Twist, in enriching endogenous aDNA (0.9%–90.1%) extracted from crow bones collected from the early to late Holocene (100–14,000 years ago). The target regions included a panel of 104 K genome-wide single nucleotide polymorphisms (SNPs) identified from modern populations of the <i>Corvus corone</i> species complex. Both custom probe kits substantially improved fold enrichment and target site detection rates compared with shotgun sequencing. Between the two kits, myBaits consistently achieved higher capture efficiency. In contrast, Twist retained a greater proportion of endogenous DNA, but most of this originated from off-target regions, resulting in lower target efficiency under our experimental conditions. Twist demonstrated higher coverage in regions with extreme GC content, highlighting its utility for applications targeting GC-rich genomic regions. These findings provide insights into the performance of commercially available DNA enrichment methods and help guide study design.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145480243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequencing via target capture has been used to great effect in phylogenetic studies of organisms such as insects, arachnids and vertebrates. However, other taxa have received limited genomic attention despite their diversity and the intensity of research on such groups. Here, we describe generalised probe sets targeting ultraconserved elements (UCEs) for members of the crustacean orders Amphipoda and Isopoda in the superorder Peracarida. These sets employ ~10,000–100,000 probes targeting up to 10,000 loci. In silico analyses of these probe sets recovered an average of 5087 loci, while an average of 4633 was retained post-filtering. Phylogenetic analysis of these datasets resulted in well-supported trees that align with previously reconstructed relationships among the taxa selected while also providing resolution of previously uncertain nodes. Following the in silico analysis, an in vitro analysis targeting several amphipod and isopod families was conducted. This analysis extracted up to 4864 unique loci from the taxa sequenced, with an average of 1897 loci among all taxa. This represents an order-of-magnitude increase versus previously published sets, which were only able to recover < 250 UCEs among peracarid taxa. Phylogenetic analyses of the data generated in vitro resulted in well-supported trees that were resolved at both shallow and deep taxonomic levels. Both analyses demonstrate the utility of these probe sets for phylogenomic research within the Peracarida. Additional attention to members of the superorder using target enrichment will doubtlessly assist in resolving poorly understood aspects of their evolutionary history and expand current knowledge of this group.
{"title":"Brooding Phylogenomics: Target-Capture Probe Sets for the Analysis of Ultraconserved Elements in the Peracarida","authors":"Andrew G. Cannizzaro, David J. Berg","doi":"10.1111/1755-0998.70078","DOIUrl":"10.1111/1755-0998.70078","url":null,"abstract":"<p>Sequencing via target capture has been used to great effect in phylogenetic studies of organisms such as insects, arachnids and vertebrates. However, other taxa have received limited genomic attention despite their diversity and the intensity of research on such groups. Here, we describe generalised probe sets targeting ultraconserved elements (UCEs) for members of the crustacean orders Amphipoda and Isopoda in the superorder Peracarida. These sets employ ~10,000–100,000 probes targeting up to 10,000 loci. In silico analyses of these probe sets recovered an average of 5087 loci, while an average of 4633 was retained post-filtering. Phylogenetic analysis of these datasets resulted in well-supported trees that align with previously reconstructed relationships among the taxa selected while also providing resolution of previously uncertain nodes. Following the in silico analysis, an in vitro analysis targeting several amphipod and isopod families was conducted. This analysis extracted up to 4864 unique loci from the taxa sequenced, with an average of 1897 loci among all taxa. This represents an order-of-magnitude increase versus previously published sets, which were only able to recover < 250 UCEs among peracarid taxa. Phylogenetic analyses of the data generated in vitro resulted in well-supported trees that were resolved at both shallow and deep taxonomic levels. Both analyses demonstrate the utility of these probe sets for phylogenomic research within the Peracarida. Additional attention to members of the superorder using target enrichment will doubtlessly assist in resolving poorly understood aspects of their evolutionary history and expand current knowledge of this group.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627908/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145480222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Viruel, Calum J. Sweeney, Rachel Day, Kaitalin White, Wayne Dawson, Bradley Myer, Kelvin Floyd, Marcella Corcoran, Carey Kelting, Sally Poncet, Félix Forest, Colin Clubbe, Rosemary J. Newton
Climate change and invasive species are leading drivers of biodiversity loss, with island ecosystems being especially vulnerable. South Georgia, a remote sub-Antarctic island, is 170 km long with approximately 30,000 ha of vegetated coastal areas, as snow and ice dominate the inland regions. Human activities on the island have historically introduced non-native species, resulting in 41 introduced vascular plant species compared with only 24 native ones. To address this imbalance, the South Georgia Non-Native Plant Management Strategy was implemented (2016–2020) to control non-native plant populations. We assessed emergent seedlings from South Georgia soil samples and wind-dispersed seeds to determine which species persist in the soil seed bank and contribute to dispersal. Using a molecular barcoding approach, we evaluated traditional markers (rbcL and matK) and optimized a high-throughput Angiosperms353 sequencing pipeline for accurate seedling identification. We generated a reference library covering all native and non-native species and applied this to 1,498 emergent seedlings and 737 trapped seeds. Molecular barcoding identified 21 species, including 10 non-natives and 11 natives. Strikingly, 84% of emergent seedlings were non-native, with Class III invasive species (Cerastium fontanum, Poa annua, Taraxacum officinale) dominating across most sites and in all wind traps. By contrast, Class I and II species occurred rarely and only at a few sites, indicating that management efforts have substantially reduced their spread, though viable seeds persist in the soil. These findings highlight both the continued threat from persistent seed banks of dominant invaders and the value of molecular barcoding for long-term monitoring. Our approach provides a framework for biosecurity and restoration management in South Georgia and other vulnerable ecosystems under climate change pressures.
{"title":"Phylogenomic Barcoding of Soil Seed Bank–Persistent and Wind-Dispersed Non-Native Plant Species in South Georgia","authors":"Juan Viruel, Calum J. Sweeney, Rachel Day, Kaitalin White, Wayne Dawson, Bradley Myer, Kelvin Floyd, Marcella Corcoran, Carey Kelting, Sally Poncet, Félix Forest, Colin Clubbe, Rosemary J. Newton","doi":"10.1111/1755-0998.70068","DOIUrl":"10.1111/1755-0998.70068","url":null,"abstract":"<p>Climate change and invasive species are leading drivers of biodiversity loss, with island ecosystems being especially vulnerable. South Georgia, a remote sub-Antarctic island, is 170 km long with approximately 30,000 ha of vegetated coastal areas, as snow and ice dominate the inland regions. Human activities on the island have historically introduced non-native species, resulting in 41 introduced vascular plant species compared with only 24 native ones. To address this imbalance, the South Georgia Non-Native Plant Management Strategy was implemented (2016–2020) to control non-native plant populations. We assessed emergent seedlings from South Georgia soil samples and wind-dispersed seeds to determine which species persist in the soil seed bank and contribute to dispersal. Using a molecular barcoding approach, we evaluated traditional markers (<i>rbc</i>L and <i>mat</i>K) and optimized a high-throughput Angiosperms353 sequencing pipeline for accurate seedling identification. We generated a reference library covering all native and non-native species and applied this to 1,498 emergent seedlings and 737 trapped seeds. Molecular barcoding identified 21 species, including 10 non-natives and 11 natives. Strikingly, 84% of emergent seedlings were non-native, with Class III invasive species (<i>Cerastium fontanum</i>, <i>Poa annua</i>, <i>Taraxacum officinale</i>) dominating across most sites and in all wind traps. By contrast, Class I and II species occurred rarely and only at a few sites, indicating that management efforts have substantially reduced their spread, though viable seeds persist in the soil. These findings highlight both the continued threat from persistent seed banks of dominant invaders and the value of molecular barcoding for long-term monitoring. Our approach provides a framework for biosecurity and restoration management in South Georgia and other vulnerable ecosystems under climate change pressures.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145470302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clarisse Lemonnier, Benjamin Alric, Isabelle Domaizon, Frédéric Rimet
Studying the taxonomic composition of phytoplankton has been revolutionised by the emergence of metabarcoding approaches which theoretically provides access to all phytoplankton diversity. However, metabarcoding has its limitations, including biases related to primers efficiency in covering all phytoplanktonic taxonomic groups, biases introduced during amplification and due to databases completion. To assess the importance of these biases, we compared the compositions of phytoplankton in two large peri-alpine lakes, using metabarcoding with primers targeting hypervariable regions of chloroplastic 16S and 23S rRNA, designed to cover the full taxonomic diversity of phytoplankton, and shotgun sequencing. To be able to directly compare the two methods, we extracted reads coming from the full sequences of these rRNA genes in shotgun sequencing data and used the same reference database for taxonomic assignation. The results show that the relative abundances of dominant groups of phytoplankton, including Cyanobacteria, Cryptophyta, Haptophyta, and Bacillariophyta, are consistent between the two approaches, validating the primers used for metabarcoding analysis. However, two phyla, Chlorophyta and non-diatom Ochrophyta showed greater divergence in their relative abundance, due to under-amplification or lack of amplification of certain taxonomic groups in metabarcoding. This is likely due to the high diversity of these groups, not covered yet by the reference databases, as well as a possible presence of introns in their choloroplastic ribosomal genes. These limitations are expected to be overcome with increasing reference database completion and the use of long-read metabarcoding. Overall, our study confirms the relevance of using chloroplastic primers for assessing the phytoplankton composition of lakes.
{"title":"Comparison of Metabarcoding and Shotgun Sequencing Confirms the Relevance of Chloroplastic rRNA Genes to Assess Community Structure of Lake Phytoplankton","authors":"Clarisse Lemonnier, Benjamin Alric, Isabelle Domaizon, Frédéric Rimet","doi":"10.1111/1755-0998.70077","DOIUrl":"10.1111/1755-0998.70077","url":null,"abstract":"<p>Studying the taxonomic composition of phytoplankton has been revolutionised by the emergence of metabarcoding approaches which theoretically provides access to all phytoplankton diversity. However, metabarcoding has its limitations, including biases related to primers efficiency in covering all phytoplanktonic taxonomic groups, biases introduced during amplification and due to databases completion. To assess the importance of these biases, we compared the compositions of phytoplankton in two large peri-alpine lakes, using metabarcoding with primers targeting hypervariable regions of chloroplastic 16S and 23S rRNA, designed to cover the full taxonomic diversity of phytoplankton, and shotgun sequencing. To be able to directly compare the two methods, we extracted reads coming from the full sequences of these rRNA genes in shotgun sequencing data and used the same reference database for taxonomic assignation. The results show that the relative abundances of dominant groups of phytoplankton, including <i>Cyanobacteria</i>, <i>Cryptophyta</i>, <i>Haptophyta</i>, and <i>Bacillariophyta</i>, are consistent between the two approaches, validating the primers used for metabarcoding analysis. However, two phyla, <i>Chlorophyta</i> and non-diatom <i>Ochrophyta</i> showed greater divergence in their relative abundance, due to under-amplification or lack of amplification of certain taxonomic groups in metabarcoding. This is likely due to the high diversity of these groups, not covered yet by the reference databases, as well as a possible presence of introns in their choloroplastic ribosomal genes. These limitations are expected to be overcome with increasing reference database completion and the use of long-read metabarcoding. Overall, our study confirms the relevance of using chloroplastic primers for assessing the phytoplankton composition of lakes.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 1","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145470271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}