Xinyi Yu, Zizhen Tang, Zhen Zhang, Yongxiu Song, Hao He, Yi Shi, Jiaqing Hou, Yan Yu
With the growing accessibility of low cost genome skimming, large-scale recovery of target genes in non-model species has become feasible, providing strong data for phylogenomic and evolutionary studies. However, existing tools for data assembly including Read2Tree, HybPiper, and GeneMiner still suffer from computational difficulty and assembly errors when processing genome skimming data, especially for low-level taxa that lack closely related genome references. Here, we present GeneMiner2, an updated version of GeneMiner, as an efficient and automated gene assembly and phylogenetic tool to overcome these limitations. Compared with the previous version, GeneMiner2 introduces three key optimizations by deploying a two-level hash table that speeds up k-mer filtering, applying fine-grained read selection with strand-orientation and structural-anomaly detection to handle complex heterozygous regions, and incorporating adaptive k-mer selection in the de Bruijn assembler. Together enabling accurate assembly of target genes without the need for closely related references. Using simulated datasets with low coverage and divergent references, we validated the robustness and improved accuracy of GeneMiner2 in the assembly of single-copy genes. Moreover, using genome skimming data from the subfamily Apioideae (Apiaceae), GeneMiner2 outperformed Read2Tree and HybPiper in accuracy, completeness, and speed. Equipped with a user-friendly graphical interface, GeneMiner2 integrates functions including assembly quality control, paralog detection, tree reconstruction, and divergence time calibration, offering a robust and efficient solution for phylogenetic inference using genomic level data. GeneMiner2 supports cross-platform operation. GeneMiner2 provides a user-friendly graphical interface for desktop users, and its high-performance command-line interface is specifically optimised for high-throughput analyses on Linux servers and computing clusters. Installation instructions, detailed documentation, and source code are available on GitHub (https://github.com/sculab/GeneMiner2).
{"title":"GeneMiner2: Accurate and Automated Recovery of Genes From Genome Skimming Data","authors":"Xinyi Yu, Zizhen Tang, Zhen Zhang, Yongxiu Song, Hao He, Yi Shi, Jiaqing Hou, Yan Yu","doi":"10.1111/1755-0998.70111","DOIUrl":"10.1111/1755-0998.70111","url":null,"abstract":"<p>With the growing accessibility of low cost genome skimming, large-scale recovery of target genes in non-model species has become feasible, providing strong data for phylogenomic and evolutionary studies. However, existing tools for data assembly including Read2Tree, HybPiper, and GeneMiner still suffer from computational difficulty and assembly errors when processing genome skimming data, especially for low-level taxa that lack closely related genome references. Here, we present GeneMiner2, an updated version of GeneMiner, as an efficient and automated gene assembly and phylogenetic tool to overcome these limitations. Compared with the previous version, GeneMiner2 introduces three key optimizations by deploying a two-level hash table that speeds up <i>k</i>-mer filtering, applying fine-grained read selection with strand-orientation and structural-anomaly detection to handle complex heterozygous regions, and incorporating adaptive <i>k</i>-mer selection in the de Bruijn assembler. Together enabling accurate assembly of target genes without the need for closely related references. Using simulated datasets with low coverage and divergent references, we validated the robustness and improved accuracy of GeneMiner2 in the assembly of single-copy genes. Moreover, using genome skimming data from the subfamily Apioideae (Apiaceae), GeneMiner2 outperformed Read2Tree and HybPiper in accuracy, completeness, and speed. Equipped with a user-friendly graphical interface, GeneMiner2 integrates functions including assembly quality control, paralog detection, tree reconstruction, and divergence time calibration, offering a robust and efficient solution for phylogenetic inference using genomic level data. GeneMiner2 supports cross-platform operation. GeneMiner2 provides a user-friendly graphical interface for desktop users, and its high-performance command-line interface is specifically optimised for high-throughput analyses on Linux servers and computing clusters. Installation instructions, detailed documentation, and source code are available on GitHub (https://github.com/sculab/GeneMiner2).</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 2","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12914762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146218124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Madison E. Patch, Graham B. Goodman, Sara B. Weinstein
Molecular techniques such as DNA metabarcoding are increasingly used to characterise parasite communities. However, relatively few studies have examined how sample processing methods influence detection rates. We used faecal samples collected from free-roaming horses to evaluate how parasite density and pre-processing methods influenced quantification of parasite richness, community composition, and detection of different taxa. Methods that concentrated parasites substantially increased parasite detection, especially at the low infection levels often seen in wildlife. Specifically, we found that DNA extracts from larval coproculture and a newly developed egg concentration approach detected approximately twice as many species and genera as extracts made directly from faecal matter. Although parasite richness was consistently lower in these faecal subsamples, overall parasite communities were still similar between pre-processing methods. Ultimately, the optimal method depends on research constraints and goals. Working with parasite larvae is more time intensive, but lower cost as larval parasites can be extracted using a lysis buffer approach which performs similarly to commercial extraction kits. DNA extraction from faecal subsamples misses rare and common taxa but minimises field processing. Our novel egg concentration method offers a compromise between relatively rapid processing and high sensitivity.
{"title":"Scoop That Poop: Optimising Faecal Sample Pre-Processing for Parasite Metabarcoding","authors":"Madison E. Patch, Graham B. Goodman, Sara B. Weinstein","doi":"10.1111/1755-0998.70112","DOIUrl":"10.1111/1755-0998.70112","url":null,"abstract":"<p>Molecular techniques such as DNA metabarcoding are increasingly used to characterise parasite communities. However, relatively few studies have examined how sample processing methods influence detection rates. We used faecal samples collected from free-roaming horses to evaluate how parasite density and pre-processing methods influenced quantification of parasite richness, community composition, and detection of different taxa. Methods that concentrated parasites substantially increased parasite detection, especially at the low infection levels often seen in wildlife. Specifically, we found that DNA extracts from larval coproculture and a newly developed egg concentration approach detected approximately twice as many species and genera as extracts made directly from faecal matter. Although parasite richness was consistently lower in these faecal subsamples, overall parasite communities were still similar between pre-processing methods. Ultimately, the optimal method depends on research constraints and goals. Working with parasite larvae is more time intensive, but lower cost as larval parasites can be extracted using a lysis buffer approach which performs similarly to commercial extraction kits. DNA extraction from faecal subsamples misses rare and common taxa but minimises field processing. Our novel egg concentration method offers a compromise between relatively rapid processing and high sensitivity.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 2","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12911221/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146211782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damian Käch, Miguel Loera-Sánchez, Beat Reidy, Bruno Studer, Roland Kölliker
Permanent grasslands are predominantly composed of allogamous plant species that exhibit high levels of plant genetic diversity (PGD) within their populations. Grasslands with high PGD are more resilient to environmental stress and constitute valuable reservoirs of genetic resources for plant breeding. Therefore, monitoring PGD is the basis for detecting changes in PGD and for intervening accordingly. However, PGD monitoring is often neglected in biodiversity reports due to difficulties in taking representative samples and in using standardised and affordable indicators of PGD. Here we successfully applied two common approaches, multispecies amplicon sequencing (MSAS) and genotyping-by-sequencing (GBS), to assess PGD of agronomically relevant grassland species. Using MSAS, we were able to taxonomically distinguish five species (Dactylis glomerata L., Festuca pratensisHuds.,Lolium perenne L., Trifolium pratense L. and T. repens L.) from multispecies samples and differentiate accessions within species, with fixation index (FST) values ranging from 0.014 for T. repens to 0.089 for L. perenne. Based on an extended L. perenne sample set containing mixtures of two cultivars at different ratios, mixtures containing both cultivars at 50% separated from the corresponding cultivars according to this ratio using MSAS and GBS. Furthermore, GBS enabled separation of samples containing two cultivars at a 75:25 ratio from the corresponding cultivars and the 50:50-ratio samples. These results indicate complementing applications of the two approaches in PGD monitoring. While we anticipate that MSAS with its cost-effectiveness could be applied to large-scale PGD monitoring, GBS with its lower detection limit could be applied to studies where cultivar composition shifts are of interest.
{"title":"Genetic Diversity Unveiled: Cost-Effective Methods for Grassland Species","authors":"Damian Käch, Miguel Loera-Sánchez, Beat Reidy, Bruno Studer, Roland Kölliker","doi":"10.1111/1755-0998.70108","DOIUrl":"10.1111/1755-0998.70108","url":null,"abstract":"<p>Permanent grasslands are predominantly composed of allogamous plant species that exhibit high levels of plant genetic diversity (PGD) within their populations. Grasslands with high PGD are more resilient to environmental stress and constitute valuable reservoirs of genetic resources for plant breeding. Therefore, monitoring PGD is the basis for detecting changes in PGD and for intervening accordingly. However, PGD monitoring is often neglected in biodiversity reports due to difficulties in taking representative samples and in using standardised and affordable indicators of PGD. Here we successfully applied two common approaches, multispecies amplicon sequencing (MSAS) and genotyping-by-sequencing (GBS), to assess PGD of agronomically relevant grassland species. Using MSAS, we were able to taxonomically distinguish five species (<i>Dactylis glomerata</i> L., <i>Festuca pratensis</i> <span>Huds.,</span> <i>Lolium perenne</i> L., <i>Trifolium pratense</i> L. and <i>T. repens</i> L.) from multispecies samples and differentiate accessions within species, with fixation index (<i>F</i><sub>ST</sub>) values ranging from 0.014 for <i>T. repens</i> to 0.089 for <i>L. perenne.</i> Based on an extended <i>L. perenne</i> sample set containing mixtures of two cultivars at different ratios, mixtures containing both cultivars at 50% separated from the corresponding cultivars according to this ratio using MSAS and GBS. Furthermore, GBS enabled separation of samples containing two cultivars at a 75:25 ratio from the corresponding cultivars and the 50:50-ratio samples. These results indicate complementing applications of the two approaches in PGD monitoring. While we anticipate that MSAS with its cost-effectiveness could be applied to large-scale PGD monitoring, GBS with its lower detection limit could be applied to studies where cultivar composition shifts are of interest.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"26 2","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12914157/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146211738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}