Judit Burgaya, Bamu F Damaris, Jenny Fiebig, Marco Galardini
{"title":"microGWAS: a computational pipeline to perform large-scale bacterial genome-wide association studies.","authors":"Judit Burgaya, Bamu F Damaris, Jenny Fiebig, Marco Galardini","doi":"10.1099/mgen.0.001349","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying genetic variants associated with bacterial phenotypes, such as virulence, host preference and antimicrobial resistance, has great potential for a better understanding of the mechanisms involved in these traits. The availability of large collections of bacterial genomes has made genome-wide association studies (GWAS) a common approach for this purpose. The need to employ multiple software tools for data pre- and postprocessing limits the application of these methods by experienced bioinformaticians. To address this issue, we have developed a pipeline to perform bacterial GWAS from a set of assemblies and annotations, with multiple phenotypes as targets. The associations are run using five sets of genetic variants: unitigs, gene presence/absence, rare variants (i.e. gene burden test), gene-cluster-specific <i>k</i>-mers and all unitigs jointly. All variants passing the association threshold are further annotated to identify overrepresented biological processes and pathways. The results can be further augmented by generating a phylogenetic tree and predicting the presence of antimicrobial resistance and virulence-associated genes. We tested the microGWAS pipeline on a previously reported dataset on <i>Escherichia coli</i> virulence, successfully identifying the causal variants and providing further interpretation of the association results. The microGWAS pipeline integrates state-of-the-art tools to perform bacterial GWAS into a single, user-friendly and reproducible pipeline, allowing for the democratization of these analyses. The pipeline, together with its documentation, can be accessed at https://github.com/microbial-pangenomes-lab/microGWAS.</p>","PeriodicalId":18487,"journal":{"name":"Microbial Genomics","volume":"11 2","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mgen.0.001349","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Identifying genetic variants associated with bacterial phenotypes, such as virulence, host preference and antimicrobial resistance, has great potential for a better understanding of the mechanisms involved in these traits. The availability of large collections of bacterial genomes has made genome-wide association studies (GWAS) a common approach for this purpose. The need to employ multiple software tools for data pre- and postprocessing limits the application of these methods by experienced bioinformaticians. To address this issue, we have developed a pipeline to perform bacterial GWAS from a set of assemblies and annotations, with multiple phenotypes as targets. The associations are run using five sets of genetic variants: unitigs, gene presence/absence, rare variants (i.e. gene burden test), gene-cluster-specific k-mers and all unitigs jointly. All variants passing the association threshold are further annotated to identify overrepresented biological processes and pathways. The results can be further augmented by generating a phylogenetic tree and predicting the presence of antimicrobial resistance and virulence-associated genes. We tested the microGWAS pipeline on a previously reported dataset on Escherichia coli virulence, successfully identifying the causal variants and providing further interpretation of the association results. The microGWAS pipeline integrates state-of-the-art tools to perform bacterial GWAS into a single, user-friendly and reproducible pipeline, allowing for the democratization of these analyses. The pipeline, together with its documentation, can be accessed at https://github.com/microbial-pangenomes-lab/microGWAS.
期刊介绍:
Microbial Genomics (MGen) is a fully open access, mandatory open data and peer-reviewed journal publishing high-profile original research on archaea, bacteria, microbial eukaryotes and viruses.