Christopher H Connor, Charlie K Higgs, Kristy Horan, Jason C Kwong, M Lindsay Grayson, Benjamin P Howden, Torsten Seemann, Claire L Gorrie, Norelle L Sherry
{"title":"Rapid, reference-free identification of bacterial pathogen transmission using optimized split <i>k</i>-mer analysis.","authors":"Christopher H Connor, Charlie K Higgs, Kristy Horan, Jason C Kwong, M Lindsay Grayson, Benjamin P Howden, Torsten Seemann, Claire L Gorrie, Norelle L Sherry","doi":"10.1099/mgen.0.001347","DOIUrl":null,"url":null,"abstract":"<p><p>Infections caused by multidrug-resistant organisms (MDROs) are difficult to treat and often life threatening and place a burden on the healthcare system. Minimizing the transmission of MDROs in hospitals is a global priority with genomics proving to be a powerful tool for identifying the transmission of MDROs. To optimize the utility of genomics for prospective infection control surveillance, results must be available in real time, reproducible and simple to communicate to clinicians. Traditional reference-based approaches suffer from several limitations for prospective genomic surveillance. Whilst reference-free or pairwise genome comparisons avoid some of these limitations, they can be computationally intensive and time consuming. Split <i>k</i>-mer analysis (SKA) offers a viable alternative facilitating rapid reference-free pairwise comparisons of genomic data, but the optimum SKA parameters for the detection of transmission have not been determined. Additionally, the accuracy of SKA-based inferences has not been measured, nor whether modified quality control parameters are required. Here, we explore the performance of 60 SKA parameter combinations across 50 simulations to quantify the false negative and positive SNP proportions for <i>Escherichia coli</i>, <i>Enterococcus faecium</i>, <i>Klebsiella pneumoniae</i> and <i>Staphylococcus aureus</i>. Using the optimum parameter combination, we explore concordance between SKA, multilocus sequence typing (MLST), core genome MLST (cgMLST) and Snippy in a real-world dataset. Lastly, we investigate whether simulated plasmid gain or loss could impact SNP detection with SKA. This work identifies that the use of SKA with sequencing reads, a <i>k</i>-mer length of 19 and a minor allele frequency filter of 0.01 is optimal for MDRO transmission detection. Whilst SNP detection with SKA (when used with sequencing reads) undercalls SNPs compared to Snippy, it is significantly faster, especially with larger datasets. SKA has excellent concordance with MLST and cgMLST and is not impacted by simulated plasmid movement. We propose that the use of SKA for the detection of bacterial pathogen transmission is superior to traditional methodologies, capable of providing results in a much shorter timeframe.</p>","PeriodicalId":18487,"journal":{"name":"Microbial Genomics","volume":"11 3","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mgen.0.001347","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Infections caused by multidrug-resistant organisms (MDROs) are difficult to treat and often life threatening and place a burden on the healthcare system. Minimizing the transmission of MDROs in hospitals is a global priority with genomics proving to be a powerful tool for identifying the transmission of MDROs. To optimize the utility of genomics for prospective infection control surveillance, results must be available in real time, reproducible and simple to communicate to clinicians. Traditional reference-based approaches suffer from several limitations for prospective genomic surveillance. Whilst reference-free or pairwise genome comparisons avoid some of these limitations, they can be computationally intensive and time consuming. Split k-mer analysis (SKA) offers a viable alternative facilitating rapid reference-free pairwise comparisons of genomic data, but the optimum SKA parameters for the detection of transmission have not been determined. Additionally, the accuracy of SKA-based inferences has not been measured, nor whether modified quality control parameters are required. Here, we explore the performance of 60 SKA parameter combinations across 50 simulations to quantify the false negative and positive SNP proportions for Escherichia coli, Enterococcus faecium, Klebsiella pneumoniae and Staphylococcus aureus. Using the optimum parameter combination, we explore concordance between SKA, multilocus sequence typing (MLST), core genome MLST (cgMLST) and Snippy in a real-world dataset. Lastly, we investigate whether simulated plasmid gain or loss could impact SNP detection with SKA. This work identifies that the use of SKA with sequencing reads, a k-mer length of 19 and a minor allele frequency filter of 0.01 is optimal for MDRO transmission detection. Whilst SNP detection with SKA (when used with sequencing reads) undercalls SNPs compared to Snippy, it is significantly faster, especially with larger datasets. SKA has excellent concordance with MLST and cgMLST and is not impacted by simulated plasmid movement. We propose that the use of SKA for the detection of bacterial pathogen transmission is superior to traditional methodologies, capable of providing results in a much shorter timeframe.
期刊介绍:
Microbial Genomics (MGen) is a fully open access, mandatory open data and peer-reviewed journal publishing high-profile original research on archaea, bacteria, microbial eukaryotes and viruses.