Rapid, reference-free identification of bacterial pathogen transmission using optimized split k-mer analysis.

IF 4 2区 生物学 Q1 GENETICS & HEREDITY Microbial Genomics Pub Date : 2025-03-01 DOI:10.1099/mgen.0.001347
Christopher H Connor, Charlie K Higgs, Kristy Horan, Jason C Kwong, M Lindsay Grayson, Benjamin P Howden, Torsten Seemann, Claire L Gorrie, Norelle L Sherry
{"title":"Rapid, reference-free identification of bacterial pathogen transmission using optimized split <i>k</i>-mer analysis.","authors":"Christopher H Connor, Charlie K Higgs, Kristy Horan, Jason C Kwong, M Lindsay Grayson, Benjamin P Howden, Torsten Seemann, Claire L Gorrie, Norelle L Sherry","doi":"10.1099/mgen.0.001347","DOIUrl":null,"url":null,"abstract":"<p><p>Infections caused by multidrug-resistant organisms (MDROs) are difficult to treat and often life threatening and place a burden on the healthcare system. Minimizing the transmission of MDROs in hospitals is a global priority with genomics proving to be a powerful tool for identifying the transmission of MDROs. To optimize the utility of genomics for prospective infection control surveillance, results must be available in real time, reproducible and simple to communicate to clinicians. Traditional reference-based approaches suffer from several limitations for prospective genomic surveillance. Whilst reference-free or pairwise genome comparisons avoid some of these limitations, they can be computationally intensive and time consuming. Split <i>k</i>-mer analysis (SKA) offers a viable alternative facilitating rapid reference-free pairwise comparisons of genomic data, but the optimum SKA parameters for the detection of transmission have not been determined. Additionally, the accuracy of SKA-based inferences has not been measured, nor whether modified quality control parameters are required. Here, we explore the performance of 60 SKA parameter combinations across 50 simulations to quantify the false negative and positive SNP proportions for <i>Escherichia coli</i>, <i>Enterococcus faecium</i>, <i>Klebsiella pneumoniae</i> and <i>Staphylococcus aureus</i>. Using the optimum parameter combination, we explore concordance between SKA, multilocus sequence typing (MLST), core genome MLST (cgMLST) and Snippy in a real-world dataset. Lastly, we investigate whether simulated plasmid gain or loss could impact SNP detection with SKA. This work identifies that the use of SKA with sequencing reads, a <i>k</i>-mer length of 19 and a minor allele frequency filter of 0.01 is optimal for MDRO transmission detection. Whilst SNP detection with SKA (when used with sequencing reads) undercalls SNPs compared to Snippy, it is significantly faster, especially with larger datasets. SKA has excellent concordance with MLST and cgMLST and is not impacted by simulated plasmid movement. We propose that the use of SKA for the detection of bacterial pathogen transmission is superior to traditional methodologies, capable of providing results in a much shorter timeframe.</p>","PeriodicalId":18487,"journal":{"name":"Microbial Genomics","volume":"11 3","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11936374/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mgen.0.001347","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Infections caused by multidrug-resistant organisms (MDROs) are difficult to treat and often life threatening and place a burden on the healthcare system. Minimizing the transmission of MDROs in hospitals is a global priority with genomics proving to be a powerful tool for identifying the transmission of MDROs. To optimize the utility of genomics for prospective infection control surveillance, results must be available in real time, reproducible and simple to communicate to clinicians. Traditional reference-based approaches suffer from several limitations for prospective genomic surveillance. Whilst reference-free or pairwise genome comparisons avoid some of these limitations, they can be computationally intensive and time consuming. Split k-mer analysis (SKA) offers a viable alternative facilitating rapid reference-free pairwise comparisons of genomic data, but the optimum SKA parameters for the detection of transmission have not been determined. Additionally, the accuracy of SKA-based inferences has not been measured, nor whether modified quality control parameters are required. Here, we explore the performance of 60 SKA parameter combinations across 50 simulations to quantify the false negative and positive SNP proportions for Escherichia coli, Enterococcus faecium, Klebsiella pneumoniae and Staphylococcus aureus. Using the optimum parameter combination, we explore concordance between SKA, multilocus sequence typing (MLST), core genome MLST (cgMLST) and Snippy in a real-world dataset. Lastly, we investigate whether simulated plasmid gain or loss could impact SNP detection with SKA. This work identifies that the use of SKA with sequencing reads, a k-mer length of 19 and a minor allele frequency filter of 0.01 is optimal for MDRO transmission detection. Whilst SNP detection with SKA (when used with sequencing reads) undercalls SNPs compared to Snippy, it is significantly faster, especially with larger datasets. SKA has excellent concordance with MLST and cgMLST and is not impacted by simulated plasmid movement. We propose that the use of SKA for the detection of bacterial pathogen transmission is superior to traditional methodologies, capable of providing results in a much shorter timeframe.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用优化的分裂k-mer分析快速、无参考鉴定细菌病原体传播。
耐多药生物引起的感染难以治疗,往往危及生命,给卫生保健系统带来负担。最大限度地减少mdro在医院的传播是全球的优先事项,基因组学被证明是识别mdro传播的有力工具。为了优化基因组学对未来感染控制监测的效用,结果必须实时可用,可重复且易于与临床医生沟通。传统的基于参考的方法在前瞻性基因组监测方面存在一些局限性。虽然无参考或两两基因组比较避免了这些限制,但它们可能是计算密集和耗时的。分裂k-mer分析(SKA)提供了一种可行的替代方法,可促进基因组数据的快速无参考两两比较,但用于检测传播的最佳SKA参数尚未确定。此外,还没有测量基于ska的推断的准确性,也没有测量是否需要修改质量控制参数。在这里,我们探讨了60个SKA参数组合在50个模拟中的表现,以量化大肠杆菌、屎肠球菌、肺炎克雷伯菌和金黄色葡萄球菌的假阴性和阳性SNP比例。利用最优参数组合,研究了SKA、多位点序列分型(multilocus sequence typing, MLST)、核心基因组分型(core genome MLST, cgMLST)和Snippy在真实数据集中的一致性。最后,我们研究了模拟质粒的增益或丢失是否会影响SKA的SNP检测。这项研究表明,使用SKA测序读数,k-mer长度为19,等位基因频率滤波器为0.01是MDRO传输检测的最佳选择。虽然与Snippy相比,SKA(当与测序读数一起使用时)的SNP检测较少,但它的速度要快得多,特别是在更大的数据集上。SKA与MLST和cgMLST具有良好的一致性,不受模拟质粒运动的影响。我们建议使用SKA检测细菌病原体传播优于传统方法,能够在更短的时间内提供结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Microbial Genomics
Microbial Genomics Medicine-Epidemiology
CiteScore
6.60
自引率
2.60%
发文量
153
审稿时长
12 weeks
期刊介绍: Microbial Genomics (MGen) is a fully open access, mandatory open data and peer-reviewed journal publishing high-profile original research on archaea, bacteria, microbial eukaryotes and viruses.
期刊最新文献
Comparative genomic analysis of Clostridioides difficile isolates from symptomatic and asymptomatic paediatric patients. Genomic features of pneumococcal strains isolated from paediatric patients with invasive disease during pneumococcal conjugate vaccine introduction in Lima, Peru. Comparative genomics of Pantoea allii lineages and distribution of ecologically relevant traits. Genomic surveillance of Salmonella enterica serotype Minnesota strains from poultry products imported into South Africa. CRESSENT: a bioinformatics toolkit to explore and improve ssDNA virus annotation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1