Demixer: a probabilistic generative model to delineate different strains of a microbial species in a mixed infection sample.

Brintha Vp, Manikandan Narayanan
{"title":"Demixer: a probabilistic generative model to delineate different strains of a microbial species in a mixed infection sample.","authors":"Brintha Vp, Manikandan Narayanan","doi":"10.1093/bioinformatics/btaf139","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Multi-drug resistant or hetero-resistant tuberculosis (TB) hinders the successful treatment of TB. Hetero-resistant TB occurs when multiple strains of the TB-causing bacterium with varying degrees of drug susceptibility are present in an individual. Existing studies predicting the proportion and identity of strains in a mixed infection sample rely on a reference database of known strains. A main challenge then is to identify de novo strains not present in the reference database, while quantifying the proportion of known strains.</p><p><strong>Results: </strong>We present Demixer, a probabilistic generative model that uses a combination of reference-based and reference-free techniques to delineate mixed infection strains in whole genome sequencing (WGS) data. Demixer extends a topic model widely used in text mining to represent known mutations and discover novel ones. Parallelization and other heuristics enabled Demixer to process large datasets like CRyPTIC (Comprehensive Resistance Prediction for Tuberculosis: an International Consortium). In both synthetic and experimental benchmark datasets, our proposed method precisely detected the identity (e.g. 91.67% accuracy on the experimental in vitro dataset) as well as the proportions of the mixed strains. In real-world applications, Demixer revealed novel high confidence mixed infections (101 out of 1963 Malawi samples analysed), and new insights into the global frequency of mixed infection (2% at the most stringent threshold in the CRyPTIC dataset) and its significant association to drug resistance. Our approach is generalizable and hence applicable to any bacterial and viral WGS data.</p><p><strong>Availability and implementation: </strong>All code relevant to Demixer is available at https://github.com/BIRDSgroup/Demixer.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12011361/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Multi-drug resistant or hetero-resistant tuberculosis (TB) hinders the successful treatment of TB. Hetero-resistant TB occurs when multiple strains of the TB-causing bacterium with varying degrees of drug susceptibility are present in an individual. Existing studies predicting the proportion and identity of strains in a mixed infection sample rely on a reference database of known strains. A main challenge then is to identify de novo strains not present in the reference database, while quantifying the proportion of known strains.

Results: We present Demixer, a probabilistic generative model that uses a combination of reference-based and reference-free techniques to delineate mixed infection strains in whole genome sequencing (WGS) data. Demixer extends a topic model widely used in text mining to represent known mutations and discover novel ones. Parallelization and other heuristics enabled Demixer to process large datasets like CRyPTIC (Comprehensive Resistance Prediction for Tuberculosis: an International Consortium). In both synthetic and experimental benchmark datasets, our proposed method precisely detected the identity (e.g. 91.67% accuracy on the experimental in vitro dataset) as well as the proportions of the mixed strains. In real-world applications, Demixer revealed novel high confidence mixed infections (101 out of 1963 Malawi samples analysed), and new insights into the global frequency of mixed infection (2% at the most stringent threshold in the CRyPTIC dataset) and its significant association to drug resistance. Our approach is generalizable and hence applicable to any bacterial and viral WGS data.

Availability and implementation: All code relevant to Demixer is available at https://github.com/BIRDSgroup/Demixer.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
解混机:一种概率生成模型,用于描述混合感染样本中不同菌株的微生物种类。
动机:耐多药或耐异药结核病阻碍了结核病的成功治疗。当个体中存在多种具有不同程度药物敏感性的结核杆菌菌株时,就会发生异耐药结核。预测混合感染样本中菌株的比例和身份的现有研究依赖于已知菌株的参考数据库。因此,一个主要的挑战是确定参考数据库中不存在的新菌株,同时量化已知菌株的比例。结果:我们提出了一种概率生成模型Demixer,该模型结合了基于参考和无参考的技术来描述全基因组测序(WGS)数据中的混合感染菌株。Demixer扩展了广泛用于文本挖掘的主题模型,以表示已知的突变并发现新的突变。并行化和其他启发式方法使Demixer能够处理大型数据集,如CRyPTIC(结核病综合耐药性预测:国际联盟)。在合成基准数据集和实验基准数据集上,我们提出的方法精确地检测了混合菌株的身份(例如,在实验体外数据集上准确率为91.67%)以及混合菌株的比例。在实际应用中,Demixer揭示了新的高可信度混合感染(分析了1963个马拉维样本中的101个),并对混合感染的全球频率(在CRyPTIC数据集中最严格的阈值为2%)及其与耐药性的显着关联有了新的见解。我们的方法是可推广的,因此适用于任何细菌和病毒的WGS数据。可用性:所有与脱密器相关的代码可在https://github.com/BIRDSgroup/Demixer.Supplementary获取信息:Suppl信息PDF文件(包含Suppl方法/算法/表/图),其他Suppl数据文件可在此链接获取:https://drive.google.com/drive/folders/1P_OX_MbZ6QFN9Amyl2eGMBr1ySY6yNWu?usp = drive_link。Suppl数据、代码和vcf文件(体外、合成和现实世界数据集)也已在Zenodo存档(doi: 10.5281/ Zenodo .15074330)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
AutoGERN: Single-Cell RNA-Seq Gene Regulatory Network Inference via Explicit Link Modeling and Adaptive Architectures. Enzyme Association for Environmental Biotransformation Reactions Through Contrastive Learning of Reaction Center-Specific Fingerprints. Pretraining Improves Prediction of Genomic Datasets Across Species. FFC: A Scalable FASTA Compressor. Diagnosing scientific replicability through probabilistic distinguishability.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1