Exploration of the genetic landscape of bacterial dsDNA viruses reveals an ANI gap amid extensive mosaicism.

IF 4.6 2区 生物学 Q1 MICROBIOLOGY mSystems Pub Date : 2025-02-18 Epub Date: 2025-01-29 DOI:10.1128/msystems.01661-24
Wanangwa Ndovie, Jan Havránek, Jade Leconte, Janusz Koszucki, Leonid Chindelevitch, Evelien M Adriaenssens, Rafal J Mostowy
{"title":"Exploration of the genetic landscape of bacterial dsDNA viruses reveals an ANI gap amid extensive mosaicism.","authors":"Wanangwa Ndovie, Jan Havránek, Jade Leconte, Janusz Koszucki, Leonid Chindelevitch, Evelien M Adriaenssens, Rafal J Mostowy","doi":"10.1128/msystems.01661-24","DOIUrl":null,"url":null,"abstract":"<p><p>Average nucleotide identity (ANI) is a widely used metric to estimate genetic relatedness, especially in microbial species delineation. While ANI calculation has been well optimized for bacteria and closely related viral genomes, accurate estimation of ANI below 80%, particularly in large reference data sets, has been challenging due to a lack of accurate and scalable methods. To bridge this gap, we introduce MANIAC, an efficient computational pipeline optimized for estimating ANI and alignment fraction (AF) in viral genomes with divergence around ANI of 70%. Using a rigorous simulation framework, we demonstrate MANIAC's accuracy and scalability compared to existing approaches, even to data sets of hundreds of thousands of viral genomes. Applying MANIAC to a curated data set of complete bacterial dsDNA viruses revealed a multimodal ANI distribution, with a distinct gap around 80%, akin to the bacterial ANI gap (~90%) but shifted, likely due to viral-specific evolutionary processes such as recombination dynamics and mosaicism. We then evaluated ANI and AF as predictors of genus-level taxonomy using a logistic regression model. We found that this model has strong predictive power (PR-AUC = 0.981), but that it works much better for virulent (PR-AUC = 0.997) than temperate (PR-AUC = 0.847) bacterial viruses. This highlights the complexity of taxonomic classification in temperate phages, known for their extensive mosaicism, and cautions against over-reliance on ANI in such cases. MANIAC can be accessed at https://github.com/bioinf-mcb/MANIAC.IMPORTANCEWe introduce a novel computational pipeline called MANIAC, designed to accurately assess average nucleotide identity (ANI) and alignment fraction (AF) between diverse viral genomes, scalable to data sets of over 100k genomes. Using computer simulations and real data analyses, we show that MANIAC could accurately estimate genetic relatedness between pairs of viral genomes of around 60%-70% ANI. We applied MANIAC to investigate the question of ANI discontinuity in bacterial dsDNA viruses, finding evidence for an ANI gap, akin to the one seen in bacteria but around ANI of 80%. We then assessed the ability of ANI and AF to predict taxonomic genus boundaries, finding its strong predictive power in virulent, but not in temperate phages. Our results suggest that bacterial dsDNA viruses may exhibit an ANI threshold (on average around 80%) above which recombination helps maintain population cohesiveness, as previously argued in bacteria.</p>","PeriodicalId":18819,"journal":{"name":"mSystems","volume":" ","pages":"e0166124"},"PeriodicalIF":4.6000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834439/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSystems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msystems.01661-24","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/29 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Average nucleotide identity (ANI) is a widely used metric to estimate genetic relatedness, especially in microbial species delineation. While ANI calculation has been well optimized for bacteria and closely related viral genomes, accurate estimation of ANI below 80%, particularly in large reference data sets, has been challenging due to a lack of accurate and scalable methods. To bridge this gap, we introduce MANIAC, an efficient computational pipeline optimized for estimating ANI and alignment fraction (AF) in viral genomes with divergence around ANI of 70%. Using a rigorous simulation framework, we demonstrate MANIAC's accuracy and scalability compared to existing approaches, even to data sets of hundreds of thousands of viral genomes. Applying MANIAC to a curated data set of complete bacterial dsDNA viruses revealed a multimodal ANI distribution, with a distinct gap around 80%, akin to the bacterial ANI gap (~90%) but shifted, likely due to viral-specific evolutionary processes such as recombination dynamics and mosaicism. We then evaluated ANI and AF as predictors of genus-level taxonomy using a logistic regression model. We found that this model has strong predictive power (PR-AUC = 0.981), but that it works much better for virulent (PR-AUC = 0.997) than temperate (PR-AUC = 0.847) bacterial viruses. This highlights the complexity of taxonomic classification in temperate phages, known for their extensive mosaicism, and cautions against over-reliance on ANI in such cases. MANIAC can be accessed at https://github.com/bioinf-mcb/MANIAC.IMPORTANCEWe introduce a novel computational pipeline called MANIAC, designed to accurately assess average nucleotide identity (ANI) and alignment fraction (AF) between diverse viral genomes, scalable to data sets of over 100k genomes. Using computer simulations and real data analyses, we show that MANIAC could accurately estimate genetic relatedness between pairs of viral genomes of around 60%-70% ANI. We applied MANIAC to investigate the question of ANI discontinuity in bacterial dsDNA viruses, finding evidence for an ANI gap, akin to the one seen in bacteria but around ANI of 80%. We then assessed the ability of ANI and AF to predict taxonomic genus boundaries, finding its strong predictive power in virulent, but not in temperate phages. Our results suggest that bacterial dsDNA viruses may exhibit an ANI threshold (on average around 80%) above which recombination helps maintain population cohesiveness, as previously argued in bacteria.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对细菌dsDNA病毒的遗传景观的探索揭示了在广泛嵌合中存在的ANI差距。
平均核苷酸同一性(ANI)是一种广泛用于估计遗传相关性的度量,特别是在微生物物种描述中。虽然ANI的计算已经对细菌和密切相关的病毒基因组进行了很好的优化,但由于缺乏准确和可扩展的方法,ANI的准确估计在80%以下,特别是在大型参考数据集中,一直具有挑战性。为了弥补这一差距,我们引入了MANIAC,这是一个高效的计算管道,用于估计ANI和病毒基因组的对齐分数(AF), ANI的发散度在70%左右。使用严格的模拟框架,我们证明了MANIAC与现有方法相比的准确性和可扩展性,甚至是数十万个病毒基因组的数据集。将MANIAC应用于完整细菌dsDNA病毒的整理数据集,揭示了ANI的多模态分布,在80%左右有明显的间隙,类似于细菌ANI的间隙(~90%),但发生了变化,可能是由于病毒特异性的进化过程,如重组动力学和镶嵌现象。然后,我们使用逻辑回归模型评估ANI和AF作为属级分类学的预测因子。结果表明,该模型具有较强的预测能力(PR-AUC = 0.981),但对毒性病毒(PR-AUC = 0.997)的预测效果优于对温带病毒(PR-AUC = 0.847)的预测效果。这突出了温带噬菌体(以其广泛的嵌合性而闻名)分类分类的复杂性,并警告在这种情况下不要过度依赖ANI。MANIAC可以访问https://github.com/bioinf-mcb/MANIAC.IMPORTANCEWe,介绍一种称为MANIAC的新型计算管道,旨在准确评估不同病毒基因组之间的平均核苷酸身份(ANI)和比对分数(AF),可扩展到超过10万个基因组的数据集。通过计算机模拟和真实数据分析,我们发现MANIAC可以准确地估计大约60%-70% ANI的病毒基因组对之间的遗传相关性。我们应用MANIAC来研究细菌dsDNA病毒中ANI不连续性的问题,发现ANI间隙的证据,类似于在细菌中看到的,但ANI约为80%。然后,我们评估了ANI和AF预测分类属边界的能力,发现其在毒性噬菌体中具有很强的预测能力,而在温带噬菌体中则没有。我们的研究结果表明,细菌dsDNA病毒可能表现出ANI阈值(平均约为80%),高于此阈值的重组有助于维持群体内聚性,正如之前在细菌中所讨论的那样。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
mSystems
mSystems Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
10.50
自引率
3.10%
发文量
308
审稿时长
13 weeks
期刊介绍: mSystems™ will publish preeminent work that stems from applying technologies for high-throughput analyses to achieve insights into the metabolic and regulatory systems at the scale of both the single cell and microbial communities. The scope of mSystems™ encompasses all important biological and biochemical findings drawn from analyses of large data sets, as well as new computational approaches for deriving these insights. mSystems™ will welcome submissions from researchers who focus on the microbiome, genomics, metagenomics, transcriptomics, metabolomics, proteomics, glycomics, bioinformatics, and computational microbiology. mSystems™ will provide streamlined decisions, while carrying on ASM''s tradition of rigorous peer review.
期刊最新文献
Ecological interactions drive metabolomic diversification in Amazonian Pseudonocardia symbionts. Patterns and drivers of macro- and micro-diversity of mudflat intertidal archaeomes along the Chinese coasts. The respiratory microbiome in pulmonary tuberculosis: a meta-analysis reveals niche-specific microbial and functional signatures. Metabolic imbalance limits fermentation in microbes engineered for high-titer ethanol production. Identifying candidate gut microbiota indicators for Alzheimer's disease through integrated data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1