HiFiBGC:在 PacBio HiFi-read元基因组中改进生物合成基因簇检测的集合方法。

IF 3.5 2区 生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY BMC Genomics Pub Date : 2024-11-16 DOI:10.1186/s12864-024-10950-7
Amit Yadav, Srikrishna Subramanian
{"title":"HiFiBGC:在 PacBio HiFi-read元基因组中改进生物合成基因簇检测的集合方法。","authors":"Amit Yadav, Srikrishna Subramanian","doi":"10.1186/s12864-024-10950-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Microbes produce diverse bioactive natural products with applications in fields such as medicine and agriculture. In their genomes, these natural products are encoded by physically clustered genes known as biosynthetic gene clusters (BGCs). Genome and metagenome sequencing advances have enabled high-throughput identification of BGCs as a promising avenue for natural product discovery. BGC mining from (meta)genomes using in silico tools has allowed access to a vast diversity of potentially novel natural products. However, a fundamental limitation has been the ability to assemble complete BGCs, especially from complex metagenomes. With their fragmented assemblies, short-read technologies struggle to recover complete BGCs, such as the long and repetitive nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS). Recent advances in long-read sequencing, such as the High Fidelity (HiFi) technology from PacBio, have reduced this limitation and can help retrieve both accurate and complete BGCs from metagenomes, warranting improvement in the existing BGC identification approach for better utilization of HiFi data.</p><p><strong>Results: </strong>Here, we present HiFiBGC, a command-line-based workflow to identify BGCs in PacBio HiFi metagenomes. HiFiBGC leverages an ensemble of assemblies from three HiFi-tailored metagenome assemblers and the reads not represented in these assemblies. Based on our analyses of four HiFi metagenomic datasets from four different environments, we show that HiFiBGC identifies, on average, 78% more BGCs than the top-performing single-assembler-based method. This increase is due to HiFiBGC's ensemble assembly approach, which improves recovery by 25%, as well as from the inclusion of mostly fragmented BGCs identified in the unmapped reads.</p><p><strong>Conclusions: </strong>HiFiBGC is a computational workflow for identifying BGCs in long-read HiFi metagenomes, implemented majorly using Python programming language and workflow manager Snakemake. HiFiBGC is available on GitHub at https://github.com/ay-amityadav/HiFiBGC under the MIT license. The code related to the figures and analyses presented in the manuscript is available at https://github.com/ay-amityadav/HiFiBGC_analyses .</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1096"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569603/pdf/","citationCount":"0","resultStr":"{\"title\":\"HiFiBGC: an ensemble approach for improved biosynthetic gene cluster detection in PacBio HiFi-read metagenomes.\",\"authors\":\"Amit Yadav, Srikrishna Subramanian\",\"doi\":\"10.1186/s12864-024-10950-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Microbes produce diverse bioactive natural products with applications in fields such as medicine and agriculture. In their genomes, these natural products are encoded by physically clustered genes known as biosynthetic gene clusters (BGCs). Genome and metagenome sequencing advances have enabled high-throughput identification of BGCs as a promising avenue for natural product discovery. BGC mining from (meta)genomes using in silico tools has allowed access to a vast diversity of potentially novel natural products. However, a fundamental limitation has been the ability to assemble complete BGCs, especially from complex metagenomes. With their fragmented assemblies, short-read technologies struggle to recover complete BGCs, such as the long and repetitive nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS). Recent advances in long-read sequencing, such as the High Fidelity (HiFi) technology from PacBio, have reduced this limitation and can help retrieve both accurate and complete BGCs from metagenomes, warranting improvement in the existing BGC identification approach for better utilization of HiFi data.</p><p><strong>Results: </strong>Here, we present HiFiBGC, a command-line-based workflow to identify BGCs in PacBio HiFi metagenomes. HiFiBGC leverages an ensemble of assemblies from three HiFi-tailored metagenome assemblers and the reads not represented in these assemblies. Based on our analyses of four HiFi metagenomic datasets from four different environments, we show that HiFiBGC identifies, on average, 78% more BGCs than the top-performing single-assembler-based method. This increase is due to HiFiBGC's ensemble assembly approach, which improves recovery by 25%, as well as from the inclusion of mostly fragmented BGCs identified in the unmapped reads.</p><p><strong>Conclusions: </strong>HiFiBGC is a computational workflow for identifying BGCs in long-read HiFi metagenomes, implemented majorly using Python programming language and workflow manager Snakemake. HiFiBGC is available on GitHub at https://github.com/ay-amityadav/HiFiBGC under the MIT license. The code related to the figures and analyses presented in the manuscript is available at https://github.com/ay-amityadav/HiFiBGC_analyses .</p>\",\"PeriodicalId\":9030,\"journal\":{\"name\":\"BMC Genomics\",\"volume\":\"25 1\",\"pages\":\"1096\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569603/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12864-024-10950-7\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-024-10950-7","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:微生物能产生多种具有生物活性的天然产物,可应用于医药和农业等领域。在它们的基因组中,这些天然产物由被称为生物合成基因簇(BGCs)的物理聚类基因编码。基因组和元基因组测序技术的进步使得高通量鉴定 BGCs 成为发现天然产品的一个很有前景的途径。利用硅学工具从(元)基因组中挖掘 BGC,可以获得种类繁多的潜在新型天然产物。然而,一个根本性的限制因素是组装完整 BGC 的能力,尤其是从复杂的元基因组中组装 BGC 的能力。短线程技术的组装比较零散,难以恢复完整的 BGCs,如长且重复的非核糖体肽合成酶(NRPS)和多酮肽合成酶(PKS)。长读数测序技术(如 PacBio 的高保真(HiFi)技术)的最新进展减少了这一限制,有助于从元基因组中检索到准确而完整的 BGC,因此有必要改进现有的 BGC 鉴定方法,以便更好地利用 HiFi 数据:结果:在此,我们介绍了 HiFiBGC,这是一种基于命令行的工作流程,用于识别 PacBio HiFi 元基因组中的 BGC。HiFiBGC利用了来自三个HiFi定制元基因组组装器的组装集合以及这些组装器中未体现的读数。根据我们对来自四种不同环境的四个 HiFi 元基因组数据集的分析,我们发现 HiFiBGC 识别的 BGC 平均比性能最高的基于单一组装器的方法多 78%。这一增长归功于 HiFiBGC 的集合组装方法,该方法将恢复率提高了 25%,同时也归功于纳入了未映射读数中识别出的大部分片段 BGC:HiFiBGC是一种在长读数HiFi元基因组中识别BGC的计算工作流,主要使用Python编程语言和工作流管理器Snakemake实现。HiFiBGC 可在 GitHub https://github.com/ay-amityadav/HiFiBGC 上以 MIT 许可发布。与手稿中的图表和分析相关的代码可在 https://github.com/ay-amityadav/HiFiBGC_analyses 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HiFiBGC: an ensemble approach for improved biosynthetic gene cluster detection in PacBio HiFi-read metagenomes.

Background: Microbes produce diverse bioactive natural products with applications in fields such as medicine and agriculture. In their genomes, these natural products are encoded by physically clustered genes known as biosynthetic gene clusters (BGCs). Genome and metagenome sequencing advances have enabled high-throughput identification of BGCs as a promising avenue for natural product discovery. BGC mining from (meta)genomes using in silico tools has allowed access to a vast diversity of potentially novel natural products. However, a fundamental limitation has been the ability to assemble complete BGCs, especially from complex metagenomes. With their fragmented assemblies, short-read technologies struggle to recover complete BGCs, such as the long and repetitive nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS). Recent advances in long-read sequencing, such as the High Fidelity (HiFi) technology from PacBio, have reduced this limitation and can help retrieve both accurate and complete BGCs from metagenomes, warranting improvement in the existing BGC identification approach for better utilization of HiFi data.

Results: Here, we present HiFiBGC, a command-line-based workflow to identify BGCs in PacBio HiFi metagenomes. HiFiBGC leverages an ensemble of assemblies from three HiFi-tailored metagenome assemblers and the reads not represented in these assemblies. Based on our analyses of four HiFi metagenomic datasets from four different environments, we show that HiFiBGC identifies, on average, 78% more BGCs than the top-performing single-assembler-based method. This increase is due to HiFiBGC's ensemble assembly approach, which improves recovery by 25%, as well as from the inclusion of mostly fragmented BGCs identified in the unmapped reads.

Conclusions: HiFiBGC is a computational workflow for identifying BGCs in long-read HiFi metagenomes, implemented majorly using Python programming language and workflow manager Snakemake. HiFiBGC is available on GitHub at https://github.com/ay-amityadav/HiFiBGC under the MIT license. The code related to the figures and analyses presented in the manuscript is available at https://github.com/ay-amityadav/HiFiBGC_analyses .

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Genomics
BMC Genomics 生物-生物工程与应用微生物
CiteScore
7.40
自引率
4.50%
发文量
769
审稿时长
6.4 months
期刊介绍: BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
期刊最新文献
Evaluation of genomic mating approach based on genetic algorithms for long-term selection in Huaxi cattle. Genome-wide identification, characterization and expression analysis of key gene families in RNA silencing in centipedegrass. Identification of a terpene synthase arsenal using long-read sequencing and genome assembly of Aspergillus wentii. Dissecting the genetic admixture and forensic signatures of ethnolinguistically diverse Chinese populations via a 114-plex NGS InDel panel. Genome-wide analysis of fatty acid desaturase genes in moso bamboo (Phyllostachys edulis) reveal their important roles in abiotic stresses responses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1