Fairy: fast approximate coverage for multi-sample metagenomic binning.

IF 13.8 1区 生物学 Q1 MICROBIOLOGY Microbiome Pub Date : 2024-08-14 DOI:10.1186/s40168-024-01861-6
Jim Shaw, Yun William Yu
{"title":"Fairy: fast approximate coverage for multi-sample metagenomic binning.","authors":"Jim Shaw, Yun William Yu","doi":"10.1186/s40168-024-01861-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.</p><p><strong>Results: </strong>We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be <math><mrow><mo>></mo> <mn>250</mn> <mo>×</mo></mrow> </math> faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers <math><mrow><mn>98.5</mn> <mo>%</mo></mrow> </math> of MAGs with <math><mrow><mo>></mo> <mn>50</mn> <mo>%</mo></mrow> </math> completeness and <math><mrow><mo><</mo> <mn>5</mn> <mo>%</mo></mrow> </math> contamination relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( <math><mrow><mo>></mo> <mn>1.5</mn> <mo>×</mo></mrow> </math> more <math><mrow><mo>></mo> <mn>50</mn> <mo>%</mo></mrow> </math> complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multi-sample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy's results are indistinguishable from read alignment.</p><p><strong>Conclusions: </strong>Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a computational bottleneck for metagenomics. Video Abstract.</p>","PeriodicalId":18447,"journal":{"name":"Microbiome","volume":"12 1","pages":"151"},"PeriodicalIF":13.8000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323348/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbiome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40168-024-01861-6","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.

Results: We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be > 250 × faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers 98.5 % of MAGs with > 50 % completeness and < 5 % contamination relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( > 1.5 × more > 50 % complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multi-sample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy's results are indistinguishable from read alignment.

Conclusions: Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a computational bottleneck for metagenomics. Video Abstract.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
仙子:多样本元基因组分选的快速近似覆盖。
背景:元基因组分选,即对属于同一基因组的组装等位基因进行聚类,是恢复元基因组组装基因组(MAG)的关键步骤。利用基因组的一致特征(如读取覆盖模式)将等位基因连接起来。使用来自多个样本的覆盖率可以获得更高质量的 MAGs;然而,标准管道需要对多个样本进行全对全的读数比对来计算覆盖率,这成为了一个关键的计算瓶颈:我们介绍了用于元基因组分选的近似覆盖率计算方法 Fairy ( https://github.com/bluenote-1577/fairy )。Fairy是一种基于k-mer的快速免比对方法。对于多样本分选,fairy 比读取比对快 250 倍以上,而且精确度足以进行分选。Fairy 与宿主和非宿主相关数据集上现有的几种分选方法兼容。在使用 MetaBAT2 时,fairy 能恢复 98.5% 的 MAGs,与 BWA 相比,完整率大于 50%,污染率为 5%。值得注意的是,使用 fairy 进行多样本分选总是优于使用 BWA 进行单样本分选(平均 > 1.5 倍以上 > 50 % 的完整 MAGs),而且速度更快。在一个公共沉积物元基因组项目中,我们证明了多样本分选比单样本分选能恢复出更高质量的阿斯加德古细菌 MAGs,而且仙女的结果与读取比对没有区别:结论:Fairy是一种新工具,可用于近似、快速地计算多样本分选覆盖率,解决了元基因组学的计算瓶颈。视频摘要
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Microbiome
Microbiome MICROBIOLOGY-
CiteScore
21.90
自引率
2.60%
发文量
198
审稿时长
4 weeks
期刊介绍: Microbiome is a journal that focuses on studies of microbiomes in humans, animals, plants, and the environment. It covers both natural and manipulated microbiomes, such as those in agriculture. The journal is interested in research that uses meta-omics approaches or novel bioinformatics tools and emphasizes the community/host interaction and structure-function relationship within the microbiome. Studies that go beyond descriptive omics surveys and include experimental or theoretical approaches will be considered for publication. The journal also encourages research that establishes cause and effect relationships and supports proposed microbiome functions. However, studies of individual microbial isolates/species without exploring their impact on the host or the complex microbiome structures and functions will not be considered for publication. Microbiome is indexed in BIOSIS, Current Contents, DOAJ, Embase, MEDLINE, PubMed, PubMed Central, and Science Citations Index Expanded.
期刊最新文献
Ileal microbial microbiome and its secondary bile acids modulate susceptibility to nonalcoholic steatohepatitis in dairy goats. The links between dietary diversity and RNA virus diversity harbored by the great evening bat (Ia io). From grasslands to genes: exploring the major microbial drivers of antibiotic-resistance in microhabitats under persistent overgrazing. Correction: Parabacteroides distasonis regulates the infectivity and pathogenicity of SVCV at different water temperatures. The intestinal microbiome and Cetobacterium somerae inhibit viral infection through TLR2-type I IFN signaling axis in zebrafish.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1