Ulisses Rocha, Jonas Coelho Kasmanas, Rodolfo Toscan, Danilo S Sanches, Stefania Magnusdottir, Joao Pedro Saraiva
{"title":"对 69 个微生物群落的模拟表明,测序深度和假阳性是原核生物元基因组组装基因组恢复偏差的主要驱动因素。","authors":"Ulisses Rocha, Jonas Coelho Kasmanas, Rodolfo Toscan, Danilo S Sanches, Stefania Magnusdottir, Joao Pedro Saraiva","doi":"10.1371/journal.pcbi.1012530","DOIUrl":null,"url":null,"abstract":"<p><p>We hypothesize that sample species abundance, sequencing depth, and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample species abundance, sequencing depth, and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, while the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates that the scientific community should curate their findings from MAG recovery, especially when asserting novel species or metabolic traits.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11530072/pdf/","citationCount":"0","resultStr":"{\"title\":\"Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery.\",\"authors\":\"Ulisses Rocha, Jonas Coelho Kasmanas, Rodolfo Toscan, Danilo S Sanches, Stefania Magnusdottir, Joao Pedro Saraiva\",\"doi\":\"10.1371/journal.pcbi.1012530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We hypothesize that sample species abundance, sequencing depth, and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample species abundance, sequencing depth, and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, while the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates that the scientific community should curate their findings from MAG recovery, especially when asserting novel species or metabolic traits.</p>\",\"PeriodicalId\":20241,\"journal\":{\"name\":\"PLoS Computational Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11530072/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pcbi.1012530\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1012530","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery.
We hypothesize that sample species abundance, sequencing depth, and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample species abundance, sequencing depth, and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, while the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates that the scientific community should curate their findings from MAG recovery, especially when asserting novel species or metabolic traits.
期刊介绍:
PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery.
Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines.
Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights.
Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology.
Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.