Annika Jochheim, Florian A Jochheim, Alexandra Kolodyazhnaya, Étienne Morice, Martin Steinegger, Johannes Söding
{"title":"Strain-resolved de-novo metagenomic assembly of viral genomes and microbial 16S rRNAs.","authors":"Annika Jochheim, Florian A Jochheim, Alexandra Kolodyazhnaya, Étienne Morice, Martin Steinegger, Johannes Söding","doi":"10.1186/s40168-024-01904-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Metagenomics is a powerful approach to study environmental and human-associated microbial communities and, in particular, the role of viruses in shaping them. Viral genomes are challenging to assemble from metagenomic samples due to their genomic diversity caused by high mutation rates. In the standard de Bruijn graph assemblers, this genomic diversity leads to complex k-mer assembly graphs with a plethora of loops and bulges that are challenging to resolve into strains or haplotypes because variants more than the k-mer size apart cannot be phased. In contrast, overlap assemblers can phase variants as long as they are covered by a single read.</p><p><strong>Results: </strong>Here, we present PenguiN, a software for strain resolved assembly of viral DNA and RNA genomes and bacterial 16S rRNA from shotgun metagenomics. Its exhaustive detection of all read overlaps in linear time combined with a Bayesian model to select strain-resolved extensions allow it to assemble longer viral contigs, less fragmented genomes, and more strains than existing assembly tools, on both real and simulated datasets. We show a 3-40-fold increase in complete viral genomes and a 6-fold increase in bacterial 16S rRNA genes.</p><p><strong>Conclusion: </strong>PenguiN is the first overlap-based assembler for viral genome and 16S rRNA assembly from large and complex metagenomic datasets, which we hope will facilitate studying the key roles of viruses in microbial communities. Video Abstract.</p>","PeriodicalId":18447,"journal":{"name":"Microbiome","volume":"12 1","pages":"187"},"PeriodicalIF":13.8000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443906/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbiome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40168-024-01904-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Metagenomics is a powerful approach to study environmental and human-associated microbial communities and, in particular, the role of viruses in shaping them. Viral genomes are challenging to assemble from metagenomic samples due to their genomic diversity caused by high mutation rates. In the standard de Bruijn graph assemblers, this genomic diversity leads to complex k-mer assembly graphs with a plethora of loops and bulges that are challenging to resolve into strains or haplotypes because variants more than the k-mer size apart cannot be phased. In contrast, overlap assemblers can phase variants as long as they are covered by a single read.
Results: Here, we present PenguiN, a software for strain resolved assembly of viral DNA and RNA genomes and bacterial 16S rRNA from shotgun metagenomics. Its exhaustive detection of all read overlaps in linear time combined with a Bayesian model to select strain-resolved extensions allow it to assemble longer viral contigs, less fragmented genomes, and more strains than existing assembly tools, on both real and simulated datasets. We show a 3-40-fold increase in complete viral genomes and a 6-fold increase in bacterial 16S rRNA genes.
Conclusion: PenguiN is the first overlap-based assembler for viral genome and 16S rRNA assembly from large and complex metagenomic datasets, which we hope will facilitate studying the key roles of viruses in microbial communities. Video Abstract.
期刊介绍:
Microbiome is a journal that focuses on studies of microbiomes in humans, animals, plants, and the environment. It covers both natural and manipulated microbiomes, such as those in agriculture. The journal is interested in research that uses meta-omics approaches or novel bioinformatics tools and emphasizes the community/host interaction and structure-function relationship within the microbiome. Studies that go beyond descriptive omics surveys and include experimental or theoretical approaches will be considered for publication. The journal also encourages research that establishes cause and effect relationships and supports proposed microbiome functions. However, studies of individual microbial isolates/species without exploring their impact on the host or the complex microbiome structures and functions will not be considered for publication. Microbiome is indexed in BIOSIS, Current Contents, DOAJ, Embase, MEDLINE, PubMed, PubMed Central, and Science Citations Index Expanded.