{"title":"Prophage-DB: A comprehensive database to explore diversity, distribution, and ecology of prophages","authors":"Etan Dieppa-Colón, Cody Martin, Karthik Anantharaman","doi":"10.1101/2024.07.11.603044","DOIUrl":null,"url":null,"abstract":"Background Viruses that infect prokaryotes (phages) constitute the most abundant group of biological agents, playing pivotal roles in microbial systems. They are known to impact microbial community dynamics, microbial ecology, and evolution. Efforts to document the diversity, host range, infection dynamics, and effects of bacteriophage infection on host cell metabolism are extremely underexplored. Phages are classified as virulent or temperate based on their life cycles. Temperate phages adopt the lysogenic mode of infection, where the genome integrates into the host cell genome forming a prophage. Prophages enable viral genome replication without host cell lysis, and often contribute novel and beneficial traits to the host genome. Current phage research predominantly focuses on lytic phages, leaving a significant gap in knowledge regarding prophages, including their biology, diversity, and ecological roles. Results Here we develop and describe Prophage-DB, a database of prophages, their proteins, and associated metadata that will serve as a resource for viral genomics and microbial ecology. To create the database, we identified and characterized prophages from genomes in three of the largest publicly available databases. We applied several state-of-the-art tools in our pipeline to annotate these viruses, cluster and taxonomically classify them, and detect their respective auxiliary metabolic genes. In total, we identify and characterize over 350,000 prophages and 35,000 auxiliary metabolic genes. Our prophage database is highly representative based on statistical results and contains prophages from a diverse set of archaeal and bacterial hosts which show a wide environmental distribution. Conclusion Prophages are particularly overlooked in viral ecology and merit increased attention due to their vital implications for microbiomes and their hosts. Here, we created Prophage-DB to advance our comprehension of prophages in microbiomes through a comprehensive characterization of prophages in publicly available genomes. We propose that Prophage-DB will serve as a valuable resource for advancing phage research, offering insights into viral taxonomy, host relationships, auxiliary metabolic genes, and environmental distribution.","PeriodicalId":9124,"journal":{"name":"bioRxiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.11.603044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background Viruses that infect prokaryotes (phages) constitute the most abundant group of biological agents, playing pivotal roles in microbial systems. They are known to impact microbial community dynamics, microbial ecology, and evolution. Efforts to document the diversity, host range, infection dynamics, and effects of bacteriophage infection on host cell metabolism are extremely underexplored. Phages are classified as virulent or temperate based on their life cycles. Temperate phages adopt the lysogenic mode of infection, where the genome integrates into the host cell genome forming a prophage. Prophages enable viral genome replication without host cell lysis, and often contribute novel and beneficial traits to the host genome. Current phage research predominantly focuses on lytic phages, leaving a significant gap in knowledge regarding prophages, including their biology, diversity, and ecological roles. Results Here we develop and describe Prophage-DB, a database of prophages, their proteins, and associated metadata that will serve as a resource for viral genomics and microbial ecology. To create the database, we identified and characterized prophages from genomes in three of the largest publicly available databases. We applied several state-of-the-art tools in our pipeline to annotate these viruses, cluster and taxonomically classify them, and detect their respective auxiliary metabolic genes. In total, we identify and characterize over 350,000 prophages and 35,000 auxiliary metabolic genes. Our prophage database is highly representative based on statistical results and contains prophages from a diverse set of archaeal and bacterial hosts which show a wide environmental distribution. Conclusion Prophages are particularly overlooked in viral ecology and merit increased attention due to their vital implications for microbiomes and their hosts. Here, we created Prophage-DB to advance our comprehension of prophages in microbiomes through a comprehensive characterization of prophages in publicly available genomes. We propose that Prophage-DB will serve as a valuable resource for advancing phage research, offering insights into viral taxonomy, host relationships, auxiliary metabolic genes, and environmental distribution.