Guillermo Rangel-Pineros, Andrew Millard, Slawomir Michniewski, David Scanlan, Kimmo Sirén, Alejandro Reyes, Bent Petersen, Martha R J Clokie, Thomas Sicheritz-Pontén
{"title":"从树到云:PhageClouds 用于快速比较 ∼640,000 个噬菌体基因组序列,以及使用基因组网络图进行以宿主为中心的可视化。","authors":"Guillermo Rangel-Pineros, Andrew Millard, Slawomir Michniewski, David Scanlan, Kimmo Sirén, Alejandro Reyes, Bent Petersen, Martha R J Clokie, Thomas Sicheritz-Pontén","doi":"10.1089/phage.2021.0008","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Background:</i></b> Fast and computationally efficient strategies are required to explore genomic relationships within an increasingly large and diverse phage sequence space. Here, we present PhageClouds, a novel approach using a graph database of phage genomic sequences and their intergenomic distances to explore the phage genomic sequence space. <b><i>Methods:</i></b> A total of 640,000 phage genomic sequences were retrieved from a variety of databases and public virome assemblies. Intergenomic distances were calculated with dashing, an alignment-free method suitable for handling massive data sets. These data were used to build a Neo4j<sup>®</sup> graph database. <b><i>Results:</i></b> PhageClouds supported the search of related phages among all complete phage genomes from GenBank for a single query phage in just 10 s. Moreover, PhageClouds expanded the number of closely related phage sequences detected for both finished and draft phage genomes, in comparison with searches exclusively targeting phage entries from GenBank. <b><i>Conclusions:</i></b> PhageClouds is a novel resource that will facilitate the analysis of phage genomic sequences and the characterization of assembled phage genomes.</p>","PeriodicalId":74428,"journal":{"name":"PHAGE (New Rochelle, N.Y.)","volume":" ","pages":"194-203"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7d/81/phage.2021.0008.PMC9041511.pdf","citationCount":"0","resultStr":"{\"title\":\"From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs.\",\"authors\":\"Guillermo Rangel-Pineros, Andrew Millard, Slawomir Michniewski, David Scanlan, Kimmo Sirén, Alejandro Reyes, Bent Petersen, Martha R J Clokie, Thomas Sicheritz-Pontén\",\"doi\":\"10.1089/phage.2021.0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b><i>Background:</i></b> Fast and computationally efficient strategies are required to explore genomic relationships within an increasingly large and diverse phage sequence space. Here, we present PhageClouds, a novel approach using a graph database of phage genomic sequences and their intergenomic distances to explore the phage genomic sequence space. <b><i>Methods:</i></b> A total of 640,000 phage genomic sequences were retrieved from a variety of databases and public virome assemblies. Intergenomic distances were calculated with dashing, an alignment-free method suitable for handling massive data sets. These data were used to build a Neo4j<sup>®</sup> graph database. <b><i>Results:</i></b> PhageClouds supported the search of related phages among all complete phage genomes from GenBank for a single query phage in just 10 s. Moreover, PhageClouds expanded the number of closely related phage sequences detected for both finished and draft phage genomes, in comparison with searches exclusively targeting phage entries from GenBank. <b><i>Conclusions:</i></b> PhageClouds is a novel resource that will facilitate the analysis of phage genomic sequences and the characterization of assembled phage genomes.</p>\",\"PeriodicalId\":74428,\"journal\":{\"name\":\"PHAGE (New Rochelle, N.Y.)\",\"volume\":\" \",\"pages\":\"194-203\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7d/81/phage.2021.0008.PMC9041511.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PHAGE (New Rochelle, N.Y.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1089/phage.2021.0008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/12/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PHAGE (New Rochelle, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1089/phage.2021.0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/12/16 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs.
Background: Fast and computationally efficient strategies are required to explore genomic relationships within an increasingly large and diverse phage sequence space. Here, we present PhageClouds, a novel approach using a graph database of phage genomic sequences and their intergenomic distances to explore the phage genomic sequence space. Methods: A total of 640,000 phage genomic sequences were retrieved from a variety of databases and public virome assemblies. Intergenomic distances were calculated with dashing, an alignment-free method suitable for handling massive data sets. These data were used to build a Neo4j® graph database. Results: PhageClouds supported the search of related phages among all complete phage genomes from GenBank for a single query phage in just 10 s. Moreover, PhageClouds expanded the number of closely related phage sequences detected for both finished and draft phage genomes, in comparison with searches exclusively targeting phage entries from GenBank. Conclusions: PhageClouds is a novel resource that will facilitate the analysis of phage genomic sequences and the characterization of assembled phage genomes.