Hilde Vinje, Trygve Almøy, Kristian Hovde Liland, Lars Snipen
Background: The 16S rRNA is by far the most common genomic marker used for prokaryotic classification, and has been used extensively in metagenomic studies over recent years. Along the 16S gene there are regions with more or less variation across the kingdom of bacteria. Nine variable regions have been identified, flanked by more conserved parts of the sequence. It has been stated that the discriminatory power of the 16S marker lies in these variable regions. In the present study we wanted to examine this more closely, and used a supervised learning method to search systematically for sites that contribute to correct classification at either the phylum or genus level.
Results: When classifying phyla the site selection algorithm located 50 discriminative sites. These were scattered over most of the alignments and only around half of them were located in the variable regions. The selected sites did, however, have an entropy significantly larger than expected, meaning they are sites of large variation. We found that the discriminative sites typically have a large entropy compared to their closest neighbours along the alignments. When classifying genera the site selection algorithm needed around 80% of the sites in the 16S gene before the classification error reached a minimum. This means that all variation, in both variable and conserved regions, is needed in order to separate genera.
Conclusions: Our findings does not support the statement that the discriminative power of the 16S gene is located only in the variable regions. Variable regions are important, but just as many discriminative sites are found in the more conserved parts. The discriminative power is typically found in sites of large variation located inside shorter regions of higher conservation.
{"title":"A systematic search for discriminating sites in the 16S ribosomal RNA gene.","authors":"Hilde Vinje, Trygve Almøy, Kristian Hovde Liland, Lars Snipen","doi":"10.1186/2042-5783-4-2","DOIUrl":"https://doi.org/10.1186/2042-5783-4-2","url":null,"abstract":"<p><strong>Background: </strong>The 16S rRNA is by far the most common genomic marker used for prokaryotic classification, and has been used extensively in metagenomic studies over recent years. Along the 16S gene there are regions with more or less variation across the kingdom of bacteria. Nine variable regions have been identified, flanked by more conserved parts of the sequence. It has been stated that the discriminatory power of the 16S marker lies in these variable regions. In the present study we wanted to examine this more closely, and used a supervised learning method to search systematically for sites that contribute to correct classification at either the phylum or genus level.</p><p><strong>Results: </strong>When classifying phyla the site selection algorithm located 50 discriminative sites. These were scattered over most of the alignments and only around half of them were located in the variable regions. The selected sites did, however, have an entropy significantly larger than expected, meaning they are sites of large variation. We found that the discriminative sites typically have a large entropy compared to their closest neighbours along the alignments. When classifying genera the site selection algorithm needed around 80% of the sites in the 16S gene before the classification error reached a minimum. This means that all variation, in both variable and conserved regions, is needed in order to separate genera.</p><p><strong>Conclusions: </strong>Our findings does not support the statement that the discriminative power of the 16S gene is located only in the variable regions. Variable regions are important, but just as many discriminative sites are found in the more conserved parts. The discriminative power is typically found in sites of large variation located inside shorter regions of higher conservation.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"4 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2014-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-4-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32065816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads.
{"title":"Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions.","authors":"Kerensa McElroy, Torsten Thomas, Fabio Luciani","doi":"10.1186/2042-5783-4-1","DOIUrl":"https://doi.org/10.1186/2042-5783-4-1","url":null,"abstract":"<p><p>Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. </p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"4 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2014-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-4-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32033434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Susan Joseph, Sumyya Hariri, Naqash Masood, Stephen Forsythe
Background: The Cronobacter genus is composed of seven species, and can cause infections in all age groups. Of particular concern is C. sakazakii, as this species is strongly associated with severe and often fatal cases of necrotizing enterocolitis and meningitis in neonates and infants. Whole genome sequencing has revealed that the nanAKT gene cluster required for the utilisation of exogenous sialic acid is unique to the C. sakazakii species (ESA_03609-13).Sialic acid is found in breast milk, infant formula, intestinal mucin, and gangliosides in the brain, hence its metabolism by C. sakazakii is of particular interest. Therefore its metabolism could be an important virulence factor. To date, no laboratory studies demonstrating the growth of C. sakazakii on sialic acid have been published nor have there been reports of sialidase activity. The phylogenetic analysis of the nan genes is of interest to determine whether the genes have been acquired by horizontal gene transfer.
Results: Phylogenetic analysis of 19 Cronobacter strains from 7 recognised species revealed the nanAKTR genes formed a unique cluster, separate from other Enterobacteriaceae such as E. coli K1 and Citrobacter koseri, which are also associated with neonatal meningitis. The gene organisation was similar to Edwardsiella tarda in that nanE gene (N-acetylmannosamine-6-phosphate-2epimerase) was not located within the nanATK cluster. Laboratory studies confirmed that only C. sakazakii, and not the other six Cronobacter species, was able to use sialic acid as a carbon source for growth. Although the ganglioside GM1 was also used as carbon source, no candidate sialidase genes were found in the genome, instead the substrate degradation is probably due to β-galactosidase activity.
Conclusions: Given the relatively recent evolution of both C. sakazakii (15-23 million years ago) and sialic acid synthesis in vertebrates, sialic acid utilization may be an example of co-evolution by one species of the Cronobacter genus with the mammalian host. This has possibly resulted in additional virulence factors contributing to severe life-threatening infections in neonates due to the utilization of sialic acid from breast milk, infant formula, milk (oligosaccharides), mucins lining the intestinal wall, and even gangliosides in the brain after passing through the blood-brain barrier.
{"title":"Sialic acid utilization by Cronobacter sakazakii.","authors":"Susan Joseph, Sumyya Hariri, Naqash Masood, Stephen Forsythe","doi":"10.1186/2042-5783-3-3","DOIUrl":"https://doi.org/10.1186/2042-5783-3-3","url":null,"abstract":"<p><strong>Background: </strong>The Cronobacter genus is composed of seven species, and can cause infections in all age groups. Of particular concern is C. sakazakii, as this species is strongly associated with severe and often fatal cases of necrotizing enterocolitis and meningitis in neonates and infants. Whole genome sequencing has revealed that the nanAKT gene cluster required for the utilisation of exogenous sialic acid is unique to the C. sakazakii species (ESA_03609-13).Sialic acid is found in breast milk, infant formula, intestinal mucin, and gangliosides in the brain, hence its metabolism by C. sakazakii is of particular interest. Therefore its metabolism could be an important virulence factor. To date, no laboratory studies demonstrating the growth of C. sakazakii on sialic acid have been published nor have there been reports of sialidase activity. The phylogenetic analysis of the nan genes is of interest to determine whether the genes have been acquired by horizontal gene transfer.</p><p><strong>Results: </strong>Phylogenetic analysis of 19 Cronobacter strains from 7 recognised species revealed the nanAKTR genes formed a unique cluster, separate from other Enterobacteriaceae such as E. coli K1 and Citrobacter koseri, which are also associated with neonatal meningitis. The gene organisation was similar to Edwardsiella tarda in that nanE gene (N-acetylmannosamine-6-phosphate-2epimerase) was not located within the nanATK cluster. Laboratory studies confirmed that only C. sakazakii, and not the other six Cronobacter species, was able to use sialic acid as a carbon source for growth. Although the ganglioside GM1 was also used as carbon source, no candidate sialidase genes were found in the genome, instead the substrate degradation is probably due to β-galactosidase activity.</p><p><strong>Conclusions: </strong>Given the relatively recent evolution of both C. sakazakii (15-23 million years ago) and sialic acid synthesis in vertebrates, sialic acid utilization may be an example of co-evolution by one species of the Cronobacter genus with the mammalian host. This has possibly resulted in additional virulence factors contributing to severe life-threatening infections in neonates due to the utilization of sialic acid from breast milk, infant formula, milk (oligosaccharides), mucins lining the intestinal wall, and even gangliosides in the brain after passing through the blood-brain barrier.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"3 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2013-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-3-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31454939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High throughput sequencing is now fast and cheap enough to be considered part of the toolbox for investigating bacteria, and there are thousands of bacterial genome sequences available for comparison in the public domain. Bacterial genome analysis is increasingly being performed by diverse groups in research, clinical and public health labs alike, who are interested in a wide array of topics related to bacterial genetics and evolution. Examples include outbreak analysis and the study of pathogenicity and antimicrobial resistance. In this beginner's guide, we aim to provide an entry point for individuals with a biology background who want to perform their own bioinformatics analysis of bacterial genome data, to enable them to answer their own research questions. We assume readers will be familiar with genetics and the basic nature of sequence data, but do not assume any computer programming skills. The main topics covered are assembly, ordering of contigs, annotation, genome comparison and extracting common typing information. Each section includes worked examples using publicly available E. coli data and free software tools, all which can be performed on a desktop computer.
{"title":"Beginner's guide to comparative bacterial genome analysis using next-generation sequence data.","authors":"David J Edwards, Kathryn E Holt","doi":"10.1186/2042-5783-3-2","DOIUrl":"https://doi.org/10.1186/2042-5783-3-2","url":null,"abstract":"<p><p>High throughput sequencing is now fast and cheap enough to be considered part of the toolbox for investigating bacteria, and there are thousands of bacterial genome sequences available for comparison in the public domain. Bacterial genome analysis is increasingly being performed by diverse groups in research, clinical and public health labs alike, who are interested in a wide array of topics related to bacterial genetics and evolution. Examples include outbreak analysis and the study of pathogenicity and antimicrobial resistance. In this beginner's guide, we aim to provide an entry point for individuals with a biology background who want to perform their own bioinformatics analysis of bacterial genome data, to enable them to answer their own research questions. We assume readers will be familiar with genetics and the basic nature of sequence data, but do not assume any computer programming skills. The main topics covered are assembly, ordering of contigs, annotation, genome comparison and extracting common typing information. Each section includes worked examples using publicly available E. coli data and free software tools, all which can be performed on a desktop computer.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"3 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2013-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-3-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31349083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clelia Peano, Alessandro Pietrelli, Clarissa Consolandi, Elio Rossi, Luca Petiti, Letizia Tagliabue, Gianluca De Bellis, Paolo Landini
Unlabelled:
Background: Next generation sequencing (NGS) technologies have revolutionized gene expression studies and functional genomics analysis. However, further improvement of RNA sequencing protocols is still desirable, in order to reduce NGS costs and to increase its accuracy. In bacteria, a major problem in RNA sequencing is the abundance of ribosomal RNA (rRNA), which accounts for 95-98% of total RNA and can therefore hinder sufficient coverage of mRNA, the main focus of transcriptomic studies. Thus, efficient removal of rRNA is necessary to achieve optimal coverage, good detection sensitivity and reliable results. An additional challenge is presented by microorganisms with GC-rich genomes, in which rRNA removal is less efficient.
Results: In this work, we tested two commercial kits for rRNA removal, either alone or in combination, on Burkholderia thailandensis. This bacterium, chosen as representative of the important Burkholderia genus, which includes both pathogenic and environmental bacteria, has a rather large (6.72 Mb) and GC-rich (67.7%) genome. Each enriched mRNA sample was sequenced through paired-end Illumina GAIIx run in duplicate, yielding between 10 and 40 million reads. We show that combined treatment with both kits allows an mRNA enrichment of more than 238-fold, enabling the sequencing of almost all (more than 90%) B. thailandensis transcripts from less than 10 million reads, without introducing any bias in mRNA relative abundance, thus preserving differential expression profile.
Conclusions: The mRNA enrichment protocol presented in this work leads to an increase in detection sensitivity up to 770% compared to total RNA; such increased sensitivity allows for a corresponding reduction in the number of sequencing reads necessary for the complete analysis of whole transcriptome expression profiling. Thus we can conclude that the MICROBExpress/Ovation combined rRNA removal method could be suitable for RNA sequencing of whole transcriptomes of microorganisms with high GC content and complex genomes enabling at the same time an important scaling down of sequencing costs.
{"title":"An efficient rRNA removal method for RNA sequencing in GC-rich bacteria.","authors":"Clelia Peano, Alessandro Pietrelli, Clarissa Consolandi, Elio Rossi, Luca Petiti, Letizia Tagliabue, Gianluca De Bellis, Paolo Landini","doi":"10.1186/2042-5783-3-1","DOIUrl":"https://doi.org/10.1186/2042-5783-3-1","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Next generation sequencing (NGS) technologies have revolutionized gene expression studies and functional genomics analysis. However, further improvement of RNA sequencing protocols is still desirable, in order to reduce NGS costs and to increase its accuracy. In bacteria, a major problem in RNA sequencing is the abundance of ribosomal RNA (rRNA), which accounts for 95-98% of total RNA and can therefore hinder sufficient coverage of mRNA, the main focus of transcriptomic studies. Thus, efficient removal of rRNA is necessary to achieve optimal coverage, good detection sensitivity and reliable results. An additional challenge is presented by microorganisms with GC-rich genomes, in which rRNA removal is less efficient.</p><p><strong>Results: </strong>In this work, we tested two commercial kits for rRNA removal, either alone or in combination, on Burkholderia thailandensis. This bacterium, chosen as representative of the important Burkholderia genus, which includes both pathogenic and environmental bacteria, has a rather large (6.72 Mb) and GC-rich (67.7%) genome. Each enriched mRNA sample was sequenced through paired-end Illumina GAIIx run in duplicate, yielding between 10 and 40 million reads. We show that combined treatment with both kits allows an mRNA enrichment of more than 238-fold, enabling the sequencing of almost all (more than 90%) B. thailandensis transcripts from less than 10 million reads, without introducing any bias in mRNA relative abundance, thus preserving differential expression profile.</p><p><strong>Conclusions: </strong>The mRNA enrichment protocol presented in this work leads to an increase in detection sensitivity up to 770% compared to total RNA; such increased sensitivity allows for a corresponding reduction in the number of sequencing reads necessary for the complete analysis of whole transcriptome expression profiling. Thus we can conclude that the MICROBExpress/Ovation combined rRNA removal method could be suitable for RNA sequencing of whole transcriptomes of microorganisms with high GC content and complex genomes enabling at the same time an important scaling down of sequencing costs.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":" ","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2013-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-3-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40217597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
María Maximina Bertha Moreno-Altamirano, Iris Selene Paredes-González, Clara Espitia, Mauricio Santiago-Maldonado, Rogelio Hernández-Pando, Francisco Javier Sánchez-García
Unlabelled:
Background: M. tuberculosis infection either induces or inhibits host cell death, depending on the bacterial strain and the cell microenvironment. There is evidence suggesting a role for mitochondria in these processes.On the other hand, it has been shown that several bacterial proteins are able to target mitochondria, playing a critical role in bacterial pathogenesis and modulation of cell death. However, mycobacteria-derived proteins able to target host cell mitochondria are less studied.
Results: A bioinformaic analysis based on available genomic sequences of the common laboratory virulent reference strain Mycobacterium tuberculosis H37Rv, the avirulent strain H37Ra, the clinical isolate CDC1551, and M. bovis BCG Pasteur strain 1173P2, as well as of suitable bioinformatic tools (MitoProt II, PSORT II, and SignalP) for the in silico search for proteins likely to be secreted by mycobacteria that could target host cell mitochondria, showed that at least 19 M. tuberculosis proteins could possibly target host cell mitochondria. We experimentally tested this bioinformatic prediction on four M. tuberculosis recombinant proteins chosen from this list of 19 proteins (p27, PE_PGRS1, PE_PGRS33, and MT_1866). Confocal microscopy analyses showed that p27, and PE_PGRS33 proteins colocalize with mitochondria.
Conclusions: Based on the bioinformatic analysis of whole M. tuberculosis genome sequences, we propose that at least 19 out of 4,246 M. tuberculosis predicted proteins would be able to target host cell mitochondria and, in turn, control mitochondrial physiology. Interestingly, such a list of 19 proteins includes five members of a mycobacteria specific family of proteins (PE/PE_PGRS) thought to be virulence factors, and p27, a well known virulence factor. P27, and PE_PGRS33 proteins experimentally showed to target mitochondria in J774 cells. Our results suggest a link between mitochondrial targeting of M. tuberculosis proteins and virulence.
{"title":"Bioinformatic identification of Mycobacterium tuberculosis proteins likely to target host cell mitochondria: virulence factors?","authors":"María Maximina Bertha Moreno-Altamirano, Iris Selene Paredes-González, Clara Espitia, Mauricio Santiago-Maldonado, Rogelio Hernández-Pando, Francisco Javier Sánchez-García","doi":"10.1186/2042-5783-2-9","DOIUrl":"https://doi.org/10.1186/2042-5783-2-9","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>M. tuberculosis infection either induces or inhibits host cell death, depending on the bacterial strain and the cell microenvironment. There is evidence suggesting a role for mitochondria in these processes.On the other hand, it has been shown that several bacterial proteins are able to target mitochondria, playing a critical role in bacterial pathogenesis and modulation of cell death. However, mycobacteria-derived proteins able to target host cell mitochondria are less studied.</p><p><strong>Results: </strong>A bioinformaic analysis based on available genomic sequences of the common laboratory virulent reference strain Mycobacterium tuberculosis H37Rv, the avirulent strain H37Ra, the clinical isolate CDC1551, and M. bovis BCG Pasteur strain 1173P2, as well as of suitable bioinformatic tools (MitoProt II, PSORT II, and SignalP) for the in silico search for proteins likely to be secreted by mycobacteria that could target host cell mitochondria, showed that at least 19 M. tuberculosis proteins could possibly target host cell mitochondria. We experimentally tested this bioinformatic prediction on four M. tuberculosis recombinant proteins chosen from this list of 19 proteins (p27, PE_PGRS1, PE_PGRS33, and MT_1866). Confocal microscopy analyses showed that p27, and PE_PGRS33 proteins colocalize with mitochondria.</p><p><strong>Conclusions: </strong>Based on the bioinformatic analysis of whole M. tuberculosis genome sequences, we propose that at least 19 out of 4,246 M. tuberculosis predicted proteins would be able to target host cell mitochondria and, in turn, control mitochondrial physiology. Interestingly, such a list of 19 proteins includes five members of a mycobacteria specific family of proteins (PE/PE_PGRS) thought to be virulence factors, and p27, a well known virulence factor. P27, and PE_PGRS33 proteins experimentally showed to target mitochondria in J774 cells. Our results suggest a link between mitochondrial targeting of M. tuberculosis proteins and virulence.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2012-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31139555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Snipen, Trudy M Wassenaar, Eric Altermann, Jonathan Olson, Sophia Kathariou, Karin Lagesen, Monica Takamiya, Susanne Knøchel, David W Ussery, Richard J Meinersmann
Unlabelled:
Background: The thermophilic Campylobacter jejuni and Campylobacter coli are considered weakly clonal populations where incongruences between genetic markers are assumed to be due to random horizontal transfer of genomic DNA. In order to investigate the population genetics structure we extracted a set of 1180 core gene families (CGF) from 27 sequenced genomes of C. jejuni and C. coli. We adopted a principal component analysis (PCA) on the normalized evolutionary distances in order to reveal any patterns in the evolutionary signals contained within the various CGFs.
Results: The analysis indicates that the conserved genes in Campylobacter show at least two, possibly five, distinct patterns of evolutionary signals, seen as clusters in the score-space of our PCA. The dominant underlying factor separating the core genes is the ability to distinguish C. jejuni from C. coli. The genes in the clusters outside the main gene group have a strong tendency of being chromosomal neighbors, which is natural if they share a common evolutionary history. Also, the most distinct cluster outside the main group is enriched with genes under positive selection and displays larger than average recombination rates.
Conclusions: The Campylobacter genomes investigated here show that subsets of conserved genes differ from each other in a more systematic way than expected by random horizontal transfer, and is consistent with differences in selection pressure acting on different genes. These findings are indications of a population of bacteria characterized by genomes with a mixture of evolutionary patterns.
{"title":"Analysis of evolutionary patterns of genes in Campylobacter jejuni and C. coli.","authors":"Lars Snipen, Trudy M Wassenaar, Eric Altermann, Jonathan Olson, Sophia Kathariou, Karin Lagesen, Monica Takamiya, Susanne Knøchel, David W Ussery, Richard J Meinersmann","doi":"10.1186/2042-5783-2-8","DOIUrl":"https://doi.org/10.1186/2042-5783-2-8","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>The thermophilic Campylobacter jejuni and Campylobacter coli are considered weakly clonal populations where incongruences between genetic markers are assumed to be due to random horizontal transfer of genomic DNA. In order to investigate the population genetics structure we extracted a set of 1180 core gene families (CGF) from 27 sequenced genomes of C. jejuni and C. coli. We adopted a principal component analysis (PCA) on the normalized evolutionary distances in order to reveal any patterns in the evolutionary signals contained within the various CGFs.</p><p><strong>Results: </strong>The analysis indicates that the conserved genes in Campylobacter show at least two, possibly five, distinct patterns of evolutionary signals, seen as clusters in the score-space of our PCA. The dominant underlying factor separating the core genes is the ability to distinguish C. jejuni from C. coli. The genes in the clusters outside the main gene group have a strong tendency of being chromosomal neighbors, which is natural if they share a common evolutionary history. Also, the most distinct cluster outside the main group is enriched with genes under positive selection and displays larger than average recombination rates.</p><p><strong>Conclusions: </strong>The Campylobacter genomes investigated here show that subsets of conserved genes differ from each other in a more systematic way than expected by random horizontal transfer, and is consistent with differences in selection pressure acting on different genes. These findings are indications of a population of bacteria characterized by genomes with a mixture of evolutionary patterns.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2012-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30866026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The genus Mycobacterium comprises different species, among them the most contagious and infectious bacteria. The members of the complex Mycobacterium tuberculosis are the most virulent microorganisms that have killed human and other mammals since millennia. Additionally, with the many different mycobacterial sequences available, there is a crucial need for the visualization and the simplification of their data. In this present study, we aim to highlight a comparative genome, proteome and phylogeny analysis between twenty-one mycobacterial (Tuberculosis and non tuberculosis) strains using a set of computational and bioinformatics tools (Pan and Core genome plotting, BLAST matrix and phylogeny analysis).
Results: Considerably the result of pan and core genome Plotting demonstrated that less than 1250 Mycobacterium gene families are conserved across all species, and a total set of about 20,000 gene families within the Mycobacterium pan-genome of twenty one mycobacterial genomes.Viewing the BLAST matrix a high similarity was found among the species of the complex Mycobacterium tuberculosis and less conservation is found with other slow growing pathogenic mycobacteria.Phylogeny analysis based on both protein conservation, as well as rRNA clearly resolve known relationships between slow growing mycobacteria.
Conclusion: Mycobacteria include important pathogenic species for human and animals and the Mycobacterium tuberculosis complex is the most cause of death of the humankind. The comparative genome analysis could provide a new insight for better controlling and preventing these diseases.
{"title":"Computational genomics-proteomics and Phylogeny analysis of twenty one mycobacterial genomes (Tuberculosis & non Tuberculosis strains).","authors":"Fathiah Zakham, Othmane Aouane, David Ussery, Abdelaziz Benjouad, Moulay Mustapha Ennaji","doi":"10.1186/2042-5783-2-7","DOIUrl":"https://doi.org/10.1186/2042-5783-2-7","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>The genus Mycobacterium comprises different species, among them the most contagious and infectious bacteria. The members of the complex Mycobacterium tuberculosis are the most virulent microorganisms that have killed human and other mammals since millennia. Additionally, with the many different mycobacterial sequences available, there is a crucial need for the visualization and the simplification of their data. In this present study, we aim to highlight a comparative genome, proteome and phylogeny analysis between twenty-one mycobacterial (Tuberculosis and non tuberculosis) strains using a set of computational and bioinformatics tools (Pan and Core genome plotting, BLAST matrix and phylogeny analysis).</p><p><strong>Results: </strong>Considerably the result of pan and core genome Plotting demonstrated that less than 1250 Mycobacterium gene families are conserved across all species, and a total set of about 20,000 gene families within the Mycobacterium pan-genome of twenty one mycobacterial genomes.Viewing the BLAST matrix a high similarity was found among the species of the complex Mycobacterium tuberculosis and less conservation is found with other slow growing pathogenic mycobacteria.Phylogeny analysis based on both protein conservation, as well as rRNA clearly resolve known relationships between slow growing mycobacteria.</p><p><strong>Conclusion: </strong>Mycobacteria include important pathogenic species for human and animals and the Mycobacterium tuberculosis complex is the most cause of death of the humankind. The comparative genome analysis could provide a new insight for better controlling and preventing these diseases.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2012-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30866006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The efficiencies of the stop codons TAA, TAG, and TGA in protein synthesis termination are not the same. These variations could allow many genes to be regulated. There are many similar nucleotide trimers found on the second and third reading-frames of a gene. They are called premature stop codons (PSC). Like stop codons, the PSC in bacterial genomes are also highly bias in terms of their quantities and qualities on the genes. Phylogenetically related species often share a similar PSC profile. We want to know whether the selective forces that influence the stop codons and the PSC usage biases in a genome are related. We also wish to know how strong these trimers in a genome are related to the natural history of the bacterium. Knowing these relations may provide better knowledge in the phylogeny of bacteria
Results: A 16SrRNA-alignment tree of 19 well-studied α-, β- and γ-Proteobacteria Type species is used as standard reference for bacterial phylogeny. The genomes of sixty-one bacteria, belonging to the α-, β- and γ-Proteobacteria subphyla, are used for this study. The stop codons and PSC are collectively termed "Translation Stop Signals" (TSS). A gene is represented by nine scalars corresponding to the numbers of counts of TAA, TAG, and TGA on each of the three reading-frames of that gene. "Translation Stop Signals Ratio" (TSSR) is the ratio between the TSS counts. Four types of TSSR are investigated. The TSSR-1, TSSR-2 and TSSR-3 are each a 3-scalar series corresponding respectively to the average ratio of TAA: TAG: TGA on the first, second, and third reading-frames of all genes in a genome. The Genomic-TSSR is a 9-scalar series representing the ratio of distribution of all TSS on the three reading-frames of all genes in a genome. Results show that bacteria grouped by their similarities based on TSSR-1, TSSR-2, or TSSR-3 values could only partially resolve the phylogeny of the species. However, grouping bacteria based on thier Genomic-TSSR values resulted in clusters of bacteria identical to those bacterial clusters of the reference tree. Unlike the 16SrRNA method, the Genomic-TSSR tree is also able to separate closely related species/strains at high resolution. Species and strains separated by the Genomic-TSSR grouping method are often in good agreement with those classified by other taxonomic methods. Correspondence analysis of individual genes shows that most genes in a bacterial genome share a similar TSSR value. However, within a chromosome, the Genic-TSSR values of genes near the replication origin region (Ori) are more similar to each other than those genes near the terminus region (Ter).
Conclusion: The translation stop signals on the three reading-frames of the genes on a bacterial genome are interrelated, possibly due to frequent off-frame recombination facilitated by translational-associated recombination (TSR). However, TSR may not occur randomly in a bacte
{"title":"Bacterial phylogenetic tree construction based on genomic translation stop signals.","authors":"Lijing Xu, Jimmy Kuo, Jong-Kang Liu, Tit-Yee Wong","doi":"10.1186/2042-5783-2-6","DOIUrl":"https://doi.org/10.1186/2042-5783-2-6","url":null,"abstract":"<p><strong>Background: </strong>The efficiencies of the stop codons TAA, TAG, and TGA in protein synthesis termination are not the same. These variations could allow many genes to be regulated. There are many similar nucleotide trimers found on the second and third reading-frames of a gene. They are called premature stop codons (PSC). Like stop codons, the PSC in bacterial genomes are also highly bias in terms of their quantities and qualities on the genes. Phylogenetically related species often share a similar PSC profile. We want to know whether the selective forces that influence the stop codons and the PSC usage biases in a genome are related. We also wish to know how strong these trimers in a genome are related to the natural history of the bacterium. Knowing these relations may provide better knowledge in the phylogeny of bacteria</p><p><strong>Results: </strong>A 16SrRNA-alignment tree of 19 well-studied α-, β- and γ-Proteobacteria Type species is used as standard reference for bacterial phylogeny. The genomes of sixty-one bacteria, belonging to the α-, β- and γ-Proteobacteria subphyla, are used for this study. The stop codons and PSC are collectively termed \"Translation Stop Signals\" (TSS). A gene is represented by nine scalars corresponding to the numbers of counts of TAA, TAG, and TGA on each of the three reading-frames of that gene. \"Translation Stop Signals Ratio\" (TSSR) is the ratio between the TSS counts. Four types of TSSR are investigated. The TSSR-1, TSSR-2 and TSSR-3 are each a 3-scalar series corresponding respectively to the average ratio of TAA: TAG: TGA on the first, second, and third reading-frames of all genes in a genome. The Genomic-TSSR is a 9-scalar series representing the ratio of distribution of all TSS on the three reading-frames of all genes in a genome. Results show that bacteria grouped by their similarities based on TSSR-1, TSSR-2, or TSSR-3 values could only partially resolve the phylogeny of the species. However, grouping bacteria based on thier Genomic-TSSR values resulted in clusters of bacteria identical to those bacterial clusters of the reference tree. Unlike the 16SrRNA method, the Genomic-TSSR tree is also able to separate closely related species/strains at high resolution. Species and strains separated by the Genomic-TSSR grouping method are often in good agreement with those classified by other taxonomic methods. Correspondence analysis of individual genes shows that most genes in a bacterial genome share a similar TSSR value. However, within a chromosome, the Genic-TSSR values of genes near the replication origin region (Ori) are more similar to each other than those genes near the terminus region (Ter).</p><p><strong>Conclusion: </strong>The translation stop signals on the three reading-frames of the genes on a bacterial genome are interrelated, possibly due to frequent off-frame recombination facilitated by translational-associated recombination (TSR). However, TSR may not occur randomly in a bacte","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 1","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2012-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30658207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camilla Sekse, Jon Bohlin, Eystein Skjerve, Gerd E Vegarud
Background: We wanted to compare growth differences between 13 Escherichia coli strains exposed to various concentrations of the growth inhibitor lactoferrin in two different types of broth (Syncase and Luria-Bertani (LB)). To carry this out, we present a simple statistical procedure that separates microbial growth curves that are due to natural random perturbations and growth curves that are more likely caused by biological differences.Bacterial growth was determined using optical density data (OD) recorded for triplicates at 620 nm for 18 hours for each strain. Each resulting growth curve was divided into three equally spaced intervals. We propose a procedure using linear spline regression with two knots to compute the slopes of each interval in the bacterial growth curves. These slopes are subsequently used to estimate a 95% confidence interval based on an appropriate statistical distribution. Slopes outside the confidence interval were considered as significantly different from slopes within. We also demonstrate the use of related, but more advanced methods known collectively as generalized additive models (GAMs) to model growth. In addition to impressive curve fitting capabilities with corresponding confidence intervals, GAM's allow for the computation of derivatives, i.e. growth rate estimation, with respect to each time point.
Results: The results from our proposed procedure agreed well with the observed data. The results indicated that there were substantial growth differences between the E. coli strains. Most strains exhibited improved growth in the nutrient rich LB broth compared to Syncase. The inhibiting effect of lactoferrin varied between the different strains. The atypical enteropathogenic aEPEC-2 grew, on average, faster in both broths than the other strains tested while the enteroinvasive strains, EIEC-6 and EIEC-7 grew slower. The enterotoxigenic ETEC-5 strain, exhibited exceptional growth in Syncase broth, but slower growth in LB broth.
Conclusions: Our results do not indicate clear growth differences between pathogroups or pathogenic versus non-pathogenic E. coli.
{"title":"Growth comparison of several Escherichia coli strains exposed to various concentrations of lactoferrin using linear spline regression.","authors":"Camilla Sekse, Jon Bohlin, Eystein Skjerve, Gerd E Vegarud","doi":"10.1186/2042-5783-2-5","DOIUrl":"https://doi.org/10.1186/2042-5783-2-5","url":null,"abstract":"<p><strong>Background: </strong>We wanted to compare growth differences between 13 Escherichia coli strains exposed to various concentrations of the growth inhibitor lactoferrin in two different types of broth (Syncase and Luria-Bertani (LB)). To carry this out, we present a simple statistical procedure that separates microbial growth curves that are due to natural random perturbations and growth curves that are more likely caused by biological differences.Bacterial growth was determined using optical density data (OD) recorded for triplicates at 620 nm for 18 hours for each strain. Each resulting growth curve was divided into three equally spaced intervals. We propose a procedure using linear spline regression with two knots to compute the slopes of each interval in the bacterial growth curves. These slopes are subsequently used to estimate a 95% confidence interval based on an appropriate statistical distribution. Slopes outside the confidence interval were considered as significantly different from slopes within. We also demonstrate the use of related, but more advanced methods known collectively as generalized additive models (GAMs) to model growth. In addition to impressive curve fitting capabilities with corresponding confidence intervals, GAM's allow for the computation of derivatives, i.e. growth rate estimation, with respect to each time point.</p><p><strong>Results: </strong>The results from our proposed procedure agreed well with the observed data. The results indicated that there were substantial growth differences between the E. coli strains. Most strains exhibited improved growth in the nutrient rich LB broth compared to Syncase. The inhibiting effect of lactoferrin varied between the different strains. The atypical enteropathogenic aEPEC-2 grew, on average, faster in both broths than the other strains tested while the enteroinvasive strains, EIEC-6 and EIEC-7 grew slower. The enterotoxigenic ETEC-5 strain, exhibited exceptional growth in Syncase broth, but slower growth in LB broth.</p><p><strong>Conclusions: </strong>Our results do not indicate clear growth differences between pathogroups or pathogenic versus non-pathogenic E. coli.</p>","PeriodicalId":18538,"journal":{"name":"Microbial Informatics and Experimentation","volume":"2 ","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2042-5783-2-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30620586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}