Whole genome sequencing has revolutionized biological sciences, and is leading to a paradigm shift in microbiology. As more microbial genomes are sequenced, and more bioinformatics tools are developed, it has become possible to predict the metabolism of an organism from genomic data. In contrast, predicting the pathogenic potential of parasitic microbes and their interactions with their hosts is still a challenge, especially as the definition of pathogenesis itself is still evolving. In this review, we introduce the subsystem-based technology for genome annotation and analysis, and we discuss some subsystem-based tools available in the National Microbial Pathogen Data Resource (NMPDR, http://www.nmpdr.org) and their potential application in comparative genomics and pathogenomics.
The whole genome sequence of most human bacterial pathogens is available and the advent of next-generation sequencing technologies will result in a large number of sequenced isolates per pathogenic species. The study of multiple genome sequences of a given bacterium provides insights into its evolution, pathogenic potential and diversity. The pathogen's pan-genome, defined as the sum of the core genome shared by all sequenced strains and the dispensable genome present only in a subset of the isolates, can be analyzed to assess the size and diversity of the gene repertoire that the species has access to. This information is then used to better inform the reverse vaccinology approach whereby vaccine candidates are identified and prioritized in silico based on genomic data. Bioinformatics integration of genome sequence data with functional genomics results and clinical meta-data is essential to maximize the use of this large amount of information to answer biologically relevant questions.
Legionella pneumophila is the etiological agent of Legionnaires' disease and of the less acute disease Pontiac fever. It is a Gram-negative bacterium present in fresh and artificial water environments that replicates in protozoan hosts and is also found in biofilms. Replication within protozoa is essential for the survival of the bacterium. The last years have seen a giant step forward in the genomics of L. pneumophila. The establishment and publication of the complete genome sequences of three clinical L. pneumophila isolates in 2004 and a fourth in 2007 has paved the way for major breakthroughs in understanding the biology of L. pneumophila in particular and Legionella in general. Sequence analysis identified several specific features of Legionella: (i) an extraordinary genetic diversity among the different isolates and (ii) the presence of an unexpected high number and variety of eukaryotic-like proteins, predicted to be involved in the exploitation of the host cellular processes by mimicking specific eukaryotic functions. In this chapter, we will first discuss the insights gained from genomics by highlighting the characteristic features and common traits of the four L. pneumophila genomes obtained through genome analysis and comparison and then we will focus on the newest results obtained by functional analysis of different eukaryotic-like proteins and describe their involvementin the pathogenicity of L. pneumophila.
Current data from complete eukaryotic genomes indicate that ancestral gene duplications, followed by a mutational process called fractionation, generated profound and orderly changes in gene content. Most of these duplicated genes are removed. At least three hypotheses may explain the exceptional genes retained post-duplication: (1) Gain-of-Function; (2) Subfunctionalization, and (3) Balanced Gene Drive. Each is evaluated as an explanation for gene content data. Subfunctionalization, the most popular explanation, predicts no relationship at all between gene function and post-duplicate retention, and if there were particular sorts of 'subfunctionalizable' genes, these should be over-retained following any sort of duplication. Duplications may be local, segmental or whole genome. Gene content data from three plant genomes, reflecting three independent tetraploidies and many tandem duplications, are not explained by Subfunctionalization. Specifically, genes encoding transcription factors and ribosomal components are significantly over-retained following tetraploidy and under-retained among local duplicates. In addition, transcription factor families in Arabidopsis show a reciprocal relationship when retention is monitored after local duplication versus after tetraploidy; only Balanced Gene Drive predicts reciprocity. Vertebrates also retain genes nonrandomly following tetraploidies, but the data are preliminary. Removing subfunctionalization as the duplicate retention mechanism is of high theoretical importance. It clears the way for 'Mutationist' hypotheses that may help explain baffling adaptations and trends in eukaryotic evolution that have been largely ignored. This essay recognizes the potential evolutionary importance of saltatory chromosomal events that may change gene content - expand gene families - independent of allelic diversity.
Dioecious species are known in plants and, as in many animals, some have distinguishable sex chromosomes. Genetic maps have identified sex-determining regions in several plants, and mapped male-specific Y (MSY) regions of the chromosome in which crossing over and genetic recombination do not occur, allowing sequence divergence between the X and Y. Divergence values of the few X-Y gene pairs so far available show that recombination between different genes of Silene latifolia stopped at different times. Once recombination stops, MSY genome regions are predicted to accumulate repetitive sequences, including transposable elements, resulting in low gene density. This has been documented in papaya but not yet in other plants. Y-linked genes should also accumulate deleterious mutations, eventually being lost as dosage compensation evolves. The few available data suggest that many plant MSY genes are functional, perhaps because genes required for male gametophyte functions degenerate slowly. Detailed studies of sex-linked genes are needed to test for deleterious substitutions in Y genes, and to date the origins of plant sex chromosomes.
It has long been known that organismal complexity is poorly correlated with genome size and that tremendous variation in DNA content exists within many groups of organisms. This diversity has generated considerable interest in: (1) the identity and relative impact of sequences responsible for genome size variation, and (2) the suite of internal mechanisms and external evolutionary forces that collectively are responsible for the observed diversity. Genome size in any given taxon reflects the net effects of multiple mechanisms of DNA expansion and contraction, which by virtue of their complexity and temporal juxtaposition, may be challenging to tease apart into their constituent contributions. Here we review our current understanding of genome size variation in plants and the spectrum of mechanisms thought to be responsible for this variation. We present a synopsis of the insights into the mechanisms and pace of genome size change that are uniquely facilitated by a phylogenetic perspective, particularly among closely related species. We also highlight recent studies in diverse angiosperm groups where comparative genomic approaches have yielded general insights into the myriad mechanisms responsible for much of the observed genome size variation, most prominently the contribution of transposable elements (TEs). Finally, we draw attention to the possibility of divergence in the relative importance of different mechanisms of genome size evolution during cladogenesis.
Plant centromeres are generally composed of tandem arrays of simple repeats that are typical of a particular species, but that evolve rapidly. Centromere specific retroelements are also present. These arrays associate with a centromere specific variant of histone H3 that anchors the site of the kinetochore. Although such DNA arrays are typical of the centromere, the specification of centromere activity has an epigenetic component as shown by the fact that centromeres are formed in the absence of such repeats and that centromeres in dicentric chromosomes regularly undergo inactivation.
Whole genome duplications (WGD) have been a frequent occurrence during the evolution of angiosperms, providing all gene families the opportunity to grow and diversify. Most of this potential growth has not been realized, since each WGD has been followed by massive gene losses. The likelihood of survival of gene duplicates after a WGD has been shown to depend on their function, as is also the case for single gene duplications. These two modes of growth have different functional and evolutionary implications and have had a markedly divergent impact on the evolution of different gene families. Despite duplications, gene losses, and translocations it is still possible in many cases to reconstruct the history of angiosperm genomic segments, sometimes back to the last ancestor of monocots and eudicots. This segmental phylogeny can in turn shed light on the evolution of the genes that form part of those segments. Position-based phylogeny can improve the resolution and correct artifacts created by phylogenies based on gene sequences, although a number of questions need to be resolved for its full potential to be fulf illed.