Plant, animal and protist genomes often contain endogenous viral elements (EVEs), which correspond to partial and sometimes entire viral genomes that have been captured in the genome of their host organism through a variety of integration mechanisms. While the number of sequenced eukaryotic genomes is rapidly increasing, the annotation and characterization of EVEs remains largely overlooked. EVEs that derive from members of the family Caulimoviridae are widespread across tracheophyte plants, and sometimes they occur in very high copy numbers. However, existing programs for annotating repetitive DNA elements in plant genomes are poor at identifying and then classifying these EVEs. Other than accurately annotating plant genomes, there is intrinsic value in a tool that could identify caulimovirid EVEs as they testify to recent or ancient host-virus interactions and provide valuable insights into virus evolution. In response to this research need, we have developed CAULIFINDER, an automated and sensitive annotation software package. CAULIFINDER consists of two complementary workflows, one to reconstruct, annotate and group caulimovirid EVEs in a given plant genome and the second to classify these genetic elements into officially recognized or tentative genera in the Caulimoviridae. We have benchmarked the CAULIFINDER package using the Vitis vinifera reference genome, which contains a rich assortment of caulimovirid EVEs that have previously been characterized using manual methods. The CAULIFINDER package is distributed in the form of a Docker image.
Background: LINE-1s, Alus and SVAs are the only retrotransposition competent elements in humans. Their mobilization followed by insertional mutagenesis is often linked to disease. Apart from these rare integration events, accumulation of retrotransposition intermediates in the cytoplasm is potentially pathogenic due to induction of inflammatory response pathways. Although the retrotransposition of LINE-1 and Alu retroelements has been studied in considerable detail, there are mixed observations about the localization of their RNAs.
Results: We undertook a comprehensive and unbiased approach to analyze retroelement RNA localization using common cell lines and publicly available datasets containing RNA-sequencing data from subcellular fractions. Using our customized analytic pipeline, we compared localization patterns of RNAs transcribed from retroelements and single-copy protein coding genes. Our results demonstrate a generalized characteristic pattern of retroelement RNA nuclear localization that is conserved across retroelement classes as well as evolutionarily young and ancient elements. Preferential nuclear enrichment of retroelement transcripts was consistently observed in cell lines, in vivo and across species. Moreover, retroelement RNA localization patterns were dynamic and subject to change during development, as seen in zebrafish embryos.
Conclusion: The pronounced nuclear localization of transcripts arising from ancient as well as de novo transcribed retroelements suggests that these transcripts are retained in the nucleus as opposed to being re-imported to the nucleus or degraded in the cytoplasm. This raises the possibility that there is adaptive value associated with this localization pattern to the host, the retroelements or possibly both.
Background: Despite the advent of Chromatin Immunoprecipitation Sequencing (ChIP-seq) having revolutionised our understanding of the mammalian genome's regulatory landscape, many challenges remain. In particular, because of their repetitive nature, the sequencing reads derived from transposable elements (TEs) pose a real bioinformatics challenge, to the point that standard analysis pipelines typically ignore reads whose genomic origin cannot be unambiguously ascertained.
Results: We show that discarding ambiguously mapping reads may lead to a systematic underestimation of the number of reads associated with young TE families/subfamilies. We also provide evidence suggesting that the strategy of randomly permuting the location of the read mappings (or the TEs) that is often used to compute the background for enrichment calculations at TE families/subfamilies can result in both false positive and negative enrichments. To address these problems, we present the Transposable Element Enrichment Estimator (T3E), a tool that makes use of ChIP-seq data to characterise the epigenetic profile of associated TE families/subfamilies. T3E weights the number of read mappings assigned to the individual TE copies of a family/subfamily by the overall number of genomic loci to which the corresponding reads map, and this is done at the single nucleotide level. In addition, T3E computes ChIP-seq enrichment relative to a background estimated based on the distribution of the read mappings in the input control DNA. We demonstrated the capabilities of T3E on 23 different ChIP-seq libraries. T3E identified enrichments that were consistent with previous studies. Furthermore, T3E detected context-specific enrichments that are likely to pinpoint unexplored TE families/subfamilies with individual TE copies that have been frequently exapted as cis-regulatory elements during the evolution of mammalian regulatory networks.
Conclusions: T3E is a novel open-source computational tool (available for use at: https://github.com/michelleapaz/T3E ) that overcomes some of the pitfalls associated with the analysis of ChIP-seq data arising from the repetitive mammalian genome and provides a framework to shed light on the epigenetics of entire TE families/subfamilies.
Background: Transposable elements (TEs) are selfish DNA sequences capable of moving and amplifying at the expense of host cells. Despite this, an increasing number of studies have revealed that TE proteins are important contributors to the emergence of novel host proteins through molecular domestication. We previously described seven transposase-derived domesticated genes from the PIF/Harbinger DNA family of TEs in Drosophila and a co-domestication. All PIF TEs known in plants and animals distinguish themselves from other DNA transposons by the presence of two genes. We hypothesize that there should often be co-domestications of the two genes from the same TE because the transposase (gene 1) has been described to be translocated to the nucleus by the MADF protein (gene 2). To provide support for this model of new gene origination, we investigated available insect species genomes for additional evidence of PIF TE domestication events and explored the co-domestication of the MADF protein from the same TE insertion.
Results: After the extensive insect species genomes exploration of hits to PIF transposases and analyses of their context and evolution, we present evidence of at least six independent PIF transposable elements proteins domestication events in insects: two co-domestications of both transposase and MADF proteins in Anopheles (Diptera), one transposase-only domestication event and one co-domestication in butterflies and moths (Lepidoptera), and two transposases-only domestication events in cockroaches (Blattodea). The predicted nuclear localization signals for many of those proteins and dicistronic transcription in some instances support the functional associations of co-domesticated transposase and MADF proteins.
Conclusions: Our results add to a co-domestication that we previously described in fruit fly genomes and support that new gene origination through domestication of a PIF transposase is frequently accompanied by the co-domestication of a cognate MADF protein in insects, potentially for regulatory functions. We propose a detailed model that predicts that PIF TE protein co-domestication should often occur from the same PIF TE insertion.