Plant, animal and protist genomes often contain endogenous viral elements (EVEs), which correspond to partial and sometimes entire viral genomes that have been captured in the genome of their host organism through a variety of integration mechanisms. While the number of sequenced eukaryotic genomes is rapidly increasing, the annotation and characterization of EVEs remains largely overlooked. EVEs that derive from members of the family Caulimoviridae are widespread across tracheophyte plants, and sometimes they occur in very high copy numbers. However, existing programs for annotating repetitive DNA elements in plant genomes are poor at identifying and then classifying these EVEs. Other than accurately annotating plant genomes, there is intrinsic value in a tool that could identify caulimovirid EVEs as they testify to recent or ancient host-virus interactions and provide valuable insights into virus evolution. In response to this research need, we have developed CAULIFINDER, an automated and sensitive annotation software package. CAULIFINDER consists of two complementary workflows, one to reconstruct, annotate and group caulimovirid EVEs in a given plant genome and the second to classify these genetic elements into officially recognized or tentative genera in the Caulimoviridae. We have benchmarked the CAULIFINDER package using the Vitis vinifera reference genome, which contains a rich assortment of caulimovirid EVEs that have previously been characterized using manual methods. The CAULIFINDER package is distributed in the form of a Docker image.
Background: LINE-1s, Alus and SVAs are the only retrotransposition competent elements in humans. Their mobilization followed by insertional mutagenesis is often linked to disease. Apart from these rare integration events, accumulation of retrotransposition intermediates in the cytoplasm is potentially pathogenic due to induction of inflammatory response pathways. Although the retrotransposition of LINE-1 and Alu retroelements has been studied in considerable detail, there are mixed observations about the localization of their RNAs.
Results: We undertook a comprehensive and unbiased approach to analyze retroelement RNA localization using common cell lines and publicly available datasets containing RNA-sequencing data from subcellular fractions. Using our customized analytic pipeline, we compared localization patterns of RNAs transcribed from retroelements and single-copy protein coding genes. Our results demonstrate a generalized characteristic pattern of retroelement RNA nuclear localization that is conserved across retroelement classes as well as evolutionarily young and ancient elements. Preferential nuclear enrichment of retroelement transcripts was consistently observed in cell lines, in vivo and across species. Moreover, retroelement RNA localization patterns were dynamic and subject to change during development, as seen in zebrafish embryos.
Conclusion: The pronounced nuclear localization of transcripts arising from ancient as well as de novo transcribed retroelements suggests that these transcripts are retained in the nucleus as opposed to being re-imported to the nucleus or degraded in the cytoplasm. This raises the possibility that there is adaptive value associated with this localization pattern to the host, the retroelements or possibly both.
Background: Despite the advent of Chromatin Immunoprecipitation Sequencing (ChIP-seq) having revolutionised our understanding of the mammalian genome's regulatory landscape, many challenges remain. In particular, because of their repetitive nature, the sequencing reads derived from transposable elements (TEs) pose a real bioinformatics challenge, to the point that standard analysis pipelines typically ignore reads whose genomic origin cannot be unambiguously ascertained.
Results: We show that discarding ambiguously mapping reads may lead to a systematic underestimation of the number of reads associated with young TE families/subfamilies. We also provide evidence suggesting that the strategy of randomly permuting the location of the read mappings (or the TEs) that is often used to compute the background for enrichment calculations at TE families/subfamilies can result in both false positive and negative enrichments. To address these problems, we present the Transposable Element Enrichment Estimator (T3E), a tool that makes use of ChIP-seq data to characterise the epigenetic profile of associated TE families/subfamilies. T3E weights the number of read mappings assigned to the individual TE copies of a family/subfamily by the overall number of genomic loci to which the corresponding reads map, and this is done at the single nucleotide level. In addition, T3E computes ChIP-seq enrichment relative to a background estimated based on the distribution of the read mappings in the input control DNA. We demonstrated the capabilities of T3E on 23 different ChIP-seq libraries. T3E identified enrichments that were consistent with previous studies. Furthermore, T3E detected context-specific enrichments that are likely to pinpoint unexplored TE families/subfamilies with individual TE copies that have been frequently exapted as cis-regulatory elements during the evolution of mammalian regulatory networks.
Conclusions: T3E is a novel open-source computational tool (available for use at: https://github.com/michelleapaz/T3E ) that overcomes some of the pitfalls associated with the analysis of ChIP-seq data arising from the repetitive mammalian genome and provides a framework to shed light on the epigenetics of entire TE families/subfamilies.
Background: Transposable elements (TEs) are selfish DNA sequences capable of moving and amplifying at the expense of host cells. Despite this, an increasing number of studies have revealed that TE proteins are important contributors to the emergence of novel host proteins through molecular domestication. We previously described seven transposase-derived domesticated genes from the PIF/Harbinger DNA family of TEs in Drosophila and a co-domestication. All PIF TEs known in plants and animals distinguish themselves from other DNA transposons by the presence of two genes. We hypothesize that there should often be co-domestications of the two genes from the same TE because the transposase (gene 1) has been described to be translocated to the nucleus by the MADF protein (gene 2). To provide support for this model of new gene origination, we investigated available insect species genomes for additional evidence of PIF TE domestication events and explored the co-domestication of the MADF protein from the same TE insertion.
Results: After the extensive insect species genomes exploration of hits to PIF transposases and analyses of their context and evolution, we present evidence of at least six independent PIF transposable elements proteins domestication events in insects: two co-domestications of both transposase and MADF proteins in Anopheles (Diptera), one transposase-only domestication event and one co-domestication in butterflies and moths (Lepidoptera), and two transposases-only domestication events in cockroaches (Blattodea). The predicted nuclear localization signals for many of those proteins and dicistronic transcription in some instances support the functional associations of co-domesticated transposase and MADF proteins.
Conclusions: Our results add to a co-domestication that we previously described in fruit fly genomes and support that new gene origination through domestication of a PIF transposase is frequently accompanied by the co-domestication of a cognate MADF protein in insects, potentially for regulatory functions. We propose a detailed model that predicts that PIF TE protein co-domestication should often occur from the same PIF TE insertion.
Retrotransposons are genetic elements with the ability to replicate in the genome using reverse transcriptase: they have been associated with the development of different biological structures, such as the Central Nervous System (CNS), and their high mutagenic potential has been linked to various diseases, including cancer and neurological disorders. Throughout evolution and over time, Primates and Homo had to cope with infections from viruses and bacteria, and also with endogenous retroelements. Therefore, host genomes have evolved numerous methods to counteract the activity of endogenous and exogenous pathogens, and the APOBEC3 family of mutators is a prime example of a defensive mechanism in this context.In most Primates, there are seven members of the APOBEC3 family of deaminase proteins: among their functions, there is the ability to inhibit the mobilization of retrotransposons and the functionality of viruses. The evolution of the APOBEC3 proteins found in Primates is correlated with the expansion of two major families of retrotransposons, i.e. ERV and LINE-1.In this review, we will discuss how the rapid expansion of the APOBEC3 family is linked to the evolution of retrotransposons, highlighting the strong evolutionary arms race that characterized the history of APOBEC3s and endogenous retroelements in Primates. Moreover, the possible role of this relationship will be assessed in the context of embryonic development and brain-associated diseases.
Background: Transposable elements are ubiquitous and play a fundamental role in shaping genomes during evolution. Since excessive transposition can be mutagenic, mechanisms exist in the cells to keep these mobile elements under control. Although many cellular factors regulating the mobility of the retrovirus-like transposon Ty1 in Saccharomyces cerevisiae have been identified in genetic screens, only very few of them interact physically with Ty1 integrase (IN).
Results: Here, we perform a proteomic screen to establish Ty1 IN interactome. Among the 265 potential interacting partners, we focus our study on the conserved CK2 kinase. We confirm the interaction between IN and CK2, demonstrate that IN is a substrate of CK2 in vitro and identify the modified residues. We find that Ty1 IN is phosphorylated in vivo and that these modifications are dependent in part on CK2. No significant change in Ty1 retromobility could be observed when we introduce phospho-ablative mutations that prevent IN phosphorylation by CK2 in vitro. However, the absence of CK2 holoenzyme results in a strong stimulation of Ty1 retrotransposition, characterized by an increase in Ty1 mRNA and protein levels and a high accumulation of cDNA.
Conclusion: Our study shows that Ty1 IN is phosphorylated, as observed for retroviral INs and highlights an important role of CK2 in the regulation of Ty1 retrotransposition. In addition, the proteomic approach enabled the identification of many new Ty1 IN interacting partners, whose potential role in the control of Ty1 mobility will be interesting to study.
Background: Krüppel Associated Box-containing Zinc Finger Proteins (KRAB-ZFPs), representing the largest superfamily of transcription factors in mammals, are predicted to primarily target and repress transposable elements (TEs). It is challenging to dissect the distinct functions of these transcription regulators due to their sequence similarity and diversity, and also the complicated repetitiveness of their targeting TE sequences.
Results: Mouse KRAB-Zfps are mainly organized into clusters genomewide. In this study, we revealed that the intra-cluster members had a close evolutionary relationship, and a similar preference for zinc finger (ZnF) usage. KRAB-Zfps were expressed in a cell type- or tissue type specific manner and they tended to be actively transcribed together with other cluster members. Further sequence analyses pointed out the linker sequences in between ZnFs were conserved, and meanwhile had distinct cluster specificity. Based on these unique characteristics of KRAB-Zfp clusters, sgRNAs were designed to edit cluster-specific linkers to abolish the functions of the targeted cluster(s). Using mouse embryonic stem cells (mESC) as a model, we screened and obtained a series of sgRNAs targeting various highly expressed KRAB-Zfp clusters. The effectiveness of sgRNAs were verified in a reporter assay exclusively developed for multi-target sgRNAs and further confirmed by PCR-based analyses. Using mESC cell lines inducibly expressing Cas9 and these sgRNAs, we found that editing different KRAB-Zfp clusters resulted in the transcriptional changes of distinct categories of TEs.
Conclusions: Collectively, the intrinsic sequence correlations of intra-cluster KRAB-Zfp members discovered in this study suggest that the conserved cluster specific linkers played crucial roles in diversifying the tandem ZnF array and the related target specificity of KRAB-Zfps during clusters' evolution. On this basis, an effective CRISPR-Cas9 based approach against the linker sequences is developed and verified for rapidly editing KRAB-Zfp clusters to identify the regulatory correlation between the cluster members and their potential TE targets.