Background: Retrotransposons have been implicated as causes of Mendelian disease, but their role in autism spectrum disorder (ASD) has not been systematically defined, because they are only called with adequate sensitivity from whole genome sequencing (WGS) data and a large enough cohort for this analysis has only recently become available.
Results: We analyzed WGS data from a cohort of 2288 ASD families from the Simons Simplex Collection by establishing a scalable computational pipeline for retrotransposon insertion detection. We report 86,154 polymorphic retrotransposon insertions-including > 60% not previously reported-and 158 de novo retrotransposition events. The overall burden of de novo events was similar between ASD individuals and unaffected siblings, with 1 de novo insertion per 29, 117, and 206 births for Alu, L1, and SVA respectively, and 1 de novo insertion per 21 births total. However, ASD cases showed more de novo L1 insertions than expected in ASD genes. Additionally, we observed exonic insertions in loss-of-function intolerant genes, including a likely pathogenic exonic insertion in CSDE1, only in ASD individuals.
Conclusions: These findings suggest a modest, but important, impact of intronic and exonic retrotransposon insertions in ASD, show the importance of WGS for their analysis, and highlight the utility of specific bioinformatic tools for high-throughput detection of retrotransposon insertions.
Background: The majority of structural variation in genomes is caused by insertions of transposable elements (TEs). In mammalian genomes, the main TE fraction is made up of autonomous and non-autonomous non-LTR retrotransposons commonly known as LINEs and SINEs (Long and Short Interspersed Nuclear Elements). Here we present one of the first population-level analysis of TE insertions in a non-model organism, the giraffe. Giraffes are ruminant artiodactyls, one of the few mammalian groups with genomes that are colonized by putatively active LINEs of two different clades of non-LTR retrotransposons, namely the LINE1 and RTE/BovB LINEs as well as their associated SINEs. We analyzed TE insertions of both types, and their associated SINEs in three giraffe genome assemblies, as well as across a population level sampling of 48 individuals covering all extant giraffe species.
Results: The comparative genome screen identified 139,525 recent LINE1 and RTE insertions in the sampled giraffe population. The analysis revealed a drastically reduced RTE activity in giraffes, whereas LINE1 is still actively propagating in the genomes of extant (sub)-species. In concert with the extremely low activity of the giraffe RTE, we also found that RTE-dependent SINEs, namely Bov-tA and Bov-A2, have been virtually immobile in the last 2 million years. Despite the high current activity of the giraffe LINE1, we did not find evidence for the presence of currently active LINE1-dependent SINEs. TE insertion heterozygosity rates differ among the different (sub)-species, likely due to divergent population histories.
Conclusions: The horizontally transferred RTE/BovB and its derived SINEs appear to be close to inactivation and subsequent extinction in the genomes of extant giraffe species. This is the first time that the decline of a TE family has been meticulously analyzed from a population genetics perspective. Our study shows how detailed information about past and present TE activity can be obtained by analyzing large-scale population-level genomic data sets.
Background: LINE-1 (Long Interspersed Nuclear Elements, L1) retrotransposons are the only autonomously active transposable elements in the human genome. The evolution of L1 retrotransposition rates and its implications for L1 dynamics are poorly understood. Retrotransposition rates are commonly measured in cell culture-based assays, but it is unclear how well these measurements provide insight into L1 population dynamics. This study applied comparative methods to estimate parameters for the evolution of retrotransposition rates, and infer L1 dynamics from these estimates.
Results: Our results show that the rates at which new L1s emerge in the human population correlate positively to cell-culture based retrotransposition activities, that there is an evolutionary trend towards lower retrotransposition activity, and that this evolutionary trend is not sufficient to counter-balance the increase in active L1s resulting from continuing retrotransposition.
Conclusions: Together, these findings support a model of the population-level L1 retrotransposition dynamics that is consistent with prior expectations and indicate the remaining gaps in the understanding of L1 dynamics in human genomes.
Background: Tn5253, a composite Integrative Conjugative Element (ICE) of Streptococcus pneumoniae carrying tet(M) and cat resistance determinants, was found to (i) integrate at specific 83-bp integration site (attB), (ii) produce circular forms joined by a 84-bp sequence (attTn), and (iii) restore the chromosomal integration site. The purpose of this study is to functionally characterize the attB in S. pneumoniae strains with different genetic backgrounds and in other bacterial species, and to investigate the presence of Tn5253 attB site into bacterial genomes.
Results: Analysis of representative Tn5253-carryng transconjugants obtained in S. pneumoniae strains with different genetic backgrounds and in other bacterial species, namely Streptococcus agalactiae, Streptococcus gordonii, Streptococcus pyogenes, and Enterococcus faecalis showed that: (i) Tn5253 integrates in rbgA of S. pneumoniae and in orthologous rbgA genes of other bacterial species, (ii) integration occurs always downstream of a 11-bp sequence conserved among streptococcal and enterococcal hosts, (iii) length of the attB site corresponds to length of the duplication after Tn5253 integration, (iv) attB duplication restores rbgA CDS, (v) Tn5253 produced circular forms containing the attTn site at a concentration ranging between 2.0 × 10-5 to 1.2 × 10-2 copies per chromosome depending on bacterial species and strain, (vi) reconstitution of attB sites occurred at 3.7 × 10-5 to 1.7 × 10-2 copies per chromosome. A database search of complete microbial genomes using Tn5253 attB as a probe showed that (i) thirteen attB variants were present in the 85 complete pneumococcal genomes, (ii) in 75 pneumococcal genomes (88.3 %), the attB site was 83 or 84 nucleotides in length, while in 10 (11.7 %) it was 41 nucleotides, (iii) in other 19 bacterial species attB was located in orthologous rbgA genes and its size ranged between 17 and 84 nucleotides, (iv) the 11-bp sequence, which correspond to the last 11 nucleotides of attB sites, is conserved among the different bacterial species and can be considered the core of the Tn5253 integration site.
Conclusions: A functional characterization of the Tn5253 attB integration site combined with genome analysis contributed to elucidating the potential of Tn5253 horizontal gene transfer among different bacterial species.
Background: With the expansion of high throughput sequencing, we now have access to a larger number of genome-wide studies analyzing the Transposable elements (TEs) composition in a wide variety of organisms. However, genomic analyses often remain too limited in number and diversity of species investigated to study in depth the dynamics and evolutionary success of the different types of TEs among metazoans. Therefore, we chose to investigate the use of transcriptomes to describe the diversity of TEs in phylogenetically related species by conducting the first comparative analysis of TEs in two groups of polychaetes and evaluate the diversity of TEs that might impact genomic evolution as a result of their mobility.
Results: We present a detailed analysis of TEs distribution in transcriptomes extracted from 15 polychaetes depending on the number of reads used during assembly, and also compare these results with additional TE scans on associated low-coverage genomes. We then characterized the clades defined by 1021 LTR-retrotransposon families identified in 26 species. Clade richness was highly dependent on the considered superfamily. Copia elements appear rare and are equally distributed in only three clades, GalEa, Hydra and CoMol. Among the eight BEL/Pao clades identified in annelids, two small clades within the Sailor lineage are new for science. We characterized 17 Gypsy clades of which only 4 are new; the C-clade largely dominates with a quarter of the families. Finally, all species also expressed for the majority two distinct transcripts encoding PIWI proteins, known to be involved in control of TEs mobilities.
Conclusions: This study shows that the use of transcriptomes assembled from 40 million reads was sufficient to access to the diversity and proportion of the transposable elements compared to those obtained by low coverage sequencing. Among LTR-retrotransposons Gypsy elements were unequivocally dominant but results suggest that the number of Gypsy clades, although high, may be more limited than previously thought in metazoans. For BEL/Pao elements, the organization of clades within the Sailor lineage appears more difficult to establish clearly. The Copia elements remain rare and result from the evolutionary consistent success of the same three clades.
The submandibular gland (SG) is a relatively simple organ formed by three cell types: acinar, myoepithelial, and an intricate network of duct-forming epithelial cells, that together fulfills several physiological functions from assisting food digestion to acting as an immune barrier against pathogens. Successful SG organogenesis is the product of highly controlled and orchestrated genetic and transcriptional programs. Mounting evidence links Transposable Elements (TEs), originally thought to be selfish genetic elements, to different aspects of gene regulation in mammalian development and disease. To our knowledge, the role of TEs during murine SG organogenesis has not been studied. Using novel bioinformatic tools and publicly available RNA-Seq datasets, our results indicate that a significant number of genic and intergenic TEs are differentially expressed during the SG development. Furthermore, changes in expression of specific TEs correlated with that of genes involved in cellular division and differentiation, critical aspects for SG maturation. Altogether, we propose that TEs modulate gene networks that operate during SG development.
Transposable elements (TEs) significantly contribute to shaping the diversity of the human genome, and lines of evidence suggest TEs as one of driving forces of human brain evolution. Existing computational approaches, including cross-species comparative genomics and population genetic modeling, can be adapted for the study of the role of TEs in evolution. In particular, diverse ancient and archaic human genome sequences are increasingly available, allowing reconstruction of past human migration events and holding the promise of identifying and tracking TEs among other evolutionarily important genetic variants at an unprecedented spatiotemporal resolution. However, highly degraded short DNA templates and other unique challenges presented by ancient human DNA call for major changes in current experimental and computational procedures to enable the identification of evolutionarily important TEs. Ancient human genomes are valuable resources for investigating TEs in the evolutionary context, and efforts to explore ancient human genomes will potentially provide a novel perspective on the genetic mechanism of human brain evolution and inspire a variety of technological and methodological advances. In this review, we summarize computational and experimental approaches that can be adapted to identify and validate evolutionarily important TEs, especially for human brain evolution. We also highlight strategies that leverage ancient genomic data and discuss unique challenges in ancient transposon genomics.
Background: The autonomous retroelement Long Interspersed Element-1 (LINE-1) mobilizes though a copy and paste mechanism using an RNA intermediate (retrotransposition). Throughout human evolution, around 500,000 LINE-1 sequences have accumulated in the genome. Most of these sequences belong to ancestral LINE-1 subfamilies, including L1PA2-L1PA7, and can no longer mobilize. Only a small fraction of LINE-1 sequences, approximately 80 to 100 copies belonging to the L1Hs subfamily, are complete and still capable of retrotransposition. While silenced in most cells, many questions remain regarding LINE-1 dysregulation in cancer cells.
Results: Here, we optimized CRISPR Cas9 gRNAs to specifically target the regulatory sequence of the L1Hs 5'UTR promoter. We identified three gRNAs that were more specific to L1Hs, with limited binding to older LINE-1 sequences (L1PA2-L1PA7). We also adapted the C-BERST method (dCas9-APEX2 Biotinylation at genomic Elements by Restricted Spatial Tagging) to identify LINE-1 transcriptional regulators in cancer cells. Our LINE-1 C-BERST screen revealed both known and novel LINE-1 transcriptional regulators, including CTCF, YY1 and DUSP1.
Conclusion: Our optimization and evaluation of gRNA specificity and application of the C-BERST method creates a tool for studying the regulatory mechanisms of LINE-1 in cancer. Further, we identified the dual specificity protein phosphatase, DUSP1, as a novel regulator of LINE-1 transcription.