Pub Date : 2024-07-27DOI: 10.1186/s13100-024-00325-w
Weronika Mikina, Paweł Hałakuc, Rafał Milanowski
The widely accepted hypothesis postulates that the first spliceosomal introns originated from group II self-splicing introns. However, it is evident that not all spliceosomal introns in the nuclear genes of modern eukaryotes are inherited through vertical transfer of intronic sequences. Several phenomena contribute to the formation of new introns but their most common origin seems to be the insertion of transposable elements. Recent analyses have highlighted instances of mass gains of new introns from transposable elements. These events often coincide with an increase or change in the spliceosome's tolerance to splicing signals, including the acceptance of noncanonical borders. Widespread acquisitions of transposon-derived introns occur across diverse evolutionary lineages, indicating convergent processes. These events, though independent, likely require a similar set of conditions. These conditions include the presence of transposon elements with features enabling their removal at the RNA level as introns and/or the existence of a splicing mechanism capable of excising unusual sequences that would otherwise not be recognized as introns by standard splicing machinery. Herein we summarize those mechanisms across different eukaryotic lineages.
{"title":"Transposon-derived introns as an element shaping the structure of eukaryotic genomes","authors":"Weronika Mikina, Paweł Hałakuc, Rafał Milanowski","doi":"10.1186/s13100-024-00325-w","DOIUrl":"https://doi.org/10.1186/s13100-024-00325-w","url":null,"abstract":"The widely accepted hypothesis postulates that the first spliceosomal introns originated from group II self-splicing introns. However, it is evident that not all spliceosomal introns in the nuclear genes of modern eukaryotes are inherited through vertical transfer of intronic sequences. Several phenomena contribute to the formation of new introns but their most common origin seems to be the insertion of transposable elements. Recent analyses have highlighted instances of mass gains of new introns from transposable elements. These events often coincide with an increase or change in the spliceosome's tolerance to splicing signals, including the acceptance of noncanonical borders. Widespread acquisitions of transposon-derived introns occur across diverse evolutionary lineages, indicating convergent processes. These events, though independent, likely require a similar set of conditions. These conditions include the presence of transposon elements with features enabling their removal at the RNA level as introns and/or the existence of a splicing mechanism capable of excising unusual sequences that would otherwise not be recognized as introns by standard splicing machinery. Herein we summarize those mechanisms across different eukaryotic lineages.","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.9,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-27DOI: 10.1186/s13100-024-00324-x
Fatemeh Moadab, Sepideh Sohrabi, Xiaoxing Wang, Rayan Najjar, Justina C Wolters, Hua Jiang, Wenyan Miao, Donna Romero, Dennis M Zaller, Megan Tran, Alison Bays, Martin S Taylor, Rosana Kapeller, John LaCava, Tomas Mustelin
Background: Systemic lupus erythematosus (SLE) is a chronic autoimmune disease with an unpredictable course of recurrent exacerbations alternating with more stable disease. SLE is characterized by broad immune activation and autoantibodies against double-stranded DNA and numerous proteins that exist in cells as aggregates with nucleic acids, such as Ro60, MOV10, and the L1 retrotransposon-encoded ORF1p.
Results: Here we report that these 3 proteins are co-expressed and co-localized in a subset of SLE granulocytes and are concentrated in cytosolic dots that also contain DNA: RNA heteroduplexes and the DNA sensor ZBP1, but not cGAS. The DNA: RNA heteroduplexes vanished from the neutrophils when they were treated with a selective inhibitor of the L1 reverse transcriptase. We also report that ORF1p granules escape neutrophils during the extrusion of neutrophil extracellular traps (NETs) and, to a lesser degree, from neutrophils dying by pyroptosis, but not apoptosis.
Conclusions: These results bring new insights into the composition of ORF1p granules in SLE neutrophils and may explain, in part, why proteins in these granules become targeted by autoantibodies in this disease.
{"title":"Subcellular location of L1 retrotransposon-encoded ORF1p, reverse transcription products, and DNA sensors in lupus granulocytes.","authors":"Fatemeh Moadab, Sepideh Sohrabi, Xiaoxing Wang, Rayan Najjar, Justina C Wolters, Hua Jiang, Wenyan Miao, Donna Romero, Dennis M Zaller, Megan Tran, Alison Bays, Martin S Taylor, Rosana Kapeller, John LaCava, Tomas Mustelin","doi":"10.1186/s13100-024-00324-x","DOIUrl":"https://doi.org/10.1186/s13100-024-00324-x","url":null,"abstract":"<p><strong>Background: </strong>Systemic lupus erythematosus (SLE) is a chronic autoimmune disease with an unpredictable course of recurrent exacerbations alternating with more stable disease. SLE is characterized by broad immune activation and autoantibodies against double-stranded DNA and numerous proteins that exist in cells as aggregates with nucleic acids, such as Ro60, MOV10, and the L1 retrotransposon-encoded ORF1p.</p><p><strong>Results: </strong>Here we report that these 3 proteins are co-expressed and co-localized in a subset of SLE granulocytes and are concentrated in cytosolic dots that also contain DNA: RNA heteroduplexes and the DNA sensor ZBP1, but not cGAS. The DNA: RNA heteroduplexes vanished from the neutrophils when they were treated with a selective inhibitor of the L1 reverse transcriptase. We also report that ORF1p granules escape neutrophils during the extrusion of neutrophil extracellular traps (NETs) and, to a lesser degree, from neutrophils dying by pyroptosis, but not apoptosis.</p><p><strong>Conclusions: </strong>These results bring new insights into the composition of ORF1p granules in SLE neutrophils and may explain, in part, why proteins in these granules become targeted by autoantibodies in this disease.</p>","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11212426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141469595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1186/s13100-024-00323-y
Pío Sierra, Richard Durbin
Background: Transposable Elements (TEs) are segments of DNA, typically a few hundred base pairs up to several tens of thousands bases long, that have the ability to generate new copies of themselves in the genome. Most existing methods used to identify TEs in a newly sequenced genome are based on their repetitive character, together with detection based on homology and structural features. As new high quality assemblies become more common, including the availability of multiple independent assemblies from the same species, an alternative strategy for identification of TE families becomes possible in which we focus on the polymorphism at insertion sites caused by TE mobility.
Results: We develop the idea of using the structural polymorphisms found in pangenomes to create a library of the TE families recently active in a species, or in a closely related group of species. We present a tool, pantera, that achieves this task, and illustrate its use both on species with well-curated libraries, and on new assemblies.
Conclusions: Our results show that pantera is sensitive and accurate, tending to correctly identify complete elements with precise boundaries, and is particularly well suited to detect larger, low copy number TEs that are often undetected with existing de novo methods.
背景:可转座元件(Transposable Elements,TEs)是 DNA 片段,通常只有几百个碱基对到几万个碱基,能够在基因组中产生新的拷贝。在新测序的基因组中,现有的大多数用于识别TE的方法都是基于其重复性,以及基于同源性和结构特征的检测。随着新的高质量集合越来越常见,包括来自同一物种的多个独立集合的可用性,另一种识别 TE 家族的策略成为可能,我们将重点放在 TE 移动性引起的插入位点的多态性上:结果:我们提出了利用庞基因组中发现的结构多态性来创建一个最近在一个物种或密切相关的物种群中活跃的TE家族库的想法。我们介绍了一个实现这一任务的工具--pantera,并说明了它在具有良好整合库的物种和新的集合上的应用:我们的研究结果表明,pantera 灵敏而准确,能正确识别具有精确边界的完整元素,尤其适合检测较大的低拷贝数 TE,而现有的从头检测方法往往检测不到这些 TE。
{"title":"Identification of transposable element families from pangenome polymorphisms.","authors":"Pío Sierra, Richard Durbin","doi":"10.1186/s13100-024-00323-y","DOIUrl":"10.1186/s13100-024-00323-y","url":null,"abstract":"<p><strong>Background: </strong>Transposable Elements (TEs) are segments of DNA, typically a few hundred base pairs up to several tens of thousands bases long, that have the ability to generate new copies of themselves in the genome. Most existing methods used to identify TEs in a newly sequenced genome are based on their repetitive character, together with detection based on homology and structural features. As new high quality assemblies become more common, including the availability of multiple independent assemblies from the same species, an alternative strategy for identification of TE families becomes possible in which we focus on the polymorphism at insertion sites caused by TE mobility.</p><p><strong>Results: </strong>We develop the idea of using the structural polymorphisms found in pangenomes to create a library of the TE families recently active in a species, or in a closely related group of species. We present a tool, pantera, that achieves this task, and illustrate its use both on species with well-curated libraries, and on new assemblies.</p><p><strong>Conclusions: </strong>Our results show that pantera is sensitive and accurate, tending to correctly identify complete elements with precise boundaries, and is particularly well suited to detect larger, low copy number TEs that are often undetected with existing de novo methods.</p>","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11202377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141458150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-11DOI: 10.1186/s13100-024-00322-z
Chris J Frangieh, Max E Wilkinson, Daniel Strebinger, Jonathan Strecker, Michelle L Walsh, Guilhem Faure, Irina A Yushenova, Rhiannon K Macrae, Irina R Arkhipova, Feng Zhang
Eukaryotic retroelements are generally divided into two classes: long terminal repeat (LTR) retrotransposons and non-LTR retrotransposons. A third class of eukaryotic retroelement, the Penelope-like elements (PLEs), has been well-characterized bioinformatically, but relatively little is known about the transposition mechanism of these elements. PLEs share some features with the R2 retrotransposon from Bombyx mori, which uses a target-primed reverse transcription (TPRT) mechanism, but their distinct phylogeny suggests PLEs may utilize a novel mechanism of mobilization. Using protein purified from E. coli, we report unique in vitro properties of a PLE from the green anole (Anolis carolinensis), revealing mechanistic aspects not shared by other retrotransposons. We found that reverse transcription is initiated at two adjacent sites within the transposon RNA that is not homologous to the cleaved DNA, a feature that is reflected in the genomic "tail" signature shared between and unique to PLEs. Our results for the first active PLE in vitro provide a starting point for understanding PLE mobilization and biology.
{"title":"Internal initiation of reverse transcription in a Penelope-like retrotransposon.","authors":"Chris J Frangieh, Max E Wilkinson, Daniel Strebinger, Jonathan Strecker, Michelle L Walsh, Guilhem Faure, Irina A Yushenova, Rhiannon K Macrae, Irina R Arkhipova, Feng Zhang","doi":"10.1186/s13100-024-00322-z","DOIUrl":"10.1186/s13100-024-00322-z","url":null,"abstract":"<p><p>Eukaryotic retroelements are generally divided into two classes: long terminal repeat (LTR) retrotransposons and non-LTR retrotransposons. A third class of eukaryotic retroelement, the Penelope-like elements (PLEs), has been well-characterized bioinformatically, but relatively little is known about the transposition mechanism of these elements. PLEs share some features with the R2 retrotransposon from Bombyx mori, which uses a target-primed reverse transcription (TPRT) mechanism, but their distinct phylogeny suggests PLEs may utilize a novel mechanism of mobilization. Using protein purified from E. coli, we report unique in vitro properties of a PLE from the green anole (Anolis carolinensis), revealing mechanistic aspects not shared by other retrotransposons. We found that reverse transcription is initiated at two adjacent sites within the transposon RNA that is not homologous to the cleaved DNA, a feature that is reflected in the genomic \"tail\" signature shared between and unique to PLEs. Our results for the first active PLE in vitro provide a starting point for understanding PLE mobilization and biology.</p>","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.9,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11167929/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-10DOI: 10.1186/s13100-024-00321-0
Beverly Ann G Boyboy, Kenji Ichiyanagi
Background: Gene expression divergence between populations and between individuals can emerge from genetic variations within the genes and/or in the cis regulatory elements. Since epigenetic modifications regulate gene expression, it is conceivable that epigenetic variations in cis regulatory elements can also be a source of gene expression divergence.
Results: In this study, we compared histone acetylation (namely, H3K9ac) profiles in two mouse strains of different subspecies origin, C57BL/6 J (B6) and MSM/Ms (MSM), as well as their F1 hybrids. This identified 319 regions of strain-specific acetylation, about half of which were observed between the alleles of F1 hybrids. While the allele-specific presence of the interferon regulatory factor 3 (IRF3) binding sequence was associated with allele-specific histone acetylation, we also revealed that B6-specific insertions of a short 3' fragment of LINE-1 (L1) retrotransposon occur within or proximal to MSM-specific acetylated regions. Furthermore, even in hyperacetylated domains, flanking regions of non-polymorphic 3' L1 fragments were hypoacetylated, suggesting a general activity of the 3' L1 fragment to induce hypoacetylation. Indeed, we confirmed the binding of the 3' region of L1 by three Krüppel-associated box domain-containing zinc finger proteins (KZFPs), which interact with histone deacetylases. These results suggest that even a short insertion of L1 would be excluded from gene- and acetylation-rich regions by natural selection. Finally, mRNA-seq analysis for F1 hybrids was carried out, which disclosed a link between allele-specific promoter/enhancer acetylation and gene expression.
Conclusions: This study disclosed a number of genetic changes that have changed the histone acetylation levels during the evolution of mouse subspecies, a part of which is associated with gene expression changes. Insertions of even a very short L1 fragment can decrease the acetylation level in their neighboring regions and thereby have been counter-selected in gene-rich regions, which may explain a long-standing mystery of discrete genomic distribution of LINEs and SINEs.
{"title":"Insertion of short L1 sequences generates inter-strain histone acetylation differences in the mouse.","authors":"Beverly Ann G Boyboy, Kenji Ichiyanagi","doi":"10.1186/s13100-024-00321-0","DOIUrl":"10.1186/s13100-024-00321-0","url":null,"abstract":"<p><strong>Background: </strong>Gene expression divergence between populations and between individuals can emerge from genetic variations within the genes and/or in the cis regulatory elements. Since epigenetic modifications regulate gene expression, it is conceivable that epigenetic variations in cis regulatory elements can also be a source of gene expression divergence.</p><p><strong>Results: </strong>In this study, we compared histone acetylation (namely, H3K9ac) profiles in two mouse strains of different subspecies origin, C57BL/6 J (B6) and MSM/Ms (MSM), as well as their F1 hybrids. This identified 319 regions of strain-specific acetylation, about half of which were observed between the alleles of F1 hybrids. While the allele-specific presence of the interferon regulatory factor 3 (IRF3) binding sequence was associated with allele-specific histone acetylation, we also revealed that B6-specific insertions of a short 3' fragment of LINE-1 (L1) retrotransposon occur within or proximal to MSM-specific acetylated regions. Furthermore, even in hyperacetylated domains, flanking regions of non-polymorphic 3' L1 fragments were hypoacetylated, suggesting a general activity of the 3' L1 fragment to induce hypoacetylation. Indeed, we confirmed the binding of the 3' region of L1 by three Krüppel-associated box domain-containing zinc finger proteins (KZFPs), which interact with histone deacetylases. These results suggest that even a short insertion of L1 would be excluded from gene- and acetylation-rich regions by natural selection. Finally, mRNA-seq analysis for F1 hybrids was carried out, which disclosed a link between allele-specific promoter/enhancer acetylation and gene expression.</p><p><strong>Conclusions: </strong>This study disclosed a number of genetic changes that have changed the histone acetylation levels during the evolution of mouse subspecies, a part of which is associated with gene expression changes. Insertions of even a very short L1 fragment can decrease the acetylation level in their neighboring regions and thereby have been counter-selected in gene-rich regions, which may explain a long-standing mystery of discrete genomic distribution of LINEs and SINEs.</p>","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.9,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11084082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140904335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-06DOI: 10.1186/s13100-024-00319-8
Valentina Peona, Jacopo Martelossi, Dareen Almojil, Julia Bocharkina, Ioana Brännström, Max Brown, Alice Cang, Tomàs Carrasco-Valenzuela, Jon DeVries, Meredith Doellman, Daniel Elsner, Pamela Espíndola-Hernández, Guillermo Friis Montoya, Bence Gaspar, Danijela Zagorski, Paweł Hałakuc, Beti Ivanovska, Christopher Laumer, Robert Lehmann, Ljudevit Luka Boštjančić, Rahia Mashoodh, Sofia Mazzoleni, Alice Mouton, Maria Anna Nilsson, Yifan Pei, Giacomo Potente, Panagiotis Provataris, José Ramón Pardos-Blas, Ravindra Raut, Tomasa Sbaffi, Florian Schwarz, Jessica Stapley, Lewis Stevens, Nusrat Sultana, Radka Symonova, Mohadeseh S Tahami, Alice Urzì, Heidi Yang, Abdullah Yusuf, Carlo Pecoraro, Alexander Suh
Background: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences.
Results: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries.
Conclusions: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.
背景:随着测序技术的进步,每年都会有数百个新的基因组集合迅速发布,为基因组进化研究提供了前所未有的资源。在此背景下,深入分析重复性元件,尤其是转座元件(TEs)对理解基因组进化的意义日益得到认可。尽管有大量可用的生物信息学工具来识别和注释转座元件,但目标物种与经过整理和分类的重复元件序列数据库之间的系统发育距离限制了任何自动注释工作。此外,由于自动生成的共识序列经常不完整,因此手工整理原始重复序列库被认为是非常必要的:在这里,我们介绍了一个众包工作的例子,该工作旨在围绕协作、同行评审的教学过程,对两个非模式物种的 TE 库进行整理和注释。人工整理和分类是耗时的过程,其短期学术回报有限,而且通常仅限于少数研究小组,其方法是通过实践经验传授的。因此,众包工作可以提供一个重要的机会,弥合有效学习整理方法与通过高质量、可重复使用的重复库增强科学界能力之间的差距:通过合作手工整理两个没有TE库的沙蜥物种的TEs,在合理的时间范围内成功鉴定了数百个新的、多样的TEs。我们的众包设置可作为类似项目的教学参考指南:非模式生物中隐藏的宝藏等待着我们去发现。
{"title":"Teaching transposon classification as a means to crowd source the curation of repeat annotation - a tardigrade perspective.","authors":"Valentina Peona, Jacopo Martelossi, Dareen Almojil, Julia Bocharkina, Ioana Brännström, Max Brown, Alice Cang, Tomàs Carrasco-Valenzuela, Jon DeVries, Meredith Doellman, Daniel Elsner, Pamela Espíndola-Hernández, Guillermo Friis Montoya, Bence Gaspar, Danijela Zagorski, Paweł Hałakuc, Beti Ivanovska, Christopher Laumer, Robert Lehmann, Ljudevit Luka Boštjančić, Rahia Mashoodh, Sofia Mazzoleni, Alice Mouton, Maria Anna Nilsson, Yifan Pei, Giacomo Potente, Panagiotis Provataris, José Ramón Pardos-Blas, Ravindra Raut, Tomasa Sbaffi, Florian Schwarz, Jessica Stapley, Lewis Stevens, Nusrat Sultana, Radka Symonova, Mohadeseh S Tahami, Alice Urzì, Heidi Yang, Abdullah Yusuf, Carlo Pecoraro, Alexander Suh","doi":"10.1186/s13100-024-00319-8","DOIUrl":"10.1186/s13100-024-00319-8","url":null,"abstract":"<p><strong>Background: </strong>The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences.</p><p><strong>Results: </strong>Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries.</p><p><strong>Conclusions: </strong>The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.</p>","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.9,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11071193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140874740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-04DOI: 10.1186/s13100-024-00320-1
Elena Fernández-Suárez, María González-del Pozo, Cristina Méndez-Vidal, Marta Martín-Sánchez, Marcela Mena, Belén de la Morena-Barrio, Javier Corral, Salud Borrego, Guillermo Antiñolo
Biallelic variants in EYS are the major cause of autosomal recessive retinitis pigmentosa (arRP) in certain populations, a clinically and genetically heterogeneous disease that may lead to legal blindness. EYS is one of the largest genes (~ 2 Mb) expressed in the retina, in which structural variants (SVs) represent a common cause of disease. However, their identification using short-read sequencing (SRS) is not always feasible. Here, we conducted targeted long-read sequencing (T-LRS) using adaptive sampling of EYS on the MinION sequencing platform (Oxford Nanopore Technologies) to definitively diagnose an arRP family, whose affected individuals (n = 3) carried the heterozygous pathogenic deletion of exons 32–33 in the EYS gene. As this was a recurrent variant identified in three additional families in our cohort, we also aimed to characterize the known deletion at the nucleotide level to assess a possible founder effect. T-LRS in family A unveiled a heterozygous AluYa5 insertion in the coding exon 43 of EYS (chr6(GRCh37):g.64430524_64430525ins352), which segregated with the disease in compound heterozygosity with the previously identified deletion. Visual inspection of previous SRS alignments using IGV revealed several reads containing soft-clipped bases, accompanied by a slight drop in coverage at the Alu insertion site. This prompted us to develop a simplified program using grep command to investigate the recurrence of this variant in our cohort from SRS data. Moreover, LRS also allowed the characterization of the CNV as a ~ 56.4kb deletion spanning exons 32–33 of EYS (chr6(GRCh37):g.64764235_64820592del). The results of further characterization by Sanger sequencing and linkage analysis in the four families were consistent with a founder variant. To our knowledge, this is the first report of a mobile element insertion into the coding sequence of EYS, as a likely cause of arRP in a family. Our study highlights the value of LRS technology in characterizing and identifying hidden pathogenic SVs, such as retrotransposon insertions, whose contribution to the etiopathogenesis of rare diseases may be underestimated.
{"title":"Long-read sequencing improves the genetic diagnosis of retinitis pigmentosa by identifying an Alu retrotransposon insertion in the EYS gene","authors":"Elena Fernández-Suárez, María González-del Pozo, Cristina Méndez-Vidal, Marta Martín-Sánchez, Marcela Mena, Belén de la Morena-Barrio, Javier Corral, Salud Borrego, Guillermo Antiñolo","doi":"10.1186/s13100-024-00320-1","DOIUrl":"https://doi.org/10.1186/s13100-024-00320-1","url":null,"abstract":"Biallelic variants in EYS are the major cause of autosomal recessive retinitis pigmentosa (arRP) in certain populations, a clinically and genetically heterogeneous disease that may lead to legal blindness. EYS is one of the largest genes (~ 2 Mb) expressed in the retina, in which structural variants (SVs) represent a common cause of disease. However, their identification using short-read sequencing (SRS) is not always feasible. Here, we conducted targeted long-read sequencing (T-LRS) using adaptive sampling of EYS on the MinION sequencing platform (Oxford Nanopore Technologies) to definitively diagnose an arRP family, whose affected individuals (n = 3) carried the heterozygous pathogenic deletion of exons 32–33 in the EYS gene. As this was a recurrent variant identified in three additional families in our cohort, we also aimed to characterize the known deletion at the nucleotide level to assess a possible founder effect. T-LRS in family A unveiled a heterozygous AluYa5 insertion in the coding exon 43 of EYS (chr6(GRCh37):g.64430524_64430525ins352), which segregated with the disease in compound heterozygosity with the previously identified deletion. Visual inspection of previous SRS alignments using IGV revealed several reads containing soft-clipped bases, accompanied by a slight drop in coverage at the Alu insertion site. This prompted us to develop a simplified program using grep command to investigate the recurrence of this variant in our cohort from SRS data. Moreover, LRS also allowed the characterization of the CNV as a ~ 56.4kb deletion spanning exons 32–33 of EYS (chr6(GRCh37):g.64764235_64820592del). The results of further characterization by Sanger sequencing and linkage analysis in the four families were consistent with a founder variant. To our knowledge, this is the first report of a mobile element insertion into the coding sequence of EYS, as a likely cause of arRP in a family. Our study highlights the value of LRS technology in characterizing and identifying hidden pathogenic SVs, such as retrotransposon insertions, whose contribution to the etiopathogenesis of rare diseases may be underestimated.","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.9,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140833711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1186/s13100-024-00317-w
Anthony B. Garza, Emmanuelle Lerat, Hani Z. Girgis
Plant genomes include large numbers of transposable elements. One particular type of these elements is flanked by two Long Terminal Repeats (LTRs) and can translocate using RNA. Such elements are known as LTR-retrotransposons; they are the most abundant type of transposons in plant genomes. They have many important functions involving gene regulation and the rise of new genes and pseudo genes in response to severe stress. Additionally, LTR-retrotransposons have several applications in biotechnology. Due to the abundance and the importance of LTR-retrotransposons, multiple computational tools have been developed for their detection. However, none of these tools take advantages of the availability of related genomes; they process one chromosome at a time. Further, recently nested LTR-retrotransposons (multiple elements of the same family are inserted into each other) cannot be annotated accurately — or cannot be annotated at all — by the currently available tools. Motivated to overcome these two limitations, we built Look4LTRs, which can annotate LTR-retrotransposons in multiple related genomes simultaneously and discover recently nested elements. The methodology of Look4LTRs depends on techniques imported from the signal-processing field, graph algorithms, and machine learning with a minimal use of alignment algorithms. Four plant genomes were used in developing Look4LTRs and eight plant genomes for evaluating it in contrast to three related tools. Look4LTRs is the fastest while maintaining better or comparable F1 scores (the harmonic average of recall and precision) to those obtained by the other tools. Our results demonstrate the added benefit of annotating LTR-retrotransposons in multiple related genomes simultaneously and the ability to discover recently nested elements. Expert human manual examination of six elements — not included in the ground truth — revealed that three elements belong to known families and two elements are likely from new families. With respect to examining recently nested LTR-retrotransposons, three out of five were confirmed to be valid elements. Look4LTRs — with its speed, accuracy, and novel features — represents a true advancement in the annotation of LTR-retrotransposons, opening the door to many studies focused on understanding their functions in plants.
{"title":"Look4LTRs: a Long terminal repeat retrotransposon detection tool capable of cross species studies and discovering recently nested repeats","authors":"Anthony B. Garza, Emmanuelle Lerat, Hani Z. Girgis","doi":"10.1186/s13100-024-00317-w","DOIUrl":"https://doi.org/10.1186/s13100-024-00317-w","url":null,"abstract":"Plant genomes include large numbers of transposable elements. One particular type of these elements is flanked by two Long Terminal Repeats (LTRs) and can translocate using RNA. Such elements are known as LTR-retrotransposons; they are the most abundant type of transposons in plant genomes. They have many important functions involving gene regulation and the rise of new genes and pseudo genes in response to severe stress. Additionally, LTR-retrotransposons have several applications in biotechnology. Due to the abundance and the importance of LTR-retrotransposons, multiple computational tools have been developed for their detection. However, none of these tools take advantages of the availability of related genomes; they process one chromosome at a time. Further, recently nested LTR-retrotransposons (multiple elements of the same family are inserted into each other) cannot be annotated accurately — or cannot be annotated at all — by the currently available tools. Motivated to overcome these two limitations, we built Look4LTRs, which can annotate LTR-retrotransposons in multiple related genomes simultaneously and discover recently nested elements. The methodology of Look4LTRs depends on techniques imported from the signal-processing field, graph algorithms, and machine learning with a minimal use of alignment algorithms. Four plant genomes were used in developing Look4LTRs and eight plant genomes for evaluating it in contrast to three related tools. Look4LTRs is the fastest while maintaining better or comparable F1 scores (the harmonic average of recall and precision) to those obtained by the other tools. Our results demonstrate the added benefit of annotating LTR-retrotransposons in multiple related genomes simultaneously and the ability to discover recently nested elements. Expert human manual examination of six elements — not included in the ground truth — revealed that three elements belong to known families and two elements are likely from new families. With respect to examining recently nested LTR-retrotransposons, three out of five were confirmed to be valid elements. Look4LTRs — with its speed, accuracy, and novel features — represents a true advancement in the annotation of LTR-retrotransposons, opening the door to many studies focused on understanding their functions in plants.","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.9,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140570258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-11DOI: 10.1186/s13100-024-00318-9
Nozhat T. Hassan, James D. Galbraith, David L. Adelson
Horizontal transfer of transposable elements (HTT) has been reported across many species and the impact of such events on genome structure and function has been well described. However, few studies have focused on reptilian genomes, especially HTT events in Testudines (turtles). Here, as a consequence of investigating the repetitive content of Malaclemys terrapin terrapin (Diamondback turtle) we found a high similarity DNA transposon, annotated in RepBase as hAT-6_XT, shared between other turtle species, ray-finned fishes, and a frog. hAT-6_XT was notably absent in reptilian taxa closely related to turtles, such as crocodiles and birds. Successful invasion of DNA transposons into new genomes requires the conservation of specific residues in the encoded transposase, and through structural analysis, these residues were identified indicating some retention of functional transposition activity. We document six recent independent HTT events of a DNA transposon in turtles, which are known to have a low genomic evolutionary rate and ancient repeats. Malaclemys terrapin terrapin (Diamondback turtle). Malaclemys terrapin pileata (Mississippi diamondback terrapin turtle). Trachemys scripta elegans (Red-eared slider turtle). Chrysemys picta bellii (Western painted turtle). Dermatemys mawii (Hickatee turtle). Sternotherus odoratus (Common musk turtle). Mesoclemmys tuberculata (Tuberculate Toad-headed turtle). Etheostoma spectabile (Orangethroat darter fish). Thalassophryne amazonica (Prehistoric monster fish). Scophthalmus maximus (Turbot fish). Syngnathus acus (Greater pipefish). Scleropages formosus (Asian Arowana fish). Xenopus tropicalis (Western clawed frog).
{"title":"Multiple horizontal transfer events of a DNA transposon into turtles, fishes, and a frog","authors":"Nozhat T. Hassan, James D. Galbraith, David L. Adelson","doi":"10.1186/s13100-024-00318-9","DOIUrl":"https://doi.org/10.1186/s13100-024-00318-9","url":null,"abstract":"Horizontal transfer of transposable elements (HTT) has been reported across many species and the impact of such events on genome structure and function has been well described. However, few studies have focused on reptilian genomes, especially HTT events in Testudines (turtles). Here, as a consequence of investigating the repetitive content of Malaclemys terrapin terrapin (Diamondback turtle) we found a high similarity DNA transposon, annotated in RepBase as hAT-6_XT, shared between other turtle species, ray-finned fishes, and a frog. hAT-6_XT was notably absent in reptilian taxa closely related to turtles, such as crocodiles and birds. Successful invasion of DNA transposons into new genomes requires the conservation of specific residues in the encoded transposase, and through structural analysis, these residues were identified indicating some retention of functional transposition activity. We document six recent independent HTT events of a DNA transposon in turtles, which are known to have a low genomic evolutionary rate and ancient repeats. Malaclemys terrapin terrapin (Diamondback turtle). Malaclemys terrapin pileata (Mississippi diamondback terrapin turtle). Trachemys scripta elegans (Red-eared slider turtle). Chrysemys picta bellii (Western painted turtle). Dermatemys mawii (Hickatee turtle). Sternotherus odoratus (Common musk turtle). Mesoclemmys tuberculata (Tuberculate Toad-headed turtle). Etheostoma spectabile (Orangethroat darter fish). Thalassophryne amazonica (Prehistoric monster fish). Scophthalmus maximus (Turbot fish). Syngnathus acus (Greater pipefish). Scleropages formosus (Asian Arowana fish). Xenopus tropicalis (Western clawed frog).","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.9,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140570065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-03DOI: 10.1186/s13100-024-00315-y
Michel Choudalakis, Pavel Bashtrykov, Albert Jeltsch
Repeat elements (REs) play important roles for cell function in health and disease. However, RE enrichment analysis in short-read high-throughput sequencing (HTS) data, such as ChIP-seq, is a challenging task. Here, we present RepEnTools, a software package for genome-wide RE enrichment analysis of ChIP-seq and similar chromatin pulldown experiments. Our analysis package bundles together various software with carefully chosen and validated settings to provide a complete solution for RE analysis, starting from raw input files to tabular and graphical outputs. RepEnTools implementations are easily accessible even with minimal IT skills (Galaxy/UNIX). To demonstrate the performance of RepEnTools, we analysed chromatin pulldown data by the human UHRF1 TTD protein domain and discovered enrichment of TTD binding on young primate and hominid specific polymorphic repeats (SVA, L1PA1/L1HS) overlapping known enhancers and decorated with H3K4me1-K9me2/3 modifications. We corroborated these new bioinformatic findings with experimental data by qPCR assays using newly developed primate and hominid specific qPCR assays which complement similar research tools. Finally, we analysed mouse UHRF1 ChIP-seq data with RepEnTools and showed that the endogenous mUHRF1 protein colocalizes with H3K4me1-H3K9me3 on promoters of REs which were silenced by UHRF1. These new data suggest a functional role for UHRF1 in silencing of REs that is mediated by TTD binding to the H3K4me1-K9me3 double mark and conserved in two mammalian species. RepEnTools improves the previously available programmes for RE enrichment analysis in chromatin pulldown studies by leveraging new tools, enhancing accessibility and adding some key functions. RepEnTools can analyse RE enrichment rapidly, efficiently, and accurately, providing the community with an up-to-date, reliable and accessible tool for this important type of analysis.
重复元件(REs)对健康和疾病中的细胞功能起着重要作用。然而,在短读数高通量测序(HTS)数据(如 ChIP-seq)中进行 RE 富集分析是一项具有挑战性的任务。在这里,我们介绍 RepEnTools,这是一个用于 ChIP-seq 和类似染色质下拉实验的全基因组 RE 富集分析的软件包。我们的分析软件包将各种软件与精心选择和验证的设置捆绑在一起,提供了从原始输入文件到表格和图形输出的 RE 分析完整解决方案。即使只有最低限度的 IT 技能(Galaxy/UNIX),也能轻松实现 RepEnTools。为了证明 RepEnTools 的性能,我们分析了人类 UHRF1 TTD 蛋白结构域的染色质 pulldown 数据,发现在与已知增强子重叠并有 H3K4me1-K9me2/3 修饰的幼年灵长类和类人特异多态重复序列(SVA、L1PA1/L1HS)上 TTD 结合富集。我们利用新开发的灵长类和类人猿特异性 qPCR 检测方法,通过 qPCR 检测实验数据证实了这些新的生物信息学发现,这些检测方法是对类似研究工具的补充。最后,我们利用 RepEnTools 分析了小鼠 UHRF1 ChIP-seq 数据,结果表明内源性 mUHRF1 蛋白与被 UHRF1 沉默的 RE 启动子上的 H3K4me1-H3K9me3 共同定位。这些新数据表明,UHRF1在REs沉默中的功能作用是通过TTD与H3K4me1-K3K9me3双标记的结合来介导的,并且在两个哺乳动物物种中是保守的。RepEnTools 利用新工具,提高了可访问性,并增加了一些关键功能,从而改进了染色质 pulldown 研究中先前可用的 RE 富集分析程序。RepEnTools 可以快速、高效、准确地分析 RE 富集,为这一重要类型的分析提供了最新、可靠、易用的工具。
{"title":"RepEnTools: an automated repeat enrichment analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in young repeats","authors":"Michel Choudalakis, Pavel Bashtrykov, Albert Jeltsch","doi":"10.1186/s13100-024-00315-y","DOIUrl":"https://doi.org/10.1186/s13100-024-00315-y","url":null,"abstract":"Repeat elements (REs) play important roles for cell function in health and disease. However, RE enrichment analysis in short-read high-throughput sequencing (HTS) data, such as ChIP-seq, is a challenging task. Here, we present RepEnTools, a software package for genome-wide RE enrichment analysis of ChIP-seq and similar chromatin pulldown experiments. Our analysis package bundles together various software with carefully chosen and validated settings to provide a complete solution for RE analysis, starting from raw input files to tabular and graphical outputs. RepEnTools implementations are easily accessible even with minimal IT skills (Galaxy/UNIX). To demonstrate the performance of RepEnTools, we analysed chromatin pulldown data by the human UHRF1 TTD protein domain and discovered enrichment of TTD binding on young primate and hominid specific polymorphic repeats (SVA, L1PA1/L1HS) overlapping known enhancers and decorated with H3K4me1-K9me2/3 modifications. We corroborated these new bioinformatic findings with experimental data by qPCR assays using newly developed primate and hominid specific qPCR assays which complement similar research tools. Finally, we analysed mouse UHRF1 ChIP-seq data with RepEnTools and showed that the endogenous mUHRF1 protein colocalizes with H3K4me1-H3K9me3 on promoters of REs which were silenced by UHRF1. These new data suggest a functional role for UHRF1 in silencing of REs that is mediated by TTD binding to the H3K4me1-K9me3 double mark and conserved in two mammalian species. RepEnTools improves the previously available programmes for RE enrichment analysis in chromatin pulldown studies by leveraging new tools, enhancing accessibility and adding some key functions. RepEnTools can analyse RE enrichment rapidly, efficiently, and accurately, providing the community with an up-to-date, reliable and accessible tool for this important type of analysis.","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.9,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140570063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}