{"title":"Detecting transposable elements in long read genomes using sTELLeR.","authors":"Kristine Bilgrav Saether, Jesper Eisfeldt","doi":"10.1093/bioinformatics/btae686","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Repeat elements such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing (srGS), due to its read length and TEs repetitive nature. Long read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis, not only from a research perspective, but also for future clinical detection. When choosing a lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision and compatibility with inclusion into pipelines are crucial for efficient detection.</p><p><strong>Results: </strong>We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48x as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a VCF file, enabling compatibility with other variant callers and downstream analysis.</p><p><strong>Availability: </strong>sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Repeat elements such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing (srGS), due to its read length and TEs repetitive nature. Long read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis, not only from a research perspective, but also for future clinical detection. When choosing a lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision and compatibility with inclusion into pipelines are crucial for efficient detection.
Results: We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48x as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a VCF file, enabling compatibility with other variant callers and downstream analysis.
Availability: sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows.
Supplementary information: Supplementary data are available at Bioinformatics online.
动机可转座元件(TE)等重复元件是高度重复的 DNA 序列,约占基因组的 50%。Alu、SVA、HERV 和 L1 等可转座元件可通过破坏基因、导致换框突变或改变剪接模式而致病。由于短读数基因组测序(srGS)的读数长度和TEs的重复性,这些元素的特征描述具有挑战性。因此,长读数基因组测序(lrGS)为改进 TE 检测和分析提供了机会,不仅从研究角度来看是如此,在未来的临床检测中也是如此。在选择 lrGS TE 调用器时,运行时间、CPU 小时数、灵敏度、精确度以及与纳入管道的兼容性等参数对于高效检测至关重要:因此,我们开发了 sTELLeR(s) Transposable ELement in Long (e) Read,用于准确、快速、有效地检测 TE。特别是,与同类工具相比,sTELLeR 在调用 Alu 元素方面表现出更高的精度和灵敏度。调用速度是同类工具的5-48倍,可用性:sTELLeR是一个基于python的工具,可在https://github.com/kristinebilgrav/sTELLeR。总之,我们证明了 sTELLeR 是一种快速、灵敏、精确的 TE 元素检测调用工具,可以很容易地应用到变异调用工作流中:补充数据可在 Bioinformatics online 上获取。