Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation.

IF 4.7 2区 生物学 Q1 GENETICS & HEREDITY Mobile DNA Pub Date : 2024-03-15 DOI:10.1186/s13100-024-00316-x
Xuanzeng Liu, Lina Zhao, Muhammad Majid, Yuan Huang
{"title":"Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation.","authors":"Xuanzeng Liu, Lina Zhao, Muhammad Majid, Yuan Huang","doi":"10.1186/s13100-024-00316-x","DOIUrl":null,"url":null,"abstract":"<p><p>Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of \"level1/level2-level3\". Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is Angaracris rhodopa, which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.</p>","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10941475/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mobile DNA","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13100-024-00316-x","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of "level1/level2-level3". Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is Angaracris rhodopa, which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Orthoptera-TElib:用于 TE 注释的 Orthoptera 转座元件库。
可转座元件(TE)是真核生物基因组的主要组成部分,几乎存在于所有真核生物中。在物种之间和物种内部,可转座元具有高度的动态性,这极大地影响了可转座元数据库的普遍适用性。直翅目是昆虫类中唯一一个已知基因组显著增大(0.93-21.48 Gb)的类群。使用现有的 TE 公共数据库分析庞大的基因组时,TE 注释的效率并不理想。为了解决这一局限性,当务之急是不断更新现有的 TE 资源库,并且随着更多昆虫基因组的公开,有必要建立一个直翅目昆虫特异性资源库。在这里,我们利用 12 个直翅目物种的全基因组数据对 TE 进行了全新注释,然后对未分类的 TE 进行了人工再注释,从而构建了一个非冗余的直翅目特异性 TE 库:Orthoptera-TElib。Orthoptera-TElib 包含 24,021 个 TE 条目,其中包括 13,964 个未知 TE 的重新标注结果。Orthoptera-TElib 中 TE 条目的命名采用了与 RepeatMasker 和 Dfam 相同的命名方式,并以 "level1/level2-level3 "的三级形式编码。Orthoptera-TElib 可直接用作输入参考数据库,与 RepeatMasker 和 dnaPipeTE 等主流重复序列分析软件兼容。在分析直翅目物种的 TE 时,无论使用低覆盖率测序数据还是基因组组装数据,Orthoptera-TElib 的 TE 注释效果都优于 Dfam 和 Repbase。TE注释结果改善最大的物种是蝼蛄(Angaracris rhodopa),从占基因组的 7.89% 增加到 53.28%。最后,Orthoptera-TElib 存储在 Sqlite3 中,以方便数据更新和用户访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Mobile DNA
Mobile DNA GENETICS & HEREDITY-
CiteScore
8.20
自引率
6.10%
发文量
26
审稿时长
11 weeks
期刊介绍: Mobile DNA is an online, peer-reviewed, open access journal that publishes articles providing novel insights into DNA rearrangements in all organisms, ranging from transposition and other types of recombination mechanisms to patterns and processes of mobile element and host genome evolution. In addition, the journal will consider articles on the utility of mobile genetic elements in biotechnological methods and protocols.
期刊最新文献
International congress on transposable elements (ICTE 2024) in Saint Malo: breaking down transposon waves and their impact. Accelerating de novo SINE annotation in plant and animal genomes. Association of hyperactivated transposon expression with exacerbated immune activation in systemic lupus erythematosus. Widespread HCD-tRNA derived SINEs in bivalves rely on multiple LINE partners and accumulate in genic regions. Correction: Transposon-derived introns as an element shaping the structure of eukaryotic genomes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1