Comprehensive genome annotation of the model ciliate Tetrahymena thermophila by in-depth epigenetic and transcriptomic profiling

IF 13.1 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Nucleic Acids Research Pub Date : 2024-12-10 DOI:10.1093/nar/gkae1177
Fei Ye, Xiao Chen, Yuan Li, Aili Ju, Yalan Sheng, Lili Duan, Jiachen Zhang, Zhe Zhang, Khaled A S Al-Rasheid, Naomi A Stover, Shan Gao
{"title":"Comprehensive genome annotation of the model ciliate Tetrahymena thermophila by in-depth epigenetic and transcriptomic profiling","authors":"Fei Ye, Xiao Chen, Yuan Li, Aili Ju, Yalan Sheng, Lili Duan, Jiachen Zhang, Zhe Zhang, Khaled A S Al-Rasheid, Naomi A Stover, Shan Gao","doi":"10.1093/nar/gkae1177","DOIUrl":null,"url":null,"abstract":"The ciliate Tetrahymena thermophila is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current studies on Tetrahymena biology face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the Tetrahymena macronuclear genome, we collected extensive transcriptomic data spanning various cell stages. To ascertain transcript orientation and transcription start/end sites, we incorporated data on epigenetic marks displaying enrichment towards the 5′ end of gene bodies, including H3 lysine 4 tri-methylation (H3K4me3), histone variant H2A.Z, nucleosome positioning and N6-methyldeoxyadenine (6mA). Cap-seq data was subsequently applied to validate the accuracy of identified transcription start sites. Additionally, we integrated Nanopore direct RNA sequencing (DRS), strand-specific RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) data. Using a newly developed bioinformatic pipeline, coupled with manual curation and experimental validation, our work yielded substantial improvements to the current gene models, including the addition of 2,481 new genes, updates to 23,936 existing genes, and the incorporation of 8,339 alternatively spliced isoforms. Furthermore, novel UTR information was annotated for 26,687 high-confidence genes. Intriguingly, 20% of protein-coding genes were identified to have natural antisense transcripts characterized by high diversity in alternative splicing, thus offering insights into understanding transcriptional regulation. Our work will enhance the utility of Tetrahymena as a robust genetic toolkit for advancing biological research, and provides a promising framework for genome annotation in other eukaryotes.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"38 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkae1177","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The ciliate Tetrahymena thermophila is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current studies on Tetrahymena biology face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the Tetrahymena macronuclear genome, we collected extensive transcriptomic data spanning various cell stages. To ascertain transcript orientation and transcription start/end sites, we incorporated data on epigenetic marks displaying enrichment towards the 5′ end of gene bodies, including H3 lysine 4 tri-methylation (H3K4me3), histone variant H2A.Z, nucleosome positioning and N6-methyldeoxyadenine (6mA). Cap-seq data was subsequently applied to validate the accuracy of identified transcription start sites. Additionally, we integrated Nanopore direct RNA sequencing (DRS), strand-specific RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) data. Using a newly developed bioinformatic pipeline, coupled with manual curation and experimental validation, our work yielded substantial improvements to the current gene models, including the addition of 2,481 new genes, updates to 23,936 existing genes, and the incorporation of 8,339 alternatively spliced isoforms. Furthermore, novel UTR information was annotated for 26,687 high-confidence genes. Intriguingly, 20% of protein-coding genes were identified to have natural antisense transcripts characterized by high diversity in alternative splicing, thus offering insights into understanding transcriptional regulation. Our work will enhance the utility of Tetrahymena as a robust genetic toolkit for advancing biological research, and provides a promising framework for genome annotation in other eukaryotes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过深入的表观遗传和转录组分析,对模式纤毛虫嗜热四膜虫进行了全面的基因组注释
纤毛虫嗜热四膜虫是一种完善的单细胞模式真核生物,为基础生物学发现做出了重大贡献。尽管其重要性得到公认,但目前对四膜虫生物学的研究面临着由于基因注释不准确,特别是明显缺乏非翻译区(utr)的挑战。为了全面地注释四膜虫大核基因组,我们收集了跨越不同细胞阶段的广泛转录组数据。为了确定转录方向和转录起始/结束位点,我们结合了显示基因体5 '端富集的表观遗传标记数据,包括H3赖氨酸4三甲基化(H3K4me3),组蛋白变体H2A。Z,核小体定位和n6 -甲基脱氧腺嘌呤(6mA)。随后应用Cap-seq数据验证鉴定的转录起始位点的准确性。此外,我们整合了纳米孔直接RNA测序(DRS)、链特异性RNA测序(RNA-seq)和高通量测序(ATAC-seq)数据的转座酶可及染色质分析。利用新开发的生物信息学管道,再加上人工管理和实验验证,我们的工作对当前的基因模型进行了实质性的改进,包括增加了2481个新基因,更新了23936个现有基因,并纳入了8339个可选剪接的同种异构体。此外,还为26,687个高置信度基因注释了新的UTR信息。有趣的是,20%的蛋白质编码基因被鉴定出具有天然反义转录物,其特征是选择性剪接的高度多样性,从而为理解转录调控提供了见解。我们的工作将增强四膜虫作为推进生物学研究的强大遗传工具包的实用性,并为其他真核生物的基因组注释提供了一个有希望的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Nucleic Acids Research
Nucleic Acids Research 生物-生化与分子生物学
CiteScore
27.10
自引率
4.70%
发文量
1057
审稿时长
2 months
期刊介绍: Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.
期刊最新文献
High-throughput measurement and prediction of the i-motif DNA stability landscape AquIRE reveals the mechanisms of clinically induced RNA damage and the conservation and dynamics of glycoRNAs. Anticodon-edited transfer RNAs (ACE-tRNAs) encoded as therapeutic nonviral minimal DNA vectors. Dual-single-guide RNA strategy improves CRISPR-mediated homology-directed repair in Aspergillus. Exploring the regulatory potential of RNA structures in 202 cyanobacterial genomes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1