Comprehensive genome annotation of the model ciliate Tetrahymena thermophila by in-depth epigenetic and transcriptomic profiling

IF 13.1 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Nucleic Acids Research Pub Date : 2024-12-10 DOI:10.1093/nar/gkae1177

Fei Ye, Xiao Chen, Yuan Li, Aili Ju, Yalan Sheng, Lili Duan, Jiachen Zhang, Zhe Zhang, Khaled A S Al-Rasheid, Naomi A Stover, Shan Gao

{"title":"Comprehensive genome annotation of the model ciliate Tetrahymena thermophila by in-depth epigenetic and transcriptomic profiling","authors":"Fei Ye, Xiao Chen, Yuan Li, Aili Ju, Yalan Sheng, Lili Duan, Jiachen Zhang, Zhe Zhang, Khaled A S Al-Rasheid, Naomi A Stover, Shan Gao","doi":"10.1093/nar/gkae1177","DOIUrl":null,"url":null,"abstract":"The ciliate Tetrahymena thermophila is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current studies on Tetrahymena biology face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the Tetrahymena macronuclear genome, we collected extensive transcriptomic data spanning various cell stages. To ascertain transcript orientation and transcription start/end sites, we incorporated data on epigenetic marks displaying enrichment towards the 5′ end of gene bodies, including H3 lysine 4 tri-methylation (H3K4me3), histone variant H2A.Z, nucleosome positioning and N6-methyldeoxyadenine (6mA). Cap-seq data was subsequently applied to validate the accuracy of identified transcription start sites. Additionally, we integrated Nanopore direct RNA sequencing (DRS), strand-specific RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) data. Using a newly developed bioinformatic pipeline, coupled with manual curation and experimental validation, our work yielded substantial improvements to the current gene models, including the addition of 2,481 new genes, updates to 23,936 existing genes, and the incorporation of 8,339 alternatively spliced isoforms. Furthermore, novel UTR information was annotated for 26,687 high-confidence genes. Intriguingly, 20% of protein-coding genes were identified to have natural antisense transcripts characterized by high diversity in alternative splicing, thus offering insights into understanding transcriptional regulation. Our work will enhance the utility of Tetrahymena as a robust genetic toolkit for advancing biological research, and provides a promising framework for genome annotation in other eukaryotes.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"38 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkae1177","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The ciliate Tetrahymena thermophila is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current studies on Tetrahymena biology face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the Tetrahymena macronuclear genome, we collected extensive transcriptomic data spanning various cell stages. To ascertain transcript orientation and transcription start/end sites, we incorporated data on epigenetic marks displaying enrichment towards the 5′ end of gene bodies, including H3 lysine 4 tri-methylation (H3K4me3), histone variant H2A.Z, nucleosome positioning and N6-methyldeoxyadenine (6mA). Cap-seq data was subsequently applied to validate the accuracy of identified transcription start sites. Additionally, we integrated Nanopore direct RNA sequencing (DRS), strand-specific RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) data. Using a newly developed bioinformatic pipeline, coupled with manual curation and experimental validation, our work yielded substantial improvements to the current gene models, including the addition of 2,481 new genes, updates to 23,936 existing genes, and the incorporation of 8,339 alternatively spliced isoforms. Furthermore, novel UTR information was annotated for 26,687 high-confidence genes. Intriguingly, 20% of protein-coding genes were identified to have natural antisense transcripts characterized by high diversity in alternative splicing, thus offering insights into understanding transcriptional regulation. Our work will enhance the utility of Tetrahymena as a robust genetic toolkit for advancing biological research, and provides a promising framework for genome annotation in other eukaryotes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过深入的表观遗传和转录组分析，对模式纤毛虫嗜热四膜虫进行了全面的基因组注释

纤毛虫嗜热四膜虫是一种完善的单细胞模式真核生物，为基础生物学发现做出了重大贡献。尽管其重要性得到公认，但目前对四膜虫生物学的研究面临着由于基因注释不准确，特别是明显缺乏非翻译区（utr）的挑战。为了全面地注释四膜虫大核基因组，我们收集了跨越不同细胞阶段的广泛转录组数据。为了确定转录方向和转录起始/结束位点，我们结合了显示基因体5 '端富集的表观遗传标记数据，包括H3赖氨酸4三甲基化（H3K4me3），组蛋白变体H2A。Z，核小体定位和n6 -甲基脱氧腺嘌呤（6mA）。随后应用Cap-seq数据验证鉴定的转录起始位点的准确性。此外，我们整合了纳米孔直接RNA测序（DRS）、链特异性RNA测序（RNA-seq）和高通量测序（ATAC-seq）数据的转座酶可及染色质分析。利用新开发的生物信息学管道，再加上人工管理和实验验证，我们的工作对当前的基因模型进行了实质性的改进，包括增加了2481个新基因，更新了23936个现有基因，并纳入了8339个可选剪接的同种异构体。此外，还为26,687个高置信度基因注释了新的UTR信息。有趣的是，20%的蛋白质编码基因被鉴定出具有天然反义转录物，其特征是选择性剪接的高度多样性，从而为理解转录调控提供了见解。我们的工作将增强四膜虫作为推进生物学研究的强大遗传工具包的实用性，并为其他真核生物的基因组注释提供了一个有希望的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Nucleic Acids Research 生物-生化与分子生物学

CiteScore

27.10

自引率

4.70%

发文量

1057

审稿时长

2 months

期刊介绍： Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.