Importance of transcript variants in transcriptome analyses

bioRxiv Pub Date : 2024-07-16 DOI:10.1101/2024.07.11.603122
Kevin Vo, Ryan Mohamadi, Yashica Sharma, Amelia Mohamadi, Patrick E Fields, M. A. K. Rumi
{"title":"Importance of transcript variants in transcriptome analyses","authors":"Kevin Vo, Ryan Mohamadi, Yashica Sharma, Amelia Mohamadi, Patrick E Fields, M. A. K. Rumi","doi":"10.1101/2024.07.11.603122","DOIUrl":null,"url":null,"abstract":"RNA sequencing (RNA-Seq) has become a widely adopted genome-wide technique for investigating gene expression patterns. However, conventional RNA-Seq analyses typically rely on gene expression (GE) values that aggregate all the transcripts produced by a gene under a single identifier, overlooking the complexity of transcript variants arising from different transcription start sites and alternative splicing events. In this study, we explored the implications of neglecting transcript variants in RNA-Seq analyses. Among the 1334 transcription factor (TF) genes expressed in mouse embryonic stem (ES) or trophoblast stem (TS) cells, 652 were reported to be differentially expressed in TS cells based on GE values (365 upregulated and 287 downregulated, ≥2-fold, FDR p-value ≤0.05). Intriguingly, differential gene expression analysis revealed that of the 365 upregulated genes, 883 transcript variants were expressed, with only 174 (<20%) variants exhibiting upregulation based on transcript expression (TE) values. The remaining 709 (>80%) variants were either down-regulated or showed no significant change in expression analysis. Similarly, the 287 genes reported to be downregulated expressed 856 transcript variants, with only 153 (<20%) downregulated variants and 703 (>82%) variants that were upregulated or showed no significant changes. Additionally, the 682 TF genes that did not show significant changes between ES and TS cells (GE values < 2-fold changes and/or FDR p-values >0.05) expressed 2215 transcript variants, which included 477 (>21%) that were differentially expressed (276 upregulated and 201 downregulated, ≥2-fold, FDR p-value ≤0.05). Notably, a particular gene does not express just one protein; rather its transcript variants encode multiple proteins with distinct functional domains, including non-coding regulatory RNAs. Our findings underscore the critical necessity of considering transcript variants in RNA-Seq analyses. Doing so may enable a more precise understanding of the intricate functional and regulatory landscape of genes; ignoring the variants may result in an erroneous interpretation. Graphic Abstract Differential expression of transcription factors (TFs) between mouse embryonic stem (ES) cells and trophoblast stem (TS) cells. This graphic presentation clearly demonstrates the importance of including transcript variants during RNA sequencing (RNA-Seq) analyses. Panel A represents the conventional differential gene expression analysis approach after RNA-Seq, where all transcript reads are taken under a single gene name. Panel B takes differential gene expression analysis one step further by examining all the transcript variants that were previously hidden under the main gene name. Our results indicate that exclusive gene expression (GE) analysis inaccurately defines over 80% of the transcript expression (TE). Without analyses of all the transcript variants’ reads, we fail to uncover the functional importance of the variants and the regulation of their expression. Both GE and TE values are expressed as transcript per million (TPM). Data analyses were performed by using CLC Genomics Workbench.","PeriodicalId":9124,"journal":{"name":"bioRxiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.11.603122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

RNA sequencing (RNA-Seq) has become a widely adopted genome-wide technique for investigating gene expression patterns. However, conventional RNA-Seq analyses typically rely on gene expression (GE) values that aggregate all the transcripts produced by a gene under a single identifier, overlooking the complexity of transcript variants arising from different transcription start sites and alternative splicing events. In this study, we explored the implications of neglecting transcript variants in RNA-Seq analyses. Among the 1334 transcription factor (TF) genes expressed in mouse embryonic stem (ES) or trophoblast stem (TS) cells, 652 were reported to be differentially expressed in TS cells based on GE values (365 upregulated and 287 downregulated, ≥2-fold, FDR p-value ≤0.05). Intriguingly, differential gene expression analysis revealed that of the 365 upregulated genes, 883 transcript variants were expressed, with only 174 (<20%) variants exhibiting upregulation based on transcript expression (TE) values. The remaining 709 (>80%) variants were either down-regulated or showed no significant change in expression analysis. Similarly, the 287 genes reported to be downregulated expressed 856 transcript variants, with only 153 (<20%) downregulated variants and 703 (>82%) variants that were upregulated or showed no significant changes. Additionally, the 682 TF genes that did not show significant changes between ES and TS cells (GE values < 2-fold changes and/or FDR p-values >0.05) expressed 2215 transcript variants, which included 477 (>21%) that were differentially expressed (276 upregulated and 201 downregulated, ≥2-fold, FDR p-value ≤0.05). Notably, a particular gene does not express just one protein; rather its transcript variants encode multiple proteins with distinct functional domains, including non-coding regulatory RNAs. Our findings underscore the critical necessity of considering transcript variants in RNA-Seq analyses. Doing so may enable a more precise understanding of the intricate functional and regulatory landscape of genes; ignoring the variants may result in an erroneous interpretation. Graphic Abstract Differential expression of transcription factors (TFs) between mouse embryonic stem (ES) cells and trophoblast stem (TS) cells. This graphic presentation clearly demonstrates the importance of including transcript variants during RNA sequencing (RNA-Seq) analyses. Panel A represents the conventional differential gene expression analysis approach after RNA-Seq, where all transcript reads are taken under a single gene name. Panel B takes differential gene expression analysis one step further by examining all the transcript variants that were previously hidden under the main gene name. Our results indicate that exclusive gene expression (GE) analysis inaccurately defines over 80% of the transcript expression (TE). Without analyses of all the transcript variants’ reads, we fail to uncover the functional importance of the variants and the regulation of their expression. Both GE and TE values are expressed as transcript per million (TPM). Data analyses were performed by using CLC Genomics Workbench.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
转录本变异在转录本组分析中的重要性
RNA 测序(RNA-Seq)已成为研究基因表达模式的一种广泛采用的全基因组技术。然而,传统的 RNA-Seq 分析通常依赖于基因表达(GE)值,该值将一个基因产生的所有转录本汇总到一个单一的标识符下,从而忽略了不同转录起始位点和替代剪接事件产生的转录本变体的复杂性。在这项研究中,我们探讨了在 RNA-Seq 分析中忽略转录本变体的影响。在小鼠胚胎干(ES)或滋养层干(TS)细胞中表达的1334个转录因子(TF)基因中,根据GE值,有652个基因在TS细胞中差异表达(365个上调,287个下调,≥2倍,FDR p值≤0.05)。耐人寻味的是,差异基因表达分析表明,在 365 个上调基因中,有 883 个转录本变体表达,只有 174 个(80%)变体下调或在表达分析中未显示显著变化。同样,据报告表达下调的 287 个基因表达了 856 个转录本变体,只有 153 个(82%)变体表达上调或无明显变化。此外,在 ES 细胞和 TS 细胞之间未显示显著变化(GE 值<2 倍变化和/或 FDR p 值>0.05)的 682 个 TF 基因表达了 2215 个转录本变体,其中包括 477 个(>21%)差异表达的基因(276 个上调,201 个下调,≥2 倍,FDR p 值≤0.05)。值得注意的是,一个特定基因并不只表达一种蛋白质;相反,其转录本变体编码具有不同功能域的多种蛋白质,包括非编码调控 RNA。我们的发现强调了在 RNA-Seq 分析中考虑转录本变体的重要性。这样做可以更准确地了解基因错综复杂的功能和调控情况;忽略变体可能会导致错误的解释。图解摘要 小鼠胚胎干细胞(ES)和滋养层干细胞(TS)之间转录因子(TFs)的表达差异。该图解清楚地表明了在 RNA 测序(RNA-Seq)分析中纳入转录本变异的重要性。图 A 代表 RNA-Seq 分析后的传统差异基因表达分析方法,其中所有转录本读数都以单一基因名称提取。B 组则通过检查之前隐藏在主基因名称下的所有转录本变异,进一步进行差异基因表达分析。我们的结果表明,排他性基因表达(GE)分析不准确地定义了 80% 以上的转录本表达(TE)。如果不对所有转录本变体的读数进行分析,我们就无法发现变体的功能重要性及其表达调控。GE 和 TE 值均以每百万转录本(TPM)表示。数据分析使用 CLC Genomics Workbench 进行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DGTS overproduced in seed plants is excluded from plastid membranes and promotes endomembrane expansion A distant TANGO1 family member promotes vitellogenin export from the ER in C. elegans Diet-induced obesity mediated through Estrogen-Related Receptor α is independent of intestinal function The Rbfox1/LASR complex controls alternative pre-mRNA splicing by recognition of multi-part RNA regulatory modules The Once and Future Fish: 1300 years of Atlantic herring population structure and demography revealed through ancient DNA and mixed-stock analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1