Kevin Vo, Ryan Mohamadi, Yashica Sharma, Amelia Mohamadi, Patrick E Fields, M. A. K. Rumi
{"title":"转录本变异在转录本组分析中的重要性","authors":"Kevin Vo, Ryan Mohamadi, Yashica Sharma, Amelia Mohamadi, Patrick E Fields, M. A. K. Rumi","doi":"10.1101/2024.07.11.603122","DOIUrl":null,"url":null,"abstract":"RNA sequencing (RNA-Seq) has become a widely adopted genome-wide technique for investigating gene expression patterns. However, conventional RNA-Seq analyses typically rely on gene expression (GE) values that aggregate all the transcripts produced by a gene under a single identifier, overlooking the complexity of transcript variants arising from different transcription start sites and alternative splicing events. In this study, we explored the implications of neglecting transcript variants in RNA-Seq analyses. Among the 1334 transcription factor (TF) genes expressed in mouse embryonic stem (ES) or trophoblast stem (TS) cells, 652 were reported to be differentially expressed in TS cells based on GE values (365 upregulated and 287 downregulated, ≥2-fold, FDR p-value ≤0.05). Intriguingly, differential gene expression analysis revealed that of the 365 upregulated genes, 883 transcript variants were expressed, with only 174 (<20%) variants exhibiting upregulation based on transcript expression (TE) values. The remaining 709 (>80%) variants were either down-regulated or showed no significant change in expression analysis. Similarly, the 287 genes reported to be downregulated expressed 856 transcript variants, with only 153 (<20%) downregulated variants and 703 (>82%) variants that were upregulated or showed no significant changes. Additionally, the 682 TF genes that did not show significant changes between ES and TS cells (GE values < 2-fold changes and/or FDR p-values >0.05) expressed 2215 transcript variants, which included 477 (>21%) that were differentially expressed (276 upregulated and 201 downregulated, ≥2-fold, FDR p-value ≤0.05). Notably, a particular gene does not express just one protein; rather its transcript variants encode multiple proteins with distinct functional domains, including non-coding regulatory RNAs. Our findings underscore the critical necessity of considering transcript variants in RNA-Seq analyses. Doing so may enable a more precise understanding of the intricate functional and regulatory landscape of genes; ignoring the variants may result in an erroneous interpretation. Graphic Abstract Differential expression of transcription factors (TFs) between mouse embryonic stem (ES) cells and trophoblast stem (TS) cells. This graphic presentation clearly demonstrates the importance of including transcript variants during RNA sequencing (RNA-Seq) analyses. Panel A represents the conventional differential gene expression analysis approach after RNA-Seq, where all transcript reads are taken under a single gene name. Panel B takes differential gene expression analysis one step further by examining all the transcript variants that were previously hidden under the main gene name. Our results indicate that exclusive gene expression (GE) analysis inaccurately defines over 80% of the transcript expression (TE). Without analyses of all the transcript variants’ reads, we fail to uncover the functional importance of the variants and the regulation of their expression. Both GE and TE values are expressed as transcript per million (TPM). Data analyses were performed by using CLC Genomics Workbench.","PeriodicalId":9124,"journal":{"name":"bioRxiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Importance of transcript variants in transcriptome analyses\",\"authors\":\"Kevin Vo, Ryan Mohamadi, Yashica Sharma, Amelia Mohamadi, Patrick E Fields, M. A. K. Rumi\",\"doi\":\"10.1101/2024.07.11.603122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RNA sequencing (RNA-Seq) has become a widely adopted genome-wide technique for investigating gene expression patterns. However, conventional RNA-Seq analyses typically rely on gene expression (GE) values that aggregate all the transcripts produced by a gene under a single identifier, overlooking the complexity of transcript variants arising from different transcription start sites and alternative splicing events. In this study, we explored the implications of neglecting transcript variants in RNA-Seq analyses. Among the 1334 transcription factor (TF) genes expressed in mouse embryonic stem (ES) or trophoblast stem (TS) cells, 652 were reported to be differentially expressed in TS cells based on GE values (365 upregulated and 287 downregulated, ≥2-fold, FDR p-value ≤0.05). Intriguingly, differential gene expression analysis revealed that of the 365 upregulated genes, 883 transcript variants were expressed, with only 174 (<20%) variants exhibiting upregulation based on transcript expression (TE) values. The remaining 709 (>80%) variants were either down-regulated or showed no significant change in expression analysis. Similarly, the 287 genes reported to be downregulated expressed 856 transcript variants, with only 153 (<20%) downregulated variants and 703 (>82%) variants that were upregulated or showed no significant changes. Additionally, the 682 TF genes that did not show significant changes between ES and TS cells (GE values < 2-fold changes and/or FDR p-values >0.05) expressed 2215 transcript variants, which included 477 (>21%) that were differentially expressed (276 upregulated and 201 downregulated, ≥2-fold, FDR p-value ≤0.05). Notably, a particular gene does not express just one protein; rather its transcript variants encode multiple proteins with distinct functional domains, including non-coding regulatory RNAs. Our findings underscore the critical necessity of considering transcript variants in RNA-Seq analyses. Doing so may enable a more precise understanding of the intricate functional and regulatory landscape of genes; ignoring the variants may result in an erroneous interpretation. Graphic Abstract Differential expression of transcription factors (TFs) between mouse embryonic stem (ES) cells and trophoblast stem (TS) cells. This graphic presentation clearly demonstrates the importance of including transcript variants during RNA sequencing (RNA-Seq) analyses. Panel A represents the conventional differential gene expression analysis approach after RNA-Seq, where all transcript reads are taken under a single gene name. Panel B takes differential gene expression analysis one step further by examining all the transcript variants that were previously hidden under the main gene name. Our results indicate that exclusive gene expression (GE) analysis inaccurately defines over 80% of the transcript expression (TE). Without analyses of all the transcript variants’ reads, we fail to uncover the functional importance of the variants and the regulation of their expression. Both GE and TE values are expressed as transcript per million (TPM). Data analyses were performed by using CLC Genomics Workbench.\",\"PeriodicalId\":9124,\"journal\":{\"name\":\"bioRxiv\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.07.11.603122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.11.603122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Importance of transcript variants in transcriptome analyses
RNA sequencing (RNA-Seq) has become a widely adopted genome-wide technique for investigating gene expression patterns. However, conventional RNA-Seq analyses typically rely on gene expression (GE) values that aggregate all the transcripts produced by a gene under a single identifier, overlooking the complexity of transcript variants arising from different transcription start sites and alternative splicing events. In this study, we explored the implications of neglecting transcript variants in RNA-Seq analyses. Among the 1334 transcription factor (TF) genes expressed in mouse embryonic stem (ES) or trophoblast stem (TS) cells, 652 were reported to be differentially expressed in TS cells based on GE values (365 upregulated and 287 downregulated, ≥2-fold, FDR p-value ≤0.05). Intriguingly, differential gene expression analysis revealed that of the 365 upregulated genes, 883 transcript variants were expressed, with only 174 (<20%) variants exhibiting upregulation based on transcript expression (TE) values. The remaining 709 (>80%) variants were either down-regulated or showed no significant change in expression analysis. Similarly, the 287 genes reported to be downregulated expressed 856 transcript variants, with only 153 (<20%) downregulated variants and 703 (>82%) variants that were upregulated or showed no significant changes. Additionally, the 682 TF genes that did not show significant changes between ES and TS cells (GE values < 2-fold changes and/or FDR p-values >0.05) expressed 2215 transcript variants, which included 477 (>21%) that were differentially expressed (276 upregulated and 201 downregulated, ≥2-fold, FDR p-value ≤0.05). Notably, a particular gene does not express just one protein; rather its transcript variants encode multiple proteins with distinct functional domains, including non-coding regulatory RNAs. Our findings underscore the critical necessity of considering transcript variants in RNA-Seq analyses. Doing so may enable a more precise understanding of the intricate functional and regulatory landscape of genes; ignoring the variants may result in an erroneous interpretation. Graphic Abstract Differential expression of transcription factors (TFs) between mouse embryonic stem (ES) cells and trophoblast stem (TS) cells. This graphic presentation clearly demonstrates the importance of including transcript variants during RNA sequencing (RNA-Seq) analyses. Panel A represents the conventional differential gene expression analysis approach after RNA-Seq, where all transcript reads are taken under a single gene name. Panel B takes differential gene expression analysis one step further by examining all the transcript variants that were previously hidden under the main gene name. Our results indicate that exclusive gene expression (GE) analysis inaccurately defines over 80% of the transcript expression (TE). Without analyses of all the transcript variants’ reads, we fail to uncover the functional importance of the variants and the regulation of their expression. Both GE and TE values are expressed as transcript per million (TPM). Data analyses were performed by using CLC Genomics Workbench.