Normalization of gene counts affects principal components-based exploratory analysis of RNA-sequencing data

IF 2.6 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY Biochimica et Biophysica Acta-Gene Regulatory Mechanisms Pub Date : 2024-08-16 DOI:10.1016/j.bbagrm.2024.195058
Henk J. van Lingen, Maria Suarez-Diez, Edoardo Saccenti
{"title":"Normalization of gene counts affects principal components-based exploratory analysis of RNA-sequencing data","authors":"Henk J. van Lingen,&nbsp;Maria Suarez-Diez,&nbsp;Edoardo Saccenti","doi":"10.1016/j.bbagrm.2024.195058","DOIUrl":null,"url":null,"abstract":"<div><p>Normalization of gene expression count data is an essential step of in the analysis of RNA-sequencing data. Its statistical analysis has been mostly addressed in the context of differential expression analysis, that is in the univariate setting. However, relationships among genes and samples are better explored and quantified using multivariate exploratory data analysis tools like Principal Component Analysis (PCA). In this study we investigate how normalization impacts the PCA model and its interpretation, considering twelve different widely used normalization methods that were applied on simulated and experimental data. Correlation patterns in the normalized data were explored using both summary statistics and Covariance Simultaneous Component Analysis. The impact of normalization on the PCA solution was assessed by exploring the model complexity, the quality of sample clustering in the low-dimensional PCA space and gene ranking in the model fit to normalized data. PCA models upon normalization were interpreted in the context gene enrichment pathway analysis. We found that although PCA score plots are often similar independently form the normalization used, biological interpretation of the models can depend heavily on the normalization method applied.</p></div>","PeriodicalId":55382,"journal":{"name":"Biochimica et Biophysica Acta-Gene Regulatory Mechanisms","volume":"1867 4","pages":"Article 195058"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1874939924000543/pdfft?md5=6c80095a6aa1d7d87e5ef95c4f8180b2&pid=1-s2.0-S1874939924000543-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochimica et Biophysica Acta-Gene Regulatory Mechanisms","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874939924000543","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Normalization of gene expression count data is an essential step of in the analysis of RNA-sequencing data. Its statistical analysis has been mostly addressed in the context of differential expression analysis, that is in the univariate setting. However, relationships among genes and samples are better explored and quantified using multivariate exploratory data analysis tools like Principal Component Analysis (PCA). In this study we investigate how normalization impacts the PCA model and its interpretation, considering twelve different widely used normalization methods that were applied on simulated and experimental data. Correlation patterns in the normalized data were explored using both summary statistics and Covariance Simultaneous Component Analysis. The impact of normalization on the PCA solution was assessed by exploring the model complexity, the quality of sample clustering in the low-dimensional PCA space and gene ranking in the model fit to normalized data. PCA models upon normalization were interpreted in the context gene enrichment pathway analysis. We found that although PCA score plots are often similar independently form the normalization used, biological interpretation of the models can depend heavily on the normalization method applied.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基因数量的归一化会影响基于主成分的 RNA 序列数据探索性分析。
计数数据的归一化是分析 RNA 序列数据的一个重要步骤。其统计分析大多是在差异表达分析的背景下,即在单变量设置中进行的。然而,使用主成分分析(PCA)等多变量探索性数据分析工具可以更好地探索和量化基因与样本之间的关系。在本研究中,我们研究了归一化对 PCA 模型和解释的影响,考虑了 12 种广泛使用的归一化方法,并将其应用于模拟和实验数据。我们使用汇总统计和协方差同时分量分析探索了归一化数据中的相关模式。通过探索模型的复杂性、低维 PCA 空间中样本聚类的质量以及模型拟合归一化数据的基因排序,评估了归一化对 PCA 解决方案的影响。归一化后的 PCA 模型在基因富集通路分析中进行了解释。我们发现,虽然 PCA 分数图通常与所使用的归一化方法相似,但模型的生物学解释在很大程度上取决于所使用的归一化方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.20
自引率
2.10%
发文量
63
审稿时长
44 days
期刊介绍: BBA Gene Regulatory Mechanisms includes reports that describe novel insights into mechanisms of transcriptional, post-transcriptional and translational gene regulation. Special emphasis is placed on papers that identify epigenetic mechanisms of gene regulation, including chromatin, modification, and remodeling. This section also encompasses mechanistic studies of regulatory proteins and protein complexes; regulatory or mechanistic aspects of RNA processing; regulation of expression by small RNAs; genomic analysis of gene expression patterns; and modeling of gene regulatory pathways. Papers describing gene promoters, enhancers, silencers or other regulatory DNA regions must incorporate significant functions studies.
期刊最新文献
Transcriptional responses of three slc39a/zip members (zip4, zip5 and zip9) and their roles in Zn metabolism in grass carp (Ctenopharyngodon idella). Experimental approaches to investigate biophysical interactions between homeodomain transcription factors and DNA. Competing endogenous RNAs network and therapeutic implications: New horizons in disease research. Editorial Board Bioinformatic meta-analysis of transcriptomics of developing Drosophila muscles identifies temporal regulatory transcription factors including a Notch effector
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1