Henk J. van Lingen, Maria Suarez-Diez, Edoardo Saccenti
{"title":"基因数量的归一化会影响基于主成分的 RNA 序列数据探索性分析。","authors":"Henk J. van Lingen, Maria Suarez-Diez, Edoardo Saccenti","doi":"10.1016/j.bbagrm.2024.195058","DOIUrl":null,"url":null,"abstract":"<div><p>Normalization of gene expression count data is an essential step of in the analysis of RNA-sequencing data. Its statistical analysis has been mostly addressed in the context of differential expression analysis, that is in the univariate setting. However, relationships among genes and samples are better explored and quantified using multivariate exploratory data analysis tools like Principal Component Analysis (PCA). In this study we investigate how normalization impacts the PCA model and its interpretation, considering twelve different widely used normalization methods that were applied on simulated and experimental data. Correlation patterns in the normalized data were explored using both summary statistics and Covariance Simultaneous Component Analysis. The impact of normalization on the PCA solution was assessed by exploring the model complexity, the quality of sample clustering in the low-dimensional PCA space and gene ranking in the model fit to normalized data. PCA models upon normalization were interpreted in the context gene enrichment pathway analysis. We found that although PCA score plots are often similar independently form the normalization used, biological interpretation of the models can depend heavily on the normalization method applied.</p></div>","PeriodicalId":55382,"journal":{"name":"Biochimica et Biophysica Acta-Gene Regulatory Mechanisms","volume":"1867 4","pages":"Article 195058"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1874939924000543/pdfft?md5=6c80095a6aa1d7d87e5ef95c4f8180b2&pid=1-s2.0-S1874939924000543-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Normalization of gene counts affects principal components-based exploratory analysis of RNA-sequencing data\",\"authors\":\"Henk J. van Lingen, Maria Suarez-Diez, Edoardo Saccenti\",\"doi\":\"10.1016/j.bbagrm.2024.195058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Normalization of gene expression count data is an essential step of in the analysis of RNA-sequencing data. Its statistical analysis has been mostly addressed in the context of differential expression analysis, that is in the univariate setting. However, relationships among genes and samples are better explored and quantified using multivariate exploratory data analysis tools like Principal Component Analysis (PCA). In this study we investigate how normalization impacts the PCA model and its interpretation, considering twelve different widely used normalization methods that were applied on simulated and experimental data. Correlation patterns in the normalized data were explored using both summary statistics and Covariance Simultaneous Component Analysis. The impact of normalization on the PCA solution was assessed by exploring the model complexity, the quality of sample clustering in the low-dimensional PCA space and gene ranking in the model fit to normalized data. PCA models upon normalization were interpreted in the context gene enrichment pathway analysis. We found that although PCA score plots are often similar independently form the normalization used, biological interpretation of the models can depend heavily on the normalization method applied.</p></div>\",\"PeriodicalId\":55382,\"journal\":{\"name\":\"Biochimica et Biophysica Acta-Gene Regulatory Mechanisms\",\"volume\":\"1867 4\",\"pages\":\"Article 195058\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1874939924000543/pdfft?md5=6c80095a6aa1d7d87e5ef95c4f8180b2&pid=1-s2.0-S1874939924000543-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biochimica et Biophysica Acta-Gene Regulatory Mechanisms\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1874939924000543\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochimica et Biophysica Acta-Gene Regulatory Mechanisms","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874939924000543","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Normalization of gene counts affects principal components-based exploratory analysis of RNA-sequencing data
Normalization of gene expression count data is an essential step of in the analysis of RNA-sequencing data. Its statistical analysis has been mostly addressed in the context of differential expression analysis, that is in the univariate setting. However, relationships among genes and samples are better explored and quantified using multivariate exploratory data analysis tools like Principal Component Analysis (PCA). In this study we investigate how normalization impacts the PCA model and its interpretation, considering twelve different widely used normalization methods that were applied on simulated and experimental data. Correlation patterns in the normalized data were explored using both summary statistics and Covariance Simultaneous Component Analysis. The impact of normalization on the PCA solution was assessed by exploring the model complexity, the quality of sample clustering in the low-dimensional PCA space and gene ranking in the model fit to normalized data. PCA models upon normalization were interpreted in the context gene enrichment pathway analysis. We found that although PCA score plots are often similar independently form the normalization used, biological interpretation of the models can depend heavily on the normalization method applied.
期刊介绍:
BBA Gene Regulatory Mechanisms includes reports that describe novel insights into mechanisms of transcriptional, post-transcriptional and translational gene regulation. Special emphasis is placed on papers that identify epigenetic mechanisms of gene regulation, including chromatin, modification, and remodeling. This section also encompasses mechanistic studies of regulatory proteins and protein complexes; regulatory or mechanistic aspects of RNA processing; regulation of expression by small RNAs; genomic analysis of gene expression patterns; and modeling of gene regulatory pathways. Papers describing gene promoters, enhancers, silencers or other regulatory DNA regions must incorporate significant functions studies.