{"title":"Principal Component Analysis: Standardisation","authors":"Richard G. Brereton","doi":"10.1002/cem.3607","DOIUrl":null,"url":null,"abstract":"<p>Standardisation of the columns of a matrix is a common transformation prior to PCA. It can be called by different names, including autoscaling and normalisation. The latter term is confusing terminology, as it is also used for a number of other transformations, so we advise against calling this normalisation.</p><p>As standardisation is about scaling and not statistical estimation, it is best to use the definition of the population standard deviation <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>s</mi>\n <mi>j</mi>\n </msub>\n <mo>=</mo>\n <msqrt>\n <mrow>\n <munderover>\n <mo>∑</mo>\n <mrow>\n <mi>i</mi>\n <mo>=</mo>\n <mn>1</mn>\n </mrow>\n <mi>I</mi>\n </munderover>\n <msup>\n <mfenced>\n <mrow>\n <msub>\n <mi>x</mi>\n <mi>ij</mi>\n </msub>\n <mo>−</mo>\n <msub>\n <mover>\n <mi>x</mi>\n <mo>¯</mo>\n </mover>\n <mi>j</mi>\n </msub>\n </mrow>\n </mfenced>\n <mn>2</mn>\n </msup>\n <mo>/</mo>\n <mi>I</mi>\n </mrow>\n </msqrt>\n </mrow>\n <annotation>$$ {s}_j=\\sqrt{\\sum \\limits_{i=1}^I{\\left({x}_{\\mathrm{ij}}-{\\overline{x}}_j\\right)}^2/I} $$</annotation>\n </semantics></math> rather than the sample standard deviation.</p><p>We can now standardise each matrix. To save room, we just calculate one numerical value so that readers that are interested can check they can reproduce the results from this article. The standardised value for Dataset 1 <i>x</i><sub>83</sub> = 0.566 (Sample H, variable <i>x</i><sub>3</sub>).</p><p>Hence, whether standardisation prior to PCA is a useful technique depends on the nature of the data and the problem in hand. In some cases, it can degrade patterns, whereas in other situations it can pull out important information.</p><p>Although standardisation can make a big difference to the appearance of PC plots, in other cases, it makes little or no difference.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3607","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3607","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
Abstract
Standardisation of the columns of a matrix is a common transformation prior to PCA. It can be called by different names, including autoscaling and normalisation. The latter term is confusing terminology, as it is also used for a number of other transformations, so we advise against calling this normalisation.
As standardisation is about scaling and not statistical estimation, it is best to use the definition of the population standard deviation rather than the sample standard deviation.
We can now standardise each matrix. To save room, we just calculate one numerical value so that readers that are interested can check they can reproduce the results from this article. The standardised value for Dataset 1 x83 = 0.566 (Sample H, variable x3).
Hence, whether standardisation prior to PCA is a useful technique depends on the nature of the data and the problem in hand. In some cases, it can degrade patterns, whereas in other situations it can pull out important information.
Although standardisation can make a big difference to the appearance of PC plots, in other cases, it makes little or no difference.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.