A comparative analysis of the principal component method and parallel analysis in working with official statistical data

Q4 Mathematics Statistics in Transition Pub Date : 2023-02-24 DOI:10.59170/stattrans-2023-011
Halyna Holubova
{"title":"A comparative analysis of the principal component method and parallel analysis in\n working with official statistical data","authors":"Halyna Holubova","doi":"10.59170/stattrans-2023-011","DOIUrl":null,"url":null,"abstract":"The dynamic development of the digitized society generates large-scale information\n data flows. Therefore, data need to be compressed in a way allowing its content to\n remain complete and informative. In order for the above to be achieved, it is advisable\n to use the principal component method whose main task is to reduce the dimension of\n multidimensional space with a minimal loss of information. The article describes the\n basic conceptual approaches to the definition of principle components. Moreover, the\n methodological principles of selecting the main components are presented. Among the many\n ways to select principle components, the easiest way is selecting the first k-number of\n components with the largest eigenvalues or to determine the percentage of the total\n variance explained by each component. Many statistical data packages often use the\n Kaiser method for this purpose. However, this method fails to take into account the fact\n that when dealing with random data (noise), it is possible to identify components with\n eigenvalues greater than one, or in other words, to select redundant components. We\n conclude that when selecting the main components, the classical mechanisms should be\n used with caution. The Parallel analysis method uses multiple data simulations to\n overcome the problem of random errors. This method assumes that the components of real\n data must have greater eigenvalues than the parallel components derived from simulated\n data which have the same sample size and design, variance and number of variables. A\n comparative analysis of the eigenvalues was performed by means of two methods: the\n Kaiser criterion and the parallel Horn analysis on the example of several data sets. The\n study shows that the method of parallel analysis produces more valid results with actual\n data sets. We believe that the main advantage of Parallel analysis is its ability to\n model the process of selecting the required number of main components by determining the\n point at which they cannot be distinguished from those generated by simulated\n noise.","PeriodicalId":37985,"journal":{"name":"Statistics in Transition","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Transition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59170/stattrans-2023-011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

The dynamic development of the digitized society generates large-scale information data flows. Therefore, data need to be compressed in a way allowing its content to remain complete and informative. In order for the above to be achieved, it is advisable to use the principal component method whose main task is to reduce the dimension of multidimensional space with a minimal loss of information. The article describes the basic conceptual approaches to the definition of principle components. Moreover, the methodological principles of selecting the main components are presented. Among the many ways to select principle components, the easiest way is selecting the first k-number of components with the largest eigenvalues or to determine the percentage of the total variance explained by each component. Many statistical data packages often use the Kaiser method for this purpose. However, this method fails to take into account the fact that when dealing with random data (noise), it is possible to identify components with eigenvalues greater than one, or in other words, to select redundant components. We conclude that when selecting the main components, the classical mechanisms should be used with caution. The Parallel analysis method uses multiple data simulations to overcome the problem of random errors. This method assumes that the components of real data must have greater eigenvalues than the parallel components derived from simulated data which have the same sample size and design, variance and number of variables. A comparative analysis of the eigenvalues was performed by means of two methods: the Kaiser criterion and the parallel Horn analysis on the example of several data sets. The study shows that the method of parallel analysis produces more valid results with actual data sets. We believe that the main advantage of Parallel analysis is its ability to model the process of selecting the required number of main components by determining the point at which they cannot be distinguished from those generated by simulated noise.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
官方统计数据处理中主成分法和平行分析的比较分析
数字化社会的动态发展产生了大规模的信息数据流。因此,数据需要以一种允许其内容保持完整和信息性的方式进行压缩。为了实现上述目标,建议使用主成分方法,其主要任务是在信息损失最小的情况下降低多维空间的维数。这篇文章描述了定义主要组成部分的基本概念方法。此外,还介绍了选择主要组成部分的方法学原则。在选择主成分的许多方法中,最简单的方法是选择具有最大特征值的第一个k个成分,或者确定每个成分所解释的总方差的百分比。为此,许多统计数据包经常使用Kaiser方法。然而,这种方法没有考虑到这样一个事实,即在处理随机数据(噪声)时,可以识别特征值大于1的分量,或者换句话说,可以选择冗余分量。我们得出的结论是,在选择主要组件时,应谨慎使用经典机制。并行分析方法使用多个数据模拟来克服随机误差的问题。该方法假设真实数据的分量必须比从具有相同样本量和设计、方差和变量数量的模拟数据中导出的并行分量具有更大的特征值。通过两种方法对特征值进行了比较分析:Kaiser准则和对几个数据集的并行Horn分析。研究表明,并行分析方法与实际数据集相比能产生更有效的结果。我们认为,并行分析的主要优点是它能够通过确定无法将所需数量的主要成分与模拟噪声产生的成分区分开来的点来对选择所需数量主要成分的过程进行建模。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistics in Transition
Statistics in Transition Decision Sciences-Statistics, Probability and Uncertainty
CiteScore
1.00
自引率
0.00%
发文量
0
审稿时长
9 weeks
期刊介绍: Statistics in Transition (SiT) is an international journal published jointly by the Polish Statistical Association (PTS) and the Central Statistical Office of Poland (CSO/GUS), which sponsors this publication. Launched in 1993, it was issued twice a year until 2006; since then it appears - under a slightly changed title, Statistics in Transition new series - three times a year; and after 2013 as a regular quarterly journal." The journal provides a forum for exchange of ideas and experience amongst members of international community of statisticians, data producers and users, including researchers, teachers, policy makers and the general public. Its initially dominating focus on statistical issues pertinent to transition from centrally planned to a market-oriented economy has gradually been extended to embracing statistical problems related to development and modernization of the system of public (official) statistics, in general.
期刊最新文献
Estimating the probability of leaving unemployment for older people in Poland using survival models with censored data Does economic freedom promote financial development? Evidence from EU countries Rotation schemes and Chebyshev polynomials A nonparametric analysis of discrete time competing risks data: a comparison of the cause-specific-hazards approach and the vertical approach Comments on „Probability vs. Nonprobability Sampling: From the Birth of Survey Sampling to the Present Day” by Graham Kalton
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1