Pub Date : 2025-02-13DOI: 10.1177/00131644241313447
Karl Schweizer, Tengfei Wang, Xuezhu Ren
Coefficient Omega measuring internal consistency is investigated for its deviations from expected outcomes when applied to correlational patterns that produce variable-focused factor solutions in confirmatory factor analysis. In these solutions, the factor loadings on the factor of the one-factor measurement model closely correspond to the correlations of one manifest variable with the other manifest variables, as is in centroid solutions. It is demonstrated that in such a situation, a heterogeneous correlational pattern leads to an Omega estimate larger than those for similarly heterogeneous and uniform patterns. A simulation study reveals that these deviations are restricted to datasets including small numbers of manifest variables and that the degree of heterogeneity determines the degree of deviation. We propose a method for identifying variable-focused factor solutions and how to deal with deviations.
{"title":"Overestimation of Internal Consistency by Coefficient Omega in Data Giving Rise to a Centroid-Like Factor Solution.","authors":"Karl Schweizer, Tengfei Wang, Xuezhu Ren","doi":"10.1177/00131644241313447","DOIUrl":"10.1177/00131644241313447","url":null,"abstract":"<p><p>Coefficient Omega measuring internal consistency is investigated for its deviations from expected outcomes when applied to correlational patterns that produce variable-focused factor solutions in confirmatory factor analysis. In these solutions, the factor loadings on the factor of the one-factor measurement model closely correspond to the correlations of one manifest variable with the other manifest variables, as is in centroid solutions. It is demonstrated that in such a situation, a heterogeneous correlational pattern leads to an Omega estimate larger than those for similarly heterogeneous and uniform patterns. A simulation study reveals that these deviations are restricted to datasets including small numbers of manifest variables and that the degree of heterogeneity determines the degree of deviation. We propose a method for identifying variable-focused factor solutions and how to deal with deviations.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241313447"},"PeriodicalIF":2.1,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11826816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143432505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2024-08-20DOI: 10.1177/00131644241268073
Yan Xia, Xinchang Zhou
Parallel analysis has been considered one of the most accurate methods for determining the number of factors in factor analysis. One major advantage of parallel analysis over traditional factor retention methods (e.g., Kaiser's rule) is that it addresses the sampling variability of eigenvalues obtained from the identity matrix, representing the correlation matrix for a zero-factor model. This study argues that we should also address the sampling variability of eigenvalues obtained from the observed data, such that the results would inform practitioners of the variability of the number of factors across random samples. Thus, this study proposes to revise the parallel analysis to provide the proportion of random samples that suggest k factors (k = 0, 1, 2, . . .) rather than a single suggested number. Simulation results support the use of the proposed strategy, especially for research scenarios with limited sample sizes where sampling fluctuation is concerning.
平行分析法被认为是确定因子分析中因子个数的最准确方法之一。与传统的因子保留方法(如凯撒法则)相比,平行分析法的一大优势在于它能解决从特征矩阵(代表零因子模型的相关矩阵)中获得的特征值的抽样变异性问题。本研究认为,我们还应该解决从观测数据中获得的特征值的抽样变异性问题,从而使研究结果能够告知从业人员不同随机样本中因子数量的变异性。因此,本研究建议修改并行分析,以提供建议 k 个因子(k = 0、1、2、...)的随机样本比例,而不是单一的建议因子数。模拟结果支持使用所建议的策略,尤其是在样本量有限、抽样波动令人担忧的研究场景中。
{"title":"Improving the Use of Parallel Analysis by Accounting for Sampling Variability of the Observed Correlation Matrix.","authors":"Yan Xia, Xinchang Zhou","doi":"10.1177/00131644241268073","DOIUrl":"10.1177/00131644241268073","url":null,"abstract":"<p><p>Parallel analysis has been considered one of the most accurate methods for determining the number of factors in factor analysis. One major advantage of parallel analysis over traditional factor retention methods (e.g., Kaiser's rule) is that it addresses the sampling variability of eigenvalues obtained from the identity matrix, representing the correlation matrix for a zero-factor model. This study argues that we should also address the sampling variability of eigenvalues obtained from the observed data, such that the results would inform practitioners of the variability of the number of factors across random samples. Thus, this study proposes to revise the parallel analysis to provide the proportion of random samples that suggest <i>k</i> factors (<i>k</i> = 0, 1, 2, . . .) rather than a single suggested number. Simulation results support the use of the proposed strategy, especially for research scenarios with limited sample sizes where sampling fluctuation is concerning.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"114-133"},"PeriodicalIF":2.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11572087/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142675458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-31eCollection Date: 2025-08-01DOI: 10.1177/00131644241311877
John Mart V DelosReyes, Miguel A Padilla
A new alternative to obtain a Bayesian estimate of coefficient alpha through a posterior normal distribution is proposed and assessed through percentile, normal-theory-based, and highest probability density credible intervals in a simulation study. The results indicate that the proposed Bayesian method to estimate coefficient alpha has acceptable coverage probability performance across the majority of investigated simulation conditions.
{"title":"Obtaining a Bayesian Estimate of Coefficient Alpha Using a Posterior Normal Distribution.","authors":"John Mart V DelosReyes, Miguel A Padilla","doi":"10.1177/00131644241311877","DOIUrl":"10.1177/00131644241311877","url":null,"abstract":"<p><p>A new alternative to obtain a Bayesian estimate of coefficient alpha through a posterior normal distribution is proposed and assessed through percentile, normal-theory-based, and highest probability density credible intervals in a simulation study. The results indicate that the proposed Bayesian method to estimate coefficient alpha has acceptable coverage probability performance across the majority of investigated simulation conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"829-852"},"PeriodicalIF":2.3,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11786261/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143079164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-30eCollection Date: 2025-10-01DOI: 10.1177/00131644241313212
Anne Traynor, Cheng-Hsien Li, Shuqi Zhou
Inferences about student learning from large-scale achievement test scores are fundamental in education. For achievement test scores to provide useful information about student learning progress, differences in the content of instruction (i.e., the implemented curriculum) should affect test-takers' item responses. Existing research has begun to identify patterns in the content of instructionally sensitive multiple-choice achievement test items. To inform future test design decisions, this study identified instructionally (in)sensitive constructed-response achievement items, then characterized features of those items and their corresponding scoring rubrics. First, we used simulation to evaluate an item step difficulty difference index for constructed-response test items, derived from the generalized partial credit model. The statistical performance of the index was adequate, so we then applied it to data from 32 constructed-response eighth-grade science test items. We found that the instructional sensitivity (IS) index values varied appreciably across the category boundaries within an item as well as across items. Content analysis by master science teachers allowed us to identify general features of item score categories that show high, or negligible, IS.
{"title":"Examining the Instructional Sensitivity of Constructed-Response Achievement Test Item Scores.","authors":"Anne Traynor, Cheng-Hsien Li, Shuqi Zhou","doi":"10.1177/00131644241313212","DOIUrl":"10.1177/00131644241313212","url":null,"abstract":"<p><p>Inferences about student learning from large-scale achievement test scores are fundamental in education. For achievement test scores to provide useful information about student learning progress, differences in the content of instruction (i.e., the implemented curriculum) should affect test-takers' item responses. Existing research has begun to identify patterns in the content of instructionally sensitive multiple-choice achievement test items. To inform future test design decisions, this study identified instructionally (in)sensitive constructed-response achievement items, then characterized features of those items and their corresponding scoring rubrics. First, we used simulation to evaluate an item step difficulty difference index for constructed-response test items, derived from the generalized partial credit model. The statistical performance of the index was adequate, so we then applied it to data from 32 constructed-response eighth-grade science test items. We found that the instructional sensitivity (IS) index values varied appreciably across the category boundaries within an item as well as across items. Content analysis by master science teachers allowed us to identify general features of item score categories that show high, or negligible, IS.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"1000-1031"},"PeriodicalIF":2.3,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783420/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143079163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-29DOI: 10.1177/00131644241311851
Christie M Fuller, Marcia J Simmering, Brian Waterwall, Elizabeth Ragland, Douglas P Twitchell, Alison Wall
Social and behavioral science researchers who use survey data are vigilant about data quality, with an increasing emphasis on avoiding common method variance (CMV) and insufficient effort responding (IER). Each of these errors can inflate and deflate substantive relationships, and there are both a priori and post hoc means to address them. Yet, little research has investigated how both IER and CMV are affected with the use of these different procedural or statistical techniques used to address them. More specifically, if interventions to reduce IER are used, does this affect CMV in data? In an experiment conducted both in and out of the laboratory, we investigate the impact of attentiveness interventions, such as a Factual Manipulation Check (FMC) on both IER and CMV in same-source survey data. In addition to typical IER measures, we also track whether respondents play the instructional video and their mouse movement. The results show that while interventions have some impact on the level of participant attentiveness, these interventions do not appear to lead to differing levels of CMV.
{"title":"The Impact of Attentiveness Interventions on Survey Data.","authors":"Christie M Fuller, Marcia J Simmering, Brian Waterwall, Elizabeth Ragland, Douglas P Twitchell, Alison Wall","doi":"10.1177/00131644241311851","DOIUrl":"10.1177/00131644241311851","url":null,"abstract":"<p><p>Social and behavioral science researchers who use survey data are vigilant about data quality, with an increasing emphasis on avoiding common method variance (CMV) and insufficient effort responding (IER). Each of these errors can inflate and deflate substantive relationships, and there are both a priori and post hoc means to address them. Yet, little research has investigated how both IER and CMV are affected with the use of these different procedural or statistical techniques used to address them. More specifically, if interventions to reduce IER are used, does this affect CMV in data? In an experiment conducted both in and out of the laboratory, we investigate the impact of attentiveness interventions, such as a Factual Manipulation Check (FMC) on both IER and CMV in same-source survey data. In addition to typical IER measures, we also track whether respondents play the instructional video and their mouse movement. The results show that while interventions have some impact on the level of participant attentiveness, these interventions do not appear to lead to differing levels of CMV.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241311851"},"PeriodicalIF":2.1,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11775934/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23eCollection Date: 2025-08-01DOI: 10.1177/00131644241307560
Timo Seitz, Maik Spengler, Thorsten Meiser
Self-report personality tests used in high-stakes assessments hold the risk that test-takers engage in faking. In this article, we demonstrate an extension of the multidimensional nominal response model (MNRM) to account for the response bias of faking. The MNRM is a flexible item response theory (IRT) model that allows modeling response biases whose effect patterns vary between items. In a simulation, we found good parameter recovery of the model accounting for faking under different conditions as well as good performance of model selection criteria. Also, we modeled responses from N = 3,046 job applicants taking a personality test under real high-stakes conditions. We thereby specified item-specific effect patterns of faking by setting scoring weights to appropriate values that we collected in a pilot study. Results indicated that modeling faking significantly increased model fit over and above response styles and improved divergent validity, while the faking dimension exhibited relations to several covariates. Additionally, applying the model to a sample of job incumbents taking the test under low-stakes conditions, we found evidence that the model can effectively capture faking and adjust estimates of substantive trait scores for the assumed influence of faking. We end the article with a discussion of implications for psychological measurement in high-stakes assessment contexts.
{"title":"\"What If Applicants Fake Their Responses?\": Modeling Faking and Response Styles in High-Stakes Assessments Using the Multidimensional Nominal Response Model.","authors":"Timo Seitz, Maik Spengler, Thorsten Meiser","doi":"10.1177/00131644241307560","DOIUrl":"10.1177/00131644241307560","url":null,"abstract":"<p><p>Self-report personality tests used in high-stakes assessments hold the risk that test-takers engage in faking. In this article, we demonstrate an extension of the multidimensional nominal response model (MNRM) to account for the response bias of faking. The MNRM is a flexible item response theory (IRT) model that allows modeling response biases whose effect patterns vary between items. In a simulation, we found good parameter recovery of the model accounting for faking under different conditions as well as good performance of model selection criteria. Also, we modeled responses from <i>N</i> = 3,046 job applicants taking a personality test under real high-stakes conditions. We thereby specified item-specific effect patterns of faking by setting scoring weights to appropriate values that we collected in a pilot study. Results indicated that modeling faking significantly increased model fit over and above response styles and improved divergent validity, while the faking dimension exhibited relations to several covariates. Additionally, applying the model to a sample of job incumbents taking the test under low-stakes conditions, we found evidence that the model can effectively capture faking and adjust estimates of substantive trait scores for the assumed influence of faking. We end the article with a discussion of implications for psychological measurement in high-stakes assessment contexts.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"747-782"},"PeriodicalIF":2.3,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-22eCollection Date: 2025-08-01DOI: 10.1177/00131644241308528
Pier-Olivier Caron
A plethora of techniques exist to determine the number of factors to retain in exploratory factor analysis. A recent and promising technique is the Next Eigenvalue Sufficiency Test (NEST), but has not been systematically compared with well-established stopping rules. The present study proposes a simulation with synthetic factor structures to compare NEST, parallel analysis, sequential test, Hull method, and the empirical Kaiser criterion. The structures were based on 24 variables containing one to eight factors, loadings ranged from .40 to .80, inter-factor correlations ranged from .00 to .30, and three sample sizes were used. In total, 360 scenarios were replicated 1,000 times. Performance was evaluated in terms of accuracy (correct identification of dimensionality) and bias (tendency to over- or underestimate dimensionality). Overall, NEST showed the best overall performances, especially in hard conditions where it had to detect small but meaningful factors. It had a tendency to underextract, but to a lesser extent than other methods. The second best method was parallel analysis by being more liberal in harder cases. The three other stopping rules had pitfalls: sequential test and Hull method even in some easy conditions; the empirical Kaiser criterion in hard conditions.
{"title":"A Comparison of the Next Eigenvalue Sufficiency Test to Other Stopping Rules for the Number of Factors in Factor Analysis.","authors":"Pier-Olivier Caron","doi":"10.1177/00131644241308528","DOIUrl":"10.1177/00131644241308528","url":null,"abstract":"<p><p>A plethora of techniques exist to determine the number of factors to retain in exploratory factor analysis. A recent and promising technique is the Next Eigenvalue Sufficiency Test (NEST), but has not been systematically compared with well-established stopping rules. The present study proposes a simulation with synthetic factor structures to compare NEST, parallel analysis, sequential <math> <mrow> <msup><mrow><mi>χ</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> test, Hull method, and the empirical Kaiser criterion. The structures were based on 24 variables containing one to eight factors, loadings ranged from .40 to .80, inter-factor correlations ranged from .00 to .30, and three sample sizes were used. In total, 360 scenarios were replicated 1,000 times. Performance was evaluated in terms of accuracy (correct identification of dimensionality) and bias (tendency to over- or underestimate dimensionality). Overall, NEST showed the best overall performances, especially in hard conditions where it had to detect small but meaningful factors. It had a tendency to underextract, but to a lesser extent than other methods. The second best method was parallel analysis by being more liberal in harder cases. The three other stopping rules had pitfalls: sequential <math> <mrow> <msup><mrow><mi>χ</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> test and Hull method even in some easy conditions; the empirical Kaiser criterion in hard conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"814-828"},"PeriodicalIF":2.3,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-14DOI: 10.1177/00131644241302284
Tenko Raykov, Christine DiStefano, Yusuf Ransome
An index extending the widely used omega-hierarchical coefficient is discussed, which can be used for evaluating the influence of a second-order factor on the interrelationships among the components of a hierarchical measuring instrument. The index represents a useful and informative complement to the traditional omega-hierarchical measure of explained overall scale score variance by that underlying construct. A point and interval estimation procedure is outlined for the described index, which is based on model reparameterization and is developed within the latent variable modeling framework. The method is readily applicable with popular software and is illustrated with examples.
{"title":"An Omega-Hierarchical Extension Index for Second-Order Constructs With Hierarchical Measuring Instruments.","authors":"Tenko Raykov, Christine DiStefano, Yusuf Ransome","doi":"10.1177/00131644241302284","DOIUrl":"10.1177/00131644241302284","url":null,"abstract":"<p><p>An index extending the widely used omega-hierarchical coefficient is discussed, which can be used for evaluating the influence of a second-order factor on the interrelationships among the components of a hierarchical measuring instrument. The index represents a useful and informative complement to the traditional omega-hierarchical measure of explained overall scale score variance by that underlying construct. A point and interval estimation procedure is outlined for the described index, which is based on model reparameterization and is developed within the latent variable modeling framework. The method is readily applicable with popular software and is illustrated with examples.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241302284"},"PeriodicalIF":2.1,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143002309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-04eCollection Date: 2025-08-01DOI: 10.1177/00131644241306680
Changsheng Chen, Robbe D'hondt, Celine Vens, Wim Van den Noortgate
Multidimensional Item Response Theory (MIRT) is applied routinely in developing educational and psychological assessment tools, for instance, for exploring multidimensional structures of items using exploratory MIRT. A critical decision in exploratory MIRT analyses is the number of factors to retain. Unfortunately, the comparative properties of statistical methods and innovative Machine Learning (ML) methods for factor retention in exploratory MIRT analyses are still not clear. This study aims to fill this gap by comparing a selection of statistical and ML methods, including Kaiser Criterion (KC), Empirical Kaiser Criterion (EKC), Parallel Analysis (PA), scree plot (OC and AF), Very Simple Structure (VSS; C1 and C2), Minimum Average Partial (MAP), Exploratory Graph Analysis (EGA), Random Forest (RF), Histogram-based Gradient Boosted Decision Trees (HistGBDT), eXtreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). The comparison was performed using 720,000 dichotomous response data sets simulated by the MIRT, for various between-item and within-item structures and considering characteristics of large-scale assessments. The results show that MAP, RF, HistGBDT, XGBoost, and ANN tremendously outperform other methods. Among them, HistGBDT generally performs better than other methods. Furthermore, including statistical methods' results as training features improves ML methods' performance. The methods' correct-factoring proportions decrease with an increase in missingness or a decrease in sample size. KC, PA, EKC, and scree plot (OC) are over-factoring, while EGA, scree plot (AF), and VSS (C1) are under-factoring. We recommend that practitioners use both MAP and HistGBDT to determine the number of factors when applying exploratory MIRT.
{"title":"Factor Retention in Exploratory Multidimensional Item Response Theory.","authors":"Changsheng Chen, Robbe D'hondt, Celine Vens, Wim Van den Noortgate","doi":"10.1177/00131644241306680","DOIUrl":"10.1177/00131644241306680","url":null,"abstract":"<p><p>Multidimensional Item Response Theory (MIRT) is applied routinely in developing educational and psychological assessment tools, for instance, for exploring multidimensional structures of items using exploratory MIRT. A critical decision in exploratory MIRT analyses is the number of factors to retain. Unfortunately, the comparative properties of statistical methods and innovative Machine Learning (ML) methods for factor retention in exploratory MIRT analyses are still not clear. This study aims to fill this gap by comparing a selection of statistical and ML methods, including Kaiser Criterion (KC), Empirical Kaiser Criterion (EKC), Parallel Analysis (PA), scree plot (OC and AF), Very Simple Structure (VSS; C1 and C2), Minimum Average Partial (MAP), Exploratory Graph Analysis (EGA), Random Forest (RF), Histogram-based Gradient Boosted Decision Trees (HistGBDT), eXtreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). The comparison was performed using 720,000 dichotomous response data sets simulated by the MIRT, for various between-item and within-item structures and considering characteristics of large-scale assessments. The results show that MAP, RF, HistGBDT, XGBoost, and ANN tremendously outperform other methods. Among them, HistGBDT generally performs better than other methods. Furthermore, including statistical methods' results as training features improves ML methods' performance. The methods' correct-factoring proportions decrease with an increase in missingness or a decrease in sample size. KC, PA, EKC, and scree plot (OC) are over-factoring, while EGA, scree plot (AF), and VSS (C1) are under-factoring. We recommend that practitioners use both MAP and HistGBDT to determine the number of factors when applying exploratory MIRT.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"672-695"},"PeriodicalIF":2.3,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11699551/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142931009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-03eCollection Date: 2025-08-01DOI: 10.1177/00131644241302721
Duygu Koçak
This study examines the performance of ChatGPT, developed by OpenAI and widely used as an AI-based conversational tool, as a data analysis tool through exploratory factor analysis (EFA). To this end, simulated data were generated under various data conditions, including normal distribution, response category, sample size, test length, factor loading, and measurement models. The generated data were analyzed using ChatGPT-4o twice with a 1-week interval under the same prompt, and the results were compared with those obtained using R code. In data analysis, the Kaiser-Meyer-Olkin (KMO) value, total variance explained, and the number of factors estimated using the empirical Kaiser criterion, Hull method, and Kaiser-Guttman criterion, as well as factor loadings, were calculated. The findings obtained from ChatGPT at two different times were found to be consistent with those obtained using R. Overall, ChatGPT demonstrated good performance for steps that require only computational decisions without involving researcher judgment or theoretical evaluation (such as KMO, total variance explained, and factor loadings). However, for multidimensional structures, although the estimated number of factors was consistent across analyses, biases were observed, suggesting that researchers should exercise caution in such decisions.
{"title":"Examination of ChatGPT's Performance as a Data Analysis Tool.","authors":"Duygu Koçak","doi":"10.1177/00131644241302721","DOIUrl":"10.1177/00131644241302721","url":null,"abstract":"<p><p>This study examines the performance of ChatGPT, developed by OpenAI and widely used as an AI-based conversational tool, as a data analysis tool through exploratory factor analysis (EFA). To this end, simulated data were generated under various data conditions, including normal distribution, response category, sample size, test length, factor loading, and measurement models. The generated data were analyzed using ChatGPT-4o twice with a 1-week interval under the same prompt, and the results were compared with those obtained using R code. In data analysis, the Kaiser-Meyer-Olkin (KMO) value, total variance explained, and the number of factors estimated using the empirical Kaiser criterion, Hull method, and Kaiser-Guttman criterion, as well as factor loadings, were calculated. The findings obtained from ChatGPT at two different times were found to be consistent with those obtained using R. Overall, ChatGPT demonstrated good performance for steps that require only computational decisions without involving researcher judgment or theoretical evaluation (such as KMO, total variance explained, and factor loadings). However, for multidimensional structures, although the estimated number of factors was consistent across analyses, biases were observed, suggesting that researchers should exercise caution in such decisions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"641-671"},"PeriodicalIF":2.3,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696938/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142931005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}