Pub Date : 2025-11-26DOI: 10.1177/01466216251401213
Domenic Groh
The Test-Retest Coefficient (TRC) is a central metric of reliability in Classical Test Theory and modern psychological assessments. Originally developed by early 20th-century psychometricians, it relies on the assumptions of fixed (i.e., perfectly stable) true scores and independent error scores. However, these assumptions are rarely, if ever, tested, despite the fact that their violation can introduce significant biases. This article explores the foundations of these assumptions and examines the performance of the TRC under varying conditions, including different sample sizes, true score stability, and error score dependence. Using simulated data, results show that decreasing true score stability biases TRC estimates, leading to underestimations of reliability. Additionally, error score dependence can inflate TRC values, making unreliable measures appear reliable. More fundamentally, when these assumptions are violated, the TRC becomes underidentified, meaning that multiple, substantively different data-generating processes can yield the same coefficient, thus undermining its interpretability. These findings call into question the TRC's suitability for applied settings, especially when traits fluctuate over time or measurement conditions are uncontrolled. Alternative approaches are briefly discussed.
{"title":"On the Unreliability of Test-Retest Reliability.","authors":"Domenic Groh","doi":"10.1177/01466216251401213","DOIUrl":"10.1177/01466216251401213","url":null,"abstract":"<p><p>The Test-Retest Coefficient (TRC) is a central metric of reliability in Classical Test Theory and modern psychological assessments. Originally developed by early 20th-century psychometricians, it relies on the assumptions of fixed (i.e., perfectly stable) true scores and independent error scores. However, these assumptions are rarely, if ever, tested, despite the fact that their violation can introduce significant biases. This article explores the foundations of these assumptions and examines the performance of the TRC under varying conditions, including different sample sizes, true score stability, and error score dependence. Using simulated data, results show that decreasing true score stability biases TRC estimates, leading to underestimations of reliability. Additionally, error score dependence can inflate TRC values, making unreliable measures appear reliable. More fundamentally, when these assumptions are violated, the TRC becomes underidentified, meaning that multiple, substantively different data-generating processes can yield the same coefficient, thus undermining its interpretability. These findings call into question the TRC's suitability for applied settings, especially when traits fluctuate over time or measurement conditions are uncontrolled. Alternative approaches are briefly discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251401213"},"PeriodicalIF":1.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1177/01466216251401206
Sooyong Lee, Suyoung Kim, Seung W Choi
Ensuring measurement invariance is crucial for fair psychological and educational assessments, particularly in detecting Differential Item Functioning (DIF). Moderated Non-linear Factor Analysis (MNLFA) provides a flexible framework for detecting DIF by modeling item parameters as functions of observed covariates. However, a significant challenge in MNLFA-based DIF detection is anchor item selection, as improperly chosen anchors can bias results. This study proposes a refined constrained-baseline anchor detection approach utilizing information criteria (IC) for model selection. The proposed three-step procedure sequentially identifies potential DIF items through the Bayesian Information Criterion (BIC) and Weighted Information Criterion (WIC), followed by DIF-free anchor items using the Akaike Information Criterion (AIC). The method's effectiveness in controlling Type I error rates while maintaining statistical power is evaluated through simulation studies and empirical data analysis. Comparisons with regularization approaches demonstrate the proposed method's accuracy and computational efficiency.
{"title":"Anchor Detection Strategy in Moderated Non-Linear Factor Analysis for Differential Item Functioning (DIF).","authors":"Sooyong Lee, Suyoung Kim, Seung W Choi","doi":"10.1177/01466216251401206","DOIUrl":"https://doi.org/10.1177/01466216251401206","url":null,"abstract":"<p><p>Ensuring measurement invariance is crucial for fair psychological and educational assessments, particularly in detecting Differential Item Functioning (DIF). Moderated Non-linear Factor Analysis (MNLFA) provides a flexible framework for detecting DIF by modeling item parameters as functions of observed covariates. However, a significant challenge in MNLFA-based DIF detection is anchor item selection, as improperly chosen anchors can bias results. This study proposes a refined constrained-baseline anchor detection approach utilizing information criteria (IC) for model selection. The proposed three-step procedure sequentially identifies potential DIF items through the Bayesian Information Criterion (BIC) and Weighted Information Criterion (WIC), followed by DIF-free anchor items using the Akaike Information Criterion (AIC). The method's effectiveness in controlling Type I error rates while maintaining statistical power is evaluated through simulation studies and empirical data analysis. Comparisons with regularization approaches demonstrate the proposed method's accuracy and computational efficiency.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251401206"},"PeriodicalIF":1.2,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12643905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145641338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-13DOI: 10.1177/01466216251379471
Martijn Schoenmakers, Maria Bolsinova, Jesper Tijmstra
Extreme and midpoint response styles have frequently been found to decrease the validity of Likert-type questionnaire results. Different approaches for modelling extreme and midpoint responding have been proposed in the literature, with some advocating for a unidimensional conceptualization of the response styles as opposite poles, and others modelling them as separate dimensions. How these response styles are modelled influences the estimation complexity, parameter estimates, and detection of and correction for response styles in IRT models. For these reasons, we examine if it is possible to empirically distinguish between extreme and midpoint responding as two separate dimensions versus two opposite sides of a single dimension. The various conceptualizations are modelled using the multidimensional nominal response model, with the AIC and BIC being used to distinguish between the competing models in a simulation study and an empirical example. Results indicate good performance of both information criteria given sufficient sample size, test length, and response style strength. The BIC outperformed the AIC in cases where no response styles were present, while the AIC outperformed the BIC in cases where multiple response style dimensions were present. Implications of the results for practice are discussed.
{"title":"Distinguishing Between Models for Extreme and Midpoint Response Styles as Opposite Poles of a Single Dimension versus Two Separate Dimensions: A Simulation Study.","authors":"Martijn Schoenmakers, Maria Bolsinova, Jesper Tijmstra","doi":"10.1177/01466216251379471","DOIUrl":"10.1177/01466216251379471","url":null,"abstract":"<p><p>Extreme and midpoint response styles have frequently been found to decrease the validity of Likert-type questionnaire results. Different approaches for modelling extreme and midpoint responding have been proposed in the literature, with some advocating for a unidimensional conceptualization of the response styles as opposite poles, and others modelling them as separate dimensions. How these response styles are modelled influences the estimation complexity, parameter estimates, and detection of and correction for response styles in IRT models. For these reasons, we examine if it is possible to empirically distinguish between extreme and midpoint responding as two separate dimensions versus two opposite sides of a single dimension. The various conceptualizations are modelled using the multidimensional nominal response model, with the AIC and BIC being used to distinguish between the competing models in a simulation study and an empirical example. Results indicate good performance of both information criteria given sufficient sample size, test length, and response style strength. The BIC outperformed the AIC in cases where no response styles were present, while the AIC outperformed the BIC in cases where multiple response style dimensions were present. Implications of the results for practice are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251379471"},"PeriodicalIF":1.2,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433433/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145070752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-10DOI: 10.1177/01466216251378771
Naidan Tu, Sean Joo, Philseok Lee, Stephen Stark
Multidimensional forced choice (MFC) formats have emerged as a promising alternative to traditional single statement Likert-type measures for assessing noncognitive traits while reducing response biases. As MFC formats become more widely used, there is a growing need for tools to support MFC analysis, which motivated the development of the fcirt package. The fcirt package estimates forced choice model parameters using Bayesian methods. It currently enables estimation of the Generalized Graded Unfolding Model (GGUM; Roberts et al., 2000)-based Multi-Unidimensional Pairwise Preference (MUPP) model using rstan, which implements the Hamiltonian Monte Carlo (HMC) sampling algorithm. fcirt also includes functions for computing item and test information functions to evaluate the quality of MFC assessments, as well as functions for Bayesian diagnostic plotting to assist with model evaluation and convergence assessment.
多维强迫选择(MFC)格式已经成为传统的单语句李克特测量方法的一个有希望的替代方案,用于评估非认知特征,同时减少反应偏差。随着MFC格式的广泛使用,越来越需要支持MFC分析的工具,这推动了第一个包的开发。第一个包使用贝叶斯方法估计强制选择模型参数。目前,它可以使用rstan估计基于广义梯度展开模型(GGUM; Roberts et al., 2000)的多维配对偏好(MUPP)模型,该模型实现了哈密顿蒙特卡罗(HMC)采样算法。fcirt还包括计算项目和测试信息函数的功能,以评估MFC评估的质量,以及贝叶斯诊断绘图的功能,以协助模型评估和收敛性评估。
{"title":"<i>fcirt</i>: An R Package for Forced Choice Models in Item Response Theory.","authors":"Naidan Tu, Sean Joo, Philseok Lee, Stephen Stark","doi":"10.1177/01466216251378771","DOIUrl":"10.1177/01466216251378771","url":null,"abstract":"<p><p>Multidimensional forced choice (MFC) formats have emerged as a promising alternative to traditional single statement Likert-type measures for assessing noncognitive traits while reducing response biases. As MFC formats become more widely used, there is a growing need for tools to support MFC analysis, which motivated the development of the <i>fcirt</i> package. The <i>fcirt</i> package estimates forced choice model parameters using Bayesian methods. It currently enables estimation of the Generalized Graded Unfolding Model (GGUM; Roberts et al., 2000)-based Multi-Unidimensional Pairwise Preference (MUPP) model using <i>rstan</i>, which implements the Hamiltonian Monte Carlo (HMC) sampling algorithm. <i>fcirt</i> also includes functions for computing item and test information functions to evaluate the quality of MFC assessments, as well as functions for Bayesian diagnostic plotting to assist with model evaluation and convergence assessment.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251378771"},"PeriodicalIF":1.2,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12423082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145065903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-02DOI: 10.1177/01466216251374826
Andrea Brancaccio, Ottavia M Epifania, Pasquale Anselmi, Debora de Chiusole
{"title":"Automatic Generation of Rule-Based Raven-Like Matrices in R: The matRiks Package.","authors":"Andrea Brancaccio, Ottavia M Epifania, Pasquale Anselmi, Debora de Chiusole","doi":"10.1177/01466216251374826","DOIUrl":"10.1177/01466216251374826","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251374826"},"PeriodicalIF":1.2,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145001708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-31DOI: 10.1177/01466216251371173
Kim Nimon, Julia Fulmore, Gregg Keiffer, Bryn Hammack-Brown
This article presents a Shiny application CALMs for comprehensively comparing groups via latent means, which includes the examination of group equivalency, propensity score analysis, measurement invariance analysis, and assessment of latent mean differences of equivalent groups with invariant data. Despite the importance of these techniques, their application can be complex and time-consuming, particularly for researchers not experienced in advanced statistical methods. The Shiny application CALMs makes this cumbersome process more accessible to a broader range of users. In addition, it allows researchers to focus more on the interpretation aspect of the research rather than the labor required for testing. The practical utility of the CALMs application is demonstrated using real-world data, highlighting the potential of the application to enhance the validity and reliability of group comparison studies in psychological research.
{"title":"CALMs: A Shiny Application for Comprehensive Analysis of Latent Means.","authors":"Kim Nimon, Julia Fulmore, Gregg Keiffer, Bryn Hammack-Brown","doi":"10.1177/01466216251371173","DOIUrl":"https://doi.org/10.1177/01466216251371173","url":null,"abstract":"<p><p>This article presents a Shiny application CALMs for comprehensively comparing groups via latent means, which includes the examination of group equivalency, propensity score analysis, measurement invariance analysis, and assessment of latent mean differences of equivalent groups with invariant data. Despite the importance of these techniques, their application can be complex and time-consuming, particularly for researchers not experienced in advanced statistical methods. The Shiny application CALMs makes this cumbersome process more accessible to a broader range of users. In addition, it allows researchers to focus more on the interpretation aspect of the research rather than the labor required for testing. The practical utility of the CALMs application is demonstrated using real-world data, highlighting the potential of the application to enhance the validity and reliability of group comparison studies in psychological research.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251371173"},"PeriodicalIF":1.2,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12399567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144993935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-25DOI: 10.1177/01466216251371532
Ottavia M Epifania, Pasquale Anselmi, Egidio Robusto
{"title":"implicitMeasures: An R Package for Scoring the Implicit Association Test and the Single-Category Implicit Association Test.","authors":"Ottavia M Epifania, Pasquale Anselmi, Egidio Robusto","doi":"10.1177/01466216251371532","DOIUrl":"https://doi.org/10.1177/01466216251371532","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251371532"},"PeriodicalIF":1.2,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12378261/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144974606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-23DOI: 10.1177/01466216251363244
Chunyan Liu, Zhongmin Cui
Test developers typically use alternate test forms to protect the integrity of test scores. Because test forms may differ in difficulty, scores on different test forms are adjusted through a psychometrical procedure called equating. When conducting equating, psychometricians often apply smoothing methods to reduce random error of equating resulting from sampling. During the process, they compare plots of different smoothing degrees and choose the optimal value when using the cubic spline postsmoothing method. This manual process, however, could be automated with the help of deep learning-a machine learning technique commonly used for image classification. In this study, a convolutional neural network was trained using human-classified postsmoothing plots. The trained network was used to choose optimal smoothing values with empirical testing data, which were compared to human choices. The agreement rate between humans and the trained network was as large as 71%, suggesting the potential use of deep learning for choosing optimal smoothing values for equating.
{"title":"Using Deep Learning to Choose Optimal Smoothing Values for Equating.","authors":"Chunyan Liu, Zhongmin Cui","doi":"10.1177/01466216251363244","DOIUrl":"https://doi.org/10.1177/01466216251363244","url":null,"abstract":"<p><p>Test developers typically use alternate test forms to protect the integrity of test scores. Because test forms may differ in difficulty, scores on different test forms are adjusted through a psychometrical procedure called equating. When conducting equating, psychometricians often apply smoothing methods to reduce random error of equating resulting from sampling. During the process, they compare plots of different smoothing degrees and choose the optimal value when using the cubic spline postsmoothing method. This manual process, however, could be automated with the help of deep learning-a machine learning technique commonly used for image classification. In this study, a convolutional neural network was trained using human-classified postsmoothing plots. The trained network was used to choose optimal smoothing values with empirical testing data, which were compared to human choices. The agreement rate between humans and the trained network was as large as 71%, suggesting the potential use of deep learning for choosing optimal smoothing values for equating.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251363244"},"PeriodicalIF":1.2,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12374957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144974566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-30DOI: 10.1177/01466216251363240
Inga Laukaityte, Gabriel Wallin, Marie Wiberg
Ensuring that test scores are fair and comparable across different test forms and different test groups is a significant statistical challenge in educational testing. Methods to achieve score comparability, a process known as test score equating, often rely on including common test items or assuming that test taker groups are similar in key characteristics. This study explores a novel approach that combines propensity scores, based on test takers' background covariates, with information from common items using kernel smoothing techniques for binary-scored test items. An empirical analysis using data from a high-stakes college admissions test evaluates the standard errors and differences in adjusted test scores. A simulation study examines the impact of factors such as the number of test takers, the number of common items, and the correlation between covariates and test scores on the method's performance. The findings demonstrate that integrating propensity scores with common item information reduces standard errors and bias more effectively than using either source alone. This suggests that balancing the groups on the test-takers' covariates enhance the fairness and accuracy of test score comparisons across different groups. The proposed method highlights the benefits of considering all the collected data to improve score comparability.
{"title":"Combining Propensity Scores and Common Items for Test Score Equating.","authors":"Inga Laukaityte, Gabriel Wallin, Marie Wiberg","doi":"10.1177/01466216251363240","DOIUrl":"10.1177/01466216251363240","url":null,"abstract":"<p><p>Ensuring that test scores are fair and comparable across different test forms and different test groups is a significant statistical challenge in educational testing. Methods to achieve score comparability, a process known as test score equating, often rely on including common test items or assuming that test taker groups are similar in key characteristics. This study explores a novel approach that combines propensity scores, based on test takers' background covariates, with information from common items using kernel smoothing techniques for binary-scored test items. An empirical analysis using data from a high-stakes college admissions test evaluates the standard errors and differences in adjusted test scores. A simulation study examines the impact of factors such as the number of test takers, the number of common items, and the correlation between covariates and test scores on the method's performance. The findings demonstrate that integrating propensity scores with common item information reduces standard errors and bias more effectively than using either source alone. This suggests that balancing the groups on the test-takers' covariates enhance the fairness and accuracy of test score comparisons across different groups. The proposed method highlights the benefits of considering all the collected data to improve score comparability.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251363240"},"PeriodicalIF":1.2,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12310624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144776645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-21DOI: 10.1177/01466216251360562
Katarzyna Stapor, Grzegorz Kończak, Damian Grabowski, Marta Żywiołek-Szeja, Agata Chudzicka-Czupała
The paper addresses the problem of detecting a statistically significant subset of input considered relationships. The Pearson linear correlation coefficient calculated from a sample was used to determine the strength of a relationship. Simultaneous testing of the significance of many relationships is related to the issue of multiple hypothesis testing. In such a scenario, the probability of making a type I error without proper error control is, in practice, much higher than the assumed level of significance. The paper proposes an alternative approach: a new stepwise procedure (MCorrSeqPerm) allowing for finding the maximum statistically significant system of linear correlations keeping the error at the assumed level. The proposed procedure relies on a sequence of permutation tests. Its application in the analysis of relationships in the problem of examining stress experienced at work and job satisfaction was compared with Holm's classic method in detecting the number of significant correlations.
{"title":"MCorrSeqPerm: Searching for the Maximum Statistically Significant System of Linear Correlations and its Application in Work Psychology.","authors":"Katarzyna Stapor, Grzegorz Kończak, Damian Grabowski, Marta Żywiołek-Szeja, Agata Chudzicka-Czupała","doi":"10.1177/01466216251360562","DOIUrl":"10.1177/01466216251360562","url":null,"abstract":"<p><p>The paper addresses the problem of detecting a statistically significant subset of input considered relationships. The Pearson linear correlation coefficient calculated from a sample was used to determine the strength of a relationship. Simultaneous testing of the significance of many relationships is related to the issue of multiple hypothesis testing. In such a scenario, the probability of making a type I error without proper error control is, in practice, much higher than the assumed level of significance. The paper proposes an alternative approach: a new stepwise procedure (MCorrSeqPerm) allowing for finding the maximum statistically significant system of linear correlations keeping the error at the assumed level. The proposed procedure relies on a sequence of permutation tests. Its application in the analysis of relationships in the problem of examining stress experienced at work and job satisfaction was compared with Holm's classic method in detecting the number of significant correlations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251360562"},"PeriodicalIF":1.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12279768/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}