首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
On the Unreliability of Test-Retest Reliability. 论重测信度的不信度。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-11-26 DOI: 10.1177/01466216251401213
Domenic Groh

The Test-Retest Coefficient (TRC) is a central metric of reliability in Classical Test Theory and modern psychological assessments. Originally developed by early 20th-century psychometricians, it relies on the assumptions of fixed (i.e., perfectly stable) true scores and independent error scores. However, these assumptions are rarely, if ever, tested, despite the fact that their violation can introduce significant biases. This article explores the foundations of these assumptions and examines the performance of the TRC under varying conditions, including different sample sizes, true score stability, and error score dependence. Using simulated data, results show that decreasing true score stability biases TRC estimates, leading to underestimations of reliability. Additionally, error score dependence can inflate TRC values, making unreliable measures appear reliable. More fundamentally, when these assumptions are violated, the TRC becomes underidentified, meaning that multiple, substantively different data-generating processes can yield the same coefficient, thus undermining its interpretability. These findings call into question the TRC's suitability for applied settings, especially when traits fluctuate over time or measurement conditions are uncontrolled. Alternative approaches are briefly discussed.

在经典测试理论和现代心理评估中,重测系数(TRC)是衡量信度的核心指标。它最初是由20世纪初的心理测量学家开发的,它依赖于固定(即完全稳定)的真实分数和独立误差分数的假设。然而,这些假设很少(如果有的话)得到检验,尽管违反这些假设可能会带来显著的偏差。本文探讨了这些假设的基础,并检查了TRC在不同条件下的性能,包括不同的样本量、真实分数稳定性和错误分数依赖性。使用模拟数据,结果表明,真实分数稳定性的降低会导致TRC估计偏差,从而导致可靠性的低估。此外,误差分数依赖性会使TRC值膨胀,使不可靠的度量看起来可靠。更根本的是,当这些假设被违反时,TRC就会被低估,这意味着多个实质上不同的数据生成过程可以产生相同的系数,从而破坏了其可解释性。这些发现对TRC对应用设置的适用性提出了质疑,特别是当性状随时间波动或测量条件不受控制时。简要讨论了各种备选方法。
{"title":"On the Unreliability of Test-Retest Reliability.","authors":"Domenic Groh","doi":"10.1177/01466216251401213","DOIUrl":"10.1177/01466216251401213","url":null,"abstract":"<p><p>The Test-Retest Coefficient (TRC) is a central metric of reliability in Classical Test Theory and modern psychological assessments. Originally developed by early 20th-century psychometricians, it relies on the assumptions of fixed (i.e., perfectly stable) true scores and independent error scores. However, these assumptions are rarely, if ever, tested, despite the fact that their violation can introduce significant biases. This article explores the foundations of these assumptions and examines the performance of the TRC under varying conditions, including different sample sizes, true score stability, and error score dependence. Using simulated data, results show that decreasing true score stability biases TRC estimates, leading to underestimations of reliability. Additionally, error score dependence can inflate TRC values, making unreliable measures appear reliable. More fundamentally, when these assumptions are violated, the TRC becomes underidentified, meaning that multiple, substantively different data-generating processes can yield the same coefficient, thus undermining its interpretability. These findings call into question the TRC's suitability for applied settings, especially when traits fluctuate over time or measurement conditions are uncontrolled. Alternative approaches are briefly discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251401213"},"PeriodicalIF":1.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anchor Detection Strategy in Moderated Non-Linear Factor Analysis for Differential Item Functioning (DIF). 差异项目功能(DIF)的调节非线性因子分析中的锚点检测策略。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-11-24 DOI: 10.1177/01466216251401206
Sooyong Lee, Suyoung Kim, Seung W Choi

Ensuring measurement invariance is crucial for fair psychological and educational assessments, particularly in detecting Differential Item Functioning (DIF). Moderated Non-linear Factor Analysis (MNLFA) provides a flexible framework for detecting DIF by modeling item parameters as functions of observed covariates. However, a significant challenge in MNLFA-based DIF detection is anchor item selection, as improperly chosen anchors can bias results. This study proposes a refined constrained-baseline anchor detection approach utilizing information criteria (IC) for model selection. The proposed three-step procedure sequentially identifies potential DIF items through the Bayesian Information Criterion (BIC) and Weighted Information Criterion (WIC), followed by DIF-free anchor items using the Akaike Information Criterion (AIC). The method's effectiveness in controlling Type I error rates while maintaining statistical power is evaluated through simulation studies and empirical data analysis. Comparisons with regularization approaches demonstrate the proposed method's accuracy and computational efficiency.

确保测量的不变性对于公平的心理和教育评估至关重要,特别是在检测差异项目功能(DIF)方面。调节非线性因子分析(MNLFA)通过将项目参数建模为观测协变量的函数,为检测DIF提供了一个灵活的框架。然而,在基于mnlfa的DIF检测中,一个重要的挑战是锚点项目的选择,因为锚点选择不当会导致结果偏差。本研究提出了一种利用信息标准(IC)进行模型选择的改进约束基线锚点检测方法。该方法通过贝叶斯信息准则(BIC)和加权信息准则(WIC)确定潜在的DIF项目,然后使用赤池信息准则(AIC)确定无DIF的锚点项目。通过仿真研究和实证数据分析,评价了该方法在保持统计威力的同时控制I类错误率的有效性。与正则化方法的比较证明了该方法的准确性和计算效率。
{"title":"Anchor Detection Strategy in Moderated Non-Linear Factor Analysis for Differential Item Functioning (DIF).","authors":"Sooyong Lee, Suyoung Kim, Seung W Choi","doi":"10.1177/01466216251401206","DOIUrl":"https://doi.org/10.1177/01466216251401206","url":null,"abstract":"<p><p>Ensuring measurement invariance is crucial for fair psychological and educational assessments, particularly in detecting Differential Item Functioning (DIF). Moderated Non-linear Factor Analysis (MNLFA) provides a flexible framework for detecting DIF by modeling item parameters as functions of observed covariates. However, a significant challenge in MNLFA-based DIF detection is anchor item selection, as improperly chosen anchors can bias results. This study proposes a refined constrained-baseline anchor detection approach utilizing information criteria (IC) for model selection. The proposed three-step procedure sequentially identifies potential DIF items through the Bayesian Information Criterion (BIC) and Weighted Information Criterion (WIC), followed by DIF-free anchor items using the Akaike Information Criterion (AIC). The method's effectiveness in controlling Type I error rates while maintaining statistical power is evaluated through simulation studies and empirical data analysis. Comparisons with regularization approaches demonstrate the proposed method's accuracy and computational efficiency.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251401206"},"PeriodicalIF":1.2,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12643905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145641338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distinguishing Between Models for Extreme and Midpoint Response Styles as Opposite Poles of a Single Dimension versus Two Separate Dimensions: A Simulation Study. 区分极端和中点响应风格模型作为单一维度与两个独立维度的对立极点:一项模拟研究。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-09-13 DOI: 10.1177/01466216251379471
Martijn Schoenmakers, Maria Bolsinova, Jesper Tijmstra

Extreme and midpoint response styles have frequently been found to decrease the validity of Likert-type questionnaire results. Different approaches for modelling extreme and midpoint responding have been proposed in the literature, with some advocating for a unidimensional conceptualization of the response styles as opposite poles, and others modelling them as separate dimensions. How these response styles are modelled influences the estimation complexity, parameter estimates, and detection of and correction for response styles in IRT models. For these reasons, we examine if it is possible to empirically distinguish between extreme and midpoint responding as two separate dimensions versus two opposite sides of a single dimension. The various conceptualizations are modelled using the multidimensional nominal response model, with the AIC and BIC being used to distinguish between the competing models in a simulation study and an empirical example. Results indicate good performance of both information criteria given sufficient sample size, test length, and response style strength. The BIC outperformed the AIC in cases where no response styles were present, while the AIC outperformed the BIC in cases where multiple response style dimensions were present. Implications of the results for practice are discussed.

极端和中点反应风格经常被发现会降低李克特型问卷结果的效度。文献中提出了不同的模拟极端和中点反应的方法,一些人主张将反应风格作为相反的两极进行一维概念化,而另一些人则将它们作为单独的维度进行建模。如何对这些响应样式建模会影响IRT模型中响应样式的估计复杂性、参数估计以及检测和校正。由于这些原因,我们检查是否有可能在经验上区分极端和中点响应作为两个独立的维度与单个维度的两个相反的方面。使用多维名义响应模型对各种概念化进行建模,AIC和BIC用于区分模拟研究和经验示例中的竞争模型。结果表明,在给定足够的样本量、测试长度和响应风格强度的情况下,这两种信息标准都具有良好的性能。在没有反应风格维度的情况下,BIC优于AIC,而在存在多个反应风格维度的情况下,AIC优于BIC。讨论了研究结果对实践的影响。
{"title":"Distinguishing Between Models for Extreme and Midpoint Response Styles as Opposite Poles of a Single Dimension versus Two Separate Dimensions: A Simulation Study.","authors":"Martijn Schoenmakers, Maria Bolsinova, Jesper Tijmstra","doi":"10.1177/01466216251379471","DOIUrl":"10.1177/01466216251379471","url":null,"abstract":"<p><p>Extreme and midpoint response styles have frequently been found to decrease the validity of Likert-type questionnaire results. Different approaches for modelling extreme and midpoint responding have been proposed in the literature, with some advocating for a unidimensional conceptualization of the response styles as opposite poles, and others modelling them as separate dimensions. How these response styles are modelled influences the estimation complexity, parameter estimates, and detection of and correction for response styles in IRT models. For these reasons, we examine if it is possible to empirically distinguish between extreme and midpoint responding as two separate dimensions versus two opposite sides of a single dimension. The various conceptualizations are modelled using the multidimensional nominal response model, with the AIC and BIC being used to distinguish between the competing models in a simulation study and an empirical example. Results indicate good performance of both information criteria given sufficient sample size, test length, and response style strength. The BIC outperformed the AIC in cases where no response styles were present, while the AIC outperformed the BIC in cases where multiple response style dimensions were present. Implications of the results for practice are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251379471"},"PeriodicalIF":1.2,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433433/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145070752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
fcirt: An R Package for Forced Choice Models in Item Response Theory. 第一章:项目反应理论中强迫选择模型的R包。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-09-10 DOI: 10.1177/01466216251378771
Naidan Tu, Sean Joo, Philseok Lee, Stephen Stark

Multidimensional forced choice (MFC) formats have emerged as a promising alternative to traditional single statement Likert-type measures for assessing noncognitive traits while reducing response biases. As MFC formats become more widely used, there is a growing need for tools to support MFC analysis, which motivated the development of the fcirt package. The fcirt package estimates forced choice model parameters using Bayesian methods. It currently enables estimation of the Generalized Graded Unfolding Model (GGUM; Roberts et al., 2000)-based Multi-Unidimensional Pairwise Preference (MUPP) model using rstan, which implements the Hamiltonian Monte Carlo (HMC) sampling algorithm. fcirt also includes functions for computing item and test information functions to evaluate the quality of MFC assessments, as well as functions for Bayesian diagnostic plotting to assist with model evaluation and convergence assessment.

多维强迫选择(MFC)格式已经成为传统的单语句李克特测量方法的一个有希望的替代方案,用于评估非认知特征,同时减少反应偏差。随着MFC格式的广泛使用,越来越需要支持MFC分析的工具,这推动了第一个包的开发。第一个包使用贝叶斯方法估计强制选择模型参数。目前,它可以使用rstan估计基于广义梯度展开模型(GGUM; Roberts et al., 2000)的多维配对偏好(MUPP)模型,该模型实现了哈密顿蒙特卡罗(HMC)采样算法。fcirt还包括计算项目和测试信息函数的功能,以评估MFC评估的质量,以及贝叶斯诊断绘图的功能,以协助模型评估和收敛性评估。
{"title":"<i>fcirt</i>: An R Package for Forced Choice Models in Item Response Theory.","authors":"Naidan Tu, Sean Joo, Philseok Lee, Stephen Stark","doi":"10.1177/01466216251378771","DOIUrl":"10.1177/01466216251378771","url":null,"abstract":"<p><p>Multidimensional forced choice (MFC) formats have emerged as a promising alternative to traditional single statement Likert-type measures for assessing noncognitive traits while reducing response biases. As MFC formats become more widely used, there is a growing need for tools to support MFC analysis, which motivated the development of the <i>fcirt</i> package. The <i>fcirt</i> package estimates forced choice model parameters using Bayesian methods. It currently enables estimation of the Generalized Graded Unfolding Model (GGUM; Roberts et al., 2000)-based Multi-Unidimensional Pairwise Preference (MUPP) model using <i>rstan</i>, which implements the Hamiltonian Monte Carlo (HMC) sampling algorithm. <i>fcirt</i> also includes functions for computing item and test information functions to evaluate the quality of MFC assessments, as well as functions for Bayesian diagnostic plotting to assist with model evaluation and convergence assessment.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251378771"},"PeriodicalIF":1.2,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12423082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145065903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Generation of Rule-Based Raven-Like Matrices in R: The matRiks Package. 在R中自动生成基于规则的类乌鸦矩阵:矩阵包。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-09-02 DOI: 10.1177/01466216251374826
Andrea Brancaccio, Ottavia M Epifania, Pasquale Anselmi, Debora de Chiusole
{"title":"Automatic Generation of Rule-Based Raven-Like Matrices in R: The matRiks Package.","authors":"Andrea Brancaccio, Ottavia M Epifania, Pasquale Anselmi, Debora de Chiusole","doi":"10.1177/01466216251374826","DOIUrl":"10.1177/01466216251374826","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251374826"},"PeriodicalIF":1.2,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145001708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CALMs: A Shiny Application for Comprehensive Analysis of Latent Means. CALMs:潜在均值综合分析的闪亮应用。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-08-31 DOI: 10.1177/01466216251371173
Kim Nimon, Julia Fulmore, Gregg Keiffer, Bryn Hammack-Brown

This article presents a Shiny application CALMs for comprehensively comparing groups via latent means, which includes the examination of group equivalency, propensity score analysis, measurement invariance analysis, and assessment of latent mean differences of equivalent groups with invariant data. Despite the importance of these techniques, their application can be complex and time-consuming, particularly for researchers not experienced in advanced statistical methods. The Shiny application CALMs makes this cumbersome process more accessible to a broader range of users. In addition, it allows researchers to focus more on the interpretation aspect of the research rather than the labor required for testing. The practical utility of the CALMs application is demonstrated using real-world data, highlighting the potential of the application to enhance the validity and reliability of group comparison studies in psychological research.

本文介绍了一种利用潜在均值综合比较群体的新应用CALMs,包括群体等效性检验、倾向得分分析、测量不变性分析以及用不变数据评估等效群体的潜在均值差异。尽管这些技术很重要,但它们的应用可能是复杂和耗时的,特别是对于没有高级统计方法经验的研究人员。Shiny的应用程序CALMs使这个繁琐的过程对更广泛的用户更容易访问。此外,它允许研究人员更多地关注研究的解释方面,而不是测试所需的劳动力。使用真实世界的数据演示了CALMs应用程序的实际效用,突出了该应用程序在提高心理学研究中群体比较研究的有效性和可靠性方面的潜力。
{"title":"CALMs: A Shiny Application for Comprehensive Analysis of Latent Means.","authors":"Kim Nimon, Julia Fulmore, Gregg Keiffer, Bryn Hammack-Brown","doi":"10.1177/01466216251371173","DOIUrl":"https://doi.org/10.1177/01466216251371173","url":null,"abstract":"<p><p>This article presents a Shiny application CALMs for comprehensively comparing groups via latent means, which includes the examination of group equivalency, propensity score analysis, measurement invariance analysis, and assessment of latent mean differences of equivalent groups with invariant data. Despite the importance of these techniques, their application can be complex and time-consuming, particularly for researchers not experienced in advanced statistical methods. The Shiny application CALMs makes this cumbersome process more accessible to a broader range of users. In addition, it allows researchers to focus more on the interpretation aspect of the research rather than the labor required for testing. The practical utility of the CALMs application is demonstrated using real-world data, highlighting the potential of the application to enhance the validity and reliability of group comparison studies in psychological research.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251371173"},"PeriodicalIF":1.2,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12399567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144993935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
implicitMeasures: An R Package for Scoring the Implicit Association Test and the Single-Category Implicit Association Test. 内隐关联测验和单类别内隐关联测验的R量表。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-08-25 DOI: 10.1177/01466216251371532
Ottavia M Epifania, Pasquale Anselmi, Egidio Robusto
{"title":"implicitMeasures: An R Package for Scoring the Implicit Association Test and the Single-Category Implicit Association Test.","authors":"Ottavia M Epifania, Pasquale Anselmi, Egidio Robusto","doi":"10.1177/01466216251371532","DOIUrl":"https://doi.org/10.1177/01466216251371532","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251371532"},"PeriodicalIF":1.2,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12378261/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144974606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Deep Learning to Choose Optimal Smoothing Values for Equating. 利用深度学习选择最优的平滑值来求解方程。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-08-23 DOI: 10.1177/01466216251363244
Chunyan Liu, Zhongmin Cui

Test developers typically use alternate test forms to protect the integrity of test scores. Because test forms may differ in difficulty, scores on different test forms are adjusted through a psychometrical procedure called equating. When conducting equating, psychometricians often apply smoothing methods to reduce random error of equating resulting from sampling. During the process, they compare plots of different smoothing degrees and choose the optimal value when using the cubic spline postsmoothing method. This manual process, however, could be automated with the help of deep learning-a machine learning technique commonly used for image classification. In this study, a convolutional neural network was trained using human-classified postsmoothing plots. The trained network was used to choose optimal smoothing values with empirical testing data, which were compared to human choices. The agreement rate between humans and the trained network was as large as 71%, suggesting the potential use of deep learning for choosing optimal smoothing values for equating.

测试开发人员通常使用替代测试表单来保护测试分数的完整性。由于测试形式可能在难度上有所不同,不同测试形式的分数会通过一种称为相等的心理测量程序进行调整。在进行等值时,心理测量学家通常采用平滑方法来减少抽样导致的等值随机误差。在此过程中,他们比较了不同平滑度的图,选择了三次样条后平滑方法的最优值。然而,这个手动过程可以在深度学习的帮助下自动化,深度学习是一种通常用于图像分类的机器学习技术。在本研究中,使用人工分类后平滑图来训练卷积神经网络。利用训练后的网络与经验测试数据选择最优平滑值,并与人工选择进行比较。人类和经过训练的网络之间的一致性高达71%,这表明深度学习在选择最佳平滑值进行相等方面的潜在用途。
{"title":"Using Deep Learning to Choose Optimal Smoothing Values for Equating.","authors":"Chunyan Liu, Zhongmin Cui","doi":"10.1177/01466216251363244","DOIUrl":"https://doi.org/10.1177/01466216251363244","url":null,"abstract":"<p><p>Test developers typically use alternate test forms to protect the integrity of test scores. Because test forms may differ in difficulty, scores on different test forms are adjusted through a psychometrical procedure called equating. When conducting equating, psychometricians often apply smoothing methods to reduce random error of equating resulting from sampling. During the process, they compare plots of different smoothing degrees and choose the optimal value when using the cubic spline postsmoothing method. This manual process, however, could be automated with the help of deep learning-a machine learning technique commonly used for image classification. In this study, a convolutional neural network was trained using human-classified postsmoothing plots. The trained network was used to choose optimal smoothing values with empirical testing data, which were compared to human choices. The agreement rate between humans and the trained network was as large as 71%, suggesting the potential use of deep learning for choosing optimal smoothing values for equating.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251363244"},"PeriodicalIF":1.2,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12374957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144974566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Propensity Scores and Common Items for Test Score Equating. 结合倾向分数和测试分数相等的常见项目。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-30 DOI: 10.1177/01466216251363240
Inga Laukaityte, Gabriel Wallin, Marie Wiberg

Ensuring that test scores are fair and comparable across different test forms and different test groups is a significant statistical challenge in educational testing. Methods to achieve score comparability, a process known as test score equating, often rely on including common test items or assuming that test taker groups are similar in key characteristics. This study explores a novel approach that combines propensity scores, based on test takers' background covariates, with information from common items using kernel smoothing techniques for binary-scored test items. An empirical analysis using data from a high-stakes college admissions test evaluates the standard errors and differences in adjusted test scores. A simulation study examines the impact of factors such as the number of test takers, the number of common items, and the correlation between covariates and test scores on the method's performance. The findings demonstrate that integrating propensity scores with common item information reduces standard errors and bias more effectively than using either source alone. This suggests that balancing the groups on the test-takers' covariates enhance the fairness and accuracy of test score comparisons across different groups. The proposed method highlights the benefits of considering all the collected data to improve score comparability.

在教育考试中,确保不同考试形式和不同考试群体的考试成绩公平和可比性是一项重大的统计挑战。实现分数可比性的方法,一个被称为考试分数相等的过程,通常依赖于包括共同的测试项目或假设考生群体在关键特征上相似。本研究探索了一种新颖的方法,将基于考生背景协变量的倾向分数与使用核平滑技术处理二元得分测试项目的常见项目信息相结合。一项利用高风险大学入学考试数据的实证分析评估了标准误差和调整后考试成绩的差异。一项模拟研究考察了一些因素的影响,如参加考试的人数、常见项目的数量、协变量和考试分数之间的相关性对方法性能的影响。研究结果表明,与单独使用任何一种来源相比,将倾向得分与常见项目信息相结合可以更有效地减少标准误差和偏差。这表明,在考生协变量上平衡各组可以提高不同组间考试成绩比较的公平性和准确性。所提出的方法强调了考虑所有收集数据以提高分数可比性的好处。
{"title":"Combining Propensity Scores and Common Items for Test Score Equating.","authors":"Inga Laukaityte, Gabriel Wallin, Marie Wiberg","doi":"10.1177/01466216251363240","DOIUrl":"10.1177/01466216251363240","url":null,"abstract":"<p><p>Ensuring that test scores are fair and comparable across different test forms and different test groups is a significant statistical challenge in educational testing. Methods to achieve score comparability, a process known as test score equating, often rely on including common test items or assuming that test taker groups are similar in key characteristics. This study explores a novel approach that combines propensity scores, based on test takers' background covariates, with information from common items using kernel smoothing techniques for binary-scored test items. An empirical analysis using data from a high-stakes college admissions test evaluates the standard errors and differences in adjusted test scores. A simulation study examines the impact of factors such as the number of test takers, the number of common items, and the correlation between covariates and test scores on the method's performance. The findings demonstrate that integrating propensity scores with common item information reduces standard errors and bias more effectively than using either source alone. This suggests that balancing the groups on the test-takers' covariates enhance the fairness and accuracy of test score comparisons across different groups. The proposed method highlights the benefits of considering all the collected data to improve score comparability.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251363240"},"PeriodicalIF":1.2,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12310624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144776645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCorrSeqPerm: Searching for the Maximum Statistically Significant System of Linear Correlations and its Application in Work Psychology. 线性相关的最大统计显著性系统的搜索及其在工作心理学中的应用。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-21 DOI: 10.1177/01466216251360562
Katarzyna Stapor, Grzegorz Kończak, Damian Grabowski, Marta Żywiołek-Szeja, Agata Chudzicka-Czupała

The paper addresses the problem of detecting a statistically significant subset of input considered relationships. The Pearson linear correlation coefficient calculated from a sample was used to determine the strength of a relationship. Simultaneous testing of the significance of many relationships is related to the issue of multiple hypothesis testing. In such a scenario, the probability of making a type I error without proper error control is, in practice, much higher than the assumed level of significance. The paper proposes an alternative approach: a new stepwise procedure (MCorrSeqPerm) allowing for finding the maximum statistically significant system of linear correlations keeping the error at the assumed level. The proposed procedure relies on a sequence of permutation tests. Its application in the analysis of relationships in the problem of examining stress experienced at work and job satisfaction was compared with Holm's classic method in detecting the number of significant correlations.

本文解决了检测输入考虑关系的统计显著子集的问题。从样本中计算出的Pearson线性相关系数用于确定关系的强度。同时检验许多关系的显著性涉及到多重假设检验问题。在这种情况下,如果没有适当的错误控制,犯第一类错误的概率实际上要比假设的重要程度高得多。本文提出了一种替代方法:一种新的逐步过程(MCorrSeqPerm),允许找到最大统计显著的线性相关系统,使误差保持在假设的水平。所建议的程序依赖于一系列排列测试。在检验工作压力和工作满意度的问题中,它在关系分析中的应用与Holm的经典方法在检测显著相关性的数量方面进行了比较。
{"title":"MCorrSeqPerm: Searching for the Maximum Statistically Significant System of Linear Correlations and its Application in Work Psychology.","authors":"Katarzyna Stapor, Grzegorz Kończak, Damian Grabowski, Marta Żywiołek-Szeja, Agata Chudzicka-Czupała","doi":"10.1177/01466216251360562","DOIUrl":"10.1177/01466216251360562","url":null,"abstract":"<p><p>The paper addresses the problem of detecting a statistically significant subset of input considered relationships. The Pearson linear correlation coefficient calculated from a sample was used to determine the strength of a relationship. Simultaneous testing of the significance of many relationships is related to the issue of multiple hypothesis testing. In such a scenario, the probability of making a type I error without proper error control is, in practice, much higher than the assumed level of significance. The paper proposes an alternative approach: a new stepwise procedure (MCorrSeqPerm) allowing for finding the maximum statistically significant system of linear correlations keeping the error at the assumed level. The proposed procedure relies on a sequence of permutation tests. Its application in the analysis of relationships in the problem of examining stress experienced at work and job satisfaction was compared with Holm's classic method in detecting the number of significant correlations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251360562"},"PeriodicalIF":1.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12279768/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1