首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Using Deep Learning to Choose Optimal Smoothing Values for Equating. 利用深度学习选择最优的平滑值来求解方程。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-08-23 DOI: 10.1177/01466216251363244
Chunyan Liu, Zhongmin Cui

Test developers typically use alternate test forms to protect the integrity of test scores. Because test forms may differ in difficulty, scores on different test forms are adjusted through a psychometrical procedure called equating. When conducting equating, psychometricians often apply smoothing methods to reduce random error of equating resulting from sampling. During the process, they compare plots of different smoothing degrees and choose the optimal value when using the cubic spline postsmoothing method. This manual process, however, could be automated with the help of deep learning-a machine learning technique commonly used for image classification. In this study, a convolutional neural network was trained using human-classified postsmoothing plots. The trained network was used to choose optimal smoothing values with empirical testing data, which were compared to human choices. The agreement rate between humans and the trained network was as large as 71%, suggesting the potential use of deep learning for choosing optimal smoothing values for equating.

测试开发人员通常使用替代测试表单来保护测试分数的完整性。由于测试形式可能在难度上有所不同,不同测试形式的分数会通过一种称为相等的心理测量程序进行调整。在进行等值时,心理测量学家通常采用平滑方法来减少抽样导致的等值随机误差。在此过程中,他们比较了不同平滑度的图,选择了三次样条后平滑方法的最优值。然而,这个手动过程可以在深度学习的帮助下自动化,深度学习是一种通常用于图像分类的机器学习技术。在本研究中,使用人工分类后平滑图来训练卷积神经网络。利用训练后的网络与经验测试数据选择最优平滑值,并与人工选择进行比较。人类和经过训练的网络之间的一致性高达71%,这表明深度学习在选择最佳平滑值进行相等方面的潜在用途。
{"title":"Using Deep Learning to Choose Optimal Smoothing Values for Equating.","authors":"Chunyan Liu, Zhongmin Cui","doi":"10.1177/01466216251363244","DOIUrl":"https://doi.org/10.1177/01466216251363244","url":null,"abstract":"<p><p>Test developers typically use alternate test forms to protect the integrity of test scores. Because test forms may differ in difficulty, scores on different test forms are adjusted through a psychometrical procedure called equating. When conducting equating, psychometricians often apply smoothing methods to reduce random error of equating resulting from sampling. During the process, they compare plots of different smoothing degrees and choose the optimal value when using the cubic spline postsmoothing method. This manual process, however, could be automated with the help of deep learning-a machine learning technique commonly used for image classification. In this study, a convolutional neural network was trained using human-classified postsmoothing plots. The trained network was used to choose optimal smoothing values with empirical testing data, which were compared to human choices. The agreement rate between humans and the trained network was as large as 71%, suggesting the potential use of deep learning for choosing optimal smoothing values for equating.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251363244"},"PeriodicalIF":1.2,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12374957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144974566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Propensity Scores and Common Items for Test Score Equating. 结合倾向分数和测试分数相等的常见项目。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-30 DOI: 10.1177/01466216251363240
Inga Laukaityte, Gabriel Wallin, Marie Wiberg

Ensuring that test scores are fair and comparable across different test forms and different test groups is a significant statistical challenge in educational testing. Methods to achieve score comparability, a process known as test score equating, often rely on including common test items or assuming that test taker groups are similar in key characteristics. This study explores a novel approach that combines propensity scores, based on test takers' background covariates, with information from common items using kernel smoothing techniques for binary-scored test items. An empirical analysis using data from a high-stakes college admissions test evaluates the standard errors and differences in adjusted test scores. A simulation study examines the impact of factors such as the number of test takers, the number of common items, and the correlation between covariates and test scores on the method's performance. The findings demonstrate that integrating propensity scores with common item information reduces standard errors and bias more effectively than using either source alone. This suggests that balancing the groups on the test-takers' covariates enhance the fairness and accuracy of test score comparisons across different groups. The proposed method highlights the benefits of considering all the collected data to improve score comparability.

在教育考试中,确保不同考试形式和不同考试群体的考试成绩公平和可比性是一项重大的统计挑战。实现分数可比性的方法,一个被称为考试分数相等的过程,通常依赖于包括共同的测试项目或假设考生群体在关键特征上相似。本研究探索了一种新颖的方法,将基于考生背景协变量的倾向分数与使用核平滑技术处理二元得分测试项目的常见项目信息相结合。一项利用高风险大学入学考试数据的实证分析评估了标准误差和调整后考试成绩的差异。一项模拟研究考察了一些因素的影响,如参加考试的人数、常见项目的数量、协变量和考试分数之间的相关性对方法性能的影响。研究结果表明,与单独使用任何一种来源相比,将倾向得分与常见项目信息相结合可以更有效地减少标准误差和偏差。这表明,在考生协变量上平衡各组可以提高不同组间考试成绩比较的公平性和准确性。所提出的方法强调了考虑所有收集数据以提高分数可比性的好处。
{"title":"Combining Propensity Scores and Common Items for Test Score Equating.","authors":"Inga Laukaityte, Gabriel Wallin, Marie Wiberg","doi":"10.1177/01466216251363240","DOIUrl":"10.1177/01466216251363240","url":null,"abstract":"<p><p>Ensuring that test scores are fair and comparable across different test forms and different test groups is a significant statistical challenge in educational testing. Methods to achieve score comparability, a process known as test score equating, often rely on including common test items or assuming that test taker groups are similar in key characteristics. This study explores a novel approach that combines propensity scores, based on test takers' background covariates, with information from common items using kernel smoothing techniques for binary-scored test items. An empirical analysis using data from a high-stakes college admissions test evaluates the standard errors and differences in adjusted test scores. A simulation study examines the impact of factors such as the number of test takers, the number of common items, and the correlation between covariates and test scores on the method's performance. The findings demonstrate that integrating propensity scores with common item information reduces standard errors and bias more effectively than using either source alone. This suggests that balancing the groups on the test-takers' covariates enhance the fairness and accuracy of test score comparisons across different groups. The proposed method highlights the benefits of considering all the collected data to improve score comparability.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251363240"},"PeriodicalIF":1.2,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12310624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144776645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCorrSeqPerm: Searching for the Maximum Statistically Significant System of Linear Correlations and its Application in Work Psychology. 线性相关的最大统计显著性系统的搜索及其在工作心理学中的应用。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-21 DOI: 10.1177/01466216251360562
Katarzyna Stapor, Grzegorz Kończak, Damian Grabowski, Marta Żywiołek-Szeja, Agata Chudzicka-Czupała

The paper addresses the problem of detecting a statistically significant subset of input considered relationships. The Pearson linear correlation coefficient calculated from a sample was used to determine the strength of a relationship. Simultaneous testing of the significance of many relationships is related to the issue of multiple hypothesis testing. In such a scenario, the probability of making a type I error without proper error control is, in practice, much higher than the assumed level of significance. The paper proposes an alternative approach: a new stepwise procedure (MCorrSeqPerm) allowing for finding the maximum statistically significant system of linear correlations keeping the error at the assumed level. The proposed procedure relies on a sequence of permutation tests. Its application in the analysis of relationships in the problem of examining stress experienced at work and job satisfaction was compared with Holm's classic method in detecting the number of significant correlations.

本文解决了检测输入考虑关系的统计显著子集的问题。从样本中计算出的Pearson线性相关系数用于确定关系的强度。同时检验许多关系的显著性涉及到多重假设检验问题。在这种情况下,如果没有适当的错误控制,犯第一类错误的概率实际上要比假设的重要程度高得多。本文提出了一种替代方法:一种新的逐步过程(MCorrSeqPerm),允许找到最大统计显著的线性相关系统,使误差保持在假设的水平。所建议的程序依赖于一系列排列测试。在检验工作压力和工作满意度的问题中,它在关系分析中的应用与Holm的经典方法在检测显著相关性的数量方面进行了比较。
{"title":"MCorrSeqPerm: Searching for the Maximum Statistically Significant System of Linear Correlations and its Application in Work Psychology.","authors":"Katarzyna Stapor, Grzegorz Kończak, Damian Grabowski, Marta Żywiołek-Szeja, Agata Chudzicka-Czupała","doi":"10.1177/01466216251360562","DOIUrl":"10.1177/01466216251360562","url":null,"abstract":"<p><p>The paper addresses the problem of detecting a statistically significant subset of input considered relationships. The Pearson linear correlation coefficient calculated from a sample was used to determine the strength of a relationship. Simultaneous testing of the significance of many relationships is related to the issue of multiple hypothesis testing. In such a scenario, the probability of making a type I error without proper error control is, in practice, much higher than the assumed level of significance. The paper proposes an alternative approach: a new stepwise procedure (MCorrSeqPerm) allowing for finding the maximum statistically significant system of linear correlations keeping the error at the assumed level. The proposed procedure relies on a sequence of permutation tests. Its application in the analysis of relationships in the problem of examining stress experienced at work and job satisfaction was compared with Holm's classic method in detecting the number of significant correlations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251360562"},"PeriodicalIF":1.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12279768/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multidimensional Continuous Response Model for Measuring Unipolar Traits. 单极特质测量的多维连续响应模型。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-15 DOI: 10.1177/01466216251360311
Pere J Ferrando, Fabia Morales-Vives, José M Casas, David Navarro-González

Unipolar constructs are encountered in a variety of non-cognitive measurement scenarios that include clinical and forensic assessments, symptoms checklists, addictive behaviors, and irrational beliefs among others. Furthermore, Item Response Theory (IRT) models intended for fitting and scoring measures of unipolar constructs, particularly Log-Logistic models, are fully developed at present, but they are limited to unidimensional structures. This paper proposes a novel multidimensional log-logistic IRT model intended for double-bounded continuous response items that measure unipolar constructs. The chosen response format is a natural application, and is increasingly used, in the scenarios for which the model is intended. The proposed model is remarkably simple, has interesting properties and, at the structural level can be fitted by using linearizing transformations. Multidimensional item location and discrimination indices are developed, and procedures for fitting the model, scoring the respondents, and assessing conditional and marginal accuracy (including information curves) are proposed. Everything that is proposed has been implemented in fully available R program. The functioning of the model is illustrated by using an empirical example with the data of 371 undergraduate students who answered the Depression and Anxiety subscales of the Brief Symptom Inventory 18 and also the Rosenberg Self-Esteem Scale. The results show the usefulness of the new model to adequately interpret unipolar variables, particularly in terms of the conditional reliability of trait estimates and external validity.

在各种非认知测量场景中都会遇到单极构念,包括临床和法医评估、症状检查表、成瘾行为和非理性信念等。此外,用于拟合和评分单极结构的项目反应理论(IRT)模型,特别是Log-Logistic模型,目前已经得到了充分的发展,但它们仅限于单极结构。本文提出了一种新的多维逻辑-逻辑IRT模型,用于测量单极结构的双界连续响应项目。所选择的响应格式是一个自然的应用程序,并且在模型所要用于的场景中被越来越多地使用。所提出的模型非常简单,具有有趣的特性,并且在结构层面上可以通过线性化变换进行拟合。建立了多维项目定位和歧视指数,并提出了模型拟合、被调查者评分和评估条件和边际精度(包括信息曲线)的程序。所有建议都已在完全可用的R程序中实现。通过371名大学生回答了简要症状量表18的抑郁和焦虑子量表以及罗森博格自尊量表的数据,说明了模型的功能。结果表明,新模型能够充分解释单极变量,特别是在特质估计的条件信度和外部效度方面。
{"title":"A Multidimensional Continuous Response Model for Measuring Unipolar Traits.","authors":"Pere J Ferrando, Fabia Morales-Vives, José M Casas, David Navarro-González","doi":"10.1177/01466216251360311","DOIUrl":"10.1177/01466216251360311","url":null,"abstract":"<p><p>Unipolar constructs are encountered in a variety of non-cognitive measurement scenarios that include clinical and forensic assessments, symptoms checklists, addictive behaviors, and irrational beliefs among others. Furthermore, Item Response Theory (IRT) models intended for fitting and scoring measures of unipolar constructs, particularly Log-Logistic models, are fully developed at present, but they are limited to unidimensional structures. This paper proposes a novel multidimensional log-logistic IRT model intended for double-bounded continuous response items that measure unipolar constructs. The chosen response format is a natural application, and is increasingly used, in the scenarios for which the model is intended. The proposed model is remarkably simple, has interesting properties and, at the structural level can be fitted by using linearizing transformations. Multidimensional item location and discrimination indices are developed, and procedures for fitting the model, scoring the respondents, and assessing conditional and marginal accuracy (including information curves) are proposed. Everything that is proposed has been implemented in fully available R program. The functioning of the model is illustrated by using an empirical example with the data of 371 undergraduate students who answered the Depression and Anxiety subscales of the <i>Brief Symptom Inventory 18</i> and also the <i>Rosenberg Self-Esteem Scale.</i> The results show the usefulness of the new model to adequately interpret unipolar variables, particularly in terms of the conditional reliability of trait estimates and external validity.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251360311"},"PeriodicalIF":1.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12267208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144676191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Study of Latent State-Trait Theory Framework in Piecewise Growth Models. 分段增长模型中潜在状态-特质理论框架研究。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-15 DOI: 10.1177/01466216251360565
Ihnwhi Heo, Ren Liu, Haiyan Liu, Sarah Depaoli, Fan Jia

Latent state-trait (LST) theory provides a psychometric framework that facilitates the measurement of long-term trait change and short-term state variability in longitudinal data. While LST theory has guided the development and extension of linear latent growth models within its theoretical framework, the integration of piecewise growth models (PGMs) into the LST theory framework remains uninvestigated. PGMs are well suited for modeling nonlinear developmental processes comprised of distinct stages, which frequently arise in psychological and educational research. Their ability to capture phase-specific changes makes them a useful tool for applied and methodological researchers. This paper introduces a novel measurement approach that integrates PGMs into the framework of LST theory by presenting single-indicator piecewise growth models (SI-PGMs) and multiple-indicator piecewise growth models (MI-PGMs). We detail the model specifications for both SI-PGMs and MI-PGMs. For SI-PGMs, we define the reliability coefficient; for MI-PGMs, we define the consistency coefficient, occasion specificity coefficient, and reliability coefficient. We then conduct simulations to evaluate the models' performance in accurately recovering growth parameters and capturing true reliability. The simulation results indicated that SI-PGMs and MI-PGMs successfully recovered growth parameters and performed comparably in the absence of situational influences. However, MI-PGMs outperformed SI-PGMs when situational influences were present. We conclude by outlining directions for future research and providing Mplus syntax to support the dissemination of the models.

潜在状态-特质(LST)理论提供了一个心理测量框架,便于测量纵向数据中的长期特质变化和短期状态变异性。虽然LST理论在其理论框架内指导了线性潜在增长模型的发展和扩展,但将分段增长模型(PGMs)整合到LST理论框架中仍未得到研究。PGMs非常适合于建模由不同阶段组成的非线性发展过程,这在心理学和教育研究中经常出现。它们捕捉特定阶段变化的能力使它们成为应用和方法研究人员的有用工具。本文通过提出单指标分段增长模型(SI-PGMs)和多指标分段增长模型(MI-PGMs),介绍了一种将分段增长模型整合到LST理论框架中的新型测量方法。我们详细介绍了SI-PGMs和MI-PGMs的模型规格。对于SI-PGMs,我们定义了可靠度系数;对于MI-PGMs,我们定义了一致性系数、场合特异性系数和信度系数。然后,我们进行模拟,以评估模型在准确恢复增长参数和捕获真正的可靠性方面的性能。模拟结果表明,SI-PGMs和MI-PGMs成功地恢复了生长参数,并且在没有环境影响的情况下表现相当。然而,当情境影响存在时,MI-PGMs优于SI-PGMs。最后,我们概述了未来研究的方向,并提供了Mplus语法来支持模型的传播。
{"title":"A Study of Latent State-Trait Theory Framework in Piecewise Growth Models.","authors":"Ihnwhi Heo, Ren Liu, Haiyan Liu, Sarah Depaoli, Fan Jia","doi":"10.1177/01466216251360565","DOIUrl":"10.1177/01466216251360565","url":null,"abstract":"<p><p>Latent state-trait (LST) theory provides a psychometric framework that facilitates the measurement of long-term trait change and short-term state variability in longitudinal data. While LST theory has guided the development and extension of linear latent growth models within its theoretical framework, the integration of piecewise growth models (PGMs) into the LST theory framework remains uninvestigated. PGMs are well suited for modeling nonlinear developmental processes comprised of distinct stages, which frequently arise in psychological and educational research. Their ability to capture phase-specific changes makes them a useful tool for applied and methodological researchers. This paper introduces a novel measurement approach that integrates PGMs into the framework of LST theory by presenting single-indicator piecewise growth models (SI-PGMs) and multiple-indicator piecewise growth models (MI-PGMs). We detail the model specifications for both SI-PGMs and MI-PGMs. For SI-PGMs, we define the reliability coefficient; for MI-PGMs, we define the consistency coefficient, occasion specificity coefficient, and reliability coefficient. We then conduct simulations to evaluate the models' performance in accurately recovering growth parameters and capturing true reliability. The simulation results indicated that SI-PGMs and MI-PGMs successfully recovered growth parameters and performed comparably in the absence of situational influences. However, MI-PGMs outperformed SI-PGMs when situational influences were present. We conclude by outlining directions for future research and providing M<i>plus</i> syntax to support the dissemination of the models.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251360565"},"PeriodicalIF":1.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12264255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-Based Classification Approach. 基于结构的分类方法。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-14 DOI: 10.1177/01466216251360544
Jongwan Kim

This study introduces a novel structure-based classification (SBC) framework that leverages pairwise distance representations of rating data to enhance classification performance while mitigating individual differences in scale usage. Unlike conventional feature-based approaches that rely on absolute rating scores, SBC transforms rating data into structured representations by computing pairwise distances between rating dimensions. This transformation captures the relational structure of ratings, ensuring consistency between training and test datasets and enhancing model robustness. To evaluate the effectiveness of this approach, we conducted a simulation study in which participants rated stimuli across multiple affective dimensions, with systematic individual differences in scale usage. The results demonstrated that SBC successfully classified affective stimuli despite these variations, performing comparably to traditional classification methods. The findings suggest that relational structures among rating dimensions contain meaningful information for affective classification, akin to functional connectivity approaches in cognitive neuroscience. By focusing on rating interdependencies as well as absolute values, SBC provides a robust and generalizable method for analyzing subjective responses, with implications for psychological research.

本研究引入了一种新的基于结构的分类(SBC)框架,该框架利用评级数据的两两距离表示来提高分类性能,同时减轻量表使用的个体差异。与依赖绝对评分的传统基于特征的方法不同,SBC通过计算评分维度之间的成对距离将评分数据转换为结构化表示。这种转换捕获了评级的关系结构,确保了训练和测试数据集之间的一致性,增强了模型的鲁棒性。为了评估这种方法的有效性,我们进行了一项模拟研究,在该研究中,参与者在多个情感维度上对刺激进行评分,并在量表使用上存在系统的个体差异。结果表明,尽管存在这些差异,SBC仍能成功地对情感刺激进行分类,其表现与传统分类方法相当。研究结果表明,评级维度之间的关系结构包含情感分类的有意义信息,类似于认知神经科学中的功能连接方法。通过专注于评估相互依赖性和绝对值,SBC为分析主观反应提供了一种强大且可推广的方法,对心理学研究具有重要意义。
{"title":"Structure-Based Classification Approach.","authors":"Jongwan Kim","doi":"10.1177/01466216251360544","DOIUrl":"10.1177/01466216251360544","url":null,"abstract":"<p><p>This study introduces a novel structure-based classification (SBC) framework that leverages pairwise distance representations of rating data to enhance classification performance while mitigating individual differences in scale usage. Unlike conventional feature-based approaches that rely on absolute rating scores, SBC transforms rating data into structured representations by computing pairwise distances between rating dimensions. This transformation captures the relational structure of ratings, ensuring consistency between training and test datasets and enhancing model robustness. To evaluate the effectiveness of this approach, we conducted a simulation study in which participants rated stimuli across multiple affective dimensions, with systematic individual differences in scale usage. The results demonstrated that SBC successfully classified affective stimuli despite these variations, performing comparably to traditional classification methods. The findings suggest that relational structures among rating dimensions contain meaningful information for affective classification, akin to functional connectivity approaches in cognitive neuroscience. By focusing on rating interdependencies as well as absolute values, SBC provides a robust and generalizable method for analyzing subjective responses, with implications for psychological research.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251360544"},"PeriodicalIF":1.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12264251/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Including Empirical Prior Information in the Reliable Change Index. 在可靠变化指数中加入经验先验信息。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-10 DOI: 10.1177/01466216251358492
R Philip Chalmers, Sarah Campbell

The reliable change index (RCI; Jacobson & Truax, 1991) is commonly used to assess whether individuals have changed across two measurement occasions, and has seen many augmentations and improvements since its initial conception. In this study, we extend an item response theory version of the RCI presented by Jabrayilov et al. (2016) by including empirical priors in the associated RCI computations whenever group-level differences are quantifiable given post-test response information. Based on a reanalysis and extension of a previous simulation study, we demonstrate that although a small amount of bias is added to the estimates of the latent trait differences when no true change is present, including empirical prior information will generally improve the Type I behavior of the model-based RCI. Consequently, when non-zero changes in the latent trait are present the bias and sampling variability are show to be more favorable than competing estimators, subsequently leading to an increase in power to detect non-zero changes.

可靠变化指数(RCI;Jacobson & Truax, 1991)通常用于评估个体是否在两个测量场合中发生了变化,并且自最初的概念以来已经看到了许多增强和改进。在本研究中,我们扩展了Jabrayilov等人(2016)提出的RCI的项目反应理论版本,在给定测试后反应信息的情况下,无论群体水平差异是可量化的,我们都会在相关的RCI计算中纳入经验先验。基于先前模拟研究的再分析和扩展,我们证明,尽管在没有真实变化的情况下,潜在特征差异的估计中添加了少量偏差,但包括经验先验信息通常会改善基于模型的RCI的I型行为。因此,当潜在特征的非零变化存在时,偏差和抽样可变性比竞争估计器更有利,随后导致检测非零变化的能力增加。
{"title":"Including Empirical Prior Information in the Reliable Change Index.","authors":"R Philip Chalmers, Sarah Campbell","doi":"10.1177/01466216251358492","DOIUrl":"10.1177/01466216251358492","url":null,"abstract":"<p><p>The reliable change index (RCI; Jacobson & Truax, 1991) is commonly used to assess whether individuals have changed across two measurement occasions, and has seen many augmentations and improvements since its initial conception. In this study, we extend an item response theory version of the RCI presented by Jabrayilov et al. (2016) by including empirical priors in the associated RCI computations whenever group-level differences are quantifiable given post-test response information. Based on a reanalysis and extension of a previous simulation study, we demonstrate that although a small amount of bias is added to the estimates of the latent trait differences when no true change is present, including empirical prior information will generally improve the Type I behavior of the model-based RCI. Consequently, when non-zero changes in the latent trait are present the bias and sampling variability are show to be more favorable than competing estimators, subsequently leading to an increase in power to detect non-zero changes.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251358492"},"PeriodicalIF":1.0,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144627476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Group Differences in True Score Relationships to Evaluate Measurement Bias. 用真分关系的组差异评价测量偏倚。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-07 DOI: 10.1177/01466216251358491
Michael T Kane, Joanne Kane

This paper makes three contributions to our understanding of measurement bias and predictive bias in testing. First, we develop a linear model for assessing measurement bias across two tests and two groups in terms of the estimated true-score relationships between the two tests in the two groups. This new model for measurement bias is structurally similar to the Cleary model for predictive bias, but it relies on the Errors-in-Variables (EIV) regression model, rather than the Ordinary-Least-Squares (OLS) regression model. Second, we examine some differences between measurement bias and predictive bias in three cases in which two groups have different true-score means, and we illustrate how regression toward the mean in OLS regression can lead to questionable conclusions about test bias if the differences between measurement bias and predictive bias are ignored. Third, we reevaluate a body of empirical findings suggesting that the tests employed in college-admissions and employment-testing programs tend to over-predict criterion performance for minorities, and we show that these findings are consistent with the occurrence of substantial measurement bias against the minority group relative to the majority group.

本文对我们对测试中的测量偏差和预测偏差的理解做出了三个贡献。首先,我们建立了一个线性模型,根据两组中两个测试之间的估计真值关系来评估两个测试和两组之间的测量偏差。这种新的测量偏差模型在结构上类似于预测偏差的Cleary模型,但它依赖于变量误差(EIV)回归模型,而不是普通最小二乘(OLS)回归模型。其次,我们在两组真实得分均值不同的三种情况下检验了测量偏倚和预测偏倚之间的差异,并说明了如果忽略测量偏倚和预测偏倚之间的差异,OLS回归中的均值回归如何导致关于检验偏倚的可疑结论。第三,我们重新评估了一系列实证研究结果,这些研究结果表明,在大学录取和就业测试项目中使用的测试往往会高估少数族裔的标准表现,我们表明,这些发现与相对于多数群体而言,对少数族裔群体存在实质性的测量偏差是一致的。
{"title":"Using Group Differences in True Score Relationships to Evaluate Measurement Bias.","authors":"Michael T Kane, Joanne Kane","doi":"10.1177/01466216251358491","DOIUrl":"10.1177/01466216251358491","url":null,"abstract":"<p><p>This paper makes three contributions to our understanding of measurement bias and predictive bias in testing. First, we develop a linear model for assessing measurement bias across two tests and two groups in terms of the estimated true-score relationships between the two tests in the two groups. This new model for measurement bias is structurally similar to the Cleary model for predictive bias, but it relies on the Errors-in-Variables (EIV) regression model, rather than the Ordinary-Least-Squares (OLS) regression model. Second, we examine some differences between measurement bias and predictive bias in three cases in which two groups have different true-score means, and we illustrate how regression toward the mean in OLS regression can lead to questionable conclusions about test bias if the differences between measurement bias and predictive bias are ignored. Third, we reevaluate a body of empirical findings suggesting that the tests employed in college-admissions and employment-testing programs tend to over-predict criterion performance for minorities, and we show that these findings are consistent with the occurrence of substantial measurement bias against the minority group relative to the majority group.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251358491"},"PeriodicalIF":1.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12234520/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144601949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Standard Error Estimation for Subpopulation Non-invariance. 亚总体非不变性的标准误差估计。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-05 DOI: 10.1177/01466216251351947
Paul A Jewsbury

Score linking is widely used to place scores from different assessments, or the same assessment under different conditions, onto a common scale. A central concern is whether the linking function is invariant across subpopulations, as violations may threaten fairness. However, evaluating subpopulation differences in linked scores is challenging because linking error is not independent of sampling and measurement error when the same data are used to estimate the linking function and to compare score distributions. We show that common approaches involving neglecting linking error or treating it as independent substantially overestimate the standard errors of subpopulation differences. We introduce new methods that account for linking error dependencies. Simulation results demonstrate the accuracy of the proposed methods, and a practical example with real data illustrates how improved standard error estimation enhances power for detecting subpopulation non-invariance.

分数挂钩被广泛用于将不同评估的分数,或在不同条件下的同一评估的分数放在一个共同的尺度上。一个中心问题是,连接函数在各个子种群之间是否不变,因为违反连接函数可能会威胁到公平性。然而,当使用相同的数据来估计连接函数和比较分数分布时,评估关联分数的亚群体差异是具有挑战性的,因为连接误差并不独立于抽样和测量误差。我们表明,包括忽略连接误差或将其视为独立的常见方法实质上高估了亚群体差异的标准误差。我们引入了新的方法来解释错误依赖关系的链接。仿真结果表明了所提方法的准确性,并用一个实际数据实例说明了改进的标准误差估计提高了检测亚种群非不变性的能力。
{"title":"Standard Error Estimation for Subpopulation Non-invariance.","authors":"Paul A Jewsbury","doi":"10.1177/01466216251351947","DOIUrl":"10.1177/01466216251351947","url":null,"abstract":"<p><p>Score linking is widely used to place scores from different assessments, or the same assessment under different conditions, onto a common scale. A central concern is whether the linking function is invariant across subpopulations, as violations may threaten fairness. However, evaluating subpopulation differences in linked scores is challenging because linking error is not independent of sampling and measurement error when the same data are used to estimate the linking function and to compare score distributions. We show that common approaches involving neglecting linking error or treating it as independent substantially overestimate the standard errors of subpopulation differences. We introduce new methods that account for linking error dependencies. Simulation results demonstrate the accuracy of the proposed methods, and a practical example with real data illustrates how improved standard error estimation enhances power for detecting subpopulation non-invariance.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251351947"},"PeriodicalIF":1.0,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144585323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting DIF with the Multi-Unidimensional Pairwise Preference Model: Lord's Chi-square and IPR-NCDIF Methods. 用多维配对偏好模型检测DIF: Lord卡方和IPR-NCDIF方法。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-01 DOI: 10.1177/01466216251351949
Lavanya S Kumar, Naidan Tu, Sean Joo, Stephen Stark

Multidimensional forced choice (MFC) measures are gaining prominence in noncognitive assessment. Yet there has been little research on detecting differential item functioning (DIF) with models for forced choice measures. This research extended two well-known DIF detection methods to MFC measures. Specifically, the performance of Lord's chi-square and item parameter replication (IPR) methods with MFC tests based on the Multi-Unidimensional Pairwise Preference (MUPP) model was investigated. The Type I error rate and power of the DIF detection methods were examined in a Monte Carlo simulation that manipulated sample size, impact, DIF source, and DIF magnitude. Both methods showed consistent power and were found to control Type I error well across study conditions, indicating that established approaches to DIF detection work well with the MUPP model. Lord's chi-square outperformed the IPR method when DIF source was statement discrimination while the opposite was true when DIF source was statement threshold. Also, both methods performed similarly and showed better power when DIF source was statement location, in line with previous research. Study implications and practical recommendations for DIF detection with MFC tests, as well as limitations, are discussed.

多维强迫选择(MFC)方法在非认知评估中越来越受到重视。然而,利用强迫选择测量模型检测差异项目功能(DIF)的研究很少。本研究将两种著名的DIF检测方法扩展到MFC测量中。具体而言,研究了基于多维成对偏好(MUPP)模型的MFC检验的Lord’s卡方和项目参数复制(IPR)方法的性能。在蒙特卡罗模拟中检验了I型错误率和DIF检测方法的功率,该模拟控制了样本量、影响、DIF源和DIF幅度。两种方法都显示出一致的效果,并且在不同的研究条件下都能很好地控制I型误差,这表明已建立的DIF检测方法与MUPP模型一起工作得很好。当DIF源为语句判别时,Lord卡方法优于IPR法;当DIF源为语句阈值时,Lord卡方法优于IPR法。此外,当DIF源为语句位置时,两种方法的性能相似,并且表现出更好的能力,这与先前的研究一致。本文讨论了用MFC检测DIF的研究意义和实际建议,以及局限性。
{"title":"Detecting DIF with the Multi-Unidimensional Pairwise Preference Model: Lord's Chi-square and IPR-NCDIF Methods.","authors":"Lavanya S Kumar, Naidan Tu, Sean Joo, Stephen Stark","doi":"10.1177/01466216251351949","DOIUrl":"10.1177/01466216251351949","url":null,"abstract":"<p><p>Multidimensional forced choice (MFC) measures are gaining prominence in noncognitive assessment. Yet there has been little research on detecting differential item functioning (DIF) with models for forced choice measures. This research extended two well-known DIF detection methods to MFC measures. Specifically, the performance of Lord's chi-square and item parameter replication (IPR) methods with MFC tests based on the Multi-Unidimensional Pairwise Preference (MUPP) model was investigated. The Type I error rate and power of the DIF detection methods were examined in a Monte Carlo simulation that manipulated sample size, impact, DIF source, and DIF magnitude. Both methods showed consistent power and were found to control Type I error well across study conditions, indicating that established approaches to DIF detection work well with the MUPP model. Lord's chi-square outperformed the IPR method when DIF source was statement discrimination while the opposite was true when DIF source was statement threshold. Also, both methods performed similarly and showed better power when DIF source was statement location, in line with previous research. Study implications and practical recommendations for DIF detection with MFC tests, as well as limitations, are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251351949"},"PeriodicalIF":1.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12213542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144561576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1