首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
Modeling Nonlinear Effects of Person-by-Item Covariates in Explanatory Item Response Models: Exploratory Plots and Modeling Using Smooth Functions 在解释性项目反应模型中模拟逐人项目协变量的非线性效应:探索图和使用平滑函数建模
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-24 DOI: 10.1111/jedm.12410
Sun-Joo Cho, Amanda Goodwin, Matthew Naveiras, Paul De Boeck

Explanatory item response models (EIRMs) have been applied to investigate the effects of person covariates, item covariates, and their interactions in the fields of reading education and psycholinguistics. In practice, it is often assumed that the relationships between the covariates and the logit transformation of item response probability are linear. However, this linearity assumption obscures the differential effects of covariates over their range in the presence of nonlinearity. Therefore, this paper presents exploratory plots that describe the potential nonlinear effects of person and item covariates on binary outcome variables. This paper also illustrates the use of EIRMs with smooth functions to model these nonlinear effects. The smooth functions examined in this study include univariate smooths of continuous person or item covariates, tensor product smooths of continuous person and item covariates, and by-variable smooths between a continuous person covariate and a binary item covariate. Parameter estimation was performed using the mgcv R package through the maximum penalized likelihood estimation method. In the empirical study, we identified a nonlinear effect of the person-by-item covariate interaction and discussed its practical implications. Furthermore, the parameter recovery and the model comparison method and hypothesis testing procedures presented were evaluated via simulation studies under the same conditions observed in the empirical study.

在阅读教育和心理语言学领域,解释性项目反应模型(EIRM)已被用于研究人的协变量、项目协变量及其交互作用的影响。在实践中,通常假定协变量与项目反应概率的对数变换之间是线性关系。然而,这种线性假设掩盖了协变量在非线性情况下对其范围的不同影响。因此,本文提出了探索性图表,描述了人和项目协变量对二元结果变量的潜在非线性影响。本文还说明了如何使用具有平滑函数的 EIRM 来模拟这些非线性效应。本研究中考察的平滑函数包括连续人员或项目协变量的单变量平滑函数、连续人员和项目协变量的张量乘积平滑函数,以及连续人员协变量和二元项目协变量之间的双变量平滑函数。参数估计使用 mgcv R 软件包,通过最大似然估计法进行。在实证研究中,我们发现了人与项目协变量交互作用的非线性效应,并讨论了其实际意义。此外,我们还在与实证研究相同的条件下,通过模拟研究对参数恢复、模型比较方法和假设检验程序进行了评估。
{"title":"Modeling Nonlinear Effects of Person-by-Item Covariates in Explanatory Item Response Models: Exploratory Plots and Modeling Using Smooth Functions","authors":"Sun-Joo Cho,&nbsp;Amanda Goodwin,&nbsp;Matthew Naveiras,&nbsp;Paul De Boeck","doi":"10.1111/jedm.12410","DOIUrl":"10.1111/jedm.12410","url":null,"abstract":"<p>Explanatory item response models (EIRMs) have been applied to investigate the effects of person covariates, item covariates, and their interactions in the fields of reading education and psycholinguistics. In practice, it is often assumed that the relationships between the covariates and the logit transformation of item response probability are linear. However, this linearity assumption obscures the differential effects of covariates over their range in the presence of nonlinearity. Therefore, this paper presents exploratory plots that describe the potential nonlinear effects of person and item covariates on binary outcome variables. This paper also illustrates the use of EIRMs with smooth functions to model these nonlinear effects. The smooth functions examined in this study include univariate smooths of continuous person or item covariates, tensor product smooths of continuous person and item covariates, and by-variable smooths between a continuous person covariate and a binary item covariate. Parameter estimation was performed using the <span>mgcv</span> <span>R</span> package through the maximum penalized likelihood estimation method. In the empirical study, we identified a nonlinear effect of the person-by-item covariate interaction and discussed its practical implications. Furthermore, the parameter recovery and the model comparison method and hypothesis testing procedures presented were evaluated via simulation studies under the same conditions observed in the empirical study.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"595-623"},"PeriodicalIF":1.4,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12410","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Choice of Parameters for the Lognormal Model for Response Times: Commentary on Becker et al. (2013) 关于响应时间对数正态模型参数的选择:对贝克尔等人(2013)的评论
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-23 DOI: 10.1111/jedm.12411
Wim J. van der Linden

In a recently published article in this journal, Becker et al. claim that, because of a missing slope parameter, the lognormal model for response times on test items almost never holds in practice. However, the authors' critique rests on a misrepresentation of the model, which already does have the equivalent of a slope parameter. More importantly, their extra parameter spoils the interpretation of the parameters for the test-takers' speed and labor intensity of the items necessary for a response-time model to be empirically meaningful while their proposed interpretation of the extra parameter seems unwarranted. An analysis of the authors' earlier empirical comparison between the original and their alternative version of the model does not seem to support much of a conclusion about the relative fit of the two models. Also, their simulation study conducted to demonstrate the necessity of the extra slope parameter appears to be based on data simulated in favor of their parameter.

Becker 等人最近在本期刊上发表了一篇文章,声称由于斜率参数缺失,对数正态测验项目反应时间模型在实践中几乎从不成立。然而,作者的批评是建立在对模型的误解之上的,因为该模型已经有了一个相当于斜率的参数。更重要的是,他们的额外参数破坏了对反应时间模型所需的应试者速度和项目劳动强度参数的解释,而他们提出的对额外参数的解释似乎是没有道理的。作者早先对原始模型和他们的替代版本模型进行了实证比较,分析结果似乎并不支持关于两个模型相对拟合的结论。此外,他们为证明额外斜率参数的必要性而进行的模拟研究似乎是基于有利于其参数的模拟数据。
{"title":"On the Choice of Parameters for the Lognormal Model for Response Times: Commentary on Becker et al. (2013)","authors":"Wim J. van der Linden","doi":"10.1111/jedm.12411","DOIUrl":"10.1111/jedm.12411","url":null,"abstract":"<p>In a recently published article in this journal, Becker et al. claim that, because of a missing slope parameter, the lognormal model for response times on test items almost never holds in practice. However, the authors' critique rests on a misrepresentation of the model, which already does have the equivalent of a slope parameter. More importantly, their extra parameter spoils the interpretation of the parameters for the test-takers' speed and labor intensity of the items necessary for a response-time model to be empirically meaningful while their proposed interpretation of the extra parameter seems unwarranted. An analysis of the authors' earlier empirical comparison between the original and their alternative version of the model does not seem to support much of a conclusion about the relative fit of the two models. Also, their simulation study conducted to demonstrate the necessity of the extra slope parameter appears to be based on data simulated in favor of their parameter.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"624-633"},"PeriodicalIF":1.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12411","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Automated Procedures to Score Educational Essays Written in Three Languages 使用自动化程序为用三种语言撰写的教育论文评分
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-23 DOI: 10.1111/jedm.12406
Tahereh Firoozi, Hamid Mohammadi, Mark J. Gierl
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language‐agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were holistically scored using the Common European Framework of Reference of Languages. The AES system with mBERT produced results that were consistent with human raters overall across all three language groups. The system also produced accurate predictions for some but not all of the score levels within each language. The AES system with LaBSE produced results that were even more consistent with the human raters overall across all three language groups compared to mBERT. In addition, the system produced accurate predictions for the majority of the score levels within each language. The performance differences between mBERT and LaBSE can be explained by considering how each language embedding model is implemented. Implications of this study for educational testing are also discussed.
本研究旨在描述和评估一种多语言自动作文评分(AES)系统,该系统可对三种语言的作文进行评分。在 AES 系统中评估了两种不同的句子嵌入模型:多语种 BERT (mBERT) 和语言无关 BERT 句子嵌入 (LaBSE)。使用欧洲语言共同参考框架对德语、意大利语和捷克语论文进行了整体评分。在所有三个语言组中,使用 mBERT 的 AES 系统得出的结果与人类评分员的结果总体上一致。该系统还能准确预测每种语言中的部分分数等级,但不是所有分数等级。与 mBERT 相比,使用 LaBSE 的 AES 系统在所有三个语言组中得出的结果与人类评分员的总体评分结果更加一致。此外,该系统对每种语言中的大部分分数等级都能做出准确的预测。mBERT 和 LaBSE 之间的性能差异可以通过考虑每种语言嵌入模型的实现方式来解释。本研究对教育测试的影响也在讨论之列。
{"title":"Using Automated Procedures to Score Educational Essays Written in Three Languages","authors":"Tahereh Firoozi, Hamid Mohammadi, Mark J. Gierl","doi":"10.1111/jedm.12406","DOIUrl":"https://doi.org/10.1111/jedm.12406","url":null,"abstract":"The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language‐agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were holistically scored using the Common European Framework of Reference of Languages. The AES system with mBERT produced results that were consistent with human raters overall across all three language groups. The system also produced accurate predictions for some but not all of the score levels within each language. The AES system with LaBSE produced results that were even more consistent with the human raters overall across all three language groups compared to mBERT. In addition, the system produced accurate predictions for the majority of the score levels within each language. The performance differences between mBERT and LaBSE can be explained by considering how each language embedding model is implemented. Implications of this study for educational testing are also discussed.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reckase, M. The Psychometrics of Standard Setting: Connecting Policy and Test Scores: First edition published 2023 by CRC Press, 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 Reckase,M.The Psychometrics of Standard Setting:连接政策与考试分数》:第一版于 2023 年由 CRC Press 出版,地址:6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-23 DOI: 10.1111/jedm.12407
Daniel Lewis, Sandip Sinharay
{"title":"Reckase, M. The Psychometrics of Standard Setting: Connecting Policy and Test Scores: First edition published 2023 by CRC Press, 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742","authors":"Daniel Lewis,&nbsp;Sandip Sinharay","doi":"10.1111/jedm.12407","DOIUrl":"10.1111/jedm.12407","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"773-779"},"PeriodicalIF":1.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model Selection Posterior Predictive Model Checking via Limited-Information Indices for Bayesian Diagnostic Classification Modeling 通过贝叶斯诊断分类建模的有限信息指标进行模型选择后验预测模型
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-15 DOI: 10.1111/jedm.12408
Jihong Zhang, Jonathan Templin, Xinya Liang

Recently, Bayesian diagnostic classification modeling has been becoming popular in health psychology, education, and sociology. Typically information criteria are used for model selection when researchers want to choose the best model among alternative models. In Bayesian estimation, posterior predictive checking is a flexible Bayesian model evaluation tool, which allows researchers to detect Q-matrix misspecification. However, model selection methods using posterior predictive checking (PPC) for Bayesian DCM are not well investigated. Thus, this research aims to propose a novel model selection approach using posterior predictive checking with limited-information statistics for selecting the correct Q-matrix. A simulation study was conducted to examine the performance of the proposed method. Furthermore, an empirical example was provided to illustrate how it can be used in real scenarios.

最近,贝叶斯诊断分类模型在健康心理学、教育学和社会学领域开始流行起来。当研究人员希望在备选模型中选择最佳模型时,通常会使用信息标准进行模型选择。在贝叶斯估计中,后验预测检查是一种灵活的贝叶斯模型评估工具,它能让研究人员检测 Q 矩阵的错误规范。然而,利用后验预测检查(PPC)为贝叶斯 DCM 选择模型的方法还没有得到很好的研究。因此,本研究旨在提出一种使用后验预测检查和有限信息统计的新型模型选择方法,以选择正确的 Q 矩阵。研究人员进行了模拟研究,以检验所提出方法的性能。此外,还提供了一个经验范例,以说明如何在实际场景中使用该方法。
{"title":"Model Selection Posterior Predictive Model Checking via Limited-Information Indices for Bayesian Diagnostic Classification Modeling","authors":"Jihong Zhang,&nbsp;Jonathan Templin,&nbsp;Xinya Liang","doi":"10.1111/jedm.12408","DOIUrl":"10.1111/jedm.12408","url":null,"abstract":"<p>Recently, Bayesian diagnostic classification modeling has been becoming popular in health psychology, education, and sociology. Typically information criteria are used for model selection when researchers want to choose the best model among alternative models. In Bayesian estimation, posterior predictive checking is a flexible Bayesian model evaluation tool, which allows researchers to detect Q-matrix misspecification. However, model selection methods using posterior predictive checking (PPC) for Bayesian DCM are not well investigated. Thus, this research aims to propose a novel model selection approach using posterior predictive checking with limited-information statistics for selecting the correct Q-matrix. A simulation study was conducted to examine the performance of the proposed method. Furthermore, an empirical example was provided to illustrate how it can be used in real scenarios.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"740-762"},"PeriodicalIF":1.4,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12408","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141646896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generalized Objective Function for Computer Adaptive Item Selection 计算机自适应项目选择的通用目标函数
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-02 DOI: 10.1111/jedm.12405
Harold Doran, Testsuhiro Yamada, Ted Diaz, Emre Gonulates, Vanessa Culver
Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and principled assessment design. The generalized nature of the algorithm permits a wide array of test requirements allowing experts to define what to measure and how to measure it and the algorithm is simply a means to an end to support better construct representation. This work also emphasizes the computational algorithm and its ability to scale to support faster computing and better cost‐containment in real‐world applications than other CAT algorithms. We make a significant effort to consolidate all information needed to build and scale the algorithm so that expert psychometricians and software developers can use this document as a self‐contained resource and specification document to build and deploy an operational CAT platform.
计算机自适应测试(CAT)是一种日益普遍的测试管理模式,它能提高测试的安全性、测量的精确性,并有可能缩短测试时间。本文介绍了一种基于通用目标函数的新项目选择算法,以支持多种类型的测试条件和有原则的评估设计。该算法的通用性可满足各种测试要求,让专家确定测量什么和如何测量,而算法只是一种手段,目的是支持更好的建构表征。与其他 CAT 算法相比,这项工作还强调了计算算法及其在实际应用中的扩展能力,以支持更快的计算和更好的成本控制。我们努力整合构建和扩展算法所需的所有信息,以便心理测量专家和软件开发人员可以将本文档作为自成一体的资源和规范文档,用于构建和部署可操作的 CAT 平台。
{"title":"A Generalized Objective Function for Computer Adaptive Item Selection","authors":"Harold Doran, Testsuhiro Yamada, Ted Diaz, Emre Gonulates, Vanessa Culver","doi":"10.1111/jedm.12405","DOIUrl":"https://doi.org/10.1111/jedm.12405","url":null,"abstract":"Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and principled assessment design. The generalized nature of the algorithm permits a wide array of test requirements allowing experts to define what to measure and how to measure it and the algorithm is simply a means to an end to support better construct representation. This work also emphasizes the computational algorithm and its ability to scale to support faster computing and better cost‐containment in real‐world applications than other CAT algorithms. We make a significant effort to consolidate all information needed to build and scale the algorithm so that expert psychometricians and software developers can use this document as a self‐contained resource and specification document to build and deploy an operational CAT platform.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"144 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141528216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Likelihood-Based Estimation of Model-Derived Oral Reading Fluency 基于似然法估计模型得出的口语阅读流利度
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-06-22 DOI: 10.1111/jedm.12404
Cornelis Potgieter, Xin Qiao, Akihito Kamata, Yusuf Kara

As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including maximum likelihood estimator (MLE), maximum a posteriori (MAP), and expected a posteriori (EAP), as well as their standard errors. The proposed estimators were demonstrated with a real ORF assessment dataset. Also, the estimation of model-derived ORF scores and their standard errors by the proposed estimators were evaluated through a simulation study. The fully Bayesian approach was included as a comparison in the real data analysis and the simulation study. Results demonstrated that the three likelihood-based approaches for the model-derived ORF scores and their standard error estimation performed satisfactorily.

作为开发改进型口语阅读流利度(ORF)评估系统工作的一部分,Kara 等人通过完全贝叶斯方法,根据口语阅读流利度数据的准确性和速度的潜在变量心理测量模型估算了 ORF 分数。本研究进一步研究了基于似然估计法的 ORF 分数模型,包括最大似然估计法(MLE)、最大后验法(MAP)和预期后验法(EAP)及其标准误差。利用真实的 ORF 评估数据集演示了所提出的估计方法。此外,还通过模拟研究评估了模型衍生 ORF 分数的估计值及其标准误差。在真实数据分析和模拟研究中,将完全贝叶斯方法作为比较对象。结果表明,这三种基于似然法的模型衍生 ORF 分数及其标准误差估计方法的性能令人满意。
{"title":"Likelihood-Based Estimation of Model-Derived Oral Reading Fluency","authors":"Cornelis Potgieter,&nbsp;Xin Qiao,&nbsp;Akihito Kamata,&nbsp;Yusuf Kara","doi":"10.1111/jedm.12404","DOIUrl":"10.1111/jedm.12404","url":null,"abstract":"<p>As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including maximum likelihood estimator (MLE), maximum a posteriori (MAP), and expected a posteriori (EAP), as well as their standard errors. The proposed estimators were demonstrated with a real ORF assessment dataset. Also, the estimation of model-derived ORF scores and their standard errors by the proposed estimators were evaluated through a simulation study. The fully Bayesian approach was included as a comparison in the real data analysis and the simulation study. Results demonstrated that the three likelihood-based approaches for the model-derived ORF scores and their standard error estimation performed satisfactorily.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"542-559"},"PeriodicalIF":1.4,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Curvilinearity in the Reference Composite and Practical Implications for Measurement 参考综合数据的曲线性及其对测量的实际影响
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-06-05 DOI: 10.1111/jedm.12402
Xiangyi Liao, Daniel M. Bolt, Jee-Seon Kim

Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensions change across the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone-polynomial or asymmetric IRT models. Simulations and a real-data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.

项目难度和维度往往是相关的,这意味着多维数据(即参考复合数据)的单维 IRT 近似值可以在多维空间中呈现曲线形式。虽然这个问题以前在纵向缩放应用中讨论过,但我们要说明的是,这种现象在单项测验中也很容易出现。例如,对阅读能力的测评通常会在一次测评中使用不同的任务类型,这一特点不仅可能导致多维性,还可能导致项目难度与维度之间的关联。利用潜回归策略,我们通过模拟和实证分析证明了维度和难度之间的关联如何产生非线性参考综合,在这种综合中,基础维度的权重会根据与维度相关的项目难度在量表连续体中发生变化。我们进一步说明了这种曲线形式如何在传统的单维度 IRT 模型(如 2PL 模型)中产生系统性的规格错误,并能被单项式-多项式或非对称 IRT 模型等模型更好地适应。本文提供了一个模拟和真实数据示例,该示例来自幼儿纵向研究--幼儿园。本文还讨论了测量建模和理解 2PL 错误规范对测量指标的影响的一些意义。
{"title":"Curvilinearity in the Reference Composite and Practical Implications for Measurement","authors":"Xiangyi Liao,&nbsp;Daniel M. Bolt,&nbsp;Jee-Seon Kim","doi":"10.1111/jedm.12402","DOIUrl":"10.1111/jedm.12402","url":null,"abstract":"<p>Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensions <i>change</i> across the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone-polynomial or asymmetric IRT models. Simulations and a real-data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"511-541"},"PeriodicalIF":1.4,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12402","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141386190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Response Styles in Cross-Classified Data Using a Cross-Classified Multidimensional Nominal Response Model 使用交叉分类多维名义响应模型为交叉分类数据中的响应风格建模
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-05-31 DOI: 10.1111/jedm.12401
Sijia Huang, Seungwon Chung, Carl F. Falk

In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC-MNRM with RS, a CC-MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross-classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.

在本研究中,我们引入了一个交叉分类多维名义反应模型(CC-MNRM),以考虑交叉分类数据中的各种反应风格(RS)。所提出的模型允许斜率在不同项目间变化,并能探索观察到的协变量对潜在构造的影响。我们采用了最近开发的 Metropolis-Hastings Robbins-Monro (MH-RM) 算法的变体,以解决估计所提模型的计算难题。为了展示我们的新方法,我们分析了从一所大型公立大学收集的学生教学评价(SET)实证数据,并使用了三种模型:带 RS 的 CC-MNRM 模型、不带 RS 的 CC-MNRM 模型和带 RS 的多层次 MNRM 模型。结果表明,这三种模型对观察到的协变量做出了不同的推断。此外,在示例中,忽略/纳入 RS 会导致学生的实质分数发生变化,而教师的实质分数受到的影响较小。对交叉分类数据结构的错误定义导致了教师评分的明显变化。为了进一步评估所提出的建模方法,我们进行了初步的模拟研究,观察到参数和分数恢复良好。最后,我们讨论了本研究的局限性和未来的研究方向。
{"title":"Modeling Response Styles in Cross-Classified Data Using a Cross-Classified Multidimensional Nominal Response Model","authors":"Sijia Huang,&nbsp;Seungwon Chung,&nbsp;Carl F. Falk","doi":"10.1111/jedm.12401","DOIUrl":"10.1111/jedm.12401","url":null,"abstract":"<p>In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC-MNRM with RS, a CC-MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross-classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"486-510"},"PeriodicalIF":1.4,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior 利用配置文件相似度指标扩展对数正态响应时间模型,改进异常测试行为的检测
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-05-13 DOI: 10.1111/jedm.12395
Gregory M. Hurtz, Regi Mucino

The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we demonstrate the connection between the existing LNRT model parameters and the “level” component of profile similarity, and we define two new parameters for the LNRT model representing profile “dispersion” and “shape.” We show that while the LNRT model measures level (speed), profile dispersion and shape are conflated in model residuals, and that distinguishing them provides meaningful and useful parameters for identifying anomalous testing behavior. Results from data in a situation where many test-takers gained preknowledge of test items revealed that profile shape, not currently measured in the LNRT model, was the most sensitive response time index to the abnormal test-taking behavior patterns. Results strongly support expanding the LNRT model to measure not only each test-taker's level of speed, but also the dispersion and shape of their response time profiles.

对数正态响应时间(LNRT)模型测量的是应试者相对于测验项目正常时间要求的速度。通常会对由此得出的速度参数和模型残差进行分析,以寻找与快速和不太吻合的反应时间模式相关的异常应试行为的证据。通过扩展该模型,我们证明了现有 LNRT 模型参数与特征相似性的 "水平 "部分之间的联系,并为 LNRT 模型定义了两个新参数,分别代表特征 "分散 "和 "形状"。我们表明,虽然 LNRT 模型测量的是水平(速度),但在模型残差中,剖面离散度和形状是混在一起的,将它们区分开来可为识别异常测试行为提供有意义且有用的参数。在许多应试者预先知道测试项目的情况下,数据结果显示,目前 LNRT 模型没有测量的轮廓形状是对异常应试行为模式最敏感的反应时间指标。研究结果强烈支持扩展 LNRT 模型,使其不仅能测量每个应试者的速度水平,还能测量其反应时间曲线的分散性和形状。
{"title":"Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior","authors":"Gregory M. Hurtz,&nbsp;Regi Mucino","doi":"10.1111/jedm.12395","DOIUrl":"10.1111/jedm.12395","url":null,"abstract":"<p>The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we demonstrate the connection between the existing LNRT model parameters and the “level” component of profile similarity, and we define two new parameters for the LNRT model representing profile “dispersion” and “shape.” We show that while the LNRT model measures level (speed), profile dispersion and shape are conflated in model residuals, and that distinguishing them provides meaningful and useful parameters for identifying anomalous testing behavior. Results from data in a situation where many test-takers gained preknowledge of test items revealed that profile shape, not currently measured in the LNRT model, was the most sensitive response time index to the abnormal test-taking behavior patterns. Results strongly support expanding the LNRT model to measure not only each test-taker's level of speed, but also the dispersion and shape of their response time profiles.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"458-485"},"PeriodicalIF":1.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140939780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1