首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses 基于日志文件的行为过程数据分析的顺序储层计算
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-09-14 DOI: 10.1111/jedm.12413
Jiawei Xiong, Shiyu Wang, Cheng Tang, Qidi Liu, Rufei Sheng, Bowen Wang, Huan Kuang, Allan S. Cohen, Xinhui Xiong
The use of process data in assessment has gained attention in recent years as more assessments are administered by computers. Process data, recorded in computer log files, capture the sequence of examinees' response activities, for example, timestamped keystrokes, during the assessment. Traditional measurement methods are often inadequate for handling this type of data. In this paper, we proposed a sequential reservoir method (SRM) based on a reservoir computing model using the echo state network, with the particle swarm optimization and singular value decomposition as optimization. Designed to regularize features from process data through a computational self‐learning algorithm, this method has been evaluated using both simulated and empirical data. Simulation results suggested that, on one hand, the model effectively transforms action sequences into standardized and meaningful features, and on the other hand, these features are instrumental in categorizing latent behavioral groups and predicting latent information. Empirical results further indicate that SRM can predict assessment efficiency. The features extracted by SRM have been verified as related to action sequence lengths through the correlation analysis. This proposed method enhances the extraction and accessibility of meaningful information from process data, presenting an alternative to existing process data technologies.
近年来,随着越来越多的测评由计算机进行,在测评中使用过程数据的做法越来越受关注。记录在计算机日志文件中的过程数据可以捕捉考生在测评过程中的反应活动顺序,例如,带有时间戳记的击键。传统的测量方法往往不足以处理这类数据。在本文中,我们提出了一种基于水库计算模型的顺序水库法(SRM),该模型使用回波状态网络,并以粒子群优化和奇异值分解作为优化手段。该方法旨在通过计算自学习算法对过程数据中的特征进行正则化处理,我们利用模拟数据和经验数据对该方法进行了评估。模拟结果表明,一方面,该模型能有效地将动作序列转化为标准化和有意义的特征;另一方面,这些特征有助于对潜在行为组进行分类和预测潜在信息。实证结果进一步表明,SRM 可以预测评估效率。通过相关性分析,SRM 提取的特征与动作序列长度的相关性得到了验证。所提出的方法提高了从过程数据中提取和获取有意义信息的能力,为现有的过程数据技术提供了一种替代方案。
{"title":"Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses","authors":"Jiawei Xiong, Shiyu Wang, Cheng Tang, Qidi Liu, Rufei Sheng, Bowen Wang, Huan Kuang, Allan S. Cohen, Xinhui Xiong","doi":"10.1111/jedm.12413","DOIUrl":"https://doi.org/10.1111/jedm.12413","url":null,"abstract":"The use of process data in assessment has gained attention in recent years as more assessments are administered by computers. Process data, recorded in computer log files, capture the sequence of examinees' response activities, for example, timestamped keystrokes, during the assessment. Traditional measurement methods are often inadequate for handling this type of data. In this paper, we proposed a sequential reservoir method (SRM) based on a reservoir computing model using the echo state network, with the particle swarm optimization and singular value decomposition as optimization. Designed to regularize features from process data through a computational self‐learning algorithm, this method has been evaluated using both simulated and empirical data. Simulation results suggested that, on one hand, the model effectively transforms action sequences into standardized and meaningful features, and on the other hand, these features are instrumental in categorizing latent behavioral groups and predicting latent information. Empirical results further indicate that SRM can predict assessment efficiency. The features extracted by SRM have been verified as related to action sequence lengths through the correlation analysis. This proposed method enhances the extraction and accessibility of meaningful information from process data, presenting an alternative to existing process data technologies.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Latent Constructs through Multimodal Data Analysis 通过多模态数据分析探索潜在结构
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-08-14 DOI: 10.1111/jedm.12412
Shiyu Wang, Shushan Wu, Yinghan Chen, Luyang Fang, Liang Xiao, Feiming Li
This study presents a comprehensive analysis of three types of multimodal data‐response accuracy, response times, and eye‐tracking data‐derived from a computer‐based spatial rotation test. To tackle the complexity of high‐dimensional data analysis challenges, we have developed a methodological framework incorporating various statistical and machine learning methods. The results of our study reveal that hidden state transition probabilities, based on eye‐tracking features, may be contingent on skill mastery estimated from the fluency CDM model. The hidden state trajectory offers additional diagnostic insights into spatial rotation problem‐solving, surpassing the information provided by the fluency CDM alone. Furthermore, the distribution of participants across different hidden states reflects the intricate nature of visualizing objects in each item, adding a nuanced dimension to the characterization of item features. This complements the information obtained from item parameters in the fluency CDM model, which relies on response accuracy and response time. Our findings have the potential to pave the way for the development of new psychometric and statistical models capable of seamlessly integrating various types of multimodal data. This integrated approach promises more meaningful and interpretable results, with implications for advancing the understanding of cognitive processes involved in spatial rotation tests.
本研究全面分析了基于计算机的空间旋转测试中的三种多模态数据--反应准确性、反应时间和眼动数据。为了应对复杂的高维数据分析挑战,我们开发了一个方法框架,其中融合了各种统计和机器学习方法。我们的研究结果表明,基于眼动跟踪特征的隐藏状态转换概率,可能取决于根据流畅性 CDM 模型估计的技能掌握程度。隐藏状态轨迹为空间旋转问题的解决提供了额外的诊断见解,超过了仅由流畅性 CDM 提供的信息。此外,被试在不同隐藏状态下的分布也反映了每个项目中物体视觉化的复杂性,为项目特征的描述增添了一个细微的维度。这是对流畅度 CDM 模型中通过项目参数获得的信息的补充,而流畅度 CDM 模型依赖于反应准确性和反应时间。我们的研究结果有望为开发新的心理测量和统计模型铺平道路,使其能够无缝整合各种类型的多模态数据。这种整合方法有望得到更有意义、更可解释的结果,从而促进对空间旋转测试所涉及的认知过程的理解。
{"title":"Exploring Latent Constructs through Multimodal Data Analysis","authors":"Shiyu Wang, Shushan Wu, Yinghan Chen, Luyang Fang, Liang Xiao, Feiming Li","doi":"10.1111/jedm.12412","DOIUrl":"https://doi.org/10.1111/jedm.12412","url":null,"abstract":"This study presents a comprehensive analysis of three types of multimodal data‐response accuracy, response times, and eye‐tracking data‐derived from a computer‐based spatial rotation test. To tackle the complexity of high‐dimensional data analysis challenges, we have developed a methodological framework incorporating various statistical and machine learning methods. The results of our study reveal that hidden state transition probabilities, based on eye‐tracking features, may be contingent on skill mastery estimated from the fluency CDM model. The hidden state trajectory offers additional diagnostic insights into spatial rotation problem‐solving, surpassing the information provided by the fluency CDM alone. Furthermore, the distribution of participants across different hidden states reflects the intricate nature of visualizing objects in each item, adding a nuanced dimension to the characterization of item features. This complements the information obtained from item parameters in the fluency CDM model, which relies on response accuracy and response time. Our findings have the potential to pave the way for the development of new psychometric and statistical models capable of seamlessly integrating various types of multimodal data. This integrated approach promises more meaningful and interpretable results, with implications for advancing the understanding of cognitive processes involved in spatial rotation tests.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"69 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs 国际学生评估项目多阶段适应性测试设计下项目反应理论模型的稳健性
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-08-01 DOI: 10.1111/jedm.12409
Hyo Jeong Shin, Christoph König, Frederic Robin, Andreas Frey, Kentaro Yamamoto
Many international large‐scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)‐based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item‐by‐country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not‐reached items. The purpose of this study is to examine the robustness of current IRT‐based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item‐by‐country interactions and items not‐reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.
许多国际性的大规模测评(ILSA)都改用了多阶段适应性测试(MST)设计,以提高测量效率,测量全球异质人群的技能。在这种情况下,以往的文献报道了在多阶段自适应测试设计下,当使用目前基于项目反应理论(IRT)的比例模型时,模型参数恢复的可接受程度。然而,以往的研究并未考虑 ILSA 数据中常见的现实现象的影响,如项目与国家之间的交互作用、在后续周期中重复使用 MST 设计以及非响应(包括遗漏和未达到的项目)。本研究的目的是以国际学生评估项目(PISA)设计为例,检验目前基于 IRT 的缩放模型在 MST 设计下对这三个因素的稳健性。一系列模拟研究表明,在 PISA 项目中使用的 IRT 计分模型在后续周期中重复使用 MST 设计(项目数量更少、样本量更小)时是稳健的,而项目与国家之间的交互作用和未达到的项目对模型参数估计的影响可以忽略不计,甚至微乎其微,而遗漏回答的影响最大。讨论部分为未来的 MST 设计和 ILSA 的比例模型提供了建议和启示。
{"title":"Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs","authors":"Hyo Jeong Shin, Christoph König, Frederic Robin, Andreas Frey, Kentaro Yamamoto","doi":"10.1111/jedm.12409","DOIUrl":"https://doi.org/10.1111/jedm.12409","url":null,"abstract":"Many international large‐scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)‐based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item‐by‐country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not‐reached items. The purpose of this study is to examine the robustness of current IRT‐based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item‐by‐country interactions and items not‐reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"75 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Nonlinear Effects of Person‐by‐Item Covariates in Explanatory Item Response Models: Exploratory Plots and Modeling Using Smooth Functions 在解释性项目反应模型中模拟逐人项目协变量的非线性效应:探索图和使用平滑函数建模
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-25 DOI: 10.1111/jedm.12410
Sun‐Joo Cho, Amanda Goodwin, Matthew Naveiras, Paul De Boeck
Explanatory item response models (EIRMs) have been applied to investigate the effects of person covariates, item covariates, and their interactions in the fields of reading education and psycholinguistics. In practice, it is often assumed that the relationships between the covariates and the logit transformation of item response probability are linear. However, this linearity assumption obscures the differential effects of covariates over their range in the presence of nonlinearity. Therefore, this paper presents exploratory plots that describe the potential nonlinear effects of person and item covariates on binary outcome variables. This paper also illustrates the use of EIRMs with smooth functions to model these nonlinear effects. The smooth functions examined in this study include univariate smooths of continuous person or item covariates, tensor product smooths of continuous person and item covariates, and by‐variable smooths between a continuous person covariate and a binary item covariate. Parameter estimation was performed using the mgcv R package through the maximum penalized likelihood estimation method. In the empirical study, we identified a nonlinear effect of the person‐by‐item covariate interaction and discussed its practical implications. Furthermore, the parameter recovery and the model comparison method and hypothesis testing procedures presented were evaluated via simulation studies under the same conditions observed in the empirical study.
在阅读教育和心理语言学领域,解释性项目反应模型(EIRM)已被用于研究人的协变量、项目协变量及其交互作用的影响。在实践中,通常假定协变量与项目反应概率的对数变换之间是线性关系。然而,这种线性假设掩盖了协变量在非线性情况下对其范围的不同影响。因此,本文提出了探索性图表,描述了人和项目协变量对二元结果变量的潜在非线性影响。本文还说明了如何使用具有平滑函数的 EIRM 来模拟这些非线性效应。本研究中考察的平滑函数包括连续人员或项目协变量的单变量平滑函数、连续人员和项目协变量的张量乘积平滑函数,以及连续人员协变量和二元项目协变量之间的双变量平滑函数。参数估计使用 mgcv R 软件包,通过最大似然估计法进行。在实证研究中,我们发现了人与项目协变量交互作用的非线性效应,并讨论了其实际意义。此外,我们还在与实证研究相同的条件下,通过模拟研究对参数恢复、模型比较方法和假设检验程序进行了评估。
{"title":"Modeling Nonlinear Effects of Person‐by‐Item Covariates in Explanatory Item Response Models: Exploratory Plots and Modeling Using Smooth Functions","authors":"Sun‐Joo Cho, Amanda Goodwin, Matthew Naveiras, Paul De Boeck","doi":"10.1111/jedm.12410","DOIUrl":"https://doi.org/10.1111/jedm.12410","url":null,"abstract":"Explanatory item response models (EIRMs) have been applied to investigate the effects of person covariates, item covariates, and their interactions in the fields of reading education and psycholinguistics. In practice, it is often assumed that the relationships between the covariates and the logit transformation of item response probability are linear. However, this linearity assumption obscures the differential effects of covariates over their range in the presence of nonlinearity. Therefore, this paper presents exploratory plots that describe the potential nonlinear effects of person and item covariates on binary outcome variables. This paper also illustrates the use of EIRMs with smooth functions to model these nonlinear effects. The smooth functions examined in this study include univariate smooths of continuous person or item covariates, tensor product smooths of continuous person and item covariates, and by‐variable smooths between a continuous person covariate and a binary item covariate. Parameter estimation was performed using the <jats:styled-content>mgcv</jats:styled-content> <jats:styled-content>R</jats:styled-content> package through the maximum penalized likelihood estimation method. In the empirical study, we identified a nonlinear effect of the person‐by‐item covariate interaction and discussed its practical implications. Furthermore, the parameter recovery and the model comparison method and hypothesis testing procedures presented were evaluated via simulation studies under the same conditions observed in the empirical study.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Choice of Parameters for the Lognormal Model for Response Times: Commentary on Becker et al. (2013) 关于响应时间对数正态模型参数的选择:对贝克尔等人(2013)的评论
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-24 DOI: 10.1111/jedm.12411
Wim J. van der Linden
In a recently published article in this journal, Becker et al. claim that, because of a missing slope parameter, the lognormal model for response times on test items almost never holds in practice. However, the authors' critique rests on a misrepresentation of the model, which already does have the equivalent of a slope parameter. More importantly, their extra parameter spoils the interpretation of the parameters for the test‐takers' speed and labor intensity of the items necessary for a response‐time model to be empirically meaningful while their proposed interpretation of the extra parameter seems unwarranted. An analysis of the authors' earlier empirical comparison between the original and their alternative version of the model does not seem to support much of a conclusion about the relative fit of the two models. Also, their simulation study conducted to demonstrate the necessity of the extra slope parameter appears to be based on data simulated in favor of their parameter.
Becker 等人最近在本期刊上发表了一篇文章,声称由于斜率参数缺失,对数正态测验项目反应时间模型在实践中几乎从不成立。然而,作者的批评是建立在对模型的误解之上的,因为该模型已经有了一个相当于斜率的参数。更重要的是,他们的额外参数破坏了对反应时间模型所需的应试者速度和项目劳动强度参数的解释,而他们提出的对额外参数的解释似乎是没有道理的。作者早先对原始模型和他们的替代版本模型进行了实证比较,分析结果似乎并不支持关于两个模型相对拟合的结论。此外,他们为证明额外斜率参数的必要性而进行的模拟研究似乎是基于有利于其参数的模拟数据。
{"title":"On the Choice of Parameters for the Lognormal Model for Response Times: Commentary on Becker et al. (2013)","authors":"Wim J. van der Linden","doi":"10.1111/jedm.12411","DOIUrl":"https://doi.org/10.1111/jedm.12411","url":null,"abstract":"In a recently published article in this journal, Becker et al. claim that, because of a missing slope parameter, the lognormal model for response times on test items almost never holds in practice. However, the authors' critique rests on a misrepresentation of the model, which already does have the equivalent of a slope parameter. More importantly, their extra parameter spoils the interpretation of the parameters for the test‐takers' speed and labor intensity of the items necessary for a response‐time model to be empirically meaningful while their proposed interpretation of the extra parameter seems unwarranted. An analysis of the authors' earlier empirical comparison between the original and their alternative version of the model does not seem to support much of a conclusion about the relative fit of the two models. Also, their simulation study conducted to demonstrate the necessity of the extra slope parameter appears to be based on data simulated in favor of their parameter.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"66 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reckase, M.The Psychometrics of Standard Setting: Connecting Policy and Test Scores: First edition published 2023 by CRC Press, 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487‐2742 Reckase,M.The Psychometrics of Standard Setting:连接政策与考试分数》:第一版于 2023 年由 CRC Press 出版,地址:6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-24 DOI: 10.1111/jedm.12407
Daniel Lewis, Sandip Sinharay
{"title":"Reckase, M.The Psychometrics of Standard Setting: Connecting Policy and Test Scores: First edition published 2023 by CRC Press, 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487‐2742","authors":"Daniel Lewis, Sandip Sinharay","doi":"10.1111/jedm.12407","DOIUrl":"https://doi.org/10.1111/jedm.12407","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"54 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Automated Procedures to Score Educational Essays Written in Three Languages 使用自动化程序为用三种语言撰写的教育论文评分
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-23 DOI: 10.1111/jedm.12406
Tahereh Firoozi, Hamid Mohammadi, Mark J. Gierl
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language‐agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were holistically scored using the Common European Framework of Reference of Languages. The AES system with mBERT produced results that were consistent with human raters overall across all three language groups. The system also produced accurate predictions for some but not all of the score levels within each language. The AES system with LaBSE produced results that were even more consistent with the human raters overall across all three language groups compared to mBERT. In addition, the system produced accurate predictions for the majority of the score levels within each language. The performance differences between mBERT and LaBSE can be explained by considering how each language embedding model is implemented. Implications of this study for educational testing are also discussed.
本研究旨在描述和评估一种多语言自动作文评分(AES)系统,该系统可对三种语言的作文进行评分。在 AES 系统中评估了两种不同的句子嵌入模型:多语种 BERT (mBERT) 和语言无关 BERT 句子嵌入 (LaBSE)。使用欧洲语言共同参考框架对德语、意大利语和捷克语论文进行了整体评分。在所有三个语言组中,使用 mBERT 的 AES 系统得出的结果与人类评分员的结果总体上一致。该系统还能准确预测每种语言中的部分分数等级,但不是所有分数等级。与 mBERT 相比,使用 LaBSE 的 AES 系统在所有三个语言组中得出的结果与人类评分员的总体评分结果更加一致。此外,该系统对每种语言中的大部分分数等级都能做出准确的预测。mBERT 和 LaBSE 之间的性能差异可以通过考虑每种语言嵌入模型的实现方式来解释。本研究对教育测试的影响也在讨论之列。
{"title":"Using Automated Procedures to Score Educational Essays Written in Three Languages","authors":"Tahereh Firoozi, Hamid Mohammadi, Mark J. Gierl","doi":"10.1111/jedm.12406","DOIUrl":"https://doi.org/10.1111/jedm.12406","url":null,"abstract":"The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language‐agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were holistically scored using the Common European Framework of Reference of Languages. The AES system with mBERT produced results that were consistent with human raters overall across all three language groups. The system also produced accurate predictions for some but not all of the score levels within each language. The AES system with LaBSE produced results that were even more consistent with the human raters overall across all three language groups compared to mBERT. In addition, the system produced accurate predictions for the majority of the score levels within each language. The performance differences between mBERT and LaBSE can be explained by considering how each language embedding model is implemented. Implications of this study for educational testing are also discussed.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generalized Objective Function for Computer Adaptive Item Selection 计算机自适应项目选择的通用目标函数
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-07-02 DOI: 10.1111/jedm.12405
Harold Doran, Testsuhiro Yamada, Ted Diaz, Emre Gonulates, Vanessa Culver
Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and principled assessment design. The generalized nature of the algorithm permits a wide array of test requirements allowing experts to define what to measure and how to measure it and the algorithm is simply a means to an end to support better construct representation. This work also emphasizes the computational algorithm and its ability to scale to support faster computing and better cost‐containment in real‐world applications than other CAT algorithms. We make a significant effort to consolidate all information needed to build and scale the algorithm so that expert psychometricians and software developers can use this document as a self‐contained resource and specification document to build and deploy an operational CAT platform.
计算机自适应测试(CAT)是一种日益普遍的测试管理模式,它能提高测试的安全性、测量的精确性,并有可能缩短测试时间。本文介绍了一种基于通用目标函数的新项目选择算法,以支持多种类型的测试条件和有原则的评估设计。该算法的通用性可满足各种测试要求,让专家确定测量什么和如何测量,而算法只是一种手段,目的是支持更好的建构表征。与其他 CAT 算法相比,这项工作还强调了计算算法及其在实际应用中的扩展能力,以支持更快的计算和更好的成本控制。我们努力整合构建和扩展算法所需的所有信息,以便心理测量专家和软件开发人员可以将本文档作为自成一体的资源和规范文档,用于构建和部署可操作的 CAT 平台。
{"title":"A Generalized Objective Function for Computer Adaptive Item Selection","authors":"Harold Doran, Testsuhiro Yamada, Ted Diaz, Emre Gonulates, Vanessa Culver","doi":"10.1111/jedm.12405","DOIUrl":"https://doi.org/10.1111/jedm.12405","url":null,"abstract":"Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and principled assessment design. The generalized nature of the algorithm permits a wide array of test requirements allowing experts to define what to measure and how to measure it and the algorithm is simply a means to an end to support better construct representation. This work also emphasizes the computational algorithm and its ability to scale to support faster computing and better cost‐containment in real‐world applications than other CAT algorithms. We make a significant effort to consolidate all information needed to build and scale the algorithm so that expert psychometricians and software developers can use this document as a self‐contained resource and specification document to build and deploy an operational CAT platform.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"144 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141528216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Likelihood-Based Estimation of Model-Derived Oral Reading Fluency 基于似然法估计模型得出的口语阅读流利度
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-06-22 DOI: 10.1111/jedm.12404
Cornelis Potgieter, Xin Qiao, Akihito Kamata, Yusuf Kara

As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including maximum likelihood estimator (MLE), maximum a posteriori (MAP), and expected a posteriori (EAP), as well as their standard errors. The proposed estimators were demonstrated with a real ORF assessment dataset. Also, the estimation of model-derived ORF scores and their standard errors by the proposed estimators were evaluated through a simulation study. The fully Bayesian approach was included as a comparison in the real data analysis and the simulation study. Results demonstrated that the three likelihood-based approaches for the model-derived ORF scores and their standard error estimation performed satisfactorily.

作为开发改进型口语阅读流利度(ORF)评估系统工作的一部分,Kara 等人通过完全贝叶斯方法,根据口语阅读流利度数据的准确性和速度的潜在变量心理测量模型估算了 ORF 分数。本研究进一步研究了基于似然估计法的 ORF 分数模型,包括最大似然估计法(MLE)、最大后验法(MAP)和预期后验法(EAP)及其标准误差。利用真实的 ORF 评估数据集演示了所提出的估计方法。此外,还通过模拟研究评估了模型衍生 ORF 分数的估计值及其标准误差。在真实数据分析和模拟研究中,将完全贝叶斯方法作为比较对象。结果表明,这三种基于似然法的模型衍生 ORF 分数及其标准误差估计方法的性能令人满意。
{"title":"Likelihood-Based Estimation of Model-Derived Oral Reading Fluency","authors":"Cornelis Potgieter,&nbsp;Xin Qiao,&nbsp;Akihito Kamata,&nbsp;Yusuf Kara","doi":"10.1111/jedm.12404","DOIUrl":"10.1111/jedm.12404","url":null,"abstract":"<p>As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including maximum likelihood estimator (MLE), maximum a posteriori (MAP), and expected a posteriori (EAP), as well as their standard errors. The proposed estimators were demonstrated with a real ORF assessment dataset. Also, the estimation of model-derived ORF scores and their standard errors by the proposed estimators were evaluated through a simulation study. The fully Bayesian approach was included as a comparison in the real data analysis and the simulation study. Results demonstrated that the three likelihood-based approaches for the model-derived ORF scores and their standard error estimation performed satisfactorily.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"542-559"},"PeriodicalIF":1.4,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Curvilinearity in the Reference Composite and Practical Implications for Measurement 参考综合数据的曲线性及其对测量的实际影响
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-06-05 DOI: 10.1111/jedm.12402
Xiangyi Liao, Daniel M. Bolt, Jee-Seon Kim

Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensions change across the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone-polynomial or asymmetric IRT models. Simulations and a real-data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.

项目难度和维度往往是相关的,这意味着多维数据(即参考复合数据)的单维 IRT 近似值可以在多维空间中呈现曲线形式。虽然这个问题以前在纵向缩放应用中讨论过,但我们要说明的是,这种现象在单项测验中也很容易出现。例如,对阅读能力的测评通常会在一次测评中使用不同的任务类型,这一特点不仅可能导致多维性,还可能导致项目难度与维度之间的关联。利用潜回归策略,我们通过模拟和实证分析证明了维度和难度之间的关联如何产生非线性参考综合,在这种综合中,基础维度的权重会根据与维度相关的项目难度在量表连续体中发生变化。我们进一步说明了这种曲线形式如何在传统的单维度 IRT 模型(如 2PL 模型)中产生系统性的规格错误,并能被单项式-多项式或非对称 IRT 模型等模型更好地适应。本文提供了一个模拟和真实数据示例,该示例来自幼儿纵向研究--幼儿园。本文还讨论了测量建模和理解 2PL 错误规范对测量指标的影响的一些意义。
{"title":"Curvilinearity in the Reference Composite and Practical Implications for Measurement","authors":"Xiangyi Liao,&nbsp;Daniel M. Bolt,&nbsp;Jee-Seon Kim","doi":"10.1111/jedm.12402","DOIUrl":"10.1111/jedm.12402","url":null,"abstract":"<p>Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensions <i>change</i> across the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone-polynomial or asymmetric IRT models. Simulations and a real-data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"511-541"},"PeriodicalIF":1.4,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12402","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141386190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1