首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
Utilizing Response Time for Item Selection in On‐the‐Fly Multistage Adaptive Testing for PISA Assessment 在 PISA 评估的即时多阶段自适应测试中利用反应时间进行项目选择
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-06-06 DOI: 10.1111/jedm.12403
Xiuxiu Tang, Yi Zheng, Tong Wu, K. Hau, H. Chang
Multistage adaptive testing (MST) has been recently adopted for international large‐scale assessments such as Programme for International Student Assessment (PISA). MST offers improved measurement efficiency over traditional nonadaptive tests and improved practical convenience over single‐item‐adaptive computerized adaptive testing (CAT). As a third alternative adaptive test design to MST and CAT, Zheng and Chang proposed the “on‐the‐fly multistage adaptive testing” (OMST), which combines the benefits of MST and CAT and offsets their limitations. In this study, we adopted the OMST design while also incorporating response time (RT) in item selection. Via simulations emulating the PISA 2018 reading test, including using the real item attributes and replicating PISA 2018 reading test's MST design, we compared the performance of our OMST designs against the simulated MST design in (1) measurement accuracy of test takers’ ability, (2) test time efficiency and consistency, and (3) expected gains in precision by design. We also investigated the performance of OMST in item bank usage and constraints management. Results show great potential for the proposed RT‐incorporated OMST designs to be used for PISA and potentially other international large‐scale assessments.
多阶段自适应测试(MST)最近已被国际学生评估项目(PISA)等国际大规模评估所采用。与传统的非自适应测试相比,多阶段自适应测试提高了测量效率,与单项自适应计算机化自适应测试(CAT)相比,多阶段自适应测试更方便实用。作为 MST 和 CAT 之外的第三种自适应测试设计,Zheng 和 Chang 提出了 "即时多阶段自适应测试"(OMST),它结合了 MST 和 CAT 的优点,并弥补了它们的局限性。在本研究中,我们采用了 OMST 设计,同时将反应时间(RT)纳入项目选择。通过模拟 PISA 2018 阅读测试,包括使用真实的项目属性和复制 PISA 2018 阅读测试的 MST 设计,我们比较了 OMST 设计与模拟 MST 设计在以下方面的表现:(1)考生能力的测量准确性;(2)测试时间效率和一致性;(3)设计的预期精度收益。我们还研究了 OMST 在项目库使用和约束管理方面的性能。结果表明,所提出的包含 RT 的 OMST 设计在国际学生评估项目(PISA)和其他潜在的国际大规模评估中具有很大的应用潜力。
{"title":"Utilizing Response Time for Item Selection in On‐the‐Fly Multistage Adaptive Testing for PISA Assessment","authors":"Xiuxiu Tang, Yi Zheng, Tong Wu, K. Hau, H. Chang","doi":"10.1111/jedm.12403","DOIUrl":"https://doi.org/10.1111/jedm.12403","url":null,"abstract":"Multistage adaptive testing (MST) has been recently adopted for international large‐scale assessments such as Programme for International Student Assessment (PISA). MST offers improved measurement efficiency over traditional nonadaptive tests and improved practical convenience over single‐item‐adaptive computerized adaptive testing (CAT). As a third alternative adaptive test design to MST and CAT, Zheng and Chang proposed the “on‐the‐fly multistage adaptive testing” (OMST), which combines the benefits of MST and CAT and offsets their limitations. In this study, we adopted the OMST design while also incorporating response time (RT) in item selection. Via simulations emulating the PISA 2018 reading test, including using the real item attributes and replicating PISA 2018 reading test's MST design, we compared the performance of our OMST designs against the simulated MST design in (1) measurement accuracy of test takers’ ability, (2) test time efficiency and consistency, and (3) expected gains in precision by design. We also investigated the performance of OMST in item bank usage and constraints management. Results show great potential for the proposed RT‐incorporated OMST designs to be used for PISA and potentially other international large‐scale assessments.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141380339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensemaking of Process Data from Evaluation Studies of Educational Games: An Application of Cross‐Classified Item Response Theory Modeling 从教育游戏评价研究中感知过程数据:交叉分类项目反应理论模型的应用
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2024-06-05 DOI: 10.1111/jedm.12396
Tianying Feng, Li Cai
Process information collected from educational games can illuminate how students approach interactive tasks, complementing assessment outcomes routinely examined in evaluation studies. However, the two sources of information are historically analyzed and interpreted separately, and diagnostic process information is often underused. To tackle these issues, we present a new application of cross‐classified item response theory modeling, using indicators of knowledge misconceptions and item‐level assessment data collected from a multisite game‐based randomized controlled trial. This application addresses (a) the joint modeling of students' pretest and posttest item responses and game‐based processes described by indicators of misconceptions; (b) integration of gameplay information when gauging the intervention effect of an educational game; (c) relationships among game‐based misconception, pretest initial status, and pre‐to‐post change; and (d) nesting of students within schools, a common aspect in multisite research. We also demonstrate how to structure the data and set up the model to enable our proposed application, and how our application compares to three other approaches to analyzing gameplay and assessment data. Lastly, we note the implications for future evaluation studies and for using analytic results to inform learning and instruction.
从教育游戏中收集到的过程信息可以揭示学生是如何完成互动任务的,从而对评估研究中例行检查的评估结果进行补充。然而,这两种信息来源历来是分开分析和解释的,而且诊断性的过程信息往往没有得到充分利用。为了解决这些问题,我们提出了交叉分类项目反应理论建模的新应用,使用了从基于游戏的多站点随机对照试验中收集的知识误解指标和项目级评估数据。这一应用解决了以下问题:(a) 学生测试前和测试后的项目反应以及由误解指标描述的基于游戏的过程的联合建模;(b) 在衡量教育游戏的干预效果时整合游戏信息;(c) 基于游戏的误解、测试前的初始状态以及测试前到测试后的变化之间的关系;(d) 多站点研究中常见的学校内学生嵌套问题。我们还演示了如何构建数据和建立模型,以实现我们建议的应用,以及我们的应用与其他三种游戏和评估数据分析方法的比较。最后,我们指出了对未来评估研究以及利用分析结果为学习和教学提供信息的意义。
{"title":"Sensemaking of Process Data from Evaluation Studies of Educational Games: An Application of Cross‐Classified Item Response Theory Modeling","authors":"Tianying Feng, Li Cai","doi":"10.1111/jedm.12396","DOIUrl":"https://doi.org/10.1111/jedm.12396","url":null,"abstract":"Process information collected from educational games can illuminate how students approach interactive tasks, complementing assessment outcomes routinely examined in evaluation studies. However, the two sources of information are historically analyzed and interpreted separately, and diagnostic process information is often underused. To tackle these issues, we present a new application of cross‐classified item response theory modeling, using indicators of knowledge misconceptions and item‐level assessment data collected from a multisite game‐based randomized controlled trial. This application addresses (a) the joint modeling of students' pretest and posttest item responses and game‐based processes described by indicators of misconceptions; (b) integration of gameplay information when gauging the intervention effect of an educational game; (c) relationships among game‐based misconception, pretest initial status, and pre‐to‐post change; and (d) nesting of students within schools, a common aspect in multisite research. We also demonstrate how to structure the data and set up the model to enable our proposed application, and how our application compares to three other approaches to analyzing gameplay and assessment data. Lastly, we note the implications for future evaluation studies and for using analytic results to inform learning and instruction.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141386053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Curvilinearity in the Reference Composite and Practical Implications for Measurement 参考综合数据的曲线性及其对测量的实际影响
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-06-05 DOI: 10.1111/jedm.12402
Xiangyi Liao, Daniel M. Bolt, Jee-Seon Kim

Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensions change across the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone-polynomial or asymmetric IRT models. Simulations and a real-data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.

项目难度和维度往往是相关的,这意味着多维数据(即参考复合数据)的单维 IRT 近似值可以在多维空间中呈现曲线形式。虽然这个问题以前在纵向缩放应用中讨论过,但我们要说明的是,这种现象在单项测验中也很容易出现。例如,对阅读能力的测评通常会在一次测评中使用不同的任务类型,这一特点不仅可能导致多维性,还可能导致项目难度与维度之间的关联。利用潜回归策略,我们通过模拟和实证分析证明了维度和难度之间的关联如何产生非线性参考综合,在这种综合中,基础维度的权重会根据与维度相关的项目难度在量表连续体中发生变化。我们进一步说明了这种曲线形式如何在传统的单维度 IRT 模型(如 2PL 模型)中产生系统性的规格错误,并能被单项式-多项式或非对称 IRT 模型等模型更好地适应。本文提供了一个模拟和真实数据示例,该示例来自幼儿纵向研究--幼儿园。本文还讨论了测量建模和理解 2PL 错误规范对测量指标的影响的一些意义。
{"title":"Curvilinearity in the Reference Composite and Practical Implications for Measurement","authors":"Xiangyi Liao,&nbsp;Daniel M. Bolt,&nbsp;Jee-Seon Kim","doi":"10.1111/jedm.12402","DOIUrl":"10.1111/jedm.12402","url":null,"abstract":"<p>Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensions <i>change</i> across the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone-polynomial or asymmetric IRT models. Simulations and a real-data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12402","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141386190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Response Styles in Cross-Classified Data Using a Cross-Classified Multidimensional Nominal Response Model 使用交叉分类多维名义响应模型为交叉分类数据中的响应风格建模
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-05-31 DOI: 10.1111/jedm.12401
Sijia Huang, Seungwon Chung, Carl F. Falk

In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC-MNRM with RS, a CC-MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross-classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.

在本研究中,我们引入了一个交叉分类多维名义反应模型(CC-MNRM),以考虑交叉分类数据中的各种反应风格(RS)。所提出的模型允许斜率在不同项目间变化,并能探索观察到的协变量对潜在构造的影响。我们采用了最近开发的 Metropolis-Hastings Robbins-Monro (MH-RM) 算法的变体,以解决估计所提模型的计算难题。为了展示我们的新方法,我们分析了从一所大型公立大学收集的学生教学评价(SET)实证数据,并使用了三种模型:带 RS 的 CC-MNRM 模型、不带 RS 的 CC-MNRM 模型和带 RS 的多层次 MNRM 模型。结果表明,这三种模型对观察到的协变量做出了不同的推断。此外,在示例中,忽略/纳入 RS 会导致学生的实质分数发生变化,而教师的实质分数受到的影响较小。对交叉分类数据结构的错误定义导致了教师评分的明显变化。为了进一步评估所提出的建模方法,我们进行了初步的模拟研究,观察到参数和分数恢复良好。最后,我们讨论了本研究的局限性和未来的研究方向。
{"title":"Modeling Response Styles in Cross-Classified Data Using a Cross-Classified Multidimensional Nominal Response Model","authors":"Sijia Huang,&nbsp;Seungwon Chung,&nbsp;Carl F. Falk","doi":"10.1111/jedm.12401","DOIUrl":"10.1111/jedm.12401","url":null,"abstract":"<p>In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC-MNRM with RS, a CC-MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross-classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior 利用配置文件相似度指标扩展对数正态响应时间模型,改进异常测试行为的检测
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-05-13 DOI: 10.1111/jedm.12395
Gregory M. Hurtz, Regi Mucino

The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we demonstrate the connection between the existing LNRT model parameters and the “level” component of profile similarity, and we define two new parameters for the LNRT model representing profile “dispersion” and “shape.” We show that while the LNRT model measures level (speed), profile dispersion and shape are conflated in model residuals, and that distinguishing them provides meaningful and useful parameters for identifying anomalous testing behavior. Results from data in a situation where many test-takers gained preknowledge of test items revealed that profile shape, not currently measured in the LNRT model, was the most sensitive response time index to the abnormal test-taking behavior patterns. Results strongly support expanding the LNRT model to measure not only each test-taker's level of speed, but also the dispersion and shape of their response time profiles.

对数正态响应时间(LNRT)模型测量的是应试者相对于测验项目正常时间要求的速度。通常会对由此得出的速度参数和模型残差进行分析,以寻找与快速和不太吻合的反应时间模式相关的异常应试行为的证据。通过扩展该模型,我们证明了现有 LNRT 模型参数与特征相似性的 "水平 "部分之间的联系,并为 LNRT 模型定义了两个新参数,分别代表特征 "分散 "和 "形状"。我们表明,虽然 LNRT 模型测量的是水平(速度),但在模型残差中,剖面离散度和形状是混在一起的,将它们区分开来可为识别异常测试行为提供有意义且有用的参数。在许多应试者预先知道测试项目的情况下,数据结果显示,目前 LNRT 模型没有测量的轮廓形状是对异常应试行为模式最敏感的反应时间指标。研究结果强烈支持扩展 LNRT 模型,使其不仅能测量每个应试者的速度水平,还能测量其反应时间曲线的分散性和形状。
{"title":"Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior","authors":"Gregory M. Hurtz,&nbsp;Regi Mucino","doi":"10.1111/jedm.12395","DOIUrl":"10.1111/jedm.12395","url":null,"abstract":"<p>The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we demonstrate the connection between the existing LNRT model parameters and the “level” component of profile similarity, and we define two new parameters for the LNRT model representing profile “dispersion” and “shape.” We show that while the LNRT model measures level (speed), profile dispersion and shape are conflated in model residuals, and that distinguishing them provides meaningful and useful parameters for identifying anomalous testing behavior. Results from data in a situation where many test-takers gained preknowledge of test items revealed that profile shape, not currently measured in the LNRT model, was the most sensitive response time index to the abnormal test-taking behavior patterns. Results strongly support expanding the LNRT model to measure not only each test-taker's level of speed, but also the dispersion and shape of their response time profiles.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140939780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Nonparametric Composite Group DIF Index for Focal Groups Stemming from Multicategorical Variables 用于多分类变量焦点组的非参数综合组 DIF 指数
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-05-12 DOI: 10.1111/jedm.12394
Corinne Huggins-Manley, Anthony W. Raborn, Peggy K. Jones, Ted Myers

The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the overall testing population. We propose the nonparametric root expected proportion squared difference (REPSD) index that evaluates the statistical significance of composite group DIF for relatively small focal groups stemming from multicategorical focal variables, with decisions of statistical significance based on quasi-exact p values obtained from Monte Carlo permutations of the DIF statistic under the null distribution. We conduct a simulation to evaluate conditions under which the index produces acceptable Type I error and power rates, as well as an application to a school district assessment. Practitioners can calculate the REPSD index in a freely available package we created in the R environment.

本研究的目的是开发一种非参数 DIF 方法,该方法(a)可将焦点组直接与将用于开发报告测试得分量表的综合组进行比较,(b)允许从业人员探索与源自多类别变量的焦点组相关的 DIF,这些焦点组在整个测试人群中只占很小的比例。我们提出了非参数根期望比例平方差(REPSD)指数,该指数可评估源自多类别焦点变量的相对较小焦点组的复合组 DIF 的统计显著性,统计显著性的判定依据的是在零分布下对 DIF 统计量进行蒙特卡罗排列所获得的准精确 p 值。我们进行了一次模拟,以评估该指数在哪些条件下可产生可接受的 I 类错误和幂率,并将其应用于学区评估。实践者可以通过我们在 R 环境中创建的免费软件包计算 REPSD 指数。
{"title":"A Nonparametric Composite Group DIF Index for Focal Groups Stemming from Multicategorical Variables","authors":"Corinne Huggins-Manley,&nbsp;Anthony W. Raborn,&nbsp;Peggy K. Jones,&nbsp;Ted Myers","doi":"10.1111/jedm.12394","DOIUrl":"10.1111/jedm.12394","url":null,"abstract":"<p>The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the overall testing population. We propose the nonparametric root expected proportion squared difference (<i>REPSD</i>) index that evaluates the statistical significance of composite group DIF for relatively small focal groups stemming from multicategorical focal variables, with decisions of statistical significance based on quasi-exact <i>p</i> values obtained from Monte Carlo permutations of the DIF statistic under the null distribution. We conduct a simulation to evaluate conditions under which the index produces acceptable Type I error and power rates, as well as an application to a school district assessment. Practitioners can calculate the <i>REPSD</i> index in a freely available package we created in the R environment.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does Timed Testing Affect the Interpretation of Efficiency Scores?—A GLMM Analysis of Reading Components 定时测试会影响效率分数的解释吗?
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-05-12 DOI: 10.1111/jedm.12393
Frank Goldhammer, Ulf Kroehne, Carolin Hahnel, Johannes Naumann, Paul De Boeck

The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability conditional on speed by controlling speed experimentally. Item-level time limits control the stimulus presentation time and the time window for responding (timed condition). The overall goal was to examine the construct validity of effective ability scores obtained from untimed and timed condition by comparing the effects of theory-based item properties on item difficulty. If such effects exist, the scores reflect how well the test-takers were able to cope with the theory-based requirements. A German subsample from PISA 2012 completed two reading component skills tasks (i.e., word recognition and semantic integration) with and without item-level time limits. Overall, the included linguistic item properties showed stronger effects on item difficulty in the timed than the untimed condition. In the semantic integration task, item properties explained the time required in the untimed condition. The results suggest that effective ability scores in the timed condition better reflect how well test-takers were able to cope with the theoretically relevant task demands.

认知部分技能的效率通常是通过速度表现测试来评估的。仅将有效能力或有效速度解释为效率可能具有挑战性,因为这两个变量之间存在人内依赖性(速度-能力权衡,SAT)。本研究通过实验控制速度,将有效能力作为速度条件来衡量效率。项目级时间限制控制了刺激呈现时间和反应时间窗口(计时条件)。总体目标是通过比较基于理论的项目属性对项目难度的影响,检验未计时和计时条件下获得的有效能力分数的建构效度。如果存在这种影响,分数就能反映出应试者应对基于理论的要求的能力。2012 年国际学生评估项目的一个德国子样本完成了两项阅读部分技能任务(即单词识别和语义整合),分别有和没有项目级别的时间限制。总体而言,在有时间限制的条件下,所包含的语言项目属性对项目难度的影响要强于无时间限制的条件。在语义整合任务中,项目属性解释了无时间限制条件下所需的时间。结果表明,计时条件下的有效能力分数能更好地反映应试者应对理论上相关任务要求的能力。
{"title":"Does Timed Testing Affect the Interpretation of Efficiency Scores?—A GLMM Analysis of Reading Components","authors":"Frank Goldhammer,&nbsp;Ulf Kroehne,&nbsp;Carolin Hahnel,&nbsp;Johannes Naumann,&nbsp;Paul De Boeck","doi":"10.1111/jedm.12393","DOIUrl":"10.1111/jedm.12393","url":null,"abstract":"<p>The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability conditional on speed by controlling speed experimentally. Item-level time limits control the stimulus presentation time and the time window for responding (timed condition). The overall goal was to examine the construct validity of effective ability scores obtained from untimed and timed condition by comparing the effects of theory-based item properties on item difficulty. If such effects exist, the scores reflect how well the test-takers were able to cope with the theory-based requirements. A German subsample from PISA 2012 completed two reading component skills tasks (i.e., word recognition and semantic integration) with and without item-level time limits. Overall, the included linguistic item properties showed stronger effects on item difficulty in the timed than the untimed condition. In the semantic integration task, item properties explained the time required in the untimed condition. The results suggest that effective ability scores in the timed condition better reflect how well test-takers were able to cope with the theoretically relevant task demands.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
von Davier, Alina , Mislevy, Robert J. , and Hao, Jiangang (Eds.) (2021). Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_1 von Davier、Alina、Mislevy、Robert J.和郝建刚(编著)(2021 年)。计算心理测量学:新一代数字化学习与评估的新方法》。教育测量与评估方法论》。https://doi.org/10.1007/978-3-030-74394-9_1
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-04-24 DOI: 10.1111/jedm.12392
Hong Jiao
{"title":"von Davier, Alina , Mislevy, Robert J. , and Hao, Jiangang (Eds.) (2021). Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_1","authors":"Hong Jiao","doi":"10.1111/jedm.12392","DOIUrl":"10.1111/jedm.12392","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140661378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A One-Parameter Diagnostic Classification Model with Familiar Measurement Properties 具有熟悉测量特性的单参数诊断分类模型
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-04-24 DOI: 10.1111/jedm.12390
Matthew J. Madison, Stefanie A. Wind, Lientje Maas, Kazuhiro Yamaguchi, Sergio Haab

Diagnostic classification models (DCMs) are psychometric models designed to classify examinees according to their proficiency or nonproficiency of specified latent characteristics. These models are well suited for providing diagnostic and actionable feedback to support intermediate and formative assessment efforts. Several DCMs have been developed and applied in different settings. This study examines a DCM with functional form similar to the 1-parameter logistic item response theory model. Using data from a large-scale mathematics education research study, we demonstrate and prove that the proposed DCM has measurement properties akin to the Rasch and one-parameter logistic item response theory models, including sum score sufficiency, item-free and person-free measurement, and invariant item and person ordering. We introduce some potential applications for this model, and discuss the implications and limitations of these developments, as well as directions for future research.

诊断分类模型(DCM)是一种心理测量模型,旨在根据受试者对特定潜在特征的熟练程度或不熟练程度对其进行分类。这些模型非常适合提供诊断性和可操作的反馈,以支持中期和形成性评估工作。目前已经开发了几种 DCM,并应用于不同的环境中。本研究探讨了一种功能形式类似于 1 参数逻辑项目反应理论模型的 DCM。利用一项大规模数学教育研究的数据,我们展示并证明了所提出的 DCM 具有与 Rasch 模型和单参数逻辑项目反应理论模型相似的测量属性,包括总分充分性、无项目和无人员测量,以及不变的项目和人员排序。我们介绍了该模型的一些潜在应用,并讨论了这些发展的意义和局限性,以及未来的研究方向。
{"title":"A One-Parameter Diagnostic Classification Model with Familiar Measurement Properties","authors":"Matthew J. Madison,&nbsp;Stefanie A. Wind,&nbsp;Lientje Maas,&nbsp;Kazuhiro Yamaguchi,&nbsp;Sergio Haab","doi":"10.1111/jedm.12390","DOIUrl":"10.1111/jedm.12390","url":null,"abstract":"<p>Diagnostic classification models (DCMs) are psychometric models designed to classify examinees according to their proficiency or nonproficiency of specified latent characteristics. These models are well suited for providing diagnostic and actionable feedback to support intermediate and formative assessment efforts. Several DCMs have been developed and applied in different settings. This study examines a DCM with functional form similar to the 1-parameter logistic item response theory model. Using data from a large-scale mathematics education research study, we demonstrate and prove that the proposed DCM has measurement properties akin to the Rasch and one-parameter logistic item response theory models, including sum score sufficiency, item-free and person-free measurement, and invariant item and person ordering. We introduce some potential applications for this model, and discuss the implications and limitations of these developments, as well as directions for future research.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling the Intraindividual Relation of Ability and Speed within a Test 测试中能力与速度的个体内部关系建模
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-04-19 DOI: 10.1111/jedm.12391
Augustin Mutak, Robert Krause, Esther Ulitzsch, Sören Much, Jochen Ranger, Steffi Pohl

Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating this relationship. We propose the intraindividual speed-ability-relation (ISAR) model, which relies on nonstationarity of speed and ability over the course of the test. The ISAR model explicitly models intraindividual change in ability and speed within a test and assesses the intraindividual relation of speed and ability by evaluating the relationship of both latent change variables. Model estimation is good, when there are interindividual differences in speed and ability changes in the data. In empirical data from PISA, we found that the intraindividual relationship between speed and ability is not universally negative for all individuals and varies across different competence domains and countries. We discuss possible explanations for this relationship.

要确保评估的公平性,就必须了解个人在测试场景中的速度和能力之间的个体内部关系。估算这种关系的方法多种多样,有的依赖于特定的研究设计,有的依赖于特定的假设。本文旨在为估计这种关系的方法工具箱添砖加瓦。我们提出了个体内速度-能力关系(ISAR)模型,该模型依赖于测试过程中速度和能力的非平稳性。ISAR 模型对测试过程中能力和速度的个体内变化进行了明确建模,并通过评估两个潜在变化变量的关系来评估速度和能力的个体内关系。当数据中的速度和能力变化存在个体间差异时,模型估计效果良好。在国际学生评估项目(PISA)的经验数据中,我们发现速度与能力之间的个体内部关系并非对所有个体都是普遍的负相关,而且在不同能力领域和国家之间也存在差异。我们讨论了这种关系的可能解释。
{"title":"Modeling the Intraindividual Relation of Ability and Speed within a Test","authors":"Augustin Mutak,&nbsp;Robert Krause,&nbsp;Esther Ulitzsch,&nbsp;Sören Much,&nbsp;Jochen Ranger,&nbsp;Steffi Pohl","doi":"10.1111/jedm.12391","DOIUrl":"10.1111/jedm.12391","url":null,"abstract":"<p>Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating this relationship. We propose the intraindividual speed-ability-relation (ISAR) model, which relies on nonstationarity of speed and ability over the course of the test. The ISAR model explicitly models intraindividual change in ability and speed within a test and assesses the intraindividual relation of speed and ability by evaluating the relationship of both latent change variables. Model estimation is good, when there are interindividual differences in speed and ability changes in the data. In empirical data from PISA, we found that the intraindividual relationship between speed and ability is not universally negative for all individuals and varies across different competence domains and countries. We discuss possible explanations for this relationship.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12391","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1