首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles 项目反应与反应指标确定性的因子混合模型识别学生知识概况
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-10-10 DOI: 10.1111/jedm.12344
Chia-Wen Chen, Björn Andersson, Jinxin Zhu

The certainty of response index (CRI) measures respondents' confidence level when answering an item. In conjunction with the answers to the items, previous studies have used descriptive statistics and arbitrary thresholds to identify student knowledge profiles with the CRIs. Whereas this approach overlooked the measurement error of the observed item responses and indices, we address this by proposing a factor mixture model that integrates a latent class model to detect student subgroups and a measurement model to control for student ability and confidence level. Applying the model to 773 seventh graders' responses to an algebra test, where some items were related to new material that had not been taught in class, we found two subgroups: (1) students who had high confidence in answering items involving the new material; and (2) students who had low confidence in answering items involving the new material but higher general self-confidence than the first group. We regressed the posterior probability of the group membership on gender, prior achievement, and preview behavior and found preview behavior a significant factor associated with the membership. Finally, we discussed the implications of the current study for teaching practices and future research.

回答的确定性指数(CRI)衡量被调查者在回答一个问题时的信心水平。结合这些问题的答案,以前的研究使用描述性统计和任意阈值来确定学生的知识概况与cri。鉴于这种方法忽略了观察到的项目反应和指数的测量误差,我们提出了一个因素混合模型,该模型集成了一个潜在类别模型来检测学生亚组和一个测量模型来控制学生的能力和信心水平。将该模型应用于773名七年级学生对代数测试的回答,其中一些问题与课堂上没有教过的新材料有关,我们发现了两个亚组:(1)对回答涉及新材料的问题有高信心的学生;(2)在回答新材料问题时自信心较低,但总体自信心高于第一组。我们对小组成员的性别、先前成就和预习行为的后验概率进行了回归,发现预习行为是影响小组成员的重要因素。最后,讨论了本研究对教学实践和未来研究的启示。
{"title":"A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles","authors":"Chia-Wen Chen,&nbsp;Björn Andersson,&nbsp;Jinxin Zhu","doi":"10.1111/jedm.12344","DOIUrl":"10.1111/jedm.12344","url":null,"abstract":"<p>The certainty of response index (CRI) measures respondents' confidence level when answering an item. In conjunction with the answers to the items, previous studies have used descriptive statistics and arbitrary thresholds to identify student knowledge profiles with the CRIs. Whereas this approach overlooked the measurement error of the observed item responses and indices, we address this by proposing a factor mixture model that integrates a latent class model to detect student subgroups and a measurement model to control for student ability and confidence level. Applying the model to 773 seventh graders' responses to an algebra test, where some items were related to new material that had not been taught in class, we found two subgroups: (1) students who had high confidence in answering items involving the new material; and (2) students who had low confidence in answering items involving the new material but higher general self-confidence than the first group. We regressed the posterior probability of the group membership on gender, prior achievement, and preview behavior and found preview behavior a significant factor associated with the membership. Finally, we discussed the implications of the current study for teaching practices and future research.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"28-51"},"PeriodicalIF":1.3,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12344","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43460732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Betty Lanteigne, Christine Coombe, & James Dean Brown. 2021. Challenges in Language Testing around the World: Insights for language test users. Singapore: Springer, 2021, 129.99 € (hardcover), ISBN 978-981-33-4232-3 (eBook). xxiii + 553 pp. https://doi.org/10.1007/978-981-33-4232-3 Betty Lanteigne、Christine Coombe和James DeanBrown。2021.世界各地语言测试的挑战:语言测试用户的见解。新加坡:施普林格出版社,2021,129.99欧元(精装本),ISBN 978-981-33-4232-3(电子书)。xxiii+553页。https://doi.org/10.1007/978-981-33-4232-3
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-09-25 DOI: 10.1111/jedm.12343
Bahram Kazemian, Shafigeh Mohammadian
{"title":"Betty Lanteigne, Christine Coombe, & James Dean Brown. 2021. Challenges in Language Testing around the World: Insights for language test users. Singapore: Springer, 2021, 129.99 € (hardcover), ISBN 978-981-33-4232-3 (eBook). xxiii + 553 pp. https://doi.org/10.1007/978-981-33-4232-3","authors":"Bahram Kazemian,&nbsp;Shafigeh Mohammadian","doi":"10.1111/jedm.12343","DOIUrl":"10.1111/jedm.12343","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 4","pages":"536-544"},"PeriodicalIF":1.3,"publicationDate":"2022-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45401317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Item Scores and Distractors in Person-Fit Assessment 在个人适合度评估中使用项目分数和干扰因素
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-09-16 DOI: 10.1111/jedm.12345
Kylie Gorney, James A. Wollack

In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the lz$l_z$ and lz$l_z^*$ person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through detailed simulations, we show that the new statistics are more powerful than existing statistics in detecting several types of aberrant behavior, and that they are able to control the Type I error rate in instances where the model does not exactly fit the data. A real data example is also provided to demonstrate the utility of the new statistics in an operational setting.

为了检测大范围的异常行为,将二分项目得分以外的信息结合起来可能是有用的。本文扩展了l z$ l_z$和l z *$ l_z^*$的人拟合统计量,使得项目分数中的异常行为和项目干扰物中的异常行为可以作为异常的指标。通过详细的模拟,我们表明新的统计数据在检测几种异常行为方面比现有的统计数据更强大,并且在模型不完全适合数据的情况下,它们能够控制I型错误率。还提供了一个真实的数据示例,以演示新统计数据在操作设置中的实用性。
{"title":"Using Item Scores and Distractors in Person-Fit Assessment","authors":"Kylie Gorney,&nbsp;James A. Wollack","doi":"10.1111/jedm.12345","DOIUrl":"10.1111/jedm.12345","url":null,"abstract":"<p>In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>l</mi>\u0000 <mi>z</mi>\u0000 </msub>\u0000 <annotation>$l_z$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>l</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$l_z^*$</annotation>\u0000 </semantics></math> person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through detailed simulations, we show that the new statistics are more powerful than existing statistics in detecting several types of aberrant behavior, and that they are able to control the Type I error rate in instances where the model does not exactly fit the data. A real data example is also provided to demonstrate the utility of the new statistics in an operational setting.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"3-27"},"PeriodicalIF":1.3,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12345","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48816866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A New Bayesian Person-Fit Analysis Method Using Pivotal Discrepancy Measures 使用关键差异度量的贝叶斯人-拟合分析新方法
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-09-02 DOI: 10.1111/jedm.12342
Adam Combs

A common method of checking person-fit in Bayesian item response theory (IRT) is the posterior-predictive (PP) method. In recent years, more powerful approaches have been proposed that are based on resampling methods using the popular Lz$L_{z}^{*}$ statistic. There has also been proposed a new Bayesian model checking method based on pivotal discrepancy measures (PDMs). A PDM T is a discrepancy measure that is a pivotal quantity with a known reference distribution. A posterior sample of T can be generated using standard Markov chain Monte Carlo output, and a p-value is obtained from probability bounds computed on order statistics of the sample. In this paper, we propose a general procedure to apply this PDM method to person-fit checking in IRT models. We illustrate this using the Lz$L_{z}$ and Lz$L_{z}^{*}$ measures. Simulation studies are done comparing these with the PP method and one of the more recent resampling methods. The results show that the PDM method is more powerful than the PP method. Under certain conditions, it is more powerful than the resampling method, while in others, it is less. The PDM method is also applied to a real data set.

在贝叶斯项目反应理论(IRT)中,检验个人拟合的常用方法是后验预测法(PP)。近年来,人们提出了基于重采样方法的更强大的方法,这些方法使用流行的L z∗$L_{z}^{*}$统计量。此外,还提出了一种基于关键差异测度(pdm)的贝叶斯模型检验方法。PDM T是一种差异度量,它是具有已知参考分布的关键量。使用标准马尔可夫链蒙特卡罗输出生成T的后验样本,根据样本的阶统计量计算概率界得到p值。在本文中,我们提出了一种将PDM方法应用于IRT模型中人拟合检验的一般程序。我们使用lz $L_{z}$和lz * $L_{z}^{*}$测度来说明这一点。将这些方法与PP方法和最近的一种重采样方法进行了仿真研究。结果表明,PDM方法比PP方法更有效。在某些条件下,它比重采样方法更强大,而在其他条件下,它更小。将PDM方法应用于实际数据集。
{"title":"A New Bayesian Person-Fit Analysis Method Using Pivotal Discrepancy Measures","authors":"Adam Combs","doi":"10.1111/jedm.12342","DOIUrl":"10.1111/jedm.12342","url":null,"abstract":"<p>A common method of checking person-fit in Bayesian item response theory (IRT) is the posterior-predictive (PP) method. In recent years, more powerful approaches have been proposed that are based on resampling methods using the popular <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$L_{z}^{*}$</annotation>\u0000 </semantics></math> statistic. There has also been proposed a new Bayesian model checking method based on pivotal discrepancy measures (PDMs). A PDM <i>T</i> is a discrepancy measure that is a pivotal quantity with a known reference distribution. A posterior sample of <i>T</i> can be generated using standard Markov chain Monte Carlo output, and a <i>p</i>-value is obtained from probability bounds computed on order statistics of the sample. In this paper, we propose a general procedure to apply this PDM method to person-fit checking in IRT models. We illustrate this using the <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 </msub>\u0000 <annotation>$L_{z}$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$L_{z}^{*}$</annotation>\u0000 </semantics></math> measures. Simulation studies are done comparing these with the PP method and one of the more recent resampling methods. The results show that the PDM method is more powerful than the PP method. Under certain conditions, it is more powerful than the resampling method, while in others, it is less. The PDM method is also applied to a real data set.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"52-75"},"PeriodicalIF":1.3,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46358680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Several Variations of Simple-Structure MIRT Equating 简单结构MIRT方程的几种变体
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-07-28 DOI: 10.1111/jedm.12341
Stella Y. Kim, Won-Chan Lee

The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some notable differences between the multidimensional and unidimensional approaches when data exhibited evidence for multidimensionality. In addition, some of the proposed methods were successful in providing equating results for both section-level and composite-level scores, which has not been achieved by most of the existing methodologies. The traditional method of using a set of quadrature points and weights for equating turned out to be computationally intensive, particularly for the data with higher dimensions. The study suggested an alternative way of using the Monte-Carlo approach for such data. This study also proposed a simple-structure true-score equating procedure that does not rely on a multivariate observed-score distribution.

目前的研究提出了几种简单结构多维项目反应理论等价程序的变体。四组不同的数据被用来证明两种不同的等效设计的可行性:随机组设计和共同项目非等效组设计。研究结果表明,当数据显示出多维证据时,多维和一维方法之间存在显着差异。此外,一些拟议的方法成功地为科级和综合级分数提供了相等的结果,这是大多数现有方法无法做到的。传统的使用一组正交点和权值的方法是计算量很大的,特别是对于高维数据。该研究提出了使用蒙特卡罗方法处理此类数据的另一种方法。本研究还提出了一种结构简单的真实得分相等程序,该程序不依赖于多变量观察得分分布。
{"title":"Several Variations of Simple-Structure MIRT Equating","authors":"Stella Y. Kim,&nbsp;Won-Chan Lee","doi":"10.1111/jedm.12341","DOIUrl":"10.1111/jedm.12341","url":null,"abstract":"<p>The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some notable differences between the multidimensional and unidimensional approaches when data exhibited evidence for multidimensionality. In addition, some of the proposed methods were successful in providing equating results for both section-level and composite-level scores, which has not been achieved by most of the existing methodologies. The traditional method of using a set of quadrature points and weights for equating turned out to be computationally intensive, particularly for the data with higher dimensions. The study suggested an alternative way of using the Monte-Carlo approach for such data. This study also proposed a simple-structure true-score equating procedure that does not rely on a multivariate <i>observed</i>-score distribution.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"76-105"},"PeriodicalIF":1.3,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49051834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment 创新教育评估中的有效性论证与人工智能
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-07-08 DOI: 10.1111/jedm.12331
David W. Dorsey, Hillary R. Michaels

We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement—one that has captured our collective interest and imagination. Scientists and practitioners within the domains of organizational and workforce assessment have increasingly used AI in assessment, and its use is now becoming more common in education. While these types of solutions offer their users the promise of efficiency, effectiveness, and a “wow factor,” users need to maintain high standards for validity and fairness in high stakes settings. Due to the complexity of some AI methods and tools, this requirement for adherence to standards may challenge our traditional approaches to building validity and fairness arguments. In this edition, we review what these challenges may look like as validity arguments meet AI in educational assessment domains. We specifically explore how AI impacts Evidence-Centered Design (ECD) and development from assessment concept and coding to scoring and reporting. We also present information on ways to ensure that bias is not built into these systems. Lastly, we discuss future horizons, many that are almost here, for maximizing what AI offers while minimizing negative effects on test takers and programs.

通过技术进步,我们已经大大提高了我们在一系列用途中创建丰富、复杂和有效评估的能力。人工智能(AI)支持的评估代表了一个这样的进步领域——一个吸引了我们集体兴趣和想象力的领域。组织和劳动力评估领域的科学家和实践者越来越多地在评估中使用人工智能,并且它的使用现在在教育中变得越来越普遍。虽然这些类型的解决方案为用户提供了效率、有效性和“惊喜因素”的承诺,但用户需要在高风险设置中保持高的有效性和公平性标准。由于一些人工智能方法和工具的复杂性,这种遵守标准的要求可能会挑战我们构建有效性和公平性论点的传统方法。在这个版本中,我们回顾了在教育评估领域,当有效性争论遇到人工智能时,这些挑战可能看起来像什么。我们特别探讨了人工智能如何影响以证据为中心的设计(ECD),以及从评估概念和编码到评分和报告的发展。我们还提供了有关确保这些系统不存在偏见的方法的信息。最后,我们讨论了未来的前景,其中许多即将到来,以最大限度地发挥人工智能的作用,同时最大限度地减少对考生和项目的负面影响。
{"title":"Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment","authors":"David W. Dorsey,&nbsp;Hillary R. Michaels","doi":"10.1111/jedm.12331","DOIUrl":"https://doi.org/10.1111/jedm.12331","url":null,"abstract":"<p>We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement—one that has captured our collective interest and imagination. Scientists and practitioners within the domains of organizational and workforce assessment have increasingly used AI in assessment, and its use is now becoming more common in education. While these types of solutions offer their users the promise of efficiency, effectiveness, and a “wow factor,” users need to maintain high standards for validity and fairness in high stakes settings. Due to the complexity of some AI methods and tools, this requirement for adherence to standards may challenge our traditional approaches to building validity and fairness arguments. In this edition, we review what these challenges may look like as validity arguments meet AI in educational assessment domains. We specifically explore how AI impacts Evidence-Centered Design (ECD) and development from assessment concept and coding to scoring and reporting. We also present information on ways to ensure that bias is not built into these systems. Lastly, we discuss future horizons, many that are almost here, for maximizing what AI offers while minimizing negative effects on test takers and programs.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 3","pages":"267-271"},"PeriodicalIF":1.3,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137805809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deterministic Gated Lognormal Response Time Model to Identify Examinees with Item Preknowledge 用项目先验知识识别考生的确定性门控对数正态响应时间模型
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-07-07 DOI: 10.1111/jedm.12340
Murat Kasli, Cengiz Zopluoglu, Sarah L. Toton

Response times (RTs) have recently attracted a significant amount of attention in the literature as they may provide meaningful information about item preknowledge. In this study, a new model, the Deterministic Gated Lognormal Response Time (DG-LNRT) model, is proposed to identify examinees with item preknowledge using RTs. The proposed model was applied to two different data sets and performance was assessed with false-positive rates, true-positive rates, and precision. The results were compared with another recently proposed Z-statistic. Follow-up simulation studies were also conducted to examine model performance in settings similar to the real data sets. The results indicate that the proposed model is viable and can help detect item preknowledge under certain conditions. However, its performance is highly dependent on the correct specification of the compromised items.

响应时间(RT)最近在文献中引起了大量关注,因为它们可以提供关于项目先验知识的有意义的信息。在本研究中,提出了一种新的模型,即确定性门控对数正态响应时间(DG-LNRT)模型,用于使用RT识别具有项目先验知识的考生。将所提出的模型应用于两个不同的数据集,并用假阳性率、真阳性率和精确度来评估性能。将结果与最近提出的另一种Z统计量进行了比较。还进行了后续模拟研究,以检查模型在与真实数据集类似的环境中的性能。结果表明,该模型是可行的,可以在一定条件下帮助检测项目先验知识。然而,其性能在很大程度上取决于受损物品的正确规格。
{"title":"A Deterministic Gated Lognormal Response Time Model to Identify Examinees with Item Preknowledge","authors":"Murat Kasli,&nbsp;Cengiz Zopluoglu,&nbsp;Sarah L. Toton","doi":"10.1111/jedm.12340","DOIUrl":"https://doi.org/10.1111/jedm.12340","url":null,"abstract":"<p>Response times (RTs) have recently attracted a significant amount of attention in the literature as they may provide meaningful information about item preknowledge. In this study, a new model, the Deterministic Gated Lognormal Response Time (DG-LNRT) model, is proposed to identify examinees with item preknowledge using RTs. The proposed model was applied to two different data sets and performance was assessed with false-positive rates, true-positive rates, and precision. The results were compared with another recently proposed Z-statistic. Follow-up simulation studies were also conducted to examine model performance in settings similar to the real data sets. The results indicate that the proposed model is viable and can help detect item preknowledge under certain conditions. However, its performance is highly dependent on the correct specification of the compromised items.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"148-169"},"PeriodicalIF":1.3,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50123901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes 基于层次结构属性划分的认知诊断多阶段测试
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-07-05 DOI: 10.1111/jedm.12339
Rae Yeong Kim, Yun Joo Yoo

In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages.

在认知诊断模型(CDMs)中,需要一组细粒度的属性来描述复杂问题的解决,并提供有关考生的详细诊断信息。然而,当测试的目的是在大规模的属性图中识别考生的属性概况时,如何保证可靠的估计和控制计算复杂度是一个挑战。为了解决这一问题,本研究提出了一种通过划分层次结构属性的认知诊断多阶段测试(CD-MST-PH)作为CDM的多阶段测试。在CD-MST-PH中,可以在测试发生之前基于单独的属性组构建多个测试let,这保留了多阶段测试相对于完全自适应测试或动态方法的优点。此外,测试块的顺序和自适应提供,从而提高了测试精度和效率。提出了一种项目信息测度来计算项目对每个属性的识别能力,并提出了一种模块组装方法来构建锚定在每个单独属性组上的模块。通过对认知诊断计算机化自适应测试中常用的项目选择指标的修改,提出了CD-MST-PH的若干模块选择指标。仿真研究结果表明,CD-MST-PH相对于传统无自适应阶段的测试,可以提高测试精度和效率。
{"title":"Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes","authors":"Rae Yeong Kim,&nbsp;Yun Joo Yoo","doi":"10.1111/jedm.12339","DOIUrl":"10.1111/jedm.12339","url":null,"abstract":"<p>In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"126-147"},"PeriodicalIF":1.3,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45947771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model 用简单结构MIRT模型估计多个测度的分类精度和一致性指标
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-06-20 DOI: 10.1111/jedm.12338
Seohee Park, Kyung Yong Kim, Won-Chan Lee

Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an approach to estimate classification consistency and accuracy indices for multiple measures under four possible decision rules: (1) complementary, (2) conjunctive, (3) compensatory, and (4) pairwise combinations of the three. The current study uses the IRT-recursive-based approach with the simple-structure multidimensional IRT model (SS-MIRT) to estimate the classification consistency and accuracy for multiple measures. Theoretical formulations of the four decision rules with a binary decision (Pass/Fail) are presented. The estimation procedures are illustrated using an empirical data example based on SS-MIRT. In addition, this study applies the estimation procedures to the unidimensional IRT (UIRT) context, considering that UIRT is practically used more. This application shows that the proposed procedure of classification consistency and accuracy could be used with a UIRT model for individual measures as an alternative method of SS-MIRT.

多种测试方法,如多种内容领域或多种类型的表现,在各种测试程序中被用于对考生进行筛选或选择。尽管多测度的用法比较普遍,但对多测度的分类一致性和准确性的研究却很少。基于此,本文提出了一种基于四种可能的决策规则(1)互补、(2)连接、(3)补偿和(4)三者的两两组合来估计多度量的分类一致性和准确度指标的方法。本研究采用基于IRT递归的方法,结合简单结构多维IRT模型(SS-MIRT)来估计多测量的分类一致性和准确性。给出了四种二元决策规则(通过/不通过)的理论表达式。利用基于SS-MIRT的经验数据示例说明了估计过程。此外,考虑到一维IRT的实际应用较多,本研究将估计过程应用于一维IRT情境。该应用表明,所提出的分类一致性和准确性的程序可以与单个测量的irt模型一起使用,作为SS-MIRT的替代方法。
{"title":"Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model","authors":"Seohee Park,&nbsp;Kyung Yong Kim,&nbsp;Won-Chan Lee","doi":"10.1111/jedm.12338","DOIUrl":"10.1111/jedm.12338","url":null,"abstract":"<p>Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an approach to estimate classification consistency and accuracy indices for multiple measures under four possible decision rules: (1) complementary, (2) conjunctive, (3) compensatory, and (4) pairwise combinations of the three. The current study uses the IRT-recursive-based approach with the simple-structure multidimensional IRT model (SS-MIRT) to estimate the classification consistency and accuracy for multiple measures. Theoretical formulations of the four decision rules with a binary decision (Pass/Fail) are presented. The estimation procedures are illustrated using an empirical data example based on SS-MIRT. In addition, this study applies the estimation procedures to the unidimensional IRT (UIRT) context, considering that UIRT is practically used more. This application shows that the proposed procedure of classification consistency and accuracy could be used with a UIRT model for individual measures as an alternative method of SS-MIRT.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"106-125"},"PeriodicalIF":1.3,"publicationDate":"2022-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45264295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Latent Space Model for Process Data 过程数据的潜在空间模型
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2022-06-12 DOI: 10.1111/jedm.12337
Yi Chen, Jingru Zhang, Yi Yang, Young-Sun Lee

The development of human-computer interactive items in educational assessments provides opportunities to extract useful process information for problem-solving. However, the complex, intensive, and noisy nature of process data makes it challenging to model with the traditional psychometric methods. Social network methods have been applied to visualize and analyze process data. Nonetheless, research about statistical modeling of process information using social network methods is still limited. This article explored the application of the latent space model (LSM) for analyzing process data in educational assessment. The adjacent matrix of transitions between actions was created based on the weighted and directed network of action sequences and related auxiliary information. Then, the adjacent matrix was modeled with LSM to identify the lower-dimensional latent positions of actions. Three applications based on the results from LSM were introduced: action clustering, error analysis, and performance measurement. The simulation study showed that LSM can cluster actions from the same problem-solving strategy and measure students’ performance by comparing their action sequences with the optimal strategy. Finally, we analyzed the empirical data from PISA 2012 as a real case scenario to illustrate how to use LSM.

教育评估中人机交互项目的发展为提取解决问题的有用过程信息提供了机会。然而,过程数据的复杂性、密集性和噪声性给传统的心理测量方法建模带来了挑战。社会网络方法已被应用于可视化和分析过程数据。然而,利用社会网络方法对过程信息进行统计建模的研究仍然有限。本文探讨了潜在空间模型(LSM)在教育评价过程数据分析中的应用。基于动作序列的加权有向网络和相关辅助信息,建立动作间的相邻过渡矩阵。然后,利用LSM对相邻矩阵进行建模,识别动作的低维潜在位置。介绍了基于LSM结果的三种应用:动作聚类、误差分析和性能测量。仿真研究表明,LSM可以聚类来自相同问题解决策略的动作,并通过比较学生的动作序列与最优策略来衡量学生的表现。最后,我们分析了PISA 2012的实证数据作为一个真实的案例场景来说明如何使用LSM。
{"title":"Latent Space Model for Process Data","authors":"Yi Chen,&nbsp;Jingru Zhang,&nbsp;Yi Yang,&nbsp;Young-Sun Lee","doi":"10.1111/jedm.12337","DOIUrl":"10.1111/jedm.12337","url":null,"abstract":"<p>The development of human-computer interactive items in educational assessments provides opportunities to extract useful process information for problem-solving. However, the complex, intensive, and noisy nature of process data makes it challenging to model with the traditional psychometric methods. Social network methods have been applied to visualize and analyze process data. Nonetheless, research about statistical modeling of process information using social network methods is still limited. This article explored the application of the latent space model (LSM) for analyzing process data in educational assessment. The adjacent matrix of transitions between actions was created based on the weighted and directed network of action sequences and related auxiliary information. Then, the adjacent matrix was modeled with LSM to identify the lower-dimensional latent positions of actions. Three applications based on the results from LSM were introduced: action clustering, error analysis, and performance measurement. The simulation study showed that LSM can cluster actions from the same problem-solving strategy and measure students’ performance by comparing their action sequences with the optimal strategy. Finally, we analyzed the empirical data from PISA 2012 as a real case scenario to illustrate how to use LSM.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 4","pages":"517-535"},"PeriodicalIF":1.3,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42099226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1