首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
A Unified Comparison of IRT-Based Effect Sizes for DIF Investigations DIF研究中基于IRT的效应大小的统一比较
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-11-07 DOI: 10.1111/jedm.12347
R. Philip Chalmers

Several marginal effect size (ES) statistics suitable for quantifying the magnitude of differential item functioning (DIF) have been proposed in the area of item response theory; for instance, the Differential Functioning of Items and Tests (DFIT) statistics, signed and unsigned item difference in the sample statistics (SIDS, UIDS, NSIDS, and NUIDS), the standardized indices of impact, and the differential response functioning (DRF) statistics. However, the relationship between these proposed statistics has not been fully discussed, particularly with respect to population parameter definitions and recovery performance across independent samples. To address these issues, this article provides a unified presentation of competing DIF ES definitions and estimators, and evaluates the recovery efficacy of these competing estimators using a set of Monte Carlo simulation experiments. Statistical and inferential properties of the estimators are discussed, as well as future areas of research in this model-based area of bias quantification.

在项目反应理论领域,提出了几种适用于量化差异项目功能(DIF)大小的边际效应量(ES)统计量;例如,项目和测试的差异功能(DFIT)统计、样本统计(SIDS、ids、NSIDS和NUIDS)中签名和未签名的项目差异、影响的标准化指数和差异响应功能(DRF)统计。然而,这些拟议统计数据之间的关系尚未得到充分讨论,特别是关于独立样本的总体参数定义和恢复性能。为了解决这些问题,本文提供了相互竞争的DIF ES定义和估计器的统一表示,并使用一组蒙特卡罗模拟实验评估了这些相互竞争的估计器的恢复效率。讨论了估计器的统计和推理性质,以及在这个基于模型的偏差量化领域的未来研究领域。
{"title":"A Unified Comparison of IRT-Based Effect Sizes for DIF Investigations","authors":"R. Philip Chalmers","doi":"10.1111/jedm.12347","DOIUrl":"10.1111/jedm.12347","url":null,"abstract":"<p>Several marginal effect size (ES) statistics suitable for quantifying the magnitude of differential item functioning (DIF) have been proposed in the area of item response theory; for instance, the Differential Functioning of Items and Tests (DFIT) statistics, signed and unsigned item difference in the sample statistics (SIDS, UIDS, NSIDS, and NUIDS), the standardized indices of impact, and the differential response functioning (DRF) statistics. However, the relationship between these proposed statistics has not been fully discussed, particularly with respect to population parameter definitions and recovery performance across independent samples. To address these issues, this article provides a unified presentation of competing DIF ES definitions and estimators, and evaluates the recovery efficacy of these competing estimators using a set of Monte Carlo simulation experiments. Statistical and inferential properties of the estimators are discussed, as well as future areas of research in this model-based area of bias quantification.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47360097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Statistical Test for the Detection of Item Compromise Combining Responses and Response Times 结合响应和响应时间检测项目折衷的统计检验
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-10-28 DOI: 10.1111/jedm.12346
Wim J. van der Linden, Dmitry I. Belov

A test of item compromise is presented which combines the test takers' responses and response times (RTs) into a statistic defined as the number of correct responses on the item for test takers with RTs flagged as suspicious. The test has null and alternative distributions belonging to the well-known family of compound binomial distributions, is simple to calculate, and has results that are easy to interpret. It also demonstrated nearly perfect power for the detection of compromise with no more than 10 test takers with preknowledge of the more difficult and discriminating items in a set of empirical examples. For the easier and less discriminating items, the presence of some 20 test takers with preknowledge still sufficed. A test based on the reverse statistic of the total time by test takers with responses flagged as suspicious may seem a natural alternative but misses the property of a monotone likelihood ratio necessary to decide between a test that should be left or right sided.

提出了一个项目妥协测试,该测试将考生的反应和反应时间(RTs)结合到一个统计数据中,该统计数据被定义为对被标记为可疑的考生的项目正确回答的数量。该检验具有零分布和备选分布,属于众所周知的复合二项分布家族,计算简单,结果易于解释。在一组经验例子中,它还展示了几乎完美的检测妥协的能力,不超过10名考生预先知道更难和有区别的项目。对于比较容易和不太容易辨别的题目,大约20个有预见性的考生就足够了。一个基于被标记为可疑的考生总时间的反向统计的测试似乎是一个自然的选择,但缺少单调似然比的属性,这是决定一个测试应该是左还是右的必要条件。
{"title":"A Statistical Test for the Detection of Item Compromise Combining Responses and Response Times","authors":"Wim J. van der Linden,&nbsp;Dmitry I. Belov","doi":"10.1111/jedm.12346","DOIUrl":"10.1111/jedm.12346","url":null,"abstract":"<p>A test of item compromise is presented which combines the test takers' responses and response times (RTs) into a statistic defined as the number of correct responses on the item for test takers with RTs flagged as suspicious. The test has null and alternative distributions belonging to the well-known family of compound binomial distributions, is simple to calculate, and has results that are easy to interpret. It also demonstrated nearly perfect power for the detection of compromise with no more than 10 test takers with preknowledge of the more difficult and discriminating items in a set of empirical examples. For the easier and less discriminating items, the presence of some 20 test takers with preknowledge still sufficed. A test based on the reverse statistic of the total time by test takers with responses flagged as suspicious may seem a natural alternative but misses the property of a monotone likelihood ratio necessary to decide between a test that should be left or right sided.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12346","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47060232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fully Gibbs Sampling Algorithms for Bayesian Variable Selection in Latent Regression Models 潜在回归模型中贝叶斯变量选择的全吉布斯采样算法
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-10-25 DOI: 10.1111/jedm.12348
Kazuhiro Yamaguchi, Jihong Zhang

This study proposed Gibbs sampling algorithms for variable selection in a latent regression model under a unidimensional two-parameter logistic item response theory model. Three types of shrinkage priors were employed to obtain shrinkage estimates: double-exponential (i.e., Laplace), horseshoe, and horseshoe+ priors. These shrinkage priors were compared to a uniform prior case in both simulation and real data analysis. The simulation study revealed that two types of horseshoe priors had a smaller root mean square errors and shorter 95% credible interval lengths than double-exponential or uniform priors. In addition, the horseshoe+ prior was slightly more stable than the horseshoe prior. The real data example successfully proved the utility of horseshoe and horseshoe+ priors in selecting effective predictive covariates for math achievement.

本研究在一维双参数逻辑项目反应理论模型下,提出了潜在回归模型中变量选择的Gibbs抽样算法。使用三种类型的收缩先验来获得收缩估计:双指数(即拉普拉斯)、马蹄形和马蹄形+先验。在模拟和实际数据分析中,将这些收缩先验与均匀先验情况进行了比较。模拟研究表明,两种类型的马蹄形先验比双指数或均匀先验具有更小的均方根误差和更短的95%可信区间长度。此外,马蹄形+先验比马蹄形先验稍微稳定一些。实际数据示例成功地证明了马蹄形和马蹄形+先验在为数学成绩选择有效预测协变量方面的效用。
{"title":"Fully Gibbs Sampling Algorithms for Bayesian Variable Selection in Latent Regression Models","authors":"Kazuhiro Yamaguchi,&nbsp;Jihong Zhang","doi":"10.1111/jedm.12348","DOIUrl":"https://doi.org/10.1111/jedm.12348","url":null,"abstract":"<p>This study proposed Gibbs sampling algorithms for variable selection in a latent regression model under a unidimensional two-parameter logistic item response theory model. Three types of shrinkage priors were employed to obtain shrinkage estimates: double-exponential (i.e., Laplace), horseshoe, and horseshoe+ priors. These shrinkage priors were compared to a uniform prior case in both simulation and real data analysis. The simulation study revealed that two types of horseshoe priors had a smaller root mean square errors and shorter 95% credible interval lengths than double-exponential or uniform priors. In addition, the horseshoe+ prior was slightly more stable than the horseshoe prior. The real data example successfully proved the utility of horseshoe and horseshoe+ priors in selecting effective predictive covariates for math achievement.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50154343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles 项目反应与反应指标确定性的因子混合模型识别学生知识概况
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-10-10 DOI: 10.1111/jedm.12344
Chia-Wen Chen, Björn Andersson, Jinxin Zhu

The certainty of response index (CRI) measures respondents' confidence level when answering an item. In conjunction with the answers to the items, previous studies have used descriptive statistics and arbitrary thresholds to identify student knowledge profiles with the CRIs. Whereas this approach overlooked the measurement error of the observed item responses and indices, we address this by proposing a factor mixture model that integrates a latent class model to detect student subgroups and a measurement model to control for student ability and confidence level. Applying the model to 773 seventh graders' responses to an algebra test, where some items were related to new material that had not been taught in class, we found two subgroups: (1) students who had high confidence in answering items involving the new material; and (2) students who had low confidence in answering items involving the new material but higher general self-confidence than the first group. We regressed the posterior probability of the group membership on gender, prior achievement, and preview behavior and found preview behavior a significant factor associated with the membership. Finally, we discussed the implications of the current study for teaching practices and future research.

回答的确定性指数(CRI)衡量被调查者在回答一个问题时的信心水平。结合这些问题的答案,以前的研究使用描述性统计和任意阈值来确定学生的知识概况与cri。鉴于这种方法忽略了观察到的项目反应和指数的测量误差,我们提出了一个因素混合模型,该模型集成了一个潜在类别模型来检测学生亚组和一个测量模型来控制学生的能力和信心水平。将该模型应用于773名七年级学生对代数测试的回答,其中一些问题与课堂上没有教过的新材料有关,我们发现了两个亚组:(1)对回答涉及新材料的问题有高信心的学生;(2)在回答新材料问题时自信心较低,但总体自信心高于第一组。我们对小组成员的性别、先前成就和预习行为的后验概率进行了回归,发现预习行为是影响小组成员的重要因素。最后,讨论了本研究对教学实践和未来研究的启示。
{"title":"A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles","authors":"Chia-Wen Chen,&nbsp;Björn Andersson,&nbsp;Jinxin Zhu","doi":"10.1111/jedm.12344","DOIUrl":"10.1111/jedm.12344","url":null,"abstract":"<p>The certainty of response index (CRI) measures respondents' confidence level when answering an item. In conjunction with the answers to the items, previous studies have used descriptive statistics and arbitrary thresholds to identify student knowledge profiles with the CRIs. Whereas this approach overlooked the measurement error of the observed item responses and indices, we address this by proposing a factor mixture model that integrates a latent class model to detect student subgroups and a measurement model to control for student ability and confidence level. Applying the model to 773 seventh graders' responses to an algebra test, where some items were related to new material that had not been taught in class, we found two subgroups: (1) students who had high confidence in answering items involving the new material; and (2) students who had low confidence in answering items involving the new material but higher general self-confidence than the first group. We regressed the posterior probability of the group membership on gender, prior achievement, and preview behavior and found preview behavior a significant factor associated with the membership. Finally, we discussed the implications of the current study for teaching practices and future research.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12344","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43460732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Betty Lanteigne, Christine Coombe, & James Dean Brown. 2021. Challenges in Language Testing around the World: Insights for language test users. Singapore: Springer, 2021, 129.99 € (hardcover), ISBN 978-981-33-4232-3 (eBook). xxiii + 553 pp. https://doi.org/10.1007/978-981-33-4232-3 Betty Lanteigne、Christine Coombe和James DeanBrown。2021.世界各地语言测试的挑战:语言测试用户的见解。新加坡:施普林格出版社,2021,129.99欧元(精装本),ISBN 978-981-33-4232-3(电子书)。xxiii+553页。https://doi.org/10.1007/978-981-33-4232-3
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-09-25 DOI: 10.1111/jedm.12343
Bahram Kazemian, Shafigeh Mohammadian
{"title":"Betty Lanteigne, Christine Coombe, & James Dean Brown. 2021. Challenges in Language Testing around the World: Insights for language test users. Singapore: Springer, 2021, 129.99 € (hardcover), ISBN 978-981-33-4232-3 (eBook). xxiii + 553 pp. https://doi.org/10.1007/978-981-33-4232-3","authors":"Bahram Kazemian,&nbsp;Shafigeh Mohammadian","doi":"10.1111/jedm.12343","DOIUrl":"10.1111/jedm.12343","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45401317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Item Scores and Distractors in Person-Fit Assessment 在个人适合度评估中使用项目分数和干扰因素
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-09-16 DOI: 10.1111/jedm.12345
Kylie Gorney, James A. Wollack

In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the lz$l_z$ and lz$l_z^*$ person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through detailed simulations, we show that the new statistics are more powerful than existing statistics in detecting several types of aberrant behavior, and that they are able to control the Type I error rate in instances where the model does not exactly fit the data. A real data example is also provided to demonstrate the utility of the new statistics in an operational setting.

为了检测大范围的异常行为,将二分项目得分以外的信息结合起来可能是有用的。本文扩展了l z$ l_z$和l z *$ l_z^*$的人拟合统计量,使得项目分数中的异常行为和项目干扰物中的异常行为可以作为异常的指标。通过详细的模拟,我们表明新的统计数据在检测几种异常行为方面比现有的统计数据更强大,并且在模型不完全适合数据的情况下,它们能够控制I型错误率。还提供了一个真实的数据示例,以演示新统计数据在操作设置中的实用性。
{"title":"Using Item Scores and Distractors in Person-Fit Assessment","authors":"Kylie Gorney,&nbsp;James A. Wollack","doi":"10.1111/jedm.12345","DOIUrl":"10.1111/jedm.12345","url":null,"abstract":"<p>In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>l</mi>\u0000 <mi>z</mi>\u0000 </msub>\u0000 <annotation>$l_z$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>l</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$l_z^*$</annotation>\u0000 </semantics></math> person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through detailed simulations, we show that the new statistics are more powerful than existing statistics in detecting several types of aberrant behavior, and that they are able to control the Type I error rate in instances where the model does not exactly fit the data. A real data example is also provided to demonstrate the utility of the new statistics in an operational setting.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12345","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48816866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A New Bayesian Person-Fit Analysis Method Using Pivotal Discrepancy Measures 使用关键差异度量的贝叶斯人-拟合分析新方法
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-09-02 DOI: 10.1111/jedm.12342
Adam Combs

A common method of checking person-fit in Bayesian item response theory (IRT) is the posterior-predictive (PP) method. In recent years, more powerful approaches have been proposed that are based on resampling methods using the popular Lz$L_{z}^{*}$ statistic. There has also been proposed a new Bayesian model checking method based on pivotal discrepancy measures (PDMs). A PDM T is a discrepancy measure that is a pivotal quantity with a known reference distribution. A posterior sample of T can be generated using standard Markov chain Monte Carlo output, and a p-value is obtained from probability bounds computed on order statistics of the sample. In this paper, we propose a general procedure to apply this PDM method to person-fit checking in IRT models. We illustrate this using the Lz$L_{z}$ and Lz$L_{z}^{*}$ measures. Simulation studies are done comparing these with the PP method and one of the more recent resampling methods. The results show that the PDM method is more powerful than the PP method. Under certain conditions, it is more powerful than the resampling method, while in others, it is less. The PDM method is also applied to a real data set.

在贝叶斯项目反应理论(IRT)中,检验个人拟合的常用方法是后验预测法(PP)。近年来,人们提出了基于重采样方法的更强大的方法,这些方法使用流行的L z∗$L_{z}^{*}$统计量。此外,还提出了一种基于关键差异测度(pdm)的贝叶斯模型检验方法。PDM T是一种差异度量,它是具有已知参考分布的关键量。使用标准马尔可夫链蒙特卡罗输出生成T的后验样本,根据样本的阶统计量计算概率界得到p值。在本文中,我们提出了一种将PDM方法应用于IRT模型中人拟合检验的一般程序。我们使用lz $L_{z}$和lz * $L_{z}^{*}$测度来说明这一点。将这些方法与PP方法和最近的一种重采样方法进行了仿真研究。结果表明,PDM方法比PP方法更有效。在某些条件下,它比重采样方法更强大,而在其他条件下,它更小。将PDM方法应用于实际数据集。
{"title":"A New Bayesian Person-Fit Analysis Method Using Pivotal Discrepancy Measures","authors":"Adam Combs","doi":"10.1111/jedm.12342","DOIUrl":"10.1111/jedm.12342","url":null,"abstract":"<p>A common method of checking person-fit in Bayesian item response theory (IRT) is the posterior-predictive (PP) method. In recent years, more powerful approaches have been proposed that are based on resampling methods using the popular <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$L_{z}^{*}$</annotation>\u0000 </semantics></math> statistic. There has also been proposed a new Bayesian model checking method based on pivotal discrepancy measures (PDMs). A PDM <i>T</i> is a discrepancy measure that is a pivotal quantity with a known reference distribution. A posterior sample of <i>T</i> can be generated using standard Markov chain Monte Carlo output, and a <i>p</i>-value is obtained from probability bounds computed on order statistics of the sample. In this paper, we propose a general procedure to apply this PDM method to person-fit checking in IRT models. We illustrate this using the <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 </msub>\u0000 <annotation>$L_{z}$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$L_{z}^{*}$</annotation>\u0000 </semantics></math> measures. Simulation studies are done comparing these with the PP method and one of the more recent resampling methods. The results show that the PDM method is more powerful than the PP method. Under certain conditions, it is more powerful than the resampling method, while in others, it is less. The PDM method is also applied to a real data set.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46358680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Several Variations of Simple-Structure MIRT Equating 简单结构MIRT方程的几种变体
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-07-28 DOI: 10.1111/jedm.12341
Stella Y. Kim, Won-Chan Lee

The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some notable differences between the multidimensional and unidimensional approaches when data exhibited evidence for multidimensionality. In addition, some of the proposed methods were successful in providing equating results for both section-level and composite-level scores, which has not been achieved by most of the existing methodologies. The traditional method of using a set of quadrature points and weights for equating turned out to be computationally intensive, particularly for the data with higher dimensions. The study suggested an alternative way of using the Monte-Carlo approach for such data. This study also proposed a simple-structure true-score equating procedure that does not rely on a multivariate observed-score distribution.

目前的研究提出了几种简单结构多维项目反应理论等价程序的变体。四组不同的数据被用来证明两种不同的等效设计的可行性:随机组设计和共同项目非等效组设计。研究结果表明,当数据显示出多维证据时,多维和一维方法之间存在显着差异。此外,一些拟议的方法成功地为科级和综合级分数提供了相等的结果,这是大多数现有方法无法做到的。传统的使用一组正交点和权值的方法是计算量很大的,特别是对于高维数据。该研究提出了使用蒙特卡罗方法处理此类数据的另一种方法。本研究还提出了一种结构简单的真实得分相等程序,该程序不依赖于多变量观察得分分布。
{"title":"Several Variations of Simple-Structure MIRT Equating","authors":"Stella Y. Kim,&nbsp;Won-Chan Lee","doi":"10.1111/jedm.12341","DOIUrl":"10.1111/jedm.12341","url":null,"abstract":"<p>The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some notable differences between the multidimensional and unidimensional approaches when data exhibited evidence for multidimensionality. In addition, some of the proposed methods were successful in providing equating results for both section-level and composite-level scores, which has not been achieved by most of the existing methodologies. The traditional method of using a set of quadrature points and weights for equating turned out to be computationally intensive, particularly for the data with higher dimensions. The study suggested an alternative way of using the Monte-Carlo approach for such data. This study also proposed a simple-structure true-score equating procedure that does not rely on a multivariate <i>observed</i>-score distribution.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49051834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment 创新教育评估中的有效性论证与人工智能
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-07-08 DOI: 10.1111/jedm.12331
David W. Dorsey, Hillary R. Michaels

We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement—one that has captured our collective interest and imagination. Scientists and practitioners within the domains of organizational and workforce assessment have increasingly used AI in assessment, and its use is now becoming more common in education. While these types of solutions offer their users the promise of efficiency, effectiveness, and a “wow factor,” users need to maintain high standards for validity and fairness in high stakes settings. Due to the complexity of some AI methods and tools, this requirement for adherence to standards may challenge our traditional approaches to building validity and fairness arguments. In this edition, we review what these challenges may look like as validity arguments meet AI in educational assessment domains. We specifically explore how AI impacts Evidence-Centered Design (ECD) and development from assessment concept and coding to scoring and reporting. We also present information on ways to ensure that bias is not built into these systems. Lastly, we discuss future horizons, many that are almost here, for maximizing what AI offers while minimizing negative effects on test takers and programs.

通过技术进步,我们已经大大提高了我们在一系列用途中创建丰富、复杂和有效评估的能力。人工智能(AI)支持的评估代表了一个这样的进步领域——一个吸引了我们集体兴趣和想象力的领域。组织和劳动力评估领域的科学家和实践者越来越多地在评估中使用人工智能,并且它的使用现在在教育中变得越来越普遍。虽然这些类型的解决方案为用户提供了效率、有效性和“惊喜因素”的承诺,但用户需要在高风险设置中保持高的有效性和公平性标准。由于一些人工智能方法和工具的复杂性,这种遵守标准的要求可能会挑战我们构建有效性和公平性论点的传统方法。在这个版本中,我们回顾了在教育评估领域,当有效性争论遇到人工智能时,这些挑战可能看起来像什么。我们特别探讨了人工智能如何影响以证据为中心的设计(ECD),以及从评估概念和编码到评分和报告的发展。我们还提供了有关确保这些系统不存在偏见的方法的信息。最后,我们讨论了未来的前景,其中许多即将到来,以最大限度地发挥人工智能的作用,同时最大限度地减少对考生和项目的负面影响。
{"title":"Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment","authors":"David W. Dorsey,&nbsp;Hillary R. Michaels","doi":"10.1111/jedm.12331","DOIUrl":"https://doi.org/10.1111/jedm.12331","url":null,"abstract":"<p>We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement—one that has captured our collective interest and imagination. Scientists and practitioners within the domains of organizational and workforce assessment have increasingly used AI in assessment, and its use is now becoming more common in education. While these types of solutions offer their users the promise of efficiency, effectiveness, and a “wow factor,” users need to maintain high standards for validity and fairness in high stakes settings. Due to the complexity of some AI methods and tools, this requirement for adherence to standards may challenge our traditional approaches to building validity and fairness arguments. In this edition, we review what these challenges may look like as validity arguments meet AI in educational assessment domains. We specifically explore how AI impacts Evidence-Centered Design (ECD) and development from assessment concept and coding to scoring and reporting. We also present information on ways to ensure that bias is not built into these systems. Lastly, we discuss future horizons, many that are almost here, for maximizing what AI offers while minimizing negative effects on test takers and programs.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137805809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deterministic Gated Lognormal Response Time Model to Identify Examinees with Item Preknowledge 用项目先验知识识别考生的确定性门控对数正态响应时间模型
IF 1.3 4区 心理学 Q1 Psychology Pub Date : 2022-07-07 DOI: 10.1111/jedm.12340
Murat Kasli, Cengiz Zopluoglu, Sarah L. Toton

Response times (RTs) have recently attracted a significant amount of attention in the literature as they may provide meaningful information about item preknowledge. In this study, a new model, the Deterministic Gated Lognormal Response Time (DG-LNRT) model, is proposed to identify examinees with item preknowledge using RTs. The proposed model was applied to two different data sets and performance was assessed with false-positive rates, true-positive rates, and precision. The results were compared with another recently proposed Z-statistic. Follow-up simulation studies were also conducted to examine model performance in settings similar to the real data sets. The results indicate that the proposed model is viable and can help detect item preknowledge under certain conditions. However, its performance is highly dependent on the correct specification of the compromised items.

响应时间(RT)最近在文献中引起了大量关注,因为它们可以提供关于项目先验知识的有意义的信息。在本研究中,提出了一种新的模型,即确定性门控对数正态响应时间(DG-LNRT)模型,用于使用RT识别具有项目先验知识的考生。将所提出的模型应用于两个不同的数据集,并用假阳性率、真阳性率和精确度来评估性能。将结果与最近提出的另一种Z统计量进行了比较。还进行了后续模拟研究,以检查模型在与真实数据集类似的环境中的性能。结果表明,该模型是可行的,可以在一定条件下帮助检测项目先验知识。然而,其性能在很大程度上取决于受损物品的正确规格。
{"title":"A Deterministic Gated Lognormal Response Time Model to Identify Examinees with Item Preknowledge","authors":"Murat Kasli,&nbsp;Cengiz Zopluoglu,&nbsp;Sarah L. Toton","doi":"10.1111/jedm.12340","DOIUrl":"https://doi.org/10.1111/jedm.12340","url":null,"abstract":"<p>Response times (RTs) have recently attracted a significant amount of attention in the literature as they may provide meaningful information about item preknowledge. In this study, a new model, the Deterministic Gated Lognormal Response Time (DG-LNRT) model, is proposed to identify examinees with item preknowledge using RTs. The proposed model was applied to two different data sets and performance was assessed with false-positive rates, true-positive rates, and precision. The results were compared with another recently proposed Z-statistic. Follow-up simulation studies were also conducted to examine model performance in settings similar to the real data sets. The results indicate that the proposed model is viable and can help detect item preknowledge under certain conditions. However, its performance is highly dependent on the correct specification of the compromised items.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50123901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1