Score Comparability between Online Proctored and In-Person Credentialing Exams

IF 1.6 4区心理学 Q3 PSYCHOLOGY, APPLIED Journal of Educational Measurement Pub Date : 2022-04-27 DOI:10.1111/jedm.12320

Paul Jones, Ye Tong, Jinghua Liu, Joshua Borglum, Vince Primoli

{"title":"Score Comparability between Online Proctored and In-Person Credentialing Exams","authors":"Paul Jones, Ye Tong, Jinghua Liu, Joshua Borglum, Vince Primoli","doi":"10.1111/jedm.12320","DOIUrl":null,"url":null,"abstract":"<p>This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a “modal scale comparison approach,” where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The calibrations from all three groups were used to score the TC2 cohort, designated the validation sample. The TC1 item parameters and TC1-based thetas and pass rates were more like the native TC2 values than the OP1-based values, indicating mode effects, but the score and pass/fail decision differences were small. In Study 2, we used a “cross-modal repeater approach” in which test takers who failed their first attempt in one modality took the test again in either the same or different modality. The two pairs of repeater groups (TC → TC: TC → OP, and OP → OP: OP → TC) were matched exactly on their first attempt scores. Results showed increased pass rate and greater score variability in all conditions involving OP, with mode effects noticeable in both the TC → OP condition and less-strongly in the OP → TC condition. Limitations of the study and implications for exam developers were discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 2","pages":"180-207"},"PeriodicalIF":1.6000,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12320","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}

引用次数: 4

Abstract

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a “modal scale comparison approach,” where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The calibrations from all three groups were used to score the TC2 cohort, designated the validation sample. The TC1 item parameters and TC1-based thetas and pass rates were more like the native TC2 values than the OP1-based values, indicating mode effects, but the score and pass/fail decision differences were small. In Study 2, we used a “cross-modal repeater approach” in which test takers who failed their first attempt in one modality took the test again in either the same or different modality. The two pairs of repeater groups (TC → TC: TC → OP, and OP → OP: OP → TC) were matched exactly on their first attempt scores. Results showed increased pass rate and greater score variability in all conditions involving OP, with mode effects noticeable in both the TC → OP condition and less-strongly in the OP → TC condition. Limitations of the study and implications for exam developers were discussed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在线监考和现场考试之间的分数可比性

本文研究了两种检测两种认证考试模式效应的方法。在研究1中，我们使用了“模态量表比较方法”，其中在两个TC队列(TC1和TC2)和一个OP队列(OP1)中分别校准相同的项目池，而不进行转换，其基于池的量表得分分布相匹配。使用所有三组的校准值对TC2队列进行评分，指定验证样本。TC1项目参数和基于TC1的theta和通过率比基于op1的值更接近原生TC2值，表明模式效应，但得分和通过/不通过决策差异较小。在研究2中，我们使用了“跨模态重复测试方法”，即在第一次测试中失败的应试者用相同或不同的模态再次进行测试。两对重复组(TC→TC: TC→OP和OP→OP: OP→TC)的第一次尝试分数完全匹配。结果显示，在所有涉及OP的条件下，通过率增加，得分变异性更大，模式效应在TC→OP条件下都很明显，而在OP→TC条件下则不那么强烈。讨论了本研究的局限性和对考试开发者的启示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Educational Measurement Multiple-

CiteScore

2.30

自引率

7.70%

发文量

期刊介绍： The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.

期刊最新文献

Evaluating General-Purpose Multimodal AI for Q-Matrix Generation from Math Items: A Cognitive Diagnostic Modeling Exploration AI and Measurement Concerns: Dealing with Imbalanced Data in Autoscoring Correction to “Using GPT-4 to Augment Imbalanced Data for Automatic Scoring” Issue Information Issue Information