Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems

IF 1.4 4区心理学 Q3 PSYCHOLOGY, APPLIED Journal of Educational Measurement Pub Date : 2023-02-19 DOI:10.1111/jedm.12359

W. Jake Thompson, Brooke Nash, Amy K. Clark, Jeffrey C. Hoover

{"title":"Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems","authors":"W. Jake Thompson, Brooke Nash, Amy K. Clark, Jeffrey C. Hoover","doi":"10.1111/jedm.12359","DOIUrl":null,"url":null,"abstract":"<p>As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment systems. In this article, we describe and evaluate a method for simulating retests to summarize reliability evidence at multiple reporting levels. We evaluate how the performance of reliability estimates from simulated retests compares to other measures of classification consistency and accuracy for diagnostic assessments that have previously been described in the literature, but which limit the level at which reliability can be reported. Overall, the findings show that reliability estimates from simulated retests are an accurate measure of reliability and are consistent with other measures of reliability for diagnostic assessments. We then apply this method to real data from the Examination for the Certificate of Proficiency in English to demonstrate the method in practice and compare reliability estimates from observed data. Finally, we discuss implications for the field and possible next directions.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 3","pages":"455-475"},"PeriodicalIF":1.4000,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12359","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment systems. In this article, we describe and evaluate a method for simulating retests to summarize reliability evidence at multiple reporting levels. We evaluate how the performance of reliability estimates from simulated retests compares to other measures of classification consistency and accuracy for diagnostic assessments that have previously been described in the literature, but which limit the level at which reliability can be reported. Overall, the findings show that reliability estimates from simulated retests are an accurate measure of reliability and are consistent with other measures of reliability for diagnostic assessments. We then apply this method to real data from the Examination for the Certificate of Proficiency in English to demonstrate the method in practice and compare reliability estimates from observed data. Finally, we discuss implications for the field and possible next directions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用模拟复验估计诊断评估系统的可靠性

随着诊断分类模型在大规模作战评估中的应用越来越广泛，我们必须考虑可靠性的估计和报告方法。研究人员必须探索与诊断评估系统的设计、评分和报告水平一致的传统可靠性方法的替代方案。在本文中，我们描述和评估了一种模拟复测的方法，以总结多个报告水平的可靠性证据。我们评估从模拟复测的可靠性估计的性能如何与以前在文献中描述的诊断评估的分类一致性和准确性的其他措施进行比较，但这限制了可靠性可以报告的水平。总的来说，研究结果表明，模拟复测的可靠性估计是可靠度的准确度量，并且与诊断评估的其他可靠性度量一致。然后，我们将该方法应用于英语水平证书考试的真实数据，以在实践中证明该方法，并比较观察数据的可靠性估计。最后，我们讨论了该领域的意义和可能的下一步方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Educational Measurement Multiple-

CiteScore

2.30

自引率

7.70%

发文量

期刊介绍： The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.