Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process.

IF 2.3 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Educational and Psychological Measurement Pub Date : 2025-04-01 Epub Date: 2024-09-24 DOI:10.1177/00131644241278925

Dongwei Wang, Lisa A Keller

{"title":"Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process.","authors":"Dongwei Wang, Lisa A Keller","doi":"10.1177/00131644241278925","DOIUrl":null,"url":null,"abstract":"In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index J. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"313-335"},"PeriodicalIF":2.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562877/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Educational and Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/00131644241278925","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/24 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index J. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在标准制定过程中使用 ROC 分析法完善切分分数。

在教育评估中，切分通常是由一组学科专家通过制定标准来确定的。本研究旨在利用接收者操作特征（ROC）分析法调查几个因素对分类准确性的影响，以便在需要完善切分分值时提供统计和理论依据。研究中考察的因素包括相对于切分分值的样本分布、阳性事件的发生率和成本比率。针对四种样本分布的受试者模拟了 40 个项目的回答。此外，还对假阴性和假阳性之间的流行率和成本比进行了处理，以检查它们对分类准确性的影响。结果表明，评价标准所确定的最佳切分往往会使切分更接近能力分布的模式。此外，根据正向事件的发生率和成本比率，最佳切分也会相应地发生变化。根据用于模拟数据的项目参数和模拟样本分布，我们发现，当通过考试在人群中属于低流行率事件时，提高切分分值可在操作上改善分类；而当通过考试属于高流行率事件时，则应降低切分分值以达到最优。随着成本比率的增加，评价标准所建议的最优切分分数会降低。在本研究考察的四个样本分布中，有三个样本在人群中的流行率为 50%时，无论成本比如何，提高切分分值都能增强分类效果。本研究为出于政策原因需要完善切分值时提供了统计证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Educational and Psychological Measurement 医学-数学跨学科应用

CiteScore

5.50

自引率

7.40%

发文量

审稿时长

6-12 weeks

期刊介绍： Educational and Psychological Measurement (EPM) publishes referred scholarly work from all academic disciplines interested in the study of measurement theory, problems, and issues. Theoretical articles address new developments and techniques, and applied articles deal with innovation applications.