Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process.

IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Educational and Psychological Measurement Pub Date : 2024-09-24 DOI:10.1177/00131644241278925
Dongwei Wang, Lisa A Keller
{"title":"Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process.","authors":"Dongwei Wang, Lisa A Keller","doi":"10.1177/00131644241278925","DOIUrl":null,"url":null,"abstract":"<p><p>In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index <i>J</i>. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241278925"},"PeriodicalIF":2.1000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562877/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Educational and Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/00131644241278925","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index J. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在标准制定过程中使用 ROC 分析法完善切分分数。
在教育评估中,切分通常是由一组学科专家通过制定标准来确定的。本研究旨在利用接收者操作特征(ROC)分析法调查几个因素对分类准确性的影响,以便在需要完善切分分值时提供统计和理论依据。研究中考察的因素包括相对于切分分值的样本分布、阳性事件的发生率和成本比率。针对四种样本分布的受试者模拟了 40 个项目的回答。此外,还对假阴性和假阳性之间的流行率和成本比进行了处理,以检查它们对分类准确性的影响。结果表明,评价标准所确定的最佳切分往往会使切分更接近能力分布的模式。此外,根据正向事件的发生率和成本比率,最佳切分也会相应地发生变化。根据用于模拟数据的项目参数和模拟样本分布,我们发现,当通过考试在人群中属于低流行率事件时,提高切分分值可在操作上改善分类;而当通过考试属于高流行率事件时,则应降低切分分值以达到最优。随着成本比率的增加,评价标准所建议的最优切分分数会降低。在本研究考察的四个样本分布中,有三个样本在人群中的流行率为 50%时,无论成本比如何,提高切分分值都能增强分类效果。本研究为出于政策原因需要完善切分值时提供了统计证据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Educational and Psychological Measurement
Educational and Psychological Measurement 医学-数学跨学科应用
CiteScore
5.50
自引率
7.40%
发文量
49
审稿时长
6-12 weeks
期刊介绍: Educational and Psychological Measurement (EPM) publishes referred scholarly work from all academic disciplines interested in the study of measurement theory, problems, and issues. Theoretical articles address new developments and techniques, and applied articles deal with innovation applications.
期刊最新文献
Optimal Number of Replications for Obtaining Stable Dynamic Fit Index Cutoffs. Invariance: What Does Measurement Invariance Allow Us to Claim? Detecting Differential Item Functioning Using Response Time. Assessing the Speed-Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations. On Latent Structure Examination of Behavioral Measuring Instruments in Complex Empirical Settings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1