评估筛选程序的质量:假阳性率下限与评分者间可靠性的关系

IF 1.5 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS British Journal of Mathematical & Statistical Psychology Pub Date : 2024-04-15 DOI:10.1111/bmsp.12343
František Bartoš, Patrícia Martinková
{"title":"评估筛选程序的质量:假阳性率下限与评分者间可靠性的关系","authors":"František Bartoš,&nbsp;Patrícia Martinková","doi":"10.1111/bmsp.12343","DOIUrl":null,"url":null,"abstract":"<p>Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the IRR and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures. We also discuss other possible uses of the explored connections in other contexts, such as educational testing, psychological assessment, and health-related measurement, and implement the computations in the R package <span>IRR2FPR</span>.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"651-671"},"PeriodicalIF":1.5000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12343","citationCount":"0","resultStr":"{\"title\":\"Assessing quality of selection procedures: Lower bound of false positive rate as a function of inter-rater reliability\",\"authors\":\"František Bartoš,&nbsp;Patrícia Martinková\",\"doi\":\"10.1111/bmsp.12343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the IRR and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures. We also discuss other possible uses of the explored connections in other contexts, such as educational testing, psychological assessment, and health-related measurement, and implement the computations in the R package <span>IRR2FPR</span>.</p>\",\"PeriodicalId\":55322,\"journal\":{\"name\":\"British Journal of Mathematical & Statistical Psychology\",\"volume\":\"77 3\",\"pages\":\"651-671\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12343\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"British Journal of Mathematical & Statistical Psychology\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/bmsp.12343\",\"RegionNum\":3,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/bmsp.12343","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

评分者之间的可靠性(IRR)是评估多个评分者评分质量的常用工具之一。然而,基于多个评分者评分的申请人甄选程序通常会产生二元结果:申请人要么被选中,要么不被选中。IRR 并不考虑这一最终结果,而是将重点放在对单个主体或对象的评分上。我们概述了评级测量模型(用于 IRR)与二元分类框架之间的联系。我们开发了一种近似正确选择最佳申请人概率的简单方法,通过这种方法,我们可以计算选择程序的错误概率(即假阳性率和假阴性率)或其下限。我们得出了 IRR 和二元分类指标之间的联系,表明二元分类指标完全取决于 IRR 系数和入选申请人的比例。我们在模拟研究中评估了近似值的性能,并将其应用于一个比较多个基金同行评审选择程序可靠性的例子中。我们还讨论了在教育测试、心理评估和健康相关测量等其他情况下探索出的联系的其他可能用途,并在 R 软件包 IRR2FPR 中实现了计算。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Assessing quality of selection procedures: Lower bound of false positive rate as a function of inter-rater reliability

Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the IRR and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures. We also discuss other possible uses of the explored connections in other contexts, such as educational testing, psychological assessment, and health-related measurement, and implement the computations in the R package IRR2FPR.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.00
自引率
3.80%
发文量
34
审稿时长
>12 weeks
期刊介绍: The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.
期刊最新文献
A new Q-matrix validation method based on signal detection theory. Discriminability around polytomous knowledge structures and polytomous functions. Understanding linear interaction analysis with causal graphs. Identifiability analysis of the fixed-effects one-parameter logistic positive exponent model. Regularized Bayesian algorithms for Q-matrix inference based on saturated cognitive diagnosis modelling.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1