Forced-Choice Ranking Models for Raters’ Ranking Data

IF 1.9 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH Journal of Educational and Behavioral Statistics Pub Date : 2022-07-07 DOI:10.3102/10769986221104207

Su-Pin Hung, Hung-Yu Huang

{"title":"Forced-Choice Ranking Models for Raters’ Ranking Data","authors":"Su-Pin Hung, Hung-Yu Huang","doi":"10.3102/10769986221104207","DOIUrl":null,"url":null,"abstract":"To address response style or bias in rating scales, forced-choice items are often used to request that respondents rank their attitudes or preferences among a limited set of options. The rating scales used by raters to render judgments on ratees’ performance also contribute to rater bias or errors; consequently, forced-choice items have recently been employed for raters to rate how a ratee performs in certain defined traits. This study develops forced-choice ranking models (FCRMs) for data analysis when performance is evaluated by external raters or experts in a forced-choice ranking format. The proposed FCRMs consider different degrees of raters’ leniency/severity when modeling the selection probability in the generalized unfolding item response theory framework. They include an additional topic facet when multiple tasks are evaluated and incorporate variations in leniency parameters to capture the interactions between ratees and raters. The simulation results indicate that the parameters of the new models can be satisfactorily recovered and that better parameter recovery is associated with more item blocks, larger sample sizes, and a complete ranking design. A technological creativity assessment is presented as an empirical example with which to demonstrate the applicability and implications of the new models.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"47 1","pages":"603 - 634"},"PeriodicalIF":1.9000,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational and Behavioral Statistics","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3102/10769986221104207","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 1

Abstract

To address response style or bias in rating scales, forced-choice items are often used to request that respondents rank their attitudes or preferences among a limited set of options. The rating scales used by raters to render judgments on ratees’ performance also contribute to rater bias or errors; consequently, forced-choice items have recently been employed for raters to rate how a ratee performs in certain defined traits. This study develops forced-choice ranking models (FCRMs) for data analysis when performance is evaluated by external raters or experts in a forced-choice ranking format. The proposed FCRMs consider different degrees of raters’ leniency/severity when modeling the selection probability in the generalized unfolding item response theory framework. They include an additional topic facet when multiple tasks are evaluated and incorporate variations in leniency parameters to capture the interactions between ratees and raters. The simulation results indicate that the parameters of the new models can be satisfactorily recovered and that better parameter recovery is associated with more item blocks, larger sample sizes, and a complete ranking design. A technological creativity assessment is presented as an empirical example with which to demonstrate the applicability and implications of the new models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评分者排名数据的强迫选择排名模型

为了解决评分量表中的回答风格或偏见，通常使用强迫选择项目来要求受访者在有限的一组选项中对他们的态度或偏好进行排名。评价员用来对评价者的表现作出判断的评价表也会造成评价员的偏见或错误;因此，评分员最近使用了强制选择项目来评估一个人在某些特定特征中的表现。本研究开发了用于数据分析的强迫选择排名模型(fcrm)，当性能由外部评分者或专家以强迫选择排名格式进行评估时。在广义展开项目反应理论框架中，提出的fcrm模型在建模选择概率时考虑了不同程度的评分者的宽严程度。当评估多个任务时，它们包括一个额外的主题方面，并包含宽大参数的变化，以捕获评价者和评价者之间的交互。仿真结果表明，新模型能较好地恢复参数，且参数恢复越好，项目块越多，样本量越大，排序设计越完善。技术创造力评估是一个实证的例子，其中展示了新模型的适用性和影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Educational and Behavioral Statistics Multiple-

CiteScore

4.40

自引率

4.20%

发文量

期刊介绍： Journal of Educational and Behavioral Statistics, sponsored jointly by the American Educational Research Association and the American Statistical Association, publishes articles that are original and provide methods that are useful to those studying problems and issues in educational or behavioral research. Typical papers introduce new methods of analysis. Critical reviews of current practice, tutorial presentations of less well known methods, and novel applications of already-known methods are also of interest. Papers discussing statistical techniques without specific educational or behavioral interest or focusing on substantive results without developing new statistical methods or models or making novel use of existing methods have lower priority. Simulation studies, either to demonstrate properties of an existing method or to compare several existing methods (without providing a new method), also have low priority. The Journal of Educational and Behavioral Statistics provides an outlet for papers that are original and provide methods that are useful to those studying problems and issues in educational or behavioral research. Typical papers introduce new methods of analysis, provide properties of these methods, and an example of use in education or behavioral research. Critical reviews of current practice, tutorial presentations of less well known methods, and novel applications of already-known methods are also sometimes accepted. Papers discussing statistical techniques without specific educational or behavioral interest or focusing on substantive results without developing new statistical methods or models or making novel use of existing methods have lower priority. Simulation studies, either to demonstrate properties of an existing method or to compare several existing methods (without providing a new method), also have low priority.