Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He
{"title":"Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling","authors":"Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He","doi":"10.1080/15305058.2018.1486316","DOIUrl":null,"url":null,"abstract":"The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"18 1","pages":"366 - 391"},"PeriodicalIF":1.0000,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1486316","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Testing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15305058.2018.1486316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}
引用次数: 3
Abstract
The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.