Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He
{"title":"运用比较判断和Rasch模型研究考试难度的可比性","authors":"Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He","doi":"10.1080/15305058.2018.1486316","DOIUrl":null,"url":null,"abstract":"The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"18 1","pages":"366 - 391"},"PeriodicalIF":1.0000,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1486316","citationCount":"3","resultStr":"{\"title\":\"Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling\",\"authors\":\"Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He\",\"doi\":\"10.1080/15305058.2018.1486316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.\",\"PeriodicalId\":46615,\"journal\":{\"name\":\"International Journal of Testing\",\"volume\":\"18 1\",\"pages\":\"366 - 391\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2018-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1080/15305058.2018.1486316\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Testing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/15305058.2018.1486316\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOCIAL SCIENCES, INTERDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Testing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15305058.2018.1486316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}
Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling
The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.