Linking essay-writing tests using many-facet models and neural automated essay scoring.

IF 3.9 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Behavior Research Methods Pub Date : 2024-12-01 Epub Date: 2024-08-20 DOI:10.3758/s13428-024-02485-2

Masaki Uto, Kota Aramaki

{"title":"Linking essay-writing tests using many-facet models and neural automated essay scoring.","authors":"Masaki Uto, Kota Aramaki","doi":"10.3758/s13428-024-02485-2","DOIUrl":null,"url":null,"abstract":"<p><p>For essay-writing tests, challenges arise when scores assigned to essays are influenced by the characteristics of raters, such as rater severity and consistency. Item response theory (IRT) models incorporating rater parameters have been developed to tackle this issue, exemplified by the many-facet Rasch models. These IRT models enable the estimation of examinees' abilities while accounting for the impact of rater characteristics, thereby enhancing the accuracy of ability measurement. However, difficulties can arise when different groups of examinees are evaluated by different sets of raters. In such cases, test linking is essential for unifying the scale of model parameters estimated for individual examinee-rater groups. Traditional test-linking methods typically require administrators to design groups in which either examinees or raters are partially shared. However, this is often impractical in real-world testing scenarios. To address this, we introduce a novel method for linking the parameters of IRT models with rater parameters that uses neural automated essay scoring technology. Our experimental results indicate that our method successfully accomplishes test linking with accuracy comparable to that of linear linking using few common examinees.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8450-8479"},"PeriodicalIF":3.9000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525454/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02485-2","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

For essay-writing tests, challenges arise when scores assigned to essays are influenced by the characteristics of raters, such as rater severity and consistency. Item response theory (IRT) models incorporating rater parameters have been developed to tackle this issue, exemplified by the many-facet Rasch models. These IRT models enable the estimation of examinees' abilities while accounting for the impact of rater characteristics, thereby enhancing the accuracy of ability measurement. However, difficulties can arise when different groups of examinees are evaluated by different sets of raters. In such cases, test linking is essential for unifying the scale of model parameters estimated for individual examinee-rater groups. Traditional test-linking methods typically require administrators to design groups in which either examinees or raters are partially shared. However, this is often impractical in real-world testing scenarios. To address this, we introduce a novel method for linking the parameters of IRT models with rater parameters that uses neural automated essay scoring technology. Our experimental results indicate that our method successfully accomplishes test linking with accuracy comparable to that of linear linking using few common examinees.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用多面模型和神经自动作文评分将作文测试联系起来。

对于论文写作测试来说，如果论文的分数受到评分者特征（如评分者的严厉程度和一致性）的影响，就会出现挑战。为了解决这个问题，我们开发了包含评分者参数的项目反应理论（IRT）模型，例如多方面的 Rasch 模型。这些 IRT 模型可以在估计考生能力的同时考虑评分者特征的影响，从而提高能力测量的准确性。然而，当不同组别的考生由不同组别的评分者进行评价时，就会出现困难。在这种情况下，测试链接对于统一各个考生-评分者群体的模型参数估计规模至关重要。传统的测试链接方法通常要求管理者设计考生或评分者部分共享的组别。然而，这在实际测试场景中往往是不切实际的。为了解决这个问题，我们介绍了一种利用神经自动论文评分技术将 IRT 模型参数与评分者参数联系起来的新方法。实验结果表明，我们的方法成功地完成了测试链接，其准确性可与使用少数共同考生的线性链接相媲美。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Behavior Research Methods Multiple-

CiteScore

10.30

自引率

9.30%

发文量

266

期刊介绍： Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.