A Comparative Study of Pretrained Language Models for Automated Essay Scoring with Adversarial Inputs

2020 IEEE REGION 10 CONFERENCE (TENCON) Pub Date : 2020-11-16 DOI:10.1109/TENCON50793.2020.9293930

Phakawat Wangkriangkri, Chanissara Viboonlarp, Attapol T. Rutherford, E. Chuangsuwanich

引用次数: 3

Abstract

Automated Essay Scoring (AES) is a task that deals with grading written essays automatically without human intervention. This study compares the performance of three AES models which utilize different text embedding methods, namely Global Vectors for Word Representation (GloVe), Embeddings from Language Models (ELMo), and Bidirectional Encoder Representations from Transformers (BERT). We used two evaluation metrics: Quadratic Weighted Kappa (QWK) and a novel "robustness", which quantifies the models’ ability to detect adversarial essays created by modifying normal essays to cause them to be less coherent. We found that: (1) the BERT-based model achieved the greatest robustness, followed by the GloVe-based and ELMo-based models, respectively, and (2) fine-tuning the embeddings improves QWK but lowers robustness. These findings could be informative on how to choose, and whether to fine-tune, an appropriate model based on how much the AES program places emphasis on proper grading of adversarial essays.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于对抗性输入的自动作文评分预训练语言模型的比较研究

自动论文评分(AES)是一项在没有人为干预的情况下自动对书面论文进行评分的任务。本研究比较了使用不同文本嵌入方法的三种AES模型的性能，即全局向量词表示(GloVe)、语言模型嵌入(ELMo)和变形器双向编码器表示(BERT)。我们使用了两种评估指标:二次加权Kappa (QWK)和一种新的“鲁棒性”，它量化了模型检测对抗性文章的能力，这种对抗性文章是通过修改正常文章而产生的，从而使它们不那么连贯。我们发现:(1)基于bert的模型鲁棒性最强，其次是基于glove和elmo的模型;(2)微调嵌入提高了QWK，但降低了鲁棒性。这些发现可以为如何选择以及是否微调一个适当的模型提供信息，该模型基于AES程序在多大程度上强调对抗性论文的适当评分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE REGION 10 CONFERENCE (TENCON)

自引率

0.00%

发文量