Phakawat Wangkriangkri, Chanissara Viboonlarp, Attapol T. Rutherford, E. Chuangsuwanich
{"title":"基于对抗性输入的自动作文评分预训练语言模型的比较研究","authors":"Phakawat Wangkriangkri, Chanissara Viboonlarp, Attapol T. Rutherford, E. Chuangsuwanich","doi":"10.1109/TENCON50793.2020.9293930","DOIUrl":null,"url":null,"abstract":"Automated Essay Scoring (AES) is a task that deals with grading written essays automatically without human intervention. This study compares the performance of three AES models which utilize different text embedding methods, namely Global Vectors for Word Representation (GloVe), Embeddings from Language Models (ELMo), and Bidirectional Encoder Representations from Transformers (BERT). We used two evaluation metrics: Quadratic Weighted Kappa (QWK) and a novel \"robustness\", which quantifies the models’ ability to detect adversarial essays created by modifying normal essays to cause them to be less coherent. We found that: (1) the BERT-based model achieved the greatest robustness, followed by the GloVe-based and ELMo-based models, respectively, and (2) fine-tuning the embeddings improves QWK but lowers robustness. These findings could be informative on how to choose, and whether to fine-tune, an appropriate model based on how much the AES program places emphasis on proper grading of adversarial essays.","PeriodicalId":283131,"journal":{"name":"2020 IEEE REGION 10 CONFERENCE (TENCON)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Comparative Study of Pretrained Language Models for Automated Essay Scoring with Adversarial Inputs\",\"authors\":\"Phakawat Wangkriangkri, Chanissara Viboonlarp, Attapol T. Rutherford, E. Chuangsuwanich\",\"doi\":\"10.1109/TENCON50793.2020.9293930\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated Essay Scoring (AES) is a task that deals with grading written essays automatically without human intervention. This study compares the performance of three AES models which utilize different text embedding methods, namely Global Vectors for Word Representation (GloVe), Embeddings from Language Models (ELMo), and Bidirectional Encoder Representations from Transformers (BERT). We used two evaluation metrics: Quadratic Weighted Kappa (QWK) and a novel \\\"robustness\\\", which quantifies the models’ ability to detect adversarial essays created by modifying normal essays to cause them to be less coherent. We found that: (1) the BERT-based model achieved the greatest robustness, followed by the GloVe-based and ELMo-based models, respectively, and (2) fine-tuning the embeddings improves QWK but lowers robustness. These findings could be informative on how to choose, and whether to fine-tune, an appropriate model based on how much the AES program places emphasis on proper grading of adversarial essays.\",\"PeriodicalId\":283131,\"journal\":{\"name\":\"2020 IEEE REGION 10 CONFERENCE (TENCON)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE REGION 10 CONFERENCE (TENCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENCON50793.2020.9293930\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE REGION 10 CONFERENCE (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON50793.2020.9293930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Study of Pretrained Language Models for Automated Essay Scoring with Adversarial Inputs
Automated Essay Scoring (AES) is a task that deals with grading written essays automatically without human intervention. This study compares the performance of three AES models which utilize different text embedding methods, namely Global Vectors for Word Representation (GloVe), Embeddings from Language Models (ELMo), and Bidirectional Encoder Representations from Transformers (BERT). We used two evaluation metrics: Quadratic Weighted Kappa (QWK) and a novel "robustness", which quantifies the models’ ability to detect adversarial essays created by modifying normal essays to cause them to be less coherent. We found that: (1) the BERT-based model achieved the greatest robustness, followed by the GloVe-based and ELMo-based models, respectively, and (2) fine-tuning the embeddings improves QWK but lowers robustness. These findings could be informative on how to choose, and whether to fine-tune, an appropriate model based on how much the AES program places emphasis on proper grading of adversarial essays.