Thomas Gaillat, A. Simpkin, Nicolas Ballier, Bernardo Stearns, Annanda Sousa, Manon Bouyé, Manel Zarrouk
{"title":"Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach","authors":"Thomas Gaillat, A. Simpkin, Nicolas Ballier, Bernardo Stearns, Annanda Sousa, Manon Bouyé, Manel Zarrouk","doi":"10.1017/S095834402100029X","DOIUrl":null,"url":null,"abstract":"Abstract This paper focuses on automatically assessing language proficiency levels according to linguistic complexity in learner English. We implement a supervised learning approach as part of an automatic essay scoring system. The objective is to uncover Common European Framework of Reference for Languages (CEFR) criterial features in writings by learners of English as a foreign language. Our method relies on the concept of microsystems with features related to learner-specific linguistic systems in which several forms operate paradigmatically. Results on internal data show that different microsystems help classify writings from A1 to C2 levels (82% balanced accuracy). Overall results on external data show that a combination of lexical, syntactic, cohesive and accuracy features yields the most efficient classification across several corpora (59.2% balanced accuracy).","PeriodicalId":47046,"journal":{"name":"Recall","volume":"34 1","pages":"130 - 146"},"PeriodicalIF":4.6000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recall","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/S095834402100029X","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 8
Abstract
Abstract This paper focuses on automatically assessing language proficiency levels according to linguistic complexity in learner English. We implement a supervised learning approach as part of an automatic essay scoring system. The objective is to uncover Common European Framework of Reference for Languages (CEFR) criterial features in writings by learners of English as a foreign language. Our method relies on the concept of microsystems with features related to learner-specific linguistic systems in which several forms operate paradigmatically. Results on internal data show that different microsystems help classify writings from A1 to C2 levels (82% balanced accuracy). Overall results on external data show that a combination of lexical, syntactic, cohesive and accuracy features yields the most efficient classification across several corpora (59.2% balanced accuracy).