{"title":"Scored and Error-annotated Essay Dataset of Chinese EFL/ESL Learners","authors":"Kai Jin, Wuying Liu","doi":"10.1145/3508230.3508245","DOIUrl":null,"url":null,"abstract":"A certain scale of finely annotated essay dataset of EFL/ESL (English as a foreign language or the second language) learners is not only an important language resource for language research and teaching, but also contributing materials for language-related computing science. Unfortunately, this type of data open on the Internet are not only of small quantity but also of uneven quality, especially such data of Chinese learners. We collected 147 essays of Chinese EFL/ESL learners and had four teachers score them under the same criteria and one teacher annotate major errors, and have them scored in Pigai scoring system. We then structured the score file, error-annotated files, essay files together with context information, and built the Scored and Error-annotated Essay Dataset of Chinese EFL/ESL Learners (SeedCel) which is open on the Internet and will be incrementally updated. This paper explains how SeedCel is constructed, what the details of SeedCel are, and where SeedCel will be used.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A certain scale of finely annotated essay dataset of EFL/ESL (English as a foreign language or the second language) learners is not only an important language resource for language research and teaching, but also contributing materials for language-related computing science. Unfortunately, this type of data open on the Internet are not only of small quantity but also of uneven quality, especially such data of Chinese learners. We collected 147 essays of Chinese EFL/ESL learners and had four teachers score them under the same criteria and one teacher annotate major errors, and have them scored in Pigai scoring system. We then structured the score file, error-annotated files, essay files together with context information, and built the Scored and Error-annotated Essay Dataset of Chinese EFL/ESL Learners (SeedCel) which is open on the Internet and will be incrementally updated. This paper explains how SeedCel is constructed, what the details of SeedCel are, and where SeedCel will be used.