{"title":"错误标注中的分类和识别级别歧义","authors":"Alexandros Tantos, Nikolaos Amvrazis","doi":"10.1016/j.acorp.2022.100035","DOIUrl":null,"url":null,"abstract":"<div><p>The vast majority of corpus annotation projects goes through a piloting phase in which the annotation scheme is gradually shaped through iterative annotation cycles until its final version is produced and applied to the collected data. The differences in annotators’ choices are usually recorded and reflected by the ‘Inter-annotator Agreement’ (IAA) that serves as a proxy to understand and resolve the raised issues. However, little has been reported on how to formulate a systematic approach to: (i) tracing the source of the differences in the annotators’ choices and (ii) provide attainable solutions that would considerably increase IAA. In this paper, the ‘Greek Learner Corpus II’ (GLCII) -the largest online greek learner corpus will serve as a basis to shed light on two commonly met types of ambiguity in error annotation that are closely related to target languages in which syncretism is ubiquitous in grammar (e.g., Greek and Romanian): a classification level and an identification level ambiguity.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100035"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Classification and identification level ambiguity in error annotation\",\"authors\":\"Alexandros Tantos, Nikolaos Amvrazis\",\"doi\":\"10.1016/j.acorp.2022.100035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The vast majority of corpus annotation projects goes through a piloting phase in which the annotation scheme is gradually shaped through iterative annotation cycles until its final version is produced and applied to the collected data. The differences in annotators’ choices are usually recorded and reflected by the ‘Inter-annotator Agreement’ (IAA) that serves as a proxy to understand and resolve the raised issues. However, little has been reported on how to formulate a systematic approach to: (i) tracing the source of the differences in the annotators’ choices and (ii) provide attainable solutions that would considerably increase IAA. In this paper, the ‘Greek Learner Corpus II’ (GLCII) -the largest online greek learner corpus will serve as a basis to shed light on two commonly met types of ambiguity in error annotation that are closely related to target languages in which syncretism is ubiquitous in grammar (e.g., Greek and Romanian): a classification level and an identification level ambiguity.</p></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":\"2 3\",\"pages\":\"Article 100035\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799122000193\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799122000193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Classification and identification level ambiguity in error annotation
The vast majority of corpus annotation projects goes through a piloting phase in which the annotation scheme is gradually shaped through iterative annotation cycles until its final version is produced and applied to the collected data. The differences in annotators’ choices are usually recorded and reflected by the ‘Inter-annotator Agreement’ (IAA) that serves as a proxy to understand and resolve the raised issues. However, little has been reported on how to formulate a systematic approach to: (i) tracing the source of the differences in the annotators’ choices and (ii) provide attainable solutions that would considerably increase IAA. In this paper, the ‘Greek Learner Corpus II’ (GLCII) -the largest online greek learner corpus will serve as a basis to shed light on two commonly met types of ambiguity in error annotation that are closely related to target languages in which syncretism is ubiquitous in grammar (e.g., Greek and Romanian): a classification level and an identification level ambiguity.