{"title":"古希腊语的词源化","authors":"A. Vatri, Barbara McGillivray","doi":"10.1163/15699846-02002001","DOIUrl":null,"url":null,"abstract":"\n This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.","PeriodicalId":42386,"journal":{"name":"Journal of Greek Linguistics","volume":"20 1","pages":"179-196"},"PeriodicalIF":0.5000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Lemmatization for Ancient Greek\",\"authors\":\"A. Vatri, Barbara McGillivray\",\"doi\":\"10.1163/15699846-02002001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.\",\"PeriodicalId\":42386,\"journal\":{\"name\":\"Journal of Greek Linguistics\",\"volume\":\"20 1\",\"pages\":\"179-196\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2020-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Greek Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1163/15699846-02002001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Greek Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1163/15699846-02002001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.