古希腊语的词源化

IF 0.5 0 LANGUAGE & LINGUISTICS Journal of Greek Linguistics Pub Date : 2020-11-12 DOI:10.1163/15699846-02002001

A. Vatri, Barbara McGillivray

{"title":"古希腊语的词源化","authors":"A. Vatri, Barbara McGillivray","doi":"10.1163/15699846-02002001","DOIUrl":null,"url":null,"abstract":"\n This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.","PeriodicalId":42386,"journal":{"name":"Journal of Greek Linguistics","volume":"20 1","pages":"179-196"},"PeriodicalIF":0.5000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Lemmatization for Ancient Greek\",\"authors\":\"A. Vatri, Barbara McGillivray\",\"doi\":\"10.1163/15699846-02002001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.\",\"PeriodicalId\":42386,\"journal\":{\"name\":\"Journal of Greek Linguistics\",\"volume\":\"20 1\",\"pages\":\"179-196\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2020-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Greek Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1163/15699846-02002001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Greek Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1163/15699846-02002001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 1

摘要

本文介绍了目前可用的古希腊词法和最近出版的词法语料库的准确性测试结果。我们进行了一项盲法实验，让三位精通古希腊语的读者评估CLTK词法归纳器、CLTK退退词法归纳器和GLEM的输出，以及由Diorisis语料库和词法归纳的古希腊语文本库提供的词法归纳。这个实验选择的文本是荷马，伊利亚特1.1-279和吕西亚斯7。结果表明，使用大型词汇库和词性标注的词汇化方法——比如Diorisis语料库和CLTK backoff词汇化器所使用的词汇化方法——比更依赖于机器学习和使用较小词汇库的方法更可靠。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Lemmatization for Ancient Greek

This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊