{"title":"Advances in the automatic lemmatization of Old English: class V strong verbs (L-Y)","authors":"Roberto Torre Alonso","doi":"10.4995/rlyla.2022.16132","DOIUrl":null,"url":null,"abstract":"The grammatical description of Old English lacks complete and systematic lemmatization, which hinders Natural Language Processing studies in this language, as they strongly rely on the existence of large, annotated corpora. Moreover, the inflectional features of Old English preclude token-based automatic lemmatization. Therefore, specifically goal-oriented applications must be developed to account for the automatic lemmatization of specific variable categories. This article designs an automatic lemmatizer within the framework of Morphological Generation to address the type-based lemmatization of Old English class V strong verbs (L-Y). The lemmatizer is implemented with rules that account for inflectional, derivational and morphophonological variation. The generated forms are compared with the most relevant corpora of Old English for validation before being assigned a lemma. The lemmatizer is successful in supplying form-lemma associations not yet accounted for in the literature, and in identifying mismatches and areas for manual revision.","PeriodicalId":42090,"journal":{"name":"Revista de Linguistica y Lenguas Aplicadas","volume":null,"pages":null},"PeriodicalIF":0.3000,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista de Linguistica y Lenguas Aplicadas","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4995/rlyla.2022.16132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
The grammatical description of Old English lacks complete and systematic lemmatization, which hinders Natural Language Processing studies in this language, as they strongly rely on the existence of large, annotated corpora. Moreover, the inflectional features of Old English preclude token-based automatic lemmatization. Therefore, specifically goal-oriented applications must be developed to account for the automatic lemmatization of specific variable categories. This article designs an automatic lemmatizer within the framework of Morphological Generation to address the type-based lemmatization of Old English class V strong verbs (L-Y). The lemmatizer is implemented with rules that account for inflectional, derivational and morphophonological variation. The generated forms are compared with the most relevant corpora of Old English for validation before being assigned a lemma. The lemmatizer is successful in supplying form-lemma associations not yet accounted for in the literature, and in identifying mismatches and areas for manual revision.
期刊介绍:
The Revista de Lingüística y Lenguas Aplicadas aims to contribute to thedissemination of scholarly research in the field of language study, especially thatof specialised languages. Whether from a theoretical or a practical perspective,contributions discussing any of the following areas are of particular interest: Discourse Analysis Language Teaching Terminology and Translation Languages for Specific Purposes (LSP) Computer-Assisted Language Learning (CALL) Its a peer-review yearly journal of linguistic studies, designed to target an international readership and to contribute to the promotion of knowledge regarding applied linguistics.