{"title":"English–Georgian Parallel Corpus and Its Application in Georgian Lexicography","authors":"T. Margalitadze, G. Meladze, Z. Pourtskhvanidze","doi":"10.5788/32-2-1701","DOIUrl":null,"url":null,"abstract":"The Georgian language, the official language of Georgia, is the only written member of the Kartvelian language family, the indigenous language family of the Caucasus region. Georgian philology and lexicography have long-standing tradition, English–Georgian lexicography being no exception. Given the increasing use of ample electronic text corpora for lexicographical purposes, the team of Georgian lexicographers, working on the Comprehensive English–Georgian Dictionary (CEGD), subsequently the Comprehensive English–Georgian Online Dictionary (CEGOD), decided to compile an English–Georgian Parallel Corpus (EGPC). The aim of the project was to develop the methodology of building a parallel corpus of Georgian and assess its efficiency for Georgian bilingual lexicography. The work on the corpus is going on for over a decade. The ultimate aim is to create a standard for Georgian bilingual corpora that will be compiled in future. The article describes the content and composition of the EGPC, its structure, functionalities, search engines and so on. The article also deals with various studies conducted over years in order to assess and enhance the value, applicability and efficiency of the EGPC for the automatic or semi-automatic recognition, tagging and extraction of terminology, the compilation of terminological entries, as well as the entries for the English–Georgian Dictionary and those for the Georgian–English Learner's Dictionary, etc. Particular emphasis is laid upon the actual or potential applicability of the corpus for the lexicographical activities and for the machine translation projects. The findings of the study may be interesting for other under-resourced languages like Georgian. Keywords: parallel corpus, terminological entries, English–Georgian dictionary, Georgian–English dictionary","PeriodicalId":43907,"journal":{"name":"Lexikos","volume":"1 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lexikos","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.5788/32-2-1701","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The Georgian language, the official language of Georgia, is the only written member of the Kartvelian language family, the indigenous language family of the Caucasus region. Georgian philology and lexicography have long-standing tradition, English–Georgian lexicography being no exception. Given the increasing use of ample electronic text corpora for lexicographical purposes, the team of Georgian lexicographers, working on the Comprehensive English–Georgian Dictionary (CEGD), subsequently the Comprehensive English–Georgian Online Dictionary (CEGOD), decided to compile an English–Georgian Parallel Corpus (EGPC). The aim of the project was to develop the methodology of building a parallel corpus of Georgian and assess its efficiency for Georgian bilingual lexicography. The work on the corpus is going on for over a decade. The ultimate aim is to create a standard for Georgian bilingual corpora that will be compiled in future. The article describes the content and composition of the EGPC, its structure, functionalities, search engines and so on. The article also deals with various studies conducted over years in order to assess and enhance the value, applicability and efficiency of the EGPC for the automatic or semi-automatic recognition, tagging and extraction of terminology, the compilation of terminological entries, as well as the entries for the English–Georgian Dictionary and those for the Georgian–English Learner's Dictionary, etc. Particular emphasis is laid upon the actual or potential applicability of the corpus for the lexicographical activities and for the machine translation projects. The findings of the study may be interesting for other under-resourced languages like Georgian. Keywords: parallel corpus, terminological entries, English–Georgian dictionary, Georgian–English dictionary
期刊介绍:
Lexikos (Greek for "of or for words") is a journal for the lexicographical specialist. It is the only journal in Africa which is exclusively devoted to lexicography. Articles dealing with all aspects of lexicography and terminology or the implications that research in related disciplines such as linguistics, computer and information science, etc. has for lexicography will be considered for publication. Articles may be written in Afrikaans, English, Dutch, German and French.