English–Georgian Parallel Corpus and Its Application in Georgian Lexicography

IF 0.9 2区 文学 0 LANGUAGE & LINGUISTICS Lexikos Pub Date : 2022-01-01 DOI:10.5788/32-2-1701
T. Margalitadze, G. Meladze, Z. Pourtskhvanidze
{"title":"English–Georgian Parallel Corpus and Its Application in Georgian Lexicography","authors":"T. Margalitadze, G. Meladze, Z. Pourtskhvanidze","doi":"10.5788/32-2-1701","DOIUrl":null,"url":null,"abstract":"The Georgian language, the official language of Georgia, is the only written member of the Kartvelian language family, the indigenous language family of the Caucasus region. Georgian philology and lexicography have long-standing tradition, English–Georgian lexicography being no exception. Given the increasing use of ample electronic text corpora for lexicographical purposes, the team of Georgian lexicographers, working on the Comprehensive English–Georgian Dictionary (CEGD), subsequently the Comprehensive English–Georgian Online Dictionary (CEGOD), decided to compile an English–Georgian Parallel Corpus (EGPC). The aim of the project was to develop the methodology of building a parallel corpus of Georgian and assess its efficiency for Georgian bilingual lexi­cog­raphy. The work on the corpus is going on for over a decade. The ultimate aim is to create a standard for Georgian bilingual corpora that will be compiled in future. The article describes the content and composition of the EGPC, its structure, functionalities, search engines and so on. The article also deals with various studies conducted over years in order to assess and enhance the value, applicability and efficiency of the EGPC for the automatic or semi-auto­matic recognition, tagging and extraction of terminology, the compilation of terminological entries, as well as the entries for the English–Georgian Dictionary and those for the Georgian–English Learner's Dictionary, etc. Particular emphasis is laid upon the actual or potential applicability of the corpus for the lexi­cographical activities and for the machine translation projects. The findings of the study may be interesting for other under-resourced languages like Georgian. Keywords: parallel corpus, terminological entries, English–Georgian dictionary, Georgian–English dictionary","PeriodicalId":43907,"journal":{"name":"Lexikos","volume":"1 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lexikos","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.5788/32-2-1701","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

The Georgian language, the official language of Georgia, is the only written member of the Kartvelian language family, the indigenous language family of the Caucasus region. Georgian philology and lexicography have long-standing tradition, English–Georgian lexicography being no exception. Given the increasing use of ample electronic text corpora for lexicographical purposes, the team of Georgian lexicographers, working on the Comprehensive English–Georgian Dictionary (CEGD), subsequently the Comprehensive English–Georgian Online Dictionary (CEGOD), decided to compile an English–Georgian Parallel Corpus (EGPC). The aim of the project was to develop the methodology of building a parallel corpus of Georgian and assess its efficiency for Georgian bilingual lexi­cog­raphy. The work on the corpus is going on for over a decade. The ultimate aim is to create a standard for Georgian bilingual corpora that will be compiled in future. The article describes the content and composition of the EGPC, its structure, functionalities, search engines and so on. The article also deals with various studies conducted over years in order to assess and enhance the value, applicability and efficiency of the EGPC for the automatic or semi-auto­matic recognition, tagging and extraction of terminology, the compilation of terminological entries, as well as the entries for the English–Georgian Dictionary and those for the Georgian–English Learner's Dictionary, etc. Particular emphasis is laid upon the actual or potential applicability of the corpus for the lexi­cographical activities and for the machine translation projects. The findings of the study may be interesting for other under-resourced languages like Georgian. Keywords: parallel corpus, terminological entries, English–Georgian dictionary, Georgian–English dictionary
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
英汉-格鲁吉亚平行语料库及其在格鲁吉亚词典编纂中的应用
格鲁吉亚语是格鲁吉亚的官方语言,是高加索地区土著语系卡特维利亚语族中唯一的书面语言。格鲁吉亚语文学和词典编纂有着悠久的传统,英语-格鲁吉亚词典编纂也不例外。鉴于越来越多的电子文本语料库用于词典编纂目的,格鲁吉亚词典编纂者团队正在编写《综合英语-格鲁吉亚语词典》(CEGD),随后是《综合英语-格鲁吉亚语在线词典》(CEGOD),决定编写《英语-格鲁吉亚语平行语料库》(EGPC)。该项目的目的是发展建立格鲁吉亚语平行语料库的方法,并评估其对格鲁吉亚语双语词汇编纂的效率。语料库的工作已经进行了十多年。最终目标是为将来编写的格鲁吉亚双语语料库建立一个标准。本文介绍了EGPC的内容、组成、结构、功能、搜索引擎等。文章还讨论了多年来为评估和提高《英语-格鲁吉亚语词典》在自动或半自动识别、术语标注和提取、术语条目的编纂以及《英汉-格鲁吉亚语学习词典》词条等方面的价值、适用性和效率而进行的各项研究。特别强调的是语料库在词典编纂活动和机器翻译项目中的实际或潜在的适用性。这项研究的结果对其他资源不足的语言,如格鲁吉亚语,可能会很有趣。关键词:平行语料库,词条,英格词典,格英词典
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Lexikos
Lexikos Multiple-
CiteScore
1.00
自引率
25.00%
发文量
15
审稿时长
7 weeks
期刊介绍: Lexikos (Greek for "of or for words") is a journal for the lexicographical specialist. It is the only journal in Africa which is exclusively devoted to lexicography. Articles dealing with all aspects of lexicography and terminology or the implications that research in related disciplines such as linguistics, computer and information science, etc. has for lexicography will be considered for publication. Articles may be written in Afrikaans, English, Dutch, German and French.
期刊最新文献
The Operative Function in Spanish Lexicography Exemplified through Sport Dictionaries and Other Reference Works Synonymy from a Prototype Theory Perspective and its Symbiosis with Polysemy: Towards a New Dictionary of Synonyms African Englishes in the Oxford English Dictionary Heming Yong and Jing Peng. A Sociolinguistic History of British English Lexicography. A New English–Serbian Dictionary of Sports Terms in the Light of Contemporary Challenges
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1