Automated annotation of parallel bible corpora with cross-lingual semantic concordance

IF 2.3 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Natural Language Engineering Pub Date : 2024-01-25 DOI:10.1017/s135132492300058x
Jens Dörpinghaus
{"title":"Automated annotation of parallel bible corpora with cross-lingual semantic concordance","authors":"Jens Dörpinghaus","doi":"10.1017/s135132492300058x","DOIUrl":null,"url":null,"abstract":"<p>Here we present an improved approach for automated annotation of New Testament corpora with cross-lingual semantic concordance based on Strong’s numbers. Based on already annotated texts, they provide references to the original Greek words. Since scientific editions and translations of biblical texts are often not available for scientific purposes and are rarely freely available, there is a lack of up-to-date training data. In addition, since annotation, curation, and quality control of alignments between these texts are expensive, there is a lack of available biblical resources for scholars. We present two improved approaches to the problem, based on dictionaries and already annotated biblical texts. We provide a detailed evaluation of annotated and unannotated translations. We also discuss a proof of concept based on English and German New Testament translations. The results presented in this paper are novel and, to our knowledge, unique. They show promising performance, although further research is needed.</p>","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"10 3 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/s135132492300058x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Here we present an improved approach for automated annotation of New Testament corpora with cross-lingual semantic concordance based on Strong’s numbers. Based on already annotated texts, they provide references to the original Greek words. Since scientific editions and translations of biblical texts are often not available for scientific purposes and are rarely freely available, there is a lack of up-to-date training data. In addition, since annotation, curation, and quality control of alignments between these texts are expensive, there is a lack of available biblical resources for scholars. We present two improved approaches to the problem, based on dictionaries and already annotated biblical texts. We provide a detailed evaluation of annotated and unannotated translations. We also discuss a proof of concept based on English and German New Testament translations. The results presented in this paper are novel and, to our knowledge, unique. They show promising performance, although further research is needed.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用跨语言语义对照自动注释平行圣经语料库
在此,我们介绍一种基于斯特朗数的新约语料库跨语言语义对照自动注释改进方法。在已注释文本的基础上,它们提供了希腊文原词的参考。由于圣经文本的科学版本和译本通常不用于科学目的,也很少免费提供,因此缺乏最新的训练数据。此外,由于这些文本之间的注释、整理和对齐质量控制费用高昂,因此学者们缺乏可用的圣经资源。针对这一问题,我们提出了两种基于词典和已注释圣经文本的改进方法。我们对有注释和无注释的翻译进行了详细评估。我们还讨论了基于英语和德语新约翻译的概念验证。据我们所知,本文介绍的结果是新颖独特的。尽管还需要进一步的研究,但它们显示出了良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Natural Language Engineering
Natural Language Engineering COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
5.90
自引率
12.00%
发文量
60
审稿时长
>12 weeks
期刊介绍: Natural Language Engineering meets the needs of professionals and researchers working in all areas of computerised language processing, whether from the perspective of theoretical or descriptive linguistics, lexicology, computer science or engineering. Its aim is to bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use. As well as publishing research articles on a broad range of topics - from text analysis, machine translation, information retrieval and speech analysis and generation to integrated systems and multi modal interfaces - it also publishes special issues on specific areas and technologies within these topics, an industry watch column and book reviews.
期刊最新文献
Start-up activity in the LLM ecosystem Anisotropic span embeddings and the negative impact of higher-order inference for coreference resolution: An empirical analysis Automated annotation of parallel bible corpora with cross-lingual semantic concordance How do control tokens affect natural language generation tasks like text simplification Emerging trends: When can users trust GPT, and when should they intervene?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1