历史研究中的机器翻译:以阿拉姆语-古希伯来语翻译为例

IF 2.2 3区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS ACM Journal on Computing and Cultural Heritage Pub Date : 2023-10-16 DOI:10.1145/3627168

Chaya Liebeskind, Shmuel Liebeskind, Dan Bouhnik

{"title":"历史研究中的机器翻译:以阿拉姆语-古希伯来语翻译为例","authors":"Chaya Liebeskind, Shmuel Liebeskind, Dan Bouhnik","doi":"10.1145/3627168","DOIUrl":null,"url":null,"abstract":"In this article, by the ability to translate Aramaic to another spoken languages, we investigated Machine Translation (MT) in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an endangered language). First, we detailed the construction of a publicly available Biblical parallel Aramaic-Hebrew corpus based on two ancient (early 2 nd - late 4 th century) Hebrew–Aramaic translations: Targum Onkelus and Targum Jonathan. Then using the Statistical Machine Translation (SMT) approach, which in our use-case significantly outperforms the Neural Machine Translation (NMT), we validated the excepted high quality of the translations. The trained model failed to translate Aramaic texts of other dialects. However, when we trained the same SMT model on another Aramaic-Hebrew corpus of a different dialect (Zohar - 13 th century) a very high translation score was achieved. We examined an additional important cultural heritage source of Aramaic texts, the Babylonian Talmud (early 3 rd - late 5 th century). Since we do not have a parallel Aramaic-Hebrew corpus of the Talmud, we used the model trained on the Bible corpus for translation. We performed an analysis of the results and suggest some potential promising future research.","PeriodicalId":54310,"journal":{"name":"ACM Journal on Computing and Cultural Heritage","volume":"18 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Translation for Historical Research: A case study of Aramaic-Ancient Hebrew Translations\",\"authors\":\"Chaya Liebeskind, Shmuel Liebeskind, Dan Bouhnik\",\"doi\":\"10.1145/3627168\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, by the ability to translate Aramaic to another spoken languages, we investigated Machine Translation (MT) in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an endangered language). First, we detailed the construction of a publicly available Biblical parallel Aramaic-Hebrew corpus based on two ancient (early 2 nd - late 4 th century) Hebrew–Aramaic translations: Targum Onkelus and Targum Jonathan. Then using the Statistical Machine Translation (SMT) approach, which in our use-case significantly outperforms the Neural Machine Translation (NMT), we validated the excepted high quality of the translations. The trained model failed to translate Aramaic texts of other dialects. However, when we trained the same SMT model on another Aramaic-Hebrew corpus of a different dialect (Zohar - 13 th century) a very high translation score was achieved. We examined an additional important cultural heritage source of Aramaic texts, the Babylonian Talmud (early 3 rd - late 5 th century). Since we do not have a parallel Aramaic-Hebrew corpus of the Talmud, we used the model trained on the Bible corpus for translation. We performed an analysis of the results and suggest some potential promising future research.\",\"PeriodicalId\":54310,\"journal\":{\"name\":\"ACM Journal on Computing and Cultural Heritage\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal on Computing and Cultural Heritage\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3627168\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Computing and Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3627168","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，通过将阿拉姆语翻译成另一种口语的能力，我们研究了机器翻译(MT)在文化遗产领域的两个主要目的:评估古代翻译的质量和保护阿拉姆语(一种濒危语言)。首先，我们详细介绍了基于两个古代(2世纪早期- 4世纪晚期)希伯来语-阿拉姆语翻译的公开可用的平行圣经亚拉姆语-希伯来语语料库的构建:Targum Onkelus和Targum Jonathan。然后使用统计机器翻译(SMT)方法，它在我们的用例中显著优于神经机器翻译(NMT)，我们验证了翻译的高质量。经过训练的模型无法翻译其他方言的阿拉姆语文本。然而，当我们在另一个不同方言的阿拉姆语-希伯来语语料库(Zohar - 13世纪)上训练相同的SMT模型时，获得了非常高的翻译分数。我们研究了另一个重要的阿拉姆文本文化遗产来源，巴比伦塔木德(3世纪早期- 5世纪晚期)。由于我们没有平行的《塔木德》的阿拉姆语-希伯来语语料库，我们使用在《圣经》语料库上训练的模型进行翻译。我们对结果进行了分析，并提出了一些潜在的有前途的未来研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Machine Translation for Historical Research: A case study of Aramaic-Ancient Hebrew Translations

In this article, by the ability to translate Aramaic to another spoken languages, we investigated Machine Translation (MT) in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an endangered language). First, we detailed the construction of a publicly available Biblical parallel Aramaic-Hebrew corpus based on two ancient (early 2 nd - late 4 th century) Hebrew–Aramaic translations: Targum Onkelus and Targum Jonathan. Then using the Statistical Machine Translation (SMT) approach, which in our use-case significantly outperforms the Neural Machine Translation (NMT), we validated the excepted high quality of the translations. The trained model failed to translate Aramaic texts of other dialects. However, when we trained the same SMT model on another Aramaic-Hebrew corpus of a different dialect (Zohar - 13 th century) a very high translation score was achieved. We examined an additional important cultural heritage source of Aramaic texts, the Babylonian Talmud (early 3 rd - late 5 th century). Since we do not have a parallel Aramaic-Hebrew corpus of the Talmud, we used the model trained on the Bible corpus for translation. We performed an analysis of the results and suggest some potential promising future research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Journal on Computing and Cultural Heritage Arts and Humanities-Conservation

CiteScore

4.60

自引率

8.30%

发文量

期刊介绍： ACM Journal on Computing and Cultural Heritage (JOCCH) publishes papers of significant and lasting value in all areas relating to the use of information and communication technologies (ICT) in support of Cultural Heritage. The journal encourages the submission of manuscripts that demonstrate innovative use of technology for the discovery, analysis, interpretation and presentation of cultural material, as well as manuscripts that illustrate applications in the Cultural Heritage sector that challenge the computational technologies and suggest new research opportunities in computer science.