Translating a low-resource language using GPT-3 and a human-readable dictionary

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 1900-01-01 DOI:10.18653/v1/2023.sigmorphon-1.2

M. Elsner, Jordan Needle

引用次数: 0

Abstract

We investigate how well words in the polysynthetic language Inuktitut can be translated by combining dictionary definitions, without use of a neural machine translation model trained on parallel text. Such a translation system would allow natural language technology to benefit from resources designed for community use in a language revitalization or education program, rather than requiring a separate parallel corpus. We show that the text-to-text generation capabilities of GPT-3 allow it to perform this task with BLEU scores of up to 18.5. We investigate prompting GPT-3 to provide multiple translations, which can help slightly, and providing it with grammar information, which is mostly ineffective. Finally, we test GPT-3’s ability to derive morpheme definitions from whole-word translations, but find this process is prone to errors including hallucinations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用GPT-3和人类可读的字典翻译低资源语言

我们研究了在不使用平行文本训练的神经机器翻译模型的情况下，通过结合字典定义来翻译多合成语言因纽特语中的单词的效果。这样的翻译系统将使自然语言技术受益于为语言振兴或教育计划的社区使用而设计的资源，而不是需要单独的并行语料库。我们证明GPT-3的文本到文本生成功能允许它以高达18.5的BLEU分数执行此任务。我们研究了提示GPT-3提供多种翻译，这可以略微有所帮助，并提供语法信息，这通常是无效的。最后，我们测试了GPT-3从全词翻译中推导语素定义的能力，但发现这一过程容易出现包括幻觉在内的错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Special Interest Group on Computational Morphology and Phonology Workshop

自引率

0.00%

发文量