An empirical study of a novel multimodal dataset for low-resource machine translation

IF 2.5 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge and Information Systems Pub Date : 2024-07-29 DOI:10.1007/s10115-024-02087-6

Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay

{"title":"An empirical study of a novel multimodal dataset for low-resource machine translation","authors":"Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay","doi":"10.1007/s10115-024-02087-6","DOIUrl":null,"url":null,"abstract":"<p>Cues from multiple modalities have been successfully applied in several fields of natural language processing including machine translation (MT). However, the application of multimodal cues in low-resource MT (LRMT) is still an open research problem. The main challenge of LRMT is the lack of abundant parallel data which makes it difficult to build MT systems for a reasonable output. Using multimodal cues can provide additional context and information that can help to mitigate this challenge. To address this challenge, we present a multimodal machine translation (MMT) dataset of low-resource languages. The dataset consists of images, audio and corresponding parallel text for a low-resource language pair that is Manipuri–English. The text dataset is collected from the news articles of local daily newspapers and subsequently translated into the target language by translators of the native speakers. The audio version by native speakers for the Manipuri text is recorded for the experiments. The study also investigates whether the correlated audio-visual cues enhance the performance of the machine translation system. Several experiments are conducted for a systematic evaluation of the effectiveness utilizing multiple modalities. With the help of automatic metrics and human evaluation, a detailed analysis of the MT systems trained with text-only and multimodal inputs is carried out. Experimental results attest that the MT systems in low-resource settings could be significantly improved up to +2.7 BLEU score by incorporating correlated modalities. The human evaluation reveals that the type of correlated auxiliary modality affects the adequacy and fluency performance in the MMT systems. Our results emphasize the potential of using cues from auxiliary modalities to enhance machine translation systems, particularly in situations with limited resources.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"3 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10115-024-02087-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Cues from multiple modalities have been successfully applied in several fields of natural language processing including machine translation (MT). However, the application of multimodal cues in low-resource MT (LRMT) is still an open research problem. The main challenge of LRMT is the lack of abundant parallel data which makes it difficult to build MT systems for a reasonable output. Using multimodal cues can provide additional context and information that can help to mitigate this challenge. To address this challenge, we present a multimodal machine translation (MMT) dataset of low-resource languages. The dataset consists of images, audio and corresponding parallel text for a low-resource language pair that is Manipuri–English. The text dataset is collected from the news articles of local daily newspapers and subsequently translated into the target language by translators of the native speakers. The audio version by native speakers for the Manipuri text is recorded for the experiments. The study also investigates whether the correlated audio-visual cues enhance the performance of the machine translation system. Several experiments are conducted for a systematic evaluation of the effectiveness utilizing multiple modalities. With the help of automatic metrics and human evaluation, a detailed analysis of the MT systems trained with text-only and multimodal inputs is carried out. Experimental results attest that the MT systems in low-resource settings could be significantly improved up to +2.7 BLEU score by incorporating correlated modalities. The human evaluation reveals that the type of correlated auxiliary modality affects the adequacy and fluency performance in the MMT systems. Our results emphasize the potential of using cues from auxiliary modalities to enhance machine translation systems, particularly in situations with limited resources.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于低资源机器翻译的新型多模态数据集实证研究

多模态线索已成功应用于包括机器翻译（MT）在内的多个自然语言处理领域。然而，多模态线索在低资源 MT（LRMT）中的应用仍是一个有待解决的研究问题。低资源 MT 面临的主要挑战是缺乏丰富的并行数据，因此很难建立 MT 系统以获得合理的输出。使用多模态线索可以提供额外的语境和信息，有助于缓解这一难题。为了应对这一挑战，我们提出了一个低资源语言的多模态机器翻译（MMT）数据集。该数据集由图像、音频和相应的平行文本组成，适用于低资源语言对（曼尼普尔语-英语）。文本数据集收集自当地日报的新闻报道，随后由母语译者翻译成目标语言。实验还录制了母语为曼尼普尔语文本的音频版本。本研究还调查了相关视听线索是否能提高机器翻译系统的性能。为了系统地评估利用多种模式的效果，我们进行了多项实验。在自动度量和人工评估的帮助下，对使用纯文本和多模态输入训练的 MT 系统进行了详细分析。实验结果证明，在低资源环境下，通过采用相关模态，MT 系统的 BLEU 得分可显著提高至 +2.7 分。人工评估显示，相关辅助模态的类型会影响 MMT 系统的充分性和流畅性。我们的研究结果强调了使用辅助模态线索来增强机器翻译系统的潜力，尤其是在资源有限的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Knowledge and Information Systems 工程技术-计算机：人工智能

CiteScore

5.70

自引率

7.40%

发文量

152

审稿时长

7.2 months

期刊介绍： Knowledge and Information Systems (KAIS) provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. This monthly peer-reviewed archival journal publishes state-of-the-art research reports on emerging topics in KAIS, reviews of important techniques in related areas, and application papers of interest to a general readership.