{"title":"An empirical study of a novel multimodal dataset for low-resource machine translation","authors":"Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay","doi":"10.1007/s10115-024-02087-6","DOIUrl":null,"url":null,"abstract":"<p>Cues from multiple modalities have been successfully applied in several fields of natural language processing including machine translation (MT). However, the application of multimodal cues in low-resource MT (LRMT) is still an open research problem. The main challenge of LRMT is the lack of abundant parallel data which makes it difficult to build MT systems for a reasonable output. Using multimodal cues can provide additional context and information that can help to mitigate this challenge. To address this challenge, we present a multimodal machine translation (MMT) dataset of low-resource languages. The dataset consists of images, audio and corresponding parallel text for a low-resource language pair that is Manipuri–English. The text dataset is collected from the news articles of local daily newspapers and subsequently translated into the target language by translators of the native speakers. The audio version by native speakers for the Manipuri text is recorded for the experiments. The study also investigates whether the correlated audio-visual cues enhance the performance of the machine translation system. Several experiments are conducted for a systematic evaluation of the effectiveness utilizing multiple modalities. With the help of automatic metrics and human evaluation, a detailed analysis of the MT systems trained with text-only and multimodal inputs is carried out. Experimental results attest that the MT systems in low-resource settings could be significantly improved up to +2.7 BLEU score by incorporating correlated modalities. The human evaluation reveals that the type of correlated auxiliary modality affects the adequacy and fluency performance in the MMT systems. Our results emphasize the potential of using cues from auxiliary modalities to enhance machine translation systems, particularly in situations with limited resources.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"3 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10115-024-02087-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Cues from multiple modalities have been successfully applied in several fields of natural language processing including machine translation (MT). However, the application of multimodal cues in low-resource MT (LRMT) is still an open research problem. The main challenge of LRMT is the lack of abundant parallel data which makes it difficult to build MT systems for a reasonable output. Using multimodal cues can provide additional context and information that can help to mitigate this challenge. To address this challenge, we present a multimodal machine translation (MMT) dataset of low-resource languages. The dataset consists of images, audio and corresponding parallel text for a low-resource language pair that is Manipuri–English. The text dataset is collected from the news articles of local daily newspapers and subsequently translated into the target language by translators of the native speakers. The audio version by native speakers for the Manipuri text is recorded for the experiments. The study also investigates whether the correlated audio-visual cues enhance the performance of the machine translation system. Several experiments are conducted for a systematic evaluation of the effectiveness utilizing multiple modalities. With the help of automatic metrics and human evaluation, a detailed analysis of the MT systems trained with text-only and multimodal inputs is carried out. Experimental results attest that the MT systems in low-resource settings could be significantly improved up to +2.7 BLEU score by incorporating correlated modalities. The human evaluation reveals that the type of correlated auxiliary modality affects the adequacy and fluency performance in the MMT systems. Our results emphasize the potential of using cues from auxiliary modalities to enhance machine translation systems, particularly in situations with limited resources.
期刊介绍:
Knowledge and Information Systems (KAIS) provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. This monthly peer-reviewed archival journal publishes state-of-the-art research reports on emerging topics in KAIS, reviews of important techniques in related areas, and application papers of interest to a general readership.