An empirical study of a novel multimodal dataset for low-resource machine translation

IF 2.5 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge and Information Systems Pub Date : 2024-07-29 DOI:10.1007/s10115-024-02087-6
Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay
{"title":"An empirical study of a novel multimodal dataset for low-resource machine translation","authors":"Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay","doi":"10.1007/s10115-024-02087-6","DOIUrl":null,"url":null,"abstract":"<p>Cues from multiple modalities have been successfully applied in several fields of natural language processing including machine translation (MT). However, the application of multimodal cues in low-resource MT (LRMT) is still an open research problem. The main challenge of LRMT is the lack of abundant parallel data which makes it difficult to build MT systems for a reasonable output. Using multimodal cues can provide additional context and information that can help to mitigate this challenge. To address this challenge, we present a multimodal machine translation (MMT) dataset of low-resource languages. The dataset consists of images, audio and corresponding parallel text for a low-resource language pair that is Manipuri–English. The text dataset is collected from the news articles of local daily newspapers and subsequently translated into the target language by translators of the native speakers. The audio version by native speakers for the Manipuri text is recorded for the experiments. The study also investigates whether the correlated audio-visual cues enhance the performance of the machine translation system. Several experiments are conducted for a systematic evaluation of the effectiveness utilizing multiple modalities. With the help of automatic metrics and human evaluation, a detailed analysis of the MT systems trained with text-only and multimodal inputs is carried out. Experimental results attest that the MT systems in low-resource settings could be significantly improved up to +2.7 BLEU score by incorporating correlated modalities. The human evaluation reveals that the type of correlated auxiliary modality affects the adequacy and fluency performance in the MMT systems. Our results emphasize the potential of using cues from auxiliary modalities to enhance machine translation systems, particularly in situations with limited resources.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"3 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10115-024-02087-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Cues from multiple modalities have been successfully applied in several fields of natural language processing including machine translation (MT). However, the application of multimodal cues in low-resource MT (LRMT) is still an open research problem. The main challenge of LRMT is the lack of abundant parallel data which makes it difficult to build MT systems for a reasonable output. Using multimodal cues can provide additional context and information that can help to mitigate this challenge. To address this challenge, we present a multimodal machine translation (MMT) dataset of low-resource languages. The dataset consists of images, audio and corresponding parallel text for a low-resource language pair that is Manipuri–English. The text dataset is collected from the news articles of local daily newspapers and subsequently translated into the target language by translators of the native speakers. The audio version by native speakers for the Manipuri text is recorded for the experiments. The study also investigates whether the correlated audio-visual cues enhance the performance of the machine translation system. Several experiments are conducted for a systematic evaluation of the effectiveness utilizing multiple modalities. With the help of automatic metrics and human evaluation, a detailed analysis of the MT systems trained with text-only and multimodal inputs is carried out. Experimental results attest that the MT systems in low-resource settings could be significantly improved up to +2.7 BLEU score by incorporating correlated modalities. The human evaluation reveals that the type of correlated auxiliary modality affects the adequacy and fluency performance in the MMT systems. Our results emphasize the potential of using cues from auxiliary modalities to enhance machine translation systems, particularly in situations with limited resources.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于低资源机器翻译的新型多模态数据集实证研究
多模态线索已成功应用于包括机器翻译(MT)在内的多个自然语言处理领域。然而,多模态线索在低资源 MT(LRMT)中的应用仍是一个有待解决的研究问题。低资源 MT 面临的主要挑战是缺乏丰富的并行数据,因此很难建立 MT 系统以获得合理的输出。使用多模态线索可以提供额外的语境和信息,有助于缓解这一难题。为了应对这一挑战,我们提出了一个低资源语言的多模态机器翻译(MMT)数据集。该数据集由图像、音频和相应的平行文本组成,适用于低资源语言对(曼尼普尔语-英语)。文本数据集收集自当地日报的新闻报道,随后由母语译者翻译成目标语言。实验还录制了母语为曼尼普尔语文本的音频版本。本研究还调查了相关视听线索是否能提高机器翻译系统的性能。为了系统地评估利用多种模式的效果,我们进行了多项实验。在自动度量和人工评估的帮助下,对使用纯文本和多模态输入训练的 MT 系统进行了详细分析。实验结果证明,在低资源环境下,通过采用相关模态,MT 系统的 BLEU 得分可显著提高至 +2.7 分。人工评估显示,相关辅助模态的类型会影响 MMT 系统的充分性和流畅性。我们的研究结果强调了使用辅助模态线索来增强机器翻译系统的潜力,尤其是在资源有限的情况下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Knowledge and Information Systems
Knowledge and Information Systems 工程技术-计算机:人工智能
CiteScore
5.70
自引率
7.40%
发文量
152
审稿时长
7.2 months
期刊介绍: Knowledge and Information Systems (KAIS) provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. This monthly peer-reviewed archival journal publishes state-of-the-art research reports on emerging topics in KAIS, reviews of important techniques in related areas, and application papers of interest to a general readership.
期刊最新文献
Dynamic evolution of causal relationships among cryptocurrencies: an analysis via Bayesian networks Deep multi-semantic fuzzy K-means with adaptive weight adjustment Class incremental named entity recognition without forgetting Spectral clustering with scale fairness constraints Supervised kernel-based multi-modal Bhattacharya distance learning for imbalanced data classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1