基于图像的多模态神经机器翻译的误差分析。

IF 2.1 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE MACHINE TRANSLATION Pub Date : 2019-01-01 Epub Date: 2019-04-08 DOI:10.1007/s10590-019-09226-9

Iacer Calixto, Qun Liu

{"title":"基于图像的多模态神经机器翻译的误差分析。","authors":"Iacer Calixto, Qun Liu","doi":"10.1007/s10590-019-09226-9","DOIUrl":null,"url":null,"abstract":"In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.","PeriodicalId":44400,"journal":{"name":"MACHINE TRANSLATION","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10590-019-09226-9","citationCount":"9","resultStr":"{\"title\":\"An error analysis for image-based multi-modal neural machine translation.\",\"authors\":\"Iacer Calixto, Qun Liu\",\"doi\":\"10.1007/s10590-019-09226-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.\",\"PeriodicalId\":44400,\"journal\":{\"name\":\"MACHINE TRANSLATION\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s10590-019-09226-9\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MACHINE TRANSLATION\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10590-019-09226-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/4/8 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MACHINE TRANSLATION","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10590-019-09226-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/4/8 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 9

摘要

在这篇文章中，我们对不同的多模态神经机器翻译（MNMT）模型进行了广泛的定量误差分析，这些模型将视觉特征集成到编码器和解码器的不同部分。我们研究了在具有图像的平行句子对的域内训练数据集上训练模型的场景。我们分析了两种不同类型的MNMT模型，它们使用全局和局部图像特征：后者对图像进行全局编码，即有一个特征向量表示整个图像，而前者对空间信息进行编码，即存在多个特征向量，每个特征向量对图像的不同部分进行编码。我们对不同的MNMT模型以及纯文本基线生成的翻译进行了错误分析，研究了在翻译视觉和非视觉术语时多模态模型的比较。总的来说，我们发现额外的多模态信号持续地改善翻译，当使用使用全局视觉特征的更简单的MNMT模型时更是如此。我们还发现，在使用多模态模型时，不仅具有强烈视觉内涵的术语的翻译得到了改善，而且几乎所有类型的错误都减少了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An error analysis for image-based multi-modal neural machine translation.

In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

MACHINE TRANSLATION COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

自引率

5.30%

发文量

期刊介绍： Machine Translation covers all branches of computational linguistics and language engineering, wherever they incorporate a multilingual aspect. It features papers that cover the theoretical, descriptive or computational aspects of any of the following topics: •machine translation and machine-aided translation •human translation theory and practice •multilingual text composition and generation •multilingual information retrieval •multilingual natural language interfaces •multilingual dialogue systems •multilingual message understanding systems

期刊最新文献

Introduction Machine Translation: 18th China Conference, CCMT 2022, Lhasa, China, August 6–10, 2022, Revised Selected Papers Joint source–target encoding with pervasive attention Investigating the roles of sentiment in machine translation Augmenting training data with syntactic phrasal-segments in low-resource neural machine translation