基于图像的多模态神经机器翻译的误差分析。

IF 2.1 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE MACHINE TRANSLATION Pub Date : 2019-01-01 Epub Date: 2019-04-08 DOI:10.1007/s10590-019-09226-9
Iacer Calixto, Qun Liu
{"title":"基于图像的多模态神经机器翻译的误差分析。","authors":"Iacer Calixto,&nbsp;Qun Liu","doi":"10.1007/s10590-019-09226-9","DOIUrl":null,"url":null,"abstract":"<p><p>In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use <i>global</i> and <i>local</i> image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both <i>visual and non-visual terms</i>. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.</p>","PeriodicalId":44400,"journal":{"name":"MACHINE TRANSLATION","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10590-019-09226-9","citationCount":"9","resultStr":"{\"title\":\"An error analysis for image-based multi-modal neural machine translation.\",\"authors\":\"Iacer Calixto,&nbsp;Qun Liu\",\"doi\":\"10.1007/s10590-019-09226-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use <i>global</i> and <i>local</i> image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both <i>visual and non-visual terms</i>. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.</p>\",\"PeriodicalId\":44400,\"journal\":{\"name\":\"MACHINE TRANSLATION\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s10590-019-09226-9\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MACHINE TRANSLATION\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10590-019-09226-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/4/8 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MACHINE TRANSLATION","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10590-019-09226-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/4/8 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 9

摘要

在这篇文章中,我们对不同的多模态神经机器翻译(MNMT)模型进行了广泛的定量误差分析,这些模型将视觉特征集成到编码器和解码器的不同部分。我们研究了在具有图像的平行句子对的域内训练数据集上训练模型的场景。我们分析了两种不同类型的MNMT模型,它们使用全局和局部图像特征:后者对图像进行全局编码,即有一个特征向量表示整个图像,而前者对空间信息进行编码,即存在多个特征向量,每个特征向量对图像的不同部分进行编码。我们对不同的MNMT模型以及纯文本基线生成的翻译进行了错误分析,研究了在翻译视觉和非视觉术语时多模态模型的比较。总的来说,我们发现额外的多模态信号持续地改善翻译,当使用使用全局视觉特征的更简单的MNMT模型时更是如此。我们还发现,在使用多模态模型时,不仅具有强烈视觉内涵的术语的翻译得到了改善,而且几乎所有类型的错误都减少了。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An error analysis for image-based multi-modal neural machine translation.

In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
MACHINE TRANSLATION
MACHINE TRANSLATION COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
自引率
5.30%
发文量
1
期刊介绍: Machine Translation covers all branches of computational linguistics and language engineering, wherever they incorporate a multilingual aspect. It features papers that cover the theoretical, descriptive or computational aspects of any of the following topics: •machine translation and machine-aided translation •human translation theory and practice •multilingual text composition and generation •multilingual information retrieval •multilingual natural language interfaces •multilingual dialogue systems •multilingual message understanding systems
期刊最新文献
Introduction Machine Translation: 18th China Conference, CCMT 2022, Lhasa, China, August 6–10, 2022, Revised Selected Papers Joint source–target encoding with pervasive attention Investigating the roles of sentiment in machine translation Augmenting training data with syntactic phrasal-segments in low-resource neural machine translation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1