{"title":"融合多模态注意和细粒度图像特征的多模态机器翻译增强","authors":"Lin Li, Turghun Tayir","doi":"10.1109/MIPR51284.2021.00050","DOIUrl":null,"url":null,"abstract":"With recent development of the multimodal machine translation (MMT) network architectures, recurrent models have effectively been replaced by attention mechanism and the translation results have been enhanced with the assistance of fine-grained image information. Although attention is a powerful and ubiquitous mechanism, different number of attention heads and granularity image features aligned by attention have an impact on the quality of multimodal machine translation. In order to address above problems, this paper proposes a multimodal machine translation enhancement by fusing multimodal-attention and fine-grained image features method which builds some submodels by introducing different granularity of image features to the multimodal-attention mechanism with different number of heads. Moreover, these sub-models are randomly fused and fusion models are obtained. The experimental results on the Multi30k dataset that the pruned attention heads lead to the improvement of translation results. Finally, our fusion model obtained the best results according to the automatic evaluation metrics BLEU compared with sub-models and some baselines.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multimodal Machine Translation Enhancement by Fusing Multimodal-attention and Fine-grained Image Features\",\"authors\":\"Lin Li, Turghun Tayir\",\"doi\":\"10.1109/MIPR51284.2021.00050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With recent development of the multimodal machine translation (MMT) network architectures, recurrent models have effectively been replaced by attention mechanism and the translation results have been enhanced with the assistance of fine-grained image information. Although attention is a powerful and ubiquitous mechanism, different number of attention heads and granularity image features aligned by attention have an impact on the quality of multimodal machine translation. In order to address above problems, this paper proposes a multimodal machine translation enhancement by fusing multimodal-attention and fine-grained image features method which builds some submodels by introducing different granularity of image features to the multimodal-attention mechanism with different number of heads. Moreover, these sub-models are randomly fused and fusion models are obtained. The experimental results on the Multi30k dataset that the pruned attention heads lead to the improvement of translation results. Finally, our fusion model obtained the best results according to the automatic evaluation metrics BLEU compared with sub-models and some baselines.\",\"PeriodicalId\":139543,\"journal\":{\"name\":\"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MIPR51284.2021.00050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR51284.2021.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multimodal Machine Translation Enhancement by Fusing Multimodal-attention and Fine-grained Image Features
With recent development of the multimodal machine translation (MMT) network architectures, recurrent models have effectively been replaced by attention mechanism and the translation results have been enhanced with the assistance of fine-grained image information. Although attention is a powerful and ubiquitous mechanism, different number of attention heads and granularity image features aligned by attention have an impact on the quality of multimodal machine translation. In order to address above problems, this paper proposes a multimodal machine translation enhancement by fusing multimodal-attention and fine-grained image features method which builds some submodels by introducing different granularity of image features to the multimodal-attention mechanism with different number of heads. Moreover, these sub-models are randomly fused and fusion models are obtained. The experimental results on the Multi30k dataset that the pruned attention heads lead to the improvement of translation results. Finally, our fusion model obtained the best results according to the automatic evaluation metrics BLEU compared with sub-models and some baselines.