Improvement of attention modules for image captioning using pixel-wise semantic information

International Conference on Digital Image Processing Pub Date : 2022-10-12 DOI:10.1117/12.2644743

Zhihao Chen, Keisuke Doman, Y. Mekada

引用次数: 0

Abstract

Although an attention mechanism is reasonable for generating image captions, how to obtain ideal image regions within the mechanism is a problem in practice due to the difficulty of its calculation between image and text data. In order to improve the attention modules for image captioning, we propose an algorithm for handling a pixel-wise semantic information, which is obtained as the outputs of semantic segmentation. The proposed method puts the pixel-wise semantic information into the attention modules for image captioning together with input text data and image features. We conducted evaluation experiments and confirmed that our method could obtain more reasonable weighted image features and better image captions with a BLEU-4 score of 0.306 than its original attention model with a BLEU-4 score of 0.243.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于逐像素语义信息的图像字幕注意模块改进

虽然使用注意机制生成图像标题是合理的，但由于注意机制在图像和文本数据之间的计算困难，如何在该机制内获得理想的图像区域在实践中是一个问题。为了改进图像字幕的注意模块，我们提出了一种处理像素级语义信息的算法，这些信息作为语义分割的输出。该方法将像素语义信息与输入文本数据和图像特征一起放入图像字幕的注意模块中。我们进行了评价实验，证实我们的方法可以获得更合理的加权图像特征和更好的图像标题，BLEU-4得分为0.306，而原始注意力模型的BLEU-4得分为0.243。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Conference on Digital Image Processing

自引率

0.00%

发文量