Research on Image Description Generation Method Based on G-AoANet

Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition Pub Date : 2022-09-23 DOI:10.1145/3573942.3574072

Pi Qiao, Ruixue Shen, Yuan Li

引用次数: 0

Abstract

Most of the image description generation methods in the attention-based encoder-decoder framework extract local features from images. Despite the relatively high semantic level of local features, it still has two problems to be solved, one is object loss, where some important objects may be lost when generating image descriptions, and the other is prediction error, as an object may be identified in the wrong class. In this paper, a G-AoANet model is proposed to solve the above problems. The model uses an attention mechanism to combine global features with local features. In this way, our model can selectively focus on both object and contextual information, improving the quality of the generated descriptions. Experimental results show that the model improves the initially reported best CIDEr-D and SPICE scores on the MS COCO dataset by 9.3% and 5.1% respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于G-AoANet的图像描述生成方法研究

在基于注意力的编码器-解码器框架中，大多数图像描述生成方法都是从图像中提取局部特征。尽管局部特征的语义水平相对较高，但仍然存在两个问题需要解决，一个是对象丢失，在生成图像描述时可能会丢失一些重要的对象，另一个是预测误差，可能会将对象识别在错误的类中。本文提出了一种G-AoANet模型来解决上述问题。该模型利用注意机制将全局特征与局部特征结合起来。通过这种方式，我们的模型可以选择性地关注对象和上下文信息，从而提高生成描述的质量。实验结果表明，该模型在MS COCO数据集上的CIDEr-D和SPICE得分分别提高了9.3%和5.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Model Lightweight Method for Object Detection Incremental Encoding Transformer Incorporating Common-sense Awareness for Conversational Sentiment Recognition Non-intrusive Automatic 3D Gaze Ground-truth System Fiber Optic Gyroscope Random Error Modeling Based on Improved Kalman Filtering Channel Modeling of Spaceborne Multiwavelet Packet OFDM System Based on CWGAN