OCR-oriented Master Object for Text Image Captioning

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI:10.1145/3512527.3531431

Wenliang Tang, Zhenzhen Hu, Zijie Song, Richang Hong

引用次数: 5

Abstract

Text image captioning aims to understand the scene text in images for image caption generation. The key issue of this challenging task is to understand the relationship between the text OCR tokens and images. In this paper, we propose a novel text image captioning method by purifying the OCR-oriented scene graph with themaster object. The master object is the object to which the OCR is attached, which is the semantic relationship bridge between the OCR token and the image. We consider the master object as a proxy to connect OCR tokens and other regions in the image. By exploring the master object for each OCR token, we build the purified scene graph based on the master objects and then enrich the visual embedding by the Graph Convolution Network (GCN). Furthermore, we cluster the OCR tokens and feed the hierarchical information to provide a richer representation. Experiments on the TextCaps validation and test dataset demonstrate the effectiveness of the proposed method.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向ocr的文本图像字幕主对象

文本图像字幕旨在理解图像中的场景文本，以便生成图像字幕。这个具有挑战性的任务的关键问题是理解文本OCR令牌和图像之间的关系。本文提出了一种新的文本图像字幕方法，即利用母对象对面向ocr的场景图进行净化。主对象是OCR附加到的对象，它是OCR令牌和图像之间的语义关系桥梁。我们将主对象视为连接OCR令牌和图像中的其他区域的代理。通过探索每个OCR令牌的主对象，在主对象的基础上构建纯化的场景图，然后利用图卷积网络(GCN)丰富视觉嵌入。此外，我们将OCR令牌聚类并提供分层信息以提供更丰富的表示。在TextCaps验证和测试数据集上的实验验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2022 International Conference on Multimedia Retrieval

自引率

0.00%

发文量