Mixture-Kernel Graph Attention Network for Situation Recognition

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI:10.1109/ICCV.2019.01046

M. Suhail, L. Sigal

引用次数: 22

Abstract

Understanding images beyond salient actions involves reasoning about scene context, objects, and the roles they play in the captured event. Situation recognition has recently been introduced as the task of jointly reasoning about the verbs (actions) and a set of semantic-role and entity (noun) pairs in the form of action frames. Labeling an image with an action frame requires an assignment of values (nouns) to the roles based on the observed image content. Among the inherent challenges are the rich conditional structured dependencies between the output role assignments and the overall semantic sparsity. In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. Our GNN enables dynamic graph structure during training and inference, through the use of a graph attention mechanism, and context-aware interactions between role pairs. We illustrate the efficacy of our model and design choices by conducting experiments on imSitu benchmark dataset, with accuracy improvements of up to 10% over the state-of-the-art.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

态势识别的混合核图注意网络

理解突出动作之外的图像需要对场景背景、对象以及它们在捕获事件中扮演的角色进行推理。情境识别是一种以动作框架的形式对动词(动作)和一组语义-角色和实体(名词)对进行联合推理的任务。用动作框架标记图像需要根据观察到的图像内容为角色分配值(名词)。其中固有的挑战是输出角色分配之间的丰富条件结构化依赖关系和整体语义稀疏性。在本文中，我们提出了一种新的混合核注意图神经网络(GNN)架构来解决这些挑战。我们的GNN通过使用图注意机制和角色对之间的上下文感知交互，在训练和推理期间实现动态图结构。我们通过在imSitu基准数据集上进行实验来说明我们的模型和设计选择的有效性，其精度比最先进的精度提高了10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量

期刊最新文献

Very Long Natural Scenery Image Prediction by Outpainting VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation Towards Latent Attribute Discovery From Triplet Similarities Gaze360: Physically Unconstrained Gaze Estimation in the Wild Attention Bridging Network for Knowledge Transfer