QAHOI: Query-Based Anchors for Human-Object Interaction Detection

2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2021-12-16 DOI:10.23919/MVA57639.2023.10215534

Junwen Chen, Keiji Yanai

引用次数: 22

Abstract

Human-object interaction (HOI) detection as a downstream of object detection task requires localizing pairs of humans and objects and recognizing the interaction between them. Recent one-stage approaches focus on detecting possible interaction points or filtering human-object pairs, ignoring the variability in the location and size of different objects at spatial scales. In this paper, we propose a transformer-based method, QAHOI (Query-Based Anchors for Human-Object Interaction detection), which leverages a multi-scale architecture to extract features from different spatial scales and uses query-based anchors to predict all the elements of an HOI instance. We further investigate that a powerful backbone significantly increases accuracy for QAHOI, and QAHOI with a transformer-based backbone outperforms recent state-of-the-art methods by large margins on the HICO-DET benchmark.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于查询的人-物交互检测锚

人-物交互(HOI)检测作为目标检测的下游任务，需要对人-物对进行定位，识别人-物之间的交互。最近的单阶段方法侧重于检测可能的交互点或过滤人-物体对，忽略了不同物体在空间尺度上的位置和大小的可变性。在本文中，我们提出了一种基于转换器的方法QAHOI(基于查询的人-对象交互检测锚)，它利用多尺度架构从不同的空间尺度提取特征，并使用基于查询的锚来预测一个HOI实例的所有元素。我们进一步研究了强大的主干网显著提高了QAHOI的准确性，并且在HICO-DET基准上，基于变压器的主干网的QAHOI在很大程度上优于最近最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 18th International Conference on Machine Vision and Applications (MVA)

自引率

0.00%

发文量