UAHOI: Uncertainty-aware robust interaction learning for HOI detection

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Vision and Image Understanding Pub Date : 2024-07-20 DOI:10.1016/j.cviu.2024.104091
{"title":"UAHOI: Uncertainty-aware robust interaction learning for HOI detection","authors":"","doi":"10.1016/j.cviu.2024.104091","DOIUrl":null,"url":null,"abstract":"<div><p>This paper focuses on Human–Object Interaction (HOI) detection, addressing the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. Spearheaded by Detection Transformer (DETR), recent developments lead to significant improvements by replacing traditional region proposals by a set of learnable queries. However, despite the powerful representation capabilities provided by Transformers, existing Human–Object Interaction (HOI) detection methods still yield low confidence levels when dealing with complex interactions and are prone to overlooking interactive actions. To address these issues, we propose a novel approach <span>UAHOI</span>, Uncertainty-aware Robust Human–Object Interaction Learning that explicitly estimates prediction uncertainty during the training process to refine both detection and interaction predictions. Our model not only predicts the HOI triplets but also quantifies the uncertainty of these predictions. Specifically, we model this uncertainty through the variance of predictions and incorporate it into the optimization objective, allowing the model to adaptively adjust its confidence threshold based on prediction variance. This integration helps in mitigating the adverse effects of incorrect or ambiguous predictions that are common in traditional methods without any hand-designed components, serving as an automatic confidence threshold. Our method is flexible to existing HOI detection methods and demonstrates improved accuracy. We evaluate <span>UAHOI</span> on two standard benchmarks in the field: V-COCO and HICO-DET, which represent challenging scenarios for HOI detection. Through extensive experiments, we demonstrate that <span>UAHOI</span> achieves significant improvements over existing state-of-the-art methods, enhancing both the accuracy and robustness of HOI detection.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001723","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

This paper focuses on Human–Object Interaction (HOI) detection, addressing the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. Spearheaded by Detection Transformer (DETR), recent developments lead to significant improvements by replacing traditional region proposals by a set of learnable queries. However, despite the powerful representation capabilities provided by Transformers, existing Human–Object Interaction (HOI) detection methods still yield low confidence levels when dealing with complex interactions and are prone to overlooking interactive actions. To address these issues, we propose a novel approach UAHOI, Uncertainty-aware Robust Human–Object Interaction Learning that explicitly estimates prediction uncertainty during the training process to refine both detection and interaction predictions. Our model not only predicts the HOI triplets but also quantifies the uncertainty of these predictions. Specifically, we model this uncertainty through the variance of predictions and incorporate it into the optimization objective, allowing the model to adaptively adjust its confidence threshold based on prediction variance. This integration helps in mitigating the adverse effects of incorrect or ambiguous predictions that are common in traditional methods without any hand-designed components, serving as an automatic confidence threshold. Our method is flexible to existing HOI detection methods and demonstrates improved accuracy. We evaluate UAHOI on two standard benchmarks in the field: V-COCO and HICO-DET, which represent challenging scenarios for HOI detection. Through extensive experiments, we demonstrate that UAHOI achieves significant improvements over existing state-of-the-art methods, enhancing both the accuracy and robustness of HOI detection.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
UAHOI:用于 HOI 检测的不确定性感知鲁棒交互学习
本文的重点是人-物互动(HOI)检测,以应对在给定图像或视频帧中识别和理解人与物体之间互动的挑战。在检测变换器(DETR)的引领下,最近的发展通过用一组可学习的查询来取代传统的区域建议,取得了显著的改进。然而,尽管变形器提供了强大的表示能力,现有的人-物交互(HOI)检测方法在处理复杂的交互时仍会产生较低的置信度,并且容易忽略交互动作。为了解决这些问题,我们提出了一种新方法 UAHOI(不确定性感知的鲁棒人-物交互学习),该方法在训练过程中明确估计预测的不确定性,以完善检测和交互预测。我们的模型不仅能预测 HOI 三胞胎,还能量化这些预测的不确定性。具体来说,我们通过预测方差对这种不确定性进行建模,并将其纳入优化目标,使模型能够根据预测方差自适应地调整其置信度阈值。这种整合有助于减轻传统方法中常见的不正确或模糊预测的不利影响,因为传统方法中没有任何手工设计的组件,可以作为自动置信度阈值。与现有的 HOI 检测方法相比,我们的方法非常灵活,而且准确性更高。我们在两个领域的标准基准上对 UAHOI 进行了评估:V-COCO 和 HICO-DET,它们代表了具有挑战性的 HOI 检测场景。通过大量实验,我们证明 UAHOI 比现有的最先进方法有了显著改进,提高了 HOI 检测的准确性和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
期刊最新文献
Deformable surface reconstruction via Riemannian metric preservation Estimating optical flow: A comprehensive review of the state of the art A lightweight convolutional neural network-based feature extractor for visible images LightSOD: Towards lightweight and efficient network for salient object detection Triple-Stream Commonsense Circulation Transformer Network for Image Captioning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1