Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Michele Mazzamuto, F. Ragusa, Antonino Furnari, G. Signorello, G. Farinella
{"title":"Weakly Supervised Attended Object Detection Using Gaze Data as Annotations","authors":"Michele Mazzamuto, F. Ragusa, Antonino Furnari, G. Signorello, G. Farinella","doi":"10.48550/arXiv.2204.07090","DOIUrl":null,"url":null,"abstract":"We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/","PeriodicalId":74527,"journal":{"name":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","volume":"78 1","pages":"263-274"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2204.07090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用注视数据作为注释的弱监督参与对象检测
我们从自我中心的视角来考虑文化场所中游客观察到的物体(即被关注的物体)的检测和识别问题。解决这个问题的标准方法包括检测所有物体,并选择一个与访问者的目光最重叠的物体,通过凝视跟踪器进行测量。由于标记大量数据来训练标准目标检测器在成本和时间上都是昂贵的,我们提出了一个弱监督版本的任务,它只依赖于注视数据和一个指示被关注对象类别的帧级标签。为了研究这个问题,我们提出了一个新的数据集,该数据集由以自我为中心的视频和参观博物馆的受试者的凝视坐标组成。因此,我们比较了三种不同的基线弱监督出席对象检测收集的数据。结果表明,所考虑的方法在弱监督方式下取得了令人满意的性能,相对于基于Faster R-CNN的完全监督检测器,可以节省大量时间。为了鼓励对该主题的研究,我们在以下url上公开发布代码和数据集:https://iplab.dmi.unict.it/WS_OBJ_DET/
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fuzzy Logic Visual Network (FLVN): A neuro-symbolic approach for visual features matching Sparse Double Descent in Vision Transformers: real or phantom threat? Not with my name! Inferring artists' names of input strings employed by Diffusion Models CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1