Michele Mazzamuto, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella
{"title":"利用凝视信号和弱目标监督学习检测文化遗址中的被关注物体","authors":"Michele Mazzamuto, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella","doi":"10.1145/3647999","DOIUrl":null,"url":null,"abstract":"<p>Cultural sites such as museums and monuments are popular tourist destinations worldwide. Visitors come to these places to learn about the cultures, histories and arts of a particular region or country. However, for many cultural sites, traditional visiting approaches are limited and may fail to engage visitors. To enhance visitors’ experiences, previous works have explored how wearable devices can be exploited in this context. Among the many functions that these devices can offer, understanding which artwork or detail the user is attending to is fundamental to provide additional information on the observed artworks, understand the visitor’s tastes and provide recommendations. This motivates the development of algorithms for understanding visitor attention from egocentric images. We considered the attended object detection task, which involves detecting and recognizing the object observed by the camera wearer, from an input RGB image and gaze signals. To study the problem, we collect a dataset of egocentric images collected by subjects visiting a museum. Since collecting and labeling data in cultural sites for real applications is a time-consuming problem, we present a study comparing unsupervised, weakly supervised, and fully supervised approaches for attended object detection. We evaluate the considered approaches on the collected dataset, assessing also the impact of training models on external datasets such as COCO and EGO-CH. The experiments show that weakly supervised approaches requiring only a 2D point label related to the gaze can be an effective alternative to fully supervised approaches for attended object detection. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/EGO-CH-Gaze/.</p>","PeriodicalId":54310,"journal":{"name":"ACM Journal on Computing and Cultural Heritage","volume":"26 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning to Detect Attended Objects in Cultural Sites with Gaze Signals and Weak Object Supervision\",\"authors\":\"Michele Mazzamuto, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella\",\"doi\":\"10.1145/3647999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Cultural sites such as museums and monuments are popular tourist destinations worldwide. Visitors come to these places to learn about the cultures, histories and arts of a particular region or country. However, for many cultural sites, traditional visiting approaches are limited and may fail to engage visitors. To enhance visitors’ experiences, previous works have explored how wearable devices can be exploited in this context. Among the many functions that these devices can offer, understanding which artwork or detail the user is attending to is fundamental to provide additional information on the observed artworks, understand the visitor’s tastes and provide recommendations. This motivates the development of algorithms for understanding visitor attention from egocentric images. We considered the attended object detection task, which involves detecting and recognizing the object observed by the camera wearer, from an input RGB image and gaze signals. To study the problem, we collect a dataset of egocentric images collected by subjects visiting a museum. Since collecting and labeling data in cultural sites for real applications is a time-consuming problem, we present a study comparing unsupervised, weakly supervised, and fully supervised approaches for attended object detection. We evaluate the considered approaches on the collected dataset, assessing also the impact of training models on external datasets such as COCO and EGO-CH. The experiments show that weakly supervised approaches requiring only a 2D point label related to the gaze can be an effective alternative to fully supervised approaches for attended object detection. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/EGO-CH-Gaze/.</p>\",\"PeriodicalId\":54310,\"journal\":{\"name\":\"ACM Journal on Computing and Cultural Heritage\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal on Computing and Cultural Heritage\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3647999\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Computing and Cultural Heritage","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3647999","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Learning to Detect Attended Objects in Cultural Sites with Gaze Signals and Weak Object Supervision
Cultural sites such as museums and monuments are popular tourist destinations worldwide. Visitors come to these places to learn about the cultures, histories and arts of a particular region or country. However, for many cultural sites, traditional visiting approaches are limited and may fail to engage visitors. To enhance visitors’ experiences, previous works have explored how wearable devices can be exploited in this context. Among the many functions that these devices can offer, understanding which artwork or detail the user is attending to is fundamental to provide additional information on the observed artworks, understand the visitor’s tastes and provide recommendations. This motivates the development of algorithms for understanding visitor attention from egocentric images. We considered the attended object detection task, which involves detecting and recognizing the object observed by the camera wearer, from an input RGB image and gaze signals. To study the problem, we collect a dataset of egocentric images collected by subjects visiting a museum. Since collecting and labeling data in cultural sites for real applications is a time-consuming problem, we present a study comparing unsupervised, weakly supervised, and fully supervised approaches for attended object detection. We evaluate the considered approaches on the collected dataset, assessing also the impact of training models on external datasets such as COCO and EGO-CH. The experiments show that weakly supervised approaches requiring only a 2D point label related to the gaze can be an effective alternative to fully supervised approaches for attended object detection. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/EGO-CH-Gaze/.
期刊介绍:
ACM Journal on Computing and Cultural Heritage (JOCCH) publishes papers of significant and lasting value in all areas relating to the use of information and communication technologies (ICT) in support of Cultural Heritage. The journal encourages the submission of manuscripts that demonstrate innovative use of technology for the discovery, analysis, interpretation and presentation of cultural material, as well as manuscripts that illustrate applications in the Cultural Heritage sector that challenge the computational technologies and suggest new research opportunities in computer science.