{"title":"RGBE-Gaze: A Large-Scale Event-Based Multimodal Dataset for High Frequency Remote Gaze Tracking","authors":"Guangrong Zhao;Yiran Shen;Chenlong Zhang;Zhaoxin Shen;Yuanfeng Zhou;Hongkai Wen","doi":"10.1109/TPAMI.2024.3474858","DOIUrl":null,"url":null,"abstract":"High-frequency gaze tracking demonstrates significant potential in various critical applications, such as foveatedrendering, gaze-based identity verification, and the diagnosis of mental disorders. However, existing eye-tracking systems based on CCD/CMOS cameras either provide tracking frequencies below 200 Hz or employ high-speedcameras, causing high power consumption and bulky devices. While there have been some high-speed eye-tracking datasets and methods based on event cameras, they are primarily tailored for near-eye camera scenarios. They lackthe advantages associated with remote camera scenarios, such as the absence of the need for direct contact, improved user comfort and head pose freedom. In this work, we present RGBE-Gaze, the first large-scale and multimodal dataset for remote gaze tracking in high-frequency through synchronizing RGB and event cameras. This dataset is collected from 66 participants with diverse genders and age groups. Our setup captures 3.6 million RGB images and 26.3 billion event samples. Additionally, the dataset includes 10.7 million gaze references from the Gazepoint GP3 HD eye tracker and 15,972 sparse points of gaze (PoG) ground truth obtained through manualstimuli clicks by participants. We present dataset characteristics such as head pose, gaze direction, and pupil size. Furthermore, we introduce a hybrid frame-event based gaze estimation method specifically designed for the collected dataset. Moreover, we perform extensive evaluations of different benchmarking methods under variousgaze-related factors. The evaluation results illustrate that introducing event stream as a new modality improves gazetracking frequency and demonstrates greater estimation robustness across diverse gaze-related factors.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 1","pages":"601-615"},"PeriodicalIF":18.6000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10706089","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10706089/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
High-frequency gaze tracking demonstrates significant potential in various critical applications, such as foveatedrendering, gaze-based identity verification, and the diagnosis of mental disorders. However, existing eye-tracking systems based on CCD/CMOS cameras either provide tracking frequencies below 200 Hz or employ high-speedcameras, causing high power consumption and bulky devices. While there have been some high-speed eye-tracking datasets and methods based on event cameras, they are primarily tailored for near-eye camera scenarios. They lackthe advantages associated with remote camera scenarios, such as the absence of the need for direct contact, improved user comfort and head pose freedom. In this work, we present RGBE-Gaze, the first large-scale and multimodal dataset for remote gaze tracking in high-frequency through synchronizing RGB and event cameras. This dataset is collected from 66 participants with diverse genders and age groups. Our setup captures 3.6 million RGB images and 26.3 billion event samples. Additionally, the dataset includes 10.7 million gaze references from the Gazepoint GP3 HD eye tracker and 15,972 sparse points of gaze (PoG) ground truth obtained through manualstimuli clicks by participants. We present dataset characteristics such as head pose, gaze direction, and pupil size. Furthermore, we introduce a hybrid frame-event based gaze estimation method specifically designed for the collected dataset. Moreover, we perform extensive evaluations of different benchmarking methods under variousgaze-related factors. The evaluation results illustrate that introducing event stream as a new modality improves gazetracking frequency and demonstrates greater estimation robustness across diverse gaze-related factors.