RGBE-Gaze: A Large-Scale Event-Based Multimodal Dataset for High Frequency Remote Gaze Tracking

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-07 DOI:10.1109/TPAMI.2024.3474858

Guangrong Zhao;Yiran Shen;Chenlong Zhang;Zhaoxin Shen;Yuanfeng Zhou;Hongkai Wen

{"title":"RGBE-Gaze: A Large-Scale Event-Based Multimodal Dataset for High Frequency Remote Gaze Tracking","authors":"Guangrong Zhao;Yiran Shen;Chenlong Zhang;Zhaoxin Shen;Yuanfeng Zhou;Hongkai Wen","doi":"10.1109/TPAMI.2024.3474858","DOIUrl":null,"url":null,"abstract":"High-frequency gaze tracking demonstrates significant potential in various critical applications, such as foveatedrendering, gaze-based identity verification, and the diagnosis of mental disorders. However, existing eye-tracking systems based on CCD/CMOS cameras either provide tracking frequencies below 200 Hz or employ high-speedcameras, causing high power consumption and bulky devices. While there have been some high-speed eye-tracking datasets and methods based on event cameras, they are primarily tailored for near-eye camera scenarios. They lackthe advantages associated with remote camera scenarios, such as the absence of the need for direct contact, improved user comfort and head pose freedom. In this work, we present RGBE-Gaze, the first large-scale and multimodal dataset for remote gaze tracking in high-frequency through synchronizing RGB and event cameras. This dataset is collected from 66 participants with diverse genders and age groups. Our setup captures 3.6 million RGB images and 26.3 billion event samples. Additionally, the dataset includes 10.7 million gaze references from the Gazepoint GP3 HD eye tracker and 15,972 sparse points of gaze (PoG) ground truth obtained through manualstimuli clicks by participants. We present dataset characteristics such as head pose, gaze direction, and pupil size. Furthermore, we introduce a hybrid frame-event based gaze estimation method specifically designed for the collected dataset. Moreover, we perform extensive evaluations of different benchmarking methods under variousgaze-related factors. The evaluation results illustrate that introducing event stream as a new modality improves gazetracking frequency and demonstrates greater estimation robustness across diverse gaze-related factors.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 1","pages":"601-615"},"PeriodicalIF":18.6000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10706089","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10706089/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

High-frequency gaze tracking demonstrates significant potential in various critical applications, such as foveatedrendering, gaze-based identity verification, and the diagnosis of mental disorders. However, existing eye-tracking systems based on CCD/CMOS cameras either provide tracking frequencies below 200 Hz or employ high-speedcameras, causing high power consumption and bulky devices. While there have been some high-speed eye-tracking datasets and methods based on event cameras, they are primarily tailored for near-eye camera scenarios. They lackthe advantages associated with remote camera scenarios, such as the absence of the need for direct contact, improved user comfort and head pose freedom. In this work, we present RGBE-Gaze, the first large-scale and multimodal dataset for remote gaze tracking in high-frequency through synchronizing RGB and event cameras. This dataset is collected from 66 participants with diverse genders and age groups. Our setup captures 3.6 million RGB images and 26.3 billion event samples. Additionally, the dataset includes 10.7 million gaze references from the Gazepoint GP3 HD eye tracker and 15,972 sparse points of gaze (PoG) ground truth obtained through manualstimuli clicks by participants. We present dataset characteristics such as head pose, gaze direction, and pupil size. Furthermore, we introduce a hybrid frame-event based gaze estimation method specifically designed for the collected dataset. Moreover, we perform extensive evaluations of different benchmarking methods under variousgaze-related factors. The evaluation results illustrate that introducing event stream as a new modality improves gazetracking frequency and demonstrates greater estimation robustness across diverse gaze-related factors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RGBE-Gaze：用于高频远程凝视跟踪的基于事件的大规模多模态数据集

高频注视跟踪在各种关键应用中显示出巨大的潜力，例如注视点渲染、基于注视的身份验证和精神障碍的诊断。然而，现有的基于CCD/CMOS相机的眼球追踪系统要么提供低于200赫兹的追踪频率，要么采用高速相机，导致高功耗和笨重的设备。虽然已经有一些基于事件相机的高速眼球追踪数据集和方法，但它们主要是为近眼相机场景量身定制的。它们缺乏与远程相机场景相关的优势，例如不需要直接接触，提高用户舒适度和头部姿势自由。在这项工作中，我们提出了RGB - gaze，这是第一个通过同步RGB和事件相机进行高频远程凝视跟踪的大规模多模态数据集。该数据集收集自66名不同性别和年龄组的参与者。我们的设置捕获了360万张RGB图像和263亿个事件样本。此外，该数据集还包括来自Gazepoint GP3 HD眼动仪的1070万个凝视参考和通过参与者手动刺激点击获得的15,972个稀疏凝视点（PoG）地面真相。我们展示了数据集的特征，如头部姿势、凝视方向和瞳孔大小。此外，我们还介绍了一种针对所收集数据集设计的基于帧-事件的混合注视估计方法。此外，我们在各种与凝视相关的因素下对不同的基准测试方法进行了广泛的评估。评估结果表明，引入事件流作为一种新的模式提高了注视跟踪频率，并在不同的注视相关因素中表现出更强的估计鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量