Ning Chen;Yiran Shen;Tongyu Zhang;Yanni Yang;Hongkai Wen
{"title":"EX-Gaze: High-Frequency and Low-Latency Gaze Tracking with Hybrid Event-Frame Cameras for On-Device Extended Reality","authors":"Ning Chen;Yiran Shen;Tongyu Zhang;Yanni Yang;Hongkai Wen","doi":"10.1109/TVCG.2025.3549565","DOIUrl":null,"url":null,"abstract":"The integration of gaze/eye tracking into virtual and augmented reality devices has unlocked new possibilities, offering a novel human-computer interaction (HCI) modality for on-device extended reality (XR). Emerging applications in XR, such as low-effort user authentication, mental health diagnosis, and foveated rendering, demand real-time eye tracking at high frequencies, a capability that current solutions struggle to deliver. To address this challenge, we present EX-Gaze, an event-based real-time eye tracking system designed for on-device extended reality. EX-Gaze achieves a high tracking frequency of 2KHz, providing decent accuracy and low tracking latency. The exceptional tracking frequency of EX-Gaze is achieved through the use of event cameras, cutting-edge, bio-inspired vision hardware that delivers event-stream output at high temporal resolution. We have developed a lightweight tracking framework that enables real-time pupil region localization and tracking on mobile devices. To effectively leverage the sparse nature of event-streams, we introduce the sparse event-patch representation and the corresponding sparse event patches transformer as key components to reduce computational time. Implemented on Jetson Orin Nano, a low-cost, small-sized mobile device with hybrid GPU and CPU components capable of parallel processing of multiple deep neural networks, EX-Gaze maximizes the computation power of Jetson Orin Nano through sophisticated computation scheduling and offloading between GPUs and CPUs. This enables EX-Gaze to achieve real-time tracking at 2KHz without accumulating latency. Evaluation on public datasets demonstrates that EX-Gaze outperforms other event-based eye tracking methods by striking the best balance between accuracy and efficiency on mobile devices. These results highlight EX-Gaze's potential as a groundbreaking technology to support XR applications that require high-frequency and real-time eye tracking. The code is available at https://github.com/Ningreka/EX-Gaze.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"31 5","pages":"2299-2309"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10918853/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The integration of gaze/eye tracking into virtual and augmented reality devices has unlocked new possibilities, offering a novel human-computer interaction (HCI) modality for on-device extended reality (XR). Emerging applications in XR, such as low-effort user authentication, mental health diagnosis, and foveated rendering, demand real-time eye tracking at high frequencies, a capability that current solutions struggle to deliver. To address this challenge, we present EX-Gaze, an event-based real-time eye tracking system designed for on-device extended reality. EX-Gaze achieves a high tracking frequency of 2KHz, providing decent accuracy and low tracking latency. The exceptional tracking frequency of EX-Gaze is achieved through the use of event cameras, cutting-edge, bio-inspired vision hardware that delivers event-stream output at high temporal resolution. We have developed a lightweight tracking framework that enables real-time pupil region localization and tracking on mobile devices. To effectively leverage the sparse nature of event-streams, we introduce the sparse event-patch representation and the corresponding sparse event patches transformer as key components to reduce computational time. Implemented on Jetson Orin Nano, a low-cost, small-sized mobile device with hybrid GPU and CPU components capable of parallel processing of multiple deep neural networks, EX-Gaze maximizes the computation power of Jetson Orin Nano through sophisticated computation scheduling and offloading between GPUs and CPUs. This enables EX-Gaze to achieve real-time tracking at 2KHz without accumulating latency. Evaluation on public datasets demonstrates that EX-Gaze outperforms other event-based eye tracking methods by striking the best balance between accuracy and efficiency on mobile devices. These results highlight EX-Gaze's potential as a groundbreaking technology to support XR applications that require high-frequency and real-time eye tracking. The code is available at https://github.com/Ningreka/EX-Gaze.
将凝视/眼动追踪集成到虚拟和增强现实设备中,为设备上扩展现实(XR)提供了一种新颖的人机交互(HCI)模式,开启了新的可能性。XR中的新兴应用程序,如低工作量的用户身份验证、心理健康诊断和注视点渲染,需要高频率的实时眼动跟踪,这是当前解决方案难以提供的功能。为了应对这一挑战,我们提出了EX-Gaze,这是一种基于事件的实时眼动追踪系统,专为设备上的扩展现实设计。EX-Gaze实现了2KHz的高跟踪频率,提供了良好的精度和低跟踪延迟。EX-Gaze的特殊跟踪频率是通过使用事件相机、尖端的生物视觉硬件来实现的,该硬件以高时间分辨率提供事件流输出。我们已经开发了一个轻量级的跟踪框架,可以在移动设备上实现实时瞳孔区域定位和跟踪。为了有效地利用事件流的稀疏特性,我们引入了稀疏事件补丁表示和相应的稀疏事件补丁转换器作为关键组件来减少计算时间。EX-Gaze在Jetson Orin Nano上实现,Jetson Orin Nano是一种低成本、小型的移动设备,具有GPU和CPU混合组件,能够并行处理多个深度神经网络,通过在GPU和CPU之间进行复杂的计算调度和卸载,最大限度地提高了Jetson Orin Nano的计算能力。这使得EX-Gaze能够实现2KHz的实时跟踪,而不会累积延迟。对公共数据集的评估表明,在移动设备上,EX-Gaze在准确性和效率之间取得了最佳平衡,优于其他基于事件的眼动追踪方法。这些结果突出了EX-Gaze作为一项突破性技术的潜力,可以支持需要高频和实时眼动追踪的XR应用。代码可在https://github.com/Ningreka/EX-Gaze上获得。