EX-Gaze: High-Frequency and Low-Latency Gaze Tracking with Hybrid Event-Frame Cameras for On-Device Extended Reality

Ning Chen;Yiran Shen;Tongyu Zhang;Yanni Yang;Hongkai Wen
{"title":"EX-Gaze: High-Frequency and Low-Latency Gaze Tracking with Hybrid Event-Frame Cameras for On-Device Extended Reality","authors":"Ning Chen;Yiran Shen;Tongyu Zhang;Yanni Yang;Hongkai Wen","doi":"10.1109/TVCG.2025.3549565","DOIUrl":null,"url":null,"abstract":"The integration of gaze/eye tracking into virtual and augmented reality devices has unlocked new possibilities, offering a novel human-computer interaction (HCI) modality for on-device extended reality (XR). Emerging applications in XR, such as low-effort user authentication, mental health diagnosis, and foveated rendering, demand real-time eye tracking at high frequencies, a capability that current solutions struggle to deliver. To address this challenge, we present EX-Gaze, an event-based real-time eye tracking system designed for on-device extended reality. EX-Gaze achieves a high tracking frequency of 2KHz, providing decent accuracy and low tracking latency. The exceptional tracking frequency of EX-Gaze is achieved through the use of event cameras, cutting-edge, bio-inspired vision hardware that delivers event-stream output at high temporal resolution. We have developed a lightweight tracking framework that enables real-time pupil region localization and tracking on mobile devices. To effectively leverage the sparse nature of event-streams, we introduce the sparse event-patch representation and the corresponding sparse event patches transformer as key components to reduce computational time. Implemented on Jetson Orin Nano, a low-cost, small-sized mobile device with hybrid GPU and CPU components capable of parallel processing of multiple deep neural networks, EX-Gaze maximizes the computation power of Jetson Orin Nano through sophisticated computation scheduling and offloading between GPUs and CPUs. This enables EX-Gaze to achieve real-time tracking at 2KHz without accumulating latency. Evaluation on public datasets demonstrates that EX-Gaze outperforms other event-based eye tracking methods by striking the best balance between accuracy and efficiency on mobile devices. These results highlight EX-Gaze's potential as a groundbreaking technology to support XR applications that require high-frequency and real-time eye tracking. The code is available at https://github.com/Ningreka/EX-Gaze.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"31 5","pages":"2299-2309"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10918853/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The integration of gaze/eye tracking into virtual and augmented reality devices has unlocked new possibilities, offering a novel human-computer interaction (HCI) modality for on-device extended reality (XR). Emerging applications in XR, such as low-effort user authentication, mental health diagnosis, and foveated rendering, demand real-time eye tracking at high frequencies, a capability that current solutions struggle to deliver. To address this challenge, we present EX-Gaze, an event-based real-time eye tracking system designed for on-device extended reality. EX-Gaze achieves a high tracking frequency of 2KHz, providing decent accuracy and low tracking latency. The exceptional tracking frequency of EX-Gaze is achieved through the use of event cameras, cutting-edge, bio-inspired vision hardware that delivers event-stream output at high temporal resolution. We have developed a lightweight tracking framework that enables real-time pupil region localization and tracking on mobile devices. To effectively leverage the sparse nature of event-streams, we introduce the sparse event-patch representation and the corresponding sparse event patches transformer as key components to reduce computational time. Implemented on Jetson Orin Nano, a low-cost, small-sized mobile device with hybrid GPU and CPU components capable of parallel processing of multiple deep neural networks, EX-Gaze maximizes the computation power of Jetson Orin Nano through sophisticated computation scheduling and offloading between GPUs and CPUs. This enables EX-Gaze to achieve real-time tracking at 2KHz without accumulating latency. Evaluation on public datasets demonstrates that EX-Gaze outperforms other event-based eye tracking methods by striking the best balance between accuracy and efficiency on mobile devices. These results highlight EX-Gaze's potential as a groundbreaking technology to support XR applications that require high-frequency and real-time eye tracking. The code is available at https://github.com/Ningreka/EX-Gaze.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
前凝视:用于设备扩展现实的混合事件帧相机的高频和低延迟凝视跟踪。
将凝视/眼动追踪集成到虚拟和增强现实设备中,为设备上扩展现实(XR)提供了一种新颖的人机交互(HCI)模式,开启了新的可能性。XR中的新兴应用程序,如低工作量的用户身份验证、心理健康诊断和注视点渲染,需要高频率的实时眼动跟踪,这是当前解决方案难以提供的功能。为了应对这一挑战,我们提出了EX-Gaze,这是一种基于事件的实时眼动追踪系统,专为设备上的扩展现实设计。EX-Gaze实现了2KHz的高跟踪频率,提供了良好的精度和低跟踪延迟。EX-Gaze的特殊跟踪频率是通过使用事件相机、尖端的生物视觉硬件来实现的,该硬件以高时间分辨率提供事件流输出。我们已经开发了一个轻量级的跟踪框架,可以在移动设备上实现实时瞳孔区域定位和跟踪。为了有效地利用事件流的稀疏特性,我们引入了稀疏事件补丁表示和相应的稀疏事件补丁转换器作为关键组件来减少计算时间。EX-Gaze在Jetson Orin Nano上实现,Jetson Orin Nano是一种低成本、小型的移动设备,具有GPU和CPU混合组件,能够并行处理多个深度神经网络,通过在GPU和CPU之间进行复杂的计算调度和卸载,最大限度地提高了Jetson Orin Nano的计算能力。这使得EX-Gaze能够实现2KHz的实时跟踪,而不会累积延迟。对公共数据集的评估表明,在移动设备上,EX-Gaze在准确性和效率之间取得了最佳平衡,优于其他基于事件的眼动追踪方法。这些结果突出了EX-Gaze作为一项突破性技术的潜力,可以支持需要高频和实时眼动追踪的XR应用。代码可在https://github.com/Ningreka/EX-Gaze上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HYVE: Hybrid Vertex Encoder for Neural Distance Fields. Errata to "DiffCap: Diffusion-Based Real-Time Human Motion Capture Using Sparse IMUs and a Monocular Camera". GSwap: Realistic Head Swapping With Dynamic Neural Gaussian Field. PortInput: Enabling Always-available Micro-gesture Input with Pressure Array Sensor. FISN: FInding Spatial Neighborhoods for Generalizable Novel View Synthesis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1