用于凝视估计的频率-空间交互网络

IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Displays Pub Date : 2024-11-21 DOI:10.1016/j.displa.2024.102878
Yuanning Jia , Zhi Liu , Ying Lv , Xiaofeng Lu , Xuefeng Liu , Jie Chen
{"title":"用于凝视估计的频率-空间交互网络","authors":"Yuanning Jia ,&nbsp;Zhi Liu ,&nbsp;Ying Lv ,&nbsp;Xiaofeng Lu ,&nbsp;Xuefeng Liu ,&nbsp;Jie Chen","doi":"10.1016/j.displa.2024.102878","DOIUrl":null,"url":null,"abstract":"<div><div>Gaze estimation is a fundamental task in the field of computer vision, which determines the direction a person is looking at. With advancements in Convolutional Neural Networks (CNNs) and the availability of large-scale datasets, appearance-based models have made significant progress. Nonetheless, CNNs exhibit limitations in extracting global information from features, resulting in a constraint on gaze estimation performance. Inspired by the properties of the Fourier transform in signal processing, we propose the Frequency-Spatial Interaction network for Gaze estimation (FSIGaze), which integrates residual modules and Frequency-Spatial Synergistic (FSS) modules. To be specific, its FSS module is a dual-branch structure with a spatial branch and a frequency branch. The frequency branch employs Fast Fourier Transformation to transfer a latent representation to the frequency domain and applies adaptive frequency filter to achieve an image-size receptive field. The spatial branch, on the other hand, can extract local detailed features. Acknowledging the synergistic benefits of global and local information in gaze estimation, we introduce a Dual-domain Interaction Block (DIB) to enhance the capability of the model. Furthermore, we implement a multi-task learning strategy, incorporating eye region detection as an auxiliary task to refine facial features. Extensive experiments demonstrate that our model surpasses other state-of-the-art gaze estimation models on three three-dimensional (3D) datasets and delivers competitive results on two two-dimensional (2D) datasets.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"86 ","pages":"Article 102878"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Frequency-spatial interaction network for gaze estimation\",\"authors\":\"Yuanning Jia ,&nbsp;Zhi Liu ,&nbsp;Ying Lv ,&nbsp;Xiaofeng Lu ,&nbsp;Xuefeng Liu ,&nbsp;Jie Chen\",\"doi\":\"10.1016/j.displa.2024.102878\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Gaze estimation is a fundamental task in the field of computer vision, which determines the direction a person is looking at. With advancements in Convolutional Neural Networks (CNNs) and the availability of large-scale datasets, appearance-based models have made significant progress. Nonetheless, CNNs exhibit limitations in extracting global information from features, resulting in a constraint on gaze estimation performance. Inspired by the properties of the Fourier transform in signal processing, we propose the Frequency-Spatial Interaction network for Gaze estimation (FSIGaze), which integrates residual modules and Frequency-Spatial Synergistic (FSS) modules. To be specific, its FSS module is a dual-branch structure with a spatial branch and a frequency branch. The frequency branch employs Fast Fourier Transformation to transfer a latent representation to the frequency domain and applies adaptive frequency filter to achieve an image-size receptive field. The spatial branch, on the other hand, can extract local detailed features. Acknowledging the synergistic benefits of global and local information in gaze estimation, we introduce a Dual-domain Interaction Block (DIB) to enhance the capability of the model. Furthermore, we implement a multi-task learning strategy, incorporating eye region detection as an auxiliary task to refine facial features. Extensive experiments demonstrate that our model surpasses other state-of-the-art gaze estimation models on three three-dimensional (3D) datasets and delivers competitive results on two two-dimensional (2D) datasets.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"86 \",\"pages\":\"Article 102878\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938224002427\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224002427","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

凝视估计是计算机视觉领域的一项基本任务,它能确定一个人正在注视的方向。随着卷积神经网络(CNN)的进步和大规模数据集的可用性,基于外观的模型取得了重大进展。然而,卷积神经网络在从特征中提取全局信息方面表现出局限性,从而制约了凝视估计的性能。受信号处理中傅立叶变换特性的启发,我们提出了用于注视估计的频率-空间交互网络(FSIGaze),它集成了残差模块和频率-空间协同(FSS)模块。具体来说,其 FSS 模块是一个双分支结构,包括空间分支和频率分支。频率分支采用快速傅里叶变换将潜在表征转移到频域,并应用自适应频率滤波器实现图像大小的感受野。空间分支则可以提取局部细节特征。考虑到全局和局部信息在凝视估计中的协同优势,我们引入了双域交互块(DIB)来增强模型的能力。此外,我们还实施了多任务学习策略,将眼部区域检测作为完善面部特征的辅助任务。广泛的实验证明,我们的模型在三个三维(3D)数据集上超越了其他最先进的凝视估计模型,在两个二维(2D)数据集上也取得了具有竞争力的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Frequency-spatial interaction network for gaze estimation
Gaze estimation is a fundamental task in the field of computer vision, which determines the direction a person is looking at. With advancements in Convolutional Neural Networks (CNNs) and the availability of large-scale datasets, appearance-based models have made significant progress. Nonetheless, CNNs exhibit limitations in extracting global information from features, resulting in a constraint on gaze estimation performance. Inspired by the properties of the Fourier transform in signal processing, we propose the Frequency-Spatial Interaction network for Gaze estimation (FSIGaze), which integrates residual modules and Frequency-Spatial Synergistic (FSS) modules. To be specific, its FSS module is a dual-branch structure with a spatial branch and a frequency branch. The frequency branch employs Fast Fourier Transformation to transfer a latent representation to the frequency domain and applies adaptive frequency filter to achieve an image-size receptive field. The spatial branch, on the other hand, can extract local detailed features. Acknowledging the synergistic benefits of global and local information in gaze estimation, we introduce a Dual-domain Interaction Block (DIB) to enhance the capability of the model. Furthermore, we implement a multi-task learning strategy, incorporating eye region detection as an auxiliary task to refine facial features. Extensive experiments demonstrate that our model surpasses other state-of-the-art gaze estimation models on three three-dimensional (3D) datasets and delivers competitive results on two two-dimensional (2D) datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Displays
Displays 工程技术-工程:电子与电气
CiteScore
4.60
自引率
25.60%
发文量
138
审稿时长
92 days
期刊介绍: Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.
期刊最新文献
An overview of bit-depth enhancement: Algorithm datasets and evaluation No-reference underwater image quality assessment based on Multi-Scale and mutual information analysis DHDP-SLAM: Dynamic Hierarchical Dirichlet Process based data association for semantic SLAM Fabrication and Reflow of Indium Bumps for Active-Matrix Micro-LED Display of 3175 PPI AI-aided diagnosis of periodontitis in oral X-ray images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1