基于可变形近似大核追踪的大型生成模型脉冲轻量注视估计

Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni
{"title":"基于可变形近似大核追踪的大型生成模型脉冲轻量注视估计","authors":"Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni","doi":"10.1109/TIP.2025.3529379","DOIUrl":null,"url":null,"abstract":"Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1149-1162"},"PeriodicalIF":13.7000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large Generative Model Impulsed Lightweight Gaze Estimator via Deformable Approximate Large Kernel Pursuit\",\"authors\":\"Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni\",\"doi\":\"10.1109/TIP.2025.3529379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"1149-1162\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10847727/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10847727/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着移动设备、AR/VR等移动交互平台的出现,高效、高精度的轻量化凝视估计方法受到越来越多的研究关注。目前基于深度学习的注视估计模型要么存在计算架构过于庞大、不适合移动部署的问题,要么泛化能力有限,无法处理眼睛纹理的巨大多样性,也无法区分细微/频繁的瞳孔运动。为了缓解上述挑战,我们提出了一种新颖的轻量级网络结构,该结构具有可变形的近似大核,可以在非常紧张的计算预算下有效地扩展感受野以处理复杂的眼动和高度变化的眼/凝视区域外观。同时,我们将注视估计器的训练嵌入到控制信息提取模块中,该模块作为注视参数输入,模块化了一个大型生成模型(Stable Diffusion V1.5)来输出特定于注视的眼睛图像。通过这种方式,可以将大型生成模型的强大泛化能力隐式地提取/追求到我们的轻量级凝视模型中。与各种最先进的凝视估计方法进行了广泛的比较,证明了我们提出的模型和训练方案在准确性和模型复杂性方面的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Large Generative Model Impulsed Lightweight Gaze Estimator via Deformable Approximate Large Kernel Pursuit
Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dual-Masked and Discriminative Reconstruction for Unified Vision Anomaly Detection. Enhanced Query Attention Constrained by Bi-directional Graphs for Human Pose Estimation Networks. LSGNet: A Local-Pattern Separation and Global-Aware Network for Temporal Action Detection. Time-variant Image Inpainting via Interactive Distribution Transition Estimation. Continuous Shape-to-texture Face Aging with Flow-based Prior Latent Age Modulation and Attentional Alignment StyleGAN.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1