Jiahui Hu , Yonghua Lu , Xiyuan Ye , Qiang Feng , Lihua Zhou
{"title":"用于凝视估计的带有自适应参考样本的快速差分网络","authors":"Jiahui Hu , Yonghua Lu , Xiyuan Ye , Qiang Feng , Lihua Zhou","doi":"10.1016/j.cviu.2024.104156","DOIUrl":null,"url":null,"abstract":"<div><div>Most non-invasive gaze estimation methods do not consider the inter-individual differences in anatomical structure, but directly regress the gaze direction from the appearance image information, which limits the accuracy of individual-independent gaze estimation networks. In addition, existing gaze estimation methods tend to consider only how to improve the model’s generalization performance, ignoring the crucial issue of efficiency, which leads to bulky models that are difficult to deploy and have questionable cost-effectiveness in practical use. This paper makes the following contributions: (1) A differential network for gaze estimation using adaptive reference samples is proposed, which can adaptively select reference samples based on scene and individual characteristics. (2) The knowledge distillation is used to transfer the knowledge structure of robust teacher networks into lightweight networks so that our networks can execute quickly and at low computational cost, dramatically increasing the prospect and value of applying gaze estimation. (3) Integrating the above innovations, a novel fast differential neural network (Diff-Net) named FDAR-Net is constructed and achieved excellent results on MPIIGaze, UTMultiview and EyeDiap.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104156"},"PeriodicalIF":4.3000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A fast differential network with adaptive reference sample for gaze estimation\",\"authors\":\"Jiahui Hu , Yonghua Lu , Xiyuan Ye , Qiang Feng , Lihua Zhou\",\"doi\":\"10.1016/j.cviu.2024.104156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Most non-invasive gaze estimation methods do not consider the inter-individual differences in anatomical structure, but directly regress the gaze direction from the appearance image information, which limits the accuracy of individual-independent gaze estimation networks. In addition, existing gaze estimation methods tend to consider only how to improve the model’s generalization performance, ignoring the crucial issue of efficiency, which leads to bulky models that are difficult to deploy and have questionable cost-effectiveness in practical use. This paper makes the following contributions: (1) A differential network for gaze estimation using adaptive reference samples is proposed, which can adaptively select reference samples based on scene and individual characteristics. (2) The knowledge distillation is used to transfer the knowledge structure of robust teacher networks into lightweight networks so that our networks can execute quickly and at low computational cost, dramatically increasing the prospect and value of applying gaze estimation. (3) Integrating the above innovations, a novel fast differential neural network (Diff-Net) named FDAR-Net is constructed and achieved excellent results on MPIIGaze, UTMultiview and EyeDiap.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"249 \",\"pages\":\"Article 104156\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224002376\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002376","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A fast differential network with adaptive reference sample for gaze estimation
Most non-invasive gaze estimation methods do not consider the inter-individual differences in anatomical structure, but directly regress the gaze direction from the appearance image information, which limits the accuracy of individual-independent gaze estimation networks. In addition, existing gaze estimation methods tend to consider only how to improve the model’s generalization performance, ignoring the crucial issue of efficiency, which leads to bulky models that are difficult to deploy and have questionable cost-effectiveness in practical use. This paper makes the following contributions: (1) A differential network for gaze estimation using adaptive reference samples is proposed, which can adaptively select reference samples based on scene and individual characteristics. (2) The knowledge distillation is used to transfer the knowledge structure of robust teacher networks into lightweight networks so that our networks can execute quickly and at low computational cost, dramatically increasing the prospect and value of applying gaze estimation. (3) Integrating the above innovations, a novel fast differential neural network (Diff-Net) named FDAR-Net is constructed and achieved excellent results on MPIIGaze, UTMultiview and EyeDiap.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems