Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840656
R. Okada, Y. Shirai, J. Miura
This paper describes a method of tracking a person with 3D translation and rotation by integrating optical flow and depth. The target region is first extracted based on the probability of each pixel belonging to the target person. The target state (3D position, posture, motion) is estimated based on the shape and the position of the target region in addition to optical flow and depth. Multiple target states are maintained when the image measurements give rise to ambiguities about the target state. Experimental results with real image sequences show the effectiveness of our method.
{"title":"Tracking a person with 3-D motion by integrating optical flow and depth","authors":"R. Okada, Y. Shirai, J. Miura","doi":"10.1109/AFGR.2000.840656","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840656","url":null,"abstract":"This paper describes a method of tracking a person with 3D translation and rotation by integrating optical flow and depth. The target region is first extracted based on the probability of each pixel belonging to the target person. The target state (3D position, posture, motion) is estimated based on the shape and the position of the target region in addition to optical flow and depth. Multiple target states are maintained when the image measurements give rise to ambiguities about the target state. Experimental results with real image sequences show the effectiveness of our method.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129877919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840689
A. O’Toole, Y. Cheng, B. Ross, Heather A. Wild, P. Phillips
We evaluated the adequacy of computational algorithms as models of human face processing by looking at how the algorithms and humans process individual faces. By comparing model- and human-generated measures of the similarity between pairs of faces, we were able to assess the accord between several automatic face recognition algorithms and human perceivers. Multidimensional scaling (MDS) was used to create a spatial representation of the subject response patterns. Next, the model response patterns were projected into this space. The results revealed a common bimodal structure for both the subjects and for most of the models. The bimodal subject structure reflected strategy differences in making similarity decisions. For the models, the bimodal structure was related to combined aspects of the representations and the distance metrics used in the implementations.
{"title":"Face recognition algorithms as models of human face processing","authors":"A. O’Toole, Y. Cheng, B. Ross, Heather A. Wild, P. Phillips","doi":"10.1109/AFGR.2000.840689","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840689","url":null,"abstract":"We evaluated the adequacy of computational algorithms as models of human face processing by looking at how the algorithms and humans process individual faces. By comparing model- and human-generated measures of the similarity between pairs of faces, we were able to assess the accord between several automatic face recognition algorithms and human perceivers. Multidimensional scaling (MDS) was used to create a spatial representation of the subject response patterns. Next, the model response patterns were projected into this space. The results revealed a common bimodal structure for both the subjects and for most of the models. The bimodal subject structure reflected strategy differences in making similarity decisions. For the models, the bimodal structure was related to combined aspects of the representations and the distance metrics used in the implementations.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116992665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840655
L. D. Silva, Pei Chi Ng
This paper describes the use of statistical techniques and hidden Markov models (HMM) in the recognition of emotions. The method aims to classify 6 basic emotions (anger, dislike, fear, happiness, sadness and surprise) from both facial expressions (video) and emotional speech (audio). The emotions of 2 human subjects were recorded and analyzed. The findings show that the audio and video information can be combined using a rule-based system to improve the recognition rate.
{"title":"Bimodal emotion recognition","authors":"L. D. Silva, Pei Chi Ng","doi":"10.1109/AFGR.2000.840655","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840655","url":null,"abstract":"This paper describes the use of statistical techniques and hidden Markov models (HMM) in the recognition of emotions. The method aims to classify 6 basic emotions (anger, dislike, fear, happiness, sadness and surprise) from both facial expressions (video) and emotional speech (audio). The emotions of 2 human subjects were recorded and analyzed. The findings show that the audio and video information can be combined using a rule-based system to improve the recognition rate.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121374012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840665
Stefan Müller, S. Eickeler, G. Rigoll
A recognition technique based on novel pseudo 3D hidden Markov models, which can integrate spatial as well as temporal derived features is presented. The approach allows the recognition of dynamic gestures such as waving hands as well as static gestures such as standing in a special pose. Pseudo 3D hidden Markov models (P3DHMM) are an extension of the pseudo 2D case, which has been successfully used for the classification of images and the recognition of faces. In the P3DHMM case the so-called superstates contain P2DHMM and thus whole image sequences can be generated by these models. Our approach has been evaluated on a crane signal database, which consists of 12 different predefined gestures for maneuvering cranes.
{"title":"Crane gesture recognition using pseudo 3-D hidden Markov models","authors":"Stefan Müller, S. Eickeler, G. Rigoll","doi":"10.1109/AFGR.2000.840665","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840665","url":null,"abstract":"A recognition technique based on novel pseudo 3D hidden Markov models, which can integrate spatial as well as temporal derived features is presented. The approach allows the recognition of dynamic gestures such as waving hands as well as static gestures such as standing in a special pose. Pseudo 3D hidden Markov models (P3DHMM) are an extension of the pseudo 2D case, which has been successfully used for the classification of images and the recognition of faces. In the P3DHMM case the so-called superstates contain P2DHMM and thus whole image sequences can be generated by these models. Our approach has been evaluated on a crane signal database, which consists of 12 different predefined gestures for maneuvering cranes.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123353035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840613
M. Störring, H. J. Andersen, E. Granum
Colour is an important and useful feature for object tracking and recognition in computer vision. However, it has the difficulty that the colour of the object changes if the illuminant colour changes. But under known illuminant colour it becomes a robust feature. There are more and more computer vision applications tracking humans, for example in interfaces for human computer interaction or automatic camera men, where skin colour is an often-used feature. Hence, it would be of significant importance to know the illuminant colour in such applications. This paper proposes a novel method to estimate the current illuminant colour from skin colour observations. The method is based on a physical model of reflections, the assumption that illuminant colours are located close to the Planckian locus, and the knowledge about the camera parameters. The method is empirically tested using real images. The average estimation error of the correlated colour temperature is as small as 180 K. Applications are for example in colour-based tracking to adapt to changes in lighting and in visualisation to re-render image colours to their appearance under canonical viewing conditions.
{"title":"Estimation of the illuminant colour from human skin colour","authors":"M. Störring, H. J. Andersen, E. Granum","doi":"10.1109/AFGR.2000.840613","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840613","url":null,"abstract":"Colour is an important and useful feature for object tracking and recognition in computer vision. However, it has the difficulty that the colour of the object changes if the illuminant colour changes. But under known illuminant colour it becomes a robust feature. There are more and more computer vision applications tracking humans, for example in interfaces for human computer interaction or automatic camera men, where skin colour is an often-used feature. Hence, it would be of significant importance to know the illuminant colour in such applications. This paper proposes a novel method to estimate the current illuminant colour from skin colour observations. The method is based on a physical model of reflections, the assumption that illuminant colours are located close to the Planckian locus, and the knowledge about the camera parameters. The method is empirically tested using real images. The average estimation error of the correlated colour temperature is as small as 180 K. Applications are for example in colour-based tracking to adapt to changes in lighting and in visualisation to re-render image colours to their appearance under canonical viewing conditions.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124125704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840650
Yongmin Li, S. Gong, H. Liddell
A support vector machine-based multi-view face detection and recognition framework is described. Face detection is carried out by constructing several detectors, each of them in charge of one specific view. The symmetrical property of face images is employed to simplify the complexity of the modelling. The estimation of head pose, which is achieved by using the support vector regression technique, provides crucial information for choosing the appropriate face detector. This helps to improve the accuracy and reduce the computation in multi-view face detection compared to other methods. For video sequences, further computational reduction can be achieved by using a pose change smoothing strategy. When face detectors find a face in frontal view, a support vector machine-based multi-class classifier is activated for face recognition. All the above issues are integrated under a support vector machine framework. Test results on four video sequences are presented, among them the detection rate is above 95%, recognition accuracy is above 90%, average pose estimation error is around 10/spl deg/, and the full detection and recognition speed is up to 4 frames/second on a Pentium II 300 PC.
提出了一种基于支持向量机的多视图人脸检测与识别框架。人脸检测是通过构建多个检测器来实现的,每个检测器负责一个特定的视图。利用人脸图像的对称性,简化了建模的复杂性。利用支持向量回归技术实现头部姿态的估计,为选择合适的人脸检测器提供了重要信息。与其他方法相比,这有助于提高多视图人脸检测的精度和减少计算量。对于视频序列,可以通过使用姿态变化平滑策略进一步减少计算量。当人脸检测器在正面视图中发现人脸时,激活基于支持向量机的多类分类器进行人脸识别。将上述问题集成在支持向量机框架下。给出了在4个视频序列上的测试结果,其中检测率在95%以上,识别精度在90%以上,平均姿态估计误差在10/spl°/左右,在Pentium II 300 PC上的全部检测和识别速度可达4帧/秒。
{"title":"Support vector regression and classification based multi-view face detection and recognition","authors":"Yongmin Li, S. Gong, H. Liddell","doi":"10.1109/AFGR.2000.840650","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840650","url":null,"abstract":"A support vector machine-based multi-view face detection and recognition framework is described. Face detection is carried out by constructing several detectors, each of them in charge of one specific view. The symmetrical property of face images is employed to simplify the complexity of the modelling. The estimation of head pose, which is achieved by using the support vector regression technique, provides crucial information for choosing the appropriate face detector. This helps to improve the accuracy and reduce the computation in multi-view face detection compared to other methods. For video sequences, further computational reduction can be achieved by using a pose change smoothing strategy. When face detectors find a face in frontal view, a support vector machine-based multi-class classifier is activated for face recognition. All the above issues are integrated under a support vector machine framework. Test results on four video sequences are presented, among them the detection rate is above 95%, recognition accuracy is above 90%, average pose estimation error is around 10/spl deg/, and the full detection and recognition speed is up to 4 frames/second on a Pentium II 300 PC.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131521216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840629
S. Satoh
The paper presents a comparative evaluation of matching methods of face sequences obtained from actual videos. Face information is quite important in videos, especially in news programs, dramas, and movies. Accurate face sequence matching enables many multimedia applications including content-based face retrieval, automated face annotation, video authoring, etc. However, face sequences in videos are subject to variation in lighting condition, pose, facial expression, etc., which cause difficulty in face matching. In order to cope with this problem, several face sequence matching methods are proposed by extending face still image matching, traditional pattern recognition, and recent pattern recognition techniques. They are expected to be applicable to face sequences extracted from actual videos. The performance of these methods is evaluated as the accuracy of face sequence annotation using the methods. The accuracy is evaluated using a considerable amount of actual drama videos. The evaluation results reveal merits and demerits of these methods, and indicate future research directions of face matching for videos.
{"title":"Comparative evaluation of face sequence matching for content-based video access","authors":"S. Satoh","doi":"10.1109/AFGR.2000.840629","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840629","url":null,"abstract":"The paper presents a comparative evaluation of matching methods of face sequences obtained from actual videos. Face information is quite important in videos, especially in news programs, dramas, and movies. Accurate face sequence matching enables many multimedia applications including content-based face retrieval, automated face annotation, video authoring, etc. However, face sequences in videos are subject to variation in lighting condition, pose, facial expression, etc., which cause difficulty in face matching. In order to cope with this problem, several face sequence matching methods are proposed by extending face still image matching, traditional pattern recognition, and recent pattern recognition techniques. They are expected to be applicable to face sequences extracted from actual videos. The performance of these methods is evaluated as the accuracy of face sequence annotation using the methods. The accuracy is evaluated using a considerable amount of actual drama videos. The evaluation results reveal merits and demerits of these methods, and indicate future research directions of face matching for videos.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116972306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840618
Vinay P. Kumar, T. Poggio
This paper describes a trainable system capable of tracking faces and facial features like eyes and nostrils and estimating basic mouth features such as degrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (hand-crafted) models of facial features based on an optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets.
{"title":"Learning-based approach to real time tracking and analysis of faces","authors":"Vinay P. Kumar, T. Poggio","doi":"10.1109/AFGR.2000.840618","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840618","url":null,"abstract":"This paper describes a trainable system capable of tracking faces and facial features like eyes and nostrils and estimating basic mouth features such as degrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (hand-crafted) models of facial features based on an optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124333615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840605
C. Morimoto, M. Flickner
This paper presents a multiple face detector based on a robust pupil detection technique. The pupil detector uses active illumination that exploits the retro-reflectivity property of eyes to facilitate detection. The detection range of this method is appropriate for interactive desktop and kiosk applications. Once the location of the pupil candidates are computed, the candidates are filtered and grouped into pairs that correspond to faces using heuristic rules. To demonstrate the robustness of the face detection technique, a dual-mode face tracker was developed, which is initialized with the most salient detected face. Recursive estimators are used to guarantee the stability of the process and combine the measurements from the multi-face detector and a feature correlation tracker. The estimated position of the face is used to control a pan-tilt servo mechanism in real-time, that moves the camera to keep the tracked face always centered in the image.
{"title":"Real-time multiple face detection using active illumination","authors":"C. Morimoto, M. Flickner","doi":"10.1109/AFGR.2000.840605","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840605","url":null,"abstract":"This paper presents a multiple face detector based on a robust pupil detection technique. The pupil detector uses active illumination that exploits the retro-reflectivity property of eyes to facilitate detection. The detection range of this method is appropriate for interactive desktop and kiosk applications. Once the location of the pupil candidates are computed, the candidates are filtered and grouped into pairs that correspond to faces using heuristic rules. To demonstrate the robustness of the face detection technique, a dual-mode face tracker was developed, which is initialized with the most salient detected face. Recursive estimators are used to guarantee the stability of the process and combine the measurements from the multi-face detector and a feature correlation tracker. The estimated position of the face is used to control a pan-tilt servo mechanism in real-time, that moves the camera to keep the tracked face always centered in the image.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122065988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840621
A. Schubert
The proposed algorithm requires the description of the facial features as 3D-polygons (optionally extended by additional intensity information) which are assembled in a 3D-model of the head provided for in separate data files. Detection is achieved by using a special implementation of the generalized Hough transform (GHT) for which the forms are generated by projecting the 3D-model into the image plane. In the initialization phase a comparatively wide range of relative positions and attitudes between head and camera has to be tested for. Aiming for illumination-independence, only information about the sign of the difference between the expected intensities on both sides of the edge of the polygons may be additionally used in the GHT. Once a feature is found, further search for the remaining features can be restricted by the use of the 3D-model. The detection of a minimum number of features starts the tracking phase which is performed by using an extended Kalman filter (EKF) and assuming a first- or second-order dynamical model for the state variables describing the position and the attitude of the head. Synergistic advantages between GHT and EKF can be realized since the EKF and the projection into the image plane yield a rather good prediction of the forms to be detected by the GHT. This reduces considerably the search space in the image and in the parameter space. On the other hand the GHT offers a solution to the matching problem between image and object features. During the tracking phase the GHT can be further enhanced by monitoring the actual intensities along the edges of the polygons, their assignment to the corresponding 3D-object features, and their use for feature selection during the accumulation process. The algorithm runs on a dual Pentium II 333 MHz with a cycle time of 40 ms in real time.
所提出的算法要求将面部特征描述为3d多边形(可选地通过额外的强度信息扩展),这些多边形组装在头部的3d模型中,并在单独的数据文件中提供。检测是通过使用广义霍夫变换(GHT)的特殊实现来实现的,该变换的形式是通过将3d模型投影到图像平面上来生成的。在初始化阶段,必须测试头部和相机之间相对位置和姿态的相对范围。为了与光照无关,只有关于多边形边缘两侧预期强度之间的差异符号的信息可以在GHT中额外使用。一旦发现了一个特征,进一步搜索剩余的特征可能会受到3d模型使用的限制。通过使用扩展卡尔曼滤波(EKF)并假设描述头部位置和姿态的状态变量的一阶或二阶动态模型,检测到最小数量的特征开始跟踪阶段。由于EKF和投影到图像平面上可以很好地预测GHT要检测的形状,因此可以实现GHT和EKF之间的协同优势。这大大减少了图像和参数空间中的搜索空间。另一方面,GHT为图像和物体特征之间的匹配问题提供了一个解决方案。在跟踪阶段,可以通过监测沿多边形边缘的实际强度、它们对相应3d物体特征的分配以及它们在积累过程中用于特征选择来进一步增强GHT。该算法在双Pentium II 333 MHz处理器上实时运行,周期时间为40 ms。
{"title":"Detection and tracking of facial features in real time using a synergistic approach of spatio-temporal models and generalized Hough-transform techniques","authors":"A. Schubert","doi":"10.1109/AFGR.2000.840621","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840621","url":null,"abstract":"The proposed algorithm requires the description of the facial features as 3D-polygons (optionally extended by additional intensity information) which are assembled in a 3D-model of the head provided for in separate data files. Detection is achieved by using a special implementation of the generalized Hough transform (GHT) for which the forms are generated by projecting the 3D-model into the image plane. In the initialization phase a comparatively wide range of relative positions and attitudes between head and camera has to be tested for. Aiming for illumination-independence, only information about the sign of the difference between the expected intensities on both sides of the edge of the polygons may be additionally used in the GHT. Once a feature is found, further search for the remaining features can be restricted by the use of the 3D-model. The detection of a minimum number of features starts the tracking phase which is performed by using an extended Kalman filter (EKF) and assuming a first- or second-order dynamical model for the state variables describing the position and the attitude of the head. Synergistic advantages between GHT and EKF can be realized since the EKF and the projection into the image plane yield a rather good prediction of the forms to be detected by the GHT. This reduces considerably the search space in the image and in the parameter space. On the other hand the GHT offers a solution to the matching problem between image and object features. During the tracking phase the GHT can be further enhanced by monitoring the actual intensities along the edges of the polygons, their assignment to the corresponding 3D-object features, and their use for feature selection during the accumulation process. The algorithm runs on a dual Pentium II 333 MHz with a cycle time of 40 ms in real time.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129818713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}