Detection and tracking of facial features in real time using a synergistic approach of spatio-temporal models and generalized Hough-transform techniques
{"title":"Detection and tracking of facial features in real time using a synergistic approach of spatio-temporal models and generalized Hough-transform techniques","authors":"A. Schubert","doi":"10.1109/AFGR.2000.840621","DOIUrl":null,"url":null,"abstract":"The proposed algorithm requires the description of the facial features as 3D-polygons (optionally extended by additional intensity information) which are assembled in a 3D-model of the head provided for in separate data files. Detection is achieved by using a special implementation of the generalized Hough transform (GHT) for which the forms are generated by projecting the 3D-model into the image plane. In the initialization phase a comparatively wide range of relative positions and attitudes between head and camera has to be tested for. Aiming for illumination-independence, only information about the sign of the difference between the expected intensities on both sides of the edge of the polygons may be additionally used in the GHT. Once a feature is found, further search for the remaining features can be restricted by the use of the 3D-model. The detection of a minimum number of features starts the tracking phase which is performed by using an extended Kalman filter (EKF) and assuming a first- or second-order dynamical model for the state variables describing the position and the attitude of the head. Synergistic advantages between GHT and EKF can be realized since the EKF and the projection into the image plane yield a rather good prediction of the forms to be detected by the GHT. This reduces considerably the search space in the image and in the parameter space. On the other hand the GHT offers a solution to the matching problem between image and object features. During the tracking phase the GHT can be further enhanced by monitoring the actual intensities along the edges of the polygons, their assignment to the corresponding 3D-object features, and their use for feature selection during the accumulation process. The algorithm runs on a dual Pentium II 333 MHz with a cycle time of 40 ms in real time.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AFGR.2000.840621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
The proposed algorithm requires the description of the facial features as 3D-polygons (optionally extended by additional intensity information) which are assembled in a 3D-model of the head provided for in separate data files. Detection is achieved by using a special implementation of the generalized Hough transform (GHT) for which the forms are generated by projecting the 3D-model into the image plane. In the initialization phase a comparatively wide range of relative positions and attitudes between head and camera has to be tested for. Aiming for illumination-independence, only information about the sign of the difference between the expected intensities on both sides of the edge of the polygons may be additionally used in the GHT. Once a feature is found, further search for the remaining features can be restricted by the use of the 3D-model. The detection of a minimum number of features starts the tracking phase which is performed by using an extended Kalman filter (EKF) and assuming a first- or second-order dynamical model for the state variables describing the position and the attitude of the head. Synergistic advantages between GHT and EKF can be realized since the EKF and the projection into the image plane yield a rather good prediction of the forms to be detected by the GHT. This reduces considerably the search space in the image and in the parameter space. On the other hand the GHT offers a solution to the matching problem between image and object features. During the tracking phase the GHT can be further enhanced by monitoring the actual intensities along the edges of the polygons, their assignment to the corresponding 3D-object features, and their use for feature selection during the accumulation process. The algorithm runs on a dual Pentium II 333 MHz with a cycle time of 40 ms in real time.
所提出的算法要求将面部特征描述为3d多边形(可选地通过额外的强度信息扩展),这些多边形组装在头部的3d模型中,并在单独的数据文件中提供。检测是通过使用广义霍夫变换(GHT)的特殊实现来实现的,该变换的形式是通过将3d模型投影到图像平面上来生成的。在初始化阶段,必须测试头部和相机之间相对位置和姿态的相对范围。为了与光照无关,只有关于多边形边缘两侧预期强度之间的差异符号的信息可以在GHT中额外使用。一旦发现了一个特征,进一步搜索剩余的特征可能会受到3d模型使用的限制。通过使用扩展卡尔曼滤波(EKF)并假设描述头部位置和姿态的状态变量的一阶或二阶动态模型,检测到最小数量的特征开始跟踪阶段。由于EKF和投影到图像平面上可以很好地预测GHT要检测的形状,因此可以实现GHT和EKF之间的协同优势。这大大减少了图像和参数空间中的搜索空间。另一方面,GHT为图像和物体特征之间的匹配问题提供了一个解决方案。在跟踪阶段,可以通过监测沿多边形边缘的实际强度、它们对相应3d物体特征的分配以及它们在积累过程中用于特征选择来进一步增强GHT。该算法在双Pentium II 333 MHz处理器上实时运行,周期时间为40 ms。