This paper proposes a novel method for robust and automatic realtime head tracking by fusing face and head cues within a multi-state particle filter. Due to large appearance variability of human head, most existing head tracking methods use little object-specific prior knowledge, resulting in limited discriminant power. In contrast, face is a distinct pattern much easier to capture, which motivates us to incorporate a vector-boosted multi-view face detector (C. Huang, et al., 2005) to lend strong aid to general head observation cues including color and contour edge. To simultaneously and collaboratively perform temporal inference of both the face state and the head state, a Markov-network-based particle filter is constructed using sequential belief propagation Monte Carlo (G. Hua, et al., 2004). Our approach is tested on sequences used by previous researchers as well as on new data sets which includes many challenging real-world cases, and shows robustness against various unfavorable conditions
{"title":"Robust Head Tracking Based on a Multi-State Particle Filter","authors":"Yuan Li, H. Ai, Chang Huang, S. Lao","doi":"10.1109/FGR.2006.96","DOIUrl":"https://doi.org/10.1109/FGR.2006.96","url":null,"abstract":"This paper proposes a novel method for robust and automatic realtime head tracking by fusing face and head cues within a multi-state particle filter. Due to large appearance variability of human head, most existing head tracking methods use little object-specific prior knowledge, resulting in limited discriminant power. In contrast, face is a distinct pattern much easier to capture, which motivates us to incorporate a vector-boosted multi-view face detector (C. Huang, et al., 2005) to lend strong aid to general head observation cues including color and contour edge. To simultaneously and collaboratively perform temporal inference of both the face state and the head state, a Markov-network-based particle filter is constructed using sequential belief propagation Monte Carlo (G. Hua, et al., 2004). Our approach is tested on sequences used by previous researchers as well as on new data sets which includes many challenging real-world cases, and shows robustness against various unfavorable conditions","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131278534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human gait recognition is currently one of the most active research topics in computer vision. Existing recognition methods suffer, in our opinion, from two shortcomings: either much expensive computation or poor identification effect; thus a new method is proposed to overcome these shortcomings. Firstly, we detect the binary silhouette of a walking person in each of the monocular image sequences. Then, we extract the pixel values at the same pixel position over one gait cycle to form a dynamic variation signal (DVS). Next, the variance features of all the DVS are computed respectively and a matrix is constructed to describe the dynamic gait signature of individual. Finally, the correlation coefficient measure based on the gait cycles and two different classification methods (NN and KNN) are used to recognize different subjects. Experimental results show that our method is not only computing efficient, but also very effective of correct recognition rates over 90% on both UCSD and CMU databases
{"title":"Automatic Gait Recognition using Dynamic Variance Features","authors":"Yanmei Chai, Jinchang Ren, R. Zhao, Jingping Jia","doi":"10.1109/FGR.2006.24","DOIUrl":"https://doi.org/10.1109/FGR.2006.24","url":null,"abstract":"Human gait recognition is currently one of the most active research topics in computer vision. Existing recognition methods suffer, in our opinion, from two shortcomings: either much expensive computation or poor identification effect; thus a new method is proposed to overcome these shortcomings. Firstly, we detect the binary silhouette of a walking person in each of the monocular image sequences. Then, we extract the pixel values at the same pixel position over one gait cycle to form a dynamic variation signal (DVS). Next, the variance features of all the DVS are computed respectively and a matrix is constructed to describe the dynamic gait signature of individual. Finally, the correlation coefficient measure based on the gait cycles and two different classification methods (NN and KNN) are used to recognize different subjects. Experimental results show that our method is not only computing efficient, but also very effective of correct recognition rates over 90% on both UCSD and CMU databases","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130953110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper summarizes the main concepts of morphable models of 3D faces, and describes two algorithms for 3D surface reconstruction and face recognition. The first algorithm is based on an analysis-by-synthesis technique that estimates shape and pose by fully reproducing the appearance of the face in the image. The second algorithm is based on a set of feature point locations, producing high-resolution shape estimates in computation times of 0.25 seconds. A variety of different application paradigms for model-based face recognition are discussed
{"title":"Face recognition based on a 3D morphable model","authors":"V. Blanz","doi":"10.1109/FGR.2006.42","DOIUrl":"https://doi.org/10.1109/FGR.2006.42","url":null,"abstract":"This paper summarizes the main concepts of morphable models of 3D faces, and describes two algorithms for 3D surface reconstruction and face recognition. The first algorithm is based on an analysis-by-synthesis technique that estimates shape and pose by fully reproducing the appearance of the face in the image. The second algorithm is based on a set of feature point locations, producing high-resolution shape estimates in computation times of 0.25 seconds. A variety of different application paradigms for model-based face recognition are discussed","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121476246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel method for 3D surface reconstruction based on a sparse set of 3D control points. For object classes such as human heads, prior information about the class is used in order to constrain the results. A common strategy to represent object classes for a reconstruction application is to build holistic models, such as PCA models. Using holistic models involves a trade-off between reconstruction of the measured points and plausibility of the result. We introduce a novel object representation that provides local adaptation of the surface, able to fit 3D control points exactly without affecting areas of the surface distant from the control points. The method is based on an interpolation scheme, opposed to approximation schemes generally used for surface reconstruction. Our interpolation method reduces the Euclidean distance between a reconstruction and its ground truth while preserving its smoothness and increasing its perceptual quality
{"title":"Combining PCA and LFA for Surface Reconstruction from a Sparse Set of Control Points","authors":"Reinhard Knothe, S. Romdhani, T. Vetter","doi":"10.1109/FGR.2006.31","DOIUrl":"https://doi.org/10.1109/FGR.2006.31","url":null,"abstract":"This paper presents a novel method for 3D surface reconstruction based on a sparse set of 3D control points. For object classes such as human heads, prior information about the class is used in order to constrain the results. A common strategy to represent object classes for a reconstruction application is to build holistic models, such as PCA models. Using holistic models involves a trade-off between reconstruction of the measured points and plausibility of the result. We introduce a novel object representation that provides local adaptation of the surface, able to fit 3D control points exactly without affecting areas of the surface distant from the control points. The method is based on an interpolation scheme, opposed to approximation schemes generally used for surface reconstruction. Our interpolation method reduces the Euclidean distance between a reconstruction and its ground truth while preserving its smoothness and increasing its perceptual quality","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"746 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122965883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic extraction of facial feature deformations (either due to identity change or expression) is a challenging task and could be the base of a facial expression interpretation system. We use active appearance models and the simultaneous inverse compositional algorithm to extract facial deformations as a starting point and propose a modified version addressing the problem of facial appearance variation in an efficient manner. To consider important variation of facial appearance is a first step toward a realistic facial feature deformation extraction system able to adapt to a new face or to track a face with changing video conditions. Moreover, in order to test fittings, we design an experiment protocol that takes human inaccuracies into account when building a ground truth
{"title":"Toward an efficient and accurate AAM fitting on appearance varying faces","authors":"Hugo Mercier, Julien Peyras, P. Dalle","doi":"10.1109/FGR.2006.104","DOIUrl":"https://doi.org/10.1109/FGR.2006.104","url":null,"abstract":"Automatic extraction of facial feature deformations (either due to identity change or expression) is a challenging task and could be the base of a facial expression interpretation system. We use active appearance models and the simultaneous inverse compositional algorithm to extract facial deformations as a starting point and propose a modified version addressing the problem of facial appearance variation in an efficient manner. To consider important variation of facial appearance is a first step toward a realistic facial feature deformation extraction system able to adapt to a new face or to track a face with changing video conditions. Moreover, in order to test fittings, we design an experiment protocol that takes human inaccuracies into account when building a ground truth","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123752456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel method to relight video sequences given known surface shape and illumination. The method preserves fine visual details. It requires single view video frames, approximate 3D shape and standard studio illumination only, making it applicable in studio production. The technique is demonstrated for relighting video sequences of faces
{"title":"Relighting of Facial Images","authors":"Péter Csákány, A. Hilton","doi":"10.1109/FGR.2006.93","DOIUrl":"https://doi.org/10.1109/FGR.2006.93","url":null,"abstract":"We present a novel method to relight video sequences given known surface shape and illumination. The method preserves fine visual details. It requires single view video frames, approximate 3D shape and standard studio illumination only, making it applicable in studio production. The technique is demonstrated for relighting video sequences of faces","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124697892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new method for synthesizing an illumination normalized image from a face image including diffuse reflection, specular reflection, attached shadow and cast shadow. The method is derived from the self-quotient image (SQI) which is defined by the ratio of albedo at the pixel value to a locally smoothed pixel value. However, the SQI is not synthesized from an image containing shadows or specular reflections. Since these regions correspond to areas of high or low albedo, they cannot be discriminated from diffuse reflection by using only a single image. To classify the appearances, we utilize a simple model defined by a number of basis images which represent diffuse reflection on a generic face. Through experimental results we show the effectiveness of this method for face identification on the Yale Face Database B and on a real-world database, using only a single image for each individual in training
{"title":"Face recognition using the classified appearance-based quotient image","authors":"Masashi Nishiyama, Osamu Yamaguchi","doi":"10.1109/FGR.2006.46","DOIUrl":"https://doi.org/10.1109/FGR.2006.46","url":null,"abstract":"We propose a new method for synthesizing an illumination normalized image from a face image including diffuse reflection, specular reflection, attached shadow and cast shadow. The method is derived from the self-quotient image (SQI) which is defined by the ratio of albedo at the pixel value to a locally smoothed pixel value. However, the SQI is not synthesized from an image containing shadows or specular reflections. Since these regions correspond to areas of high or low albedo, they cannot be discriminated from diffuse reflection by using only a single image. To classify the appearances, we utilize a simple model defined by a number of basis images which represent diffuse reflection on a generic face. Through experimental results we show the effectiveness of this method for face identification on the Yale Face Database B and on a real-world database, using only a single image for each individual in training","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127366061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ZhenQiu Zhang, Yuxiao Hu, Tianli Yu, Thomas S. Huang
A minimum variance estimation framework for 3D face reconstruction from multiple views and a new 3D surface reconstruction algorithm based on deformable subdivision mesh is proposed in this paper. First, an efficient 2D-to-3D integrated face reconstruction approach is introduced to reconstruct a personalized 3D face model from a single frontal face image with minimum variance estimation. Then, a new deformable mesh based surface reconstruction algorithm is applied to the images from different views to get more observation of the 3D face, especially the depth information, which could not be obtained from a single image directly. Based on the result of the 3D surface reconstruction, we use the minimum variance estimation again to refine the estimation of the 3D face. We combine the texture from different views, and the result looks photorealistic
{"title":"Minimum variance estimation of 3D face shape from multi-view","authors":"ZhenQiu Zhang, Yuxiao Hu, Tianli Yu, Thomas S. Huang","doi":"10.1109/FGR.2006.77","DOIUrl":"https://doi.org/10.1109/FGR.2006.77","url":null,"abstract":"A minimum variance estimation framework for 3D face reconstruction from multiple views and a new 3D surface reconstruction algorithm based on deformable subdivision mesh is proposed in this paper. First, an efficient 2D-to-3D integrated face reconstruction approach is introduced to reconstruct a personalized 3D face model from a single frontal face image with minimum variance estimation. Then, a new deformable mesh based surface reconstruction algorithm is applied to the images from different views to get more observation of the 3D face, especially the depth information, which could not be obtained from a single image directly. Based on the result of the 3D surface reconstruction, we use the minimum variance estimation again to refine the estimation of the 3D face. We combine the texture from different views, and the result looks photorealistic","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127384248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a new representation of hand motions for tracking and recognizing hand-finger gestures in an image sequence. A human hand has 15 joints and its high dimensionality makes it difficult to model hand motions. To make things easier, it is important to represent a hand motion in a low dimensional space. Principle component analysis (PCA) has been proposed to reduce the dimensionality. However, the PCA basis vectors only represent global features, which are not optimal to represent intrinsic features. This paper proposes an efficient representation of hand motions by independent component analysis (ICA). The ICA basis vectors represent local features, each of which corresponds to the motion of a particular finger. This representation is more efficient in modeling hand motions for tracking and recognizing hand-finger gestures in an image sequence. This paper demonstrates the effectiveness of our method by tracking hands in real image sequences
{"title":"Articulated hand tracking by PCA-ICA approach","authors":"Makoto Kato, Yenwei Chen, Gang Xu","doi":"10.1109/FGR.2006.21","DOIUrl":"https://doi.org/10.1109/FGR.2006.21","url":null,"abstract":"This paper introduces a new representation of hand motions for tracking and recognizing hand-finger gestures in an image sequence. A human hand has 15 joints and its high dimensionality makes it difficult to model hand motions. To make things easier, it is important to represent a hand motion in a low dimensional space. Principle component analysis (PCA) has been proposed to reduce the dimensionality. However, the PCA basis vectors only represent global features, which are not optimal to represent intrinsic features. This paper proposes an efficient representation of hand motions by independent component analysis (ICA). The ICA basis vectors represent local features, each of which corresponds to the motion of a particular finger. This representation is more efficient in modeling hand motions for tracking and recognizing hand-finger gestures in an image sequence. This paper demonstrates the effectiveness of our method by tracking hands in real image sequences","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125385739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Myung-Cheol Roh, W. Christmas, J. Kittler, Seong-Whan Lee
Player's gesture and action spotting in sports video is a key task in automatic analysis of the video material at a high level. In many sports views, the camera covers a large part of the sports arena, so that the area of player's region is small, and has large motion. These make the determination of the player's gestures and actions a challenging task. To overcome these problems, we propose a method based on curvature scale space templates of the player's silhouette. The use of curvature scale space makes the method robust to noise and our method is robust to significant shape corruption of a part of player's silhouette. We also propose a new recognition method which is robust to noisy sequence of posture and needs only a small amount of training data, which is essential characteristic for many practical applications
{"title":"Gesture Spotting in Low-Quality Video with Features Based on Curvature Scale Space","authors":"Myung-Cheol Roh, W. Christmas, J. Kittler, Seong-Whan Lee","doi":"10.1109/FGR.2006.59","DOIUrl":"https://doi.org/10.1109/FGR.2006.59","url":null,"abstract":"Player's gesture and action spotting in sports video is a key task in automatic analysis of the video material at a high level. In many sports views, the camera covers a large part of the sports arena, so that the area of player's region is small, and has large motion. These make the determination of the player's gestures and actions a challenging task. To overcome these problems, we propose a method based on curvature scale space templates of the player's silhouette. The use of curvature scale space makes the method robust to noise and our method is robust to significant shape corruption of a part of player's silhouette. We also propose a new recognition method which is robust to noisy sequence of posture and needs only a small amount of training data, which is essential characteristic for many practical applications","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127039867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}