We present a robust frontal face detection method that enables the identification of face positions in images by combining the results of a low-resolution whole face and individual face parts classifiers. Our approach is to use face parts information and change the identification strategy based on the results from individual face parts classifiers. These classifiers were implemented based on AdaBoost. Moreover, we propose a novel method based on a decision tree to improve performance of face detectors for occluded faces. The proposed decision tree method distinguishes partially occluded faces based on the results from the individual classifies. Preliminarily experiments on a test sample set containing non-occluded faces and occluded faces indicated that our method achieved better results than conventional methods. Actual experimental results containing general images also showed better results
{"title":"Component-based robust face detection using AdaBoost and decision tree","authors":"K. Ichikawa, T. Mita, O. Hori","doi":"10.1109/FGR.2006.33","DOIUrl":"https://doi.org/10.1109/FGR.2006.33","url":null,"abstract":"We present a robust frontal face detection method that enables the identification of face positions in images by combining the results of a low-resolution whole face and individual face parts classifiers. Our approach is to use face parts information and change the identification strategy based on the results from individual face parts classifiers. These classifiers were implemented based on AdaBoost. Moreover, we propose a novel method based on a decision tree to improve performance of face detectors for occluded faces. The proposed decision tree method distinguishes partially occluded faces based on the results from the individual classifies. Preliminarily experiments on a test sample set containing non-occluded faces and occluded faces indicated that our method achieved better results than conventional methods. Actual experimental results containing general images also showed better results","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128168321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose a new general framework for real-time multi-view face recognition in real-world conditions, based on a novel nonlinear dimensionality reduction method IsoScale and generalized linear models (GLMs). Multi-view face sequences of freely moving people are obtained from several stereo cameras installed in an ordinary room, and IsoScale is used to map the faces into a low-dimensional space where the manifold structure of the view-varied faces is preserved, but the face classes are forced to be linearly separable. Then, a GLM-based linear map is learnt between the low-dimensional face representation and the classes, providing posterior probabilities of class membership for the test faces. The benefits of the proposed method are illustrated in a typical HCl application
{"title":"Multi-view face recognition by nonlinear dimensionality reduction and generalized linear models","authors":"B. Raytchev, Ikushi Yoda, K. Sakaue","doi":"10.1109/FGR.2006.82","DOIUrl":"https://doi.org/10.1109/FGR.2006.82","url":null,"abstract":"In this paper we propose a new general framework for real-time multi-view face recognition in real-world conditions, based on a novel nonlinear dimensionality reduction method IsoScale and generalized linear models (GLMs). Multi-view face sequences of freely moving people are obtained from several stereo cameras installed in an ordinary room, and IsoScale is used to map the faces into a low-dimensional space where the manifold structure of the view-varied faces is preserved, but the face classes are forced to be linearly separable. Then, a GLM-based linear map is learnt between the low-dimensional face representation and the classes, providing posterior probabilities of class membership for the test faces. The benefits of the proposed method are illustrated in a typical HCl application","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114401125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masanobu Yamamoto, Humikazu Mitomi, F. Fujiwara, Taisuke Sato
This paper proposes a new approach for recognition of task-oriented actions based on stochastic context-free grammar (SCFG). Our attention puts on actions in the Japanese tea ceremony, where the action can be described by context-free grammar. Our aim is to recognize the action in the tea services. Existing SCFG approach consists of generating symbolic string, parsing it and recognition. The symbolic string often includes uncertainty. Therefore, the parsing process needs to recover the errors at the entry process. This paper proposes a segmentation method errorless as much as possible to segment an action into a string of finer actions. This method, based on an acceleration of the body motion, can produce the fine action corresponding to a terminal symbol with little error. After translating the sequence of fine actions into a set of symbolic strings, SCFG-based parsing of this set leaves small number of ones to be derived. Among the remaining strings, Bayesian classifier answers the action name with a maximum posterior probability. Giving one SCFG rule the multiple probabilities, one SCFG can recognize multiple actions
{"title":"Bayesian classification of task-oriented actions based on stochastic context-free grammar","authors":"Masanobu Yamamoto, Humikazu Mitomi, F. Fujiwara, Taisuke Sato","doi":"10.1109/FGR.2006.28","DOIUrl":"https://doi.org/10.1109/FGR.2006.28","url":null,"abstract":"This paper proposes a new approach for recognition of task-oriented actions based on stochastic context-free grammar (SCFG). Our attention puts on actions in the Japanese tea ceremony, where the action can be described by context-free grammar. Our aim is to recognize the action in the tea services. Existing SCFG approach consists of generating symbolic string, parsing it and recognition. The symbolic string often includes uncertainty. Therefore, the parsing process needs to recover the errors at the entry process. This paper proposes a segmentation method errorless as much as possible to segment an action into a string of finer actions. This method, based on an acceleration of the body motion, can produce the fine action corresponding to a terminal symbol with little error. After translating the sequence of fine actions into a set of symbolic strings, SCFG-based parsing of this set leaves small number of ones to be derived. Among the remaining strings, Bayesian classifier answers the action name with a maximum posterior probability. Giving one SCFG rule the multiple probabilities, one SCFG can recognize multiple actions","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129348123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper explores the extraction of features from color imagery for recognition tasks, especially face recognition. The well-known Gabor filter, which is typically defined as a complex function, has been extended to the hypercomplex (quaternion) domain. Several proposed modes of this extension are discussed, and a preferred formulation is selected. To quantify the effectiveness of this novel filter for color-based feature extraction, an elastic graph implementation for human face recognition has been extended to color images, and performance of the corresponding monochromatic and color recognition systems have been compared. Our experiments have shown an improvement of 3% to 17% in recognition accuracy over the analysis of monochromatic images using complex Gabor filters
{"title":"Color face recognition by hypercomplex Gabor analysis","authors":"Creed F. Jones, A. L. Abbott","doi":"10.1109/FGR.2006.30","DOIUrl":"https://doi.org/10.1109/FGR.2006.30","url":null,"abstract":"This paper explores the extraction of features from color imagery for recognition tasks, especially face recognition. The well-known Gabor filter, which is typically defined as a complex function, has been extended to the hypercomplex (quaternion) domain. Several proposed modes of this extension are discussed, and a preferred formulation is selected. To quantify the effectiveness of this novel filter for color-based feature extraction, an elastic graph implementation for human face recognition has been extended to color images, and performance of the corresponding monochromatic and color recognition systems have been compared. Our experiments have shown an improvement of 3% to 17% in recognition accuracy over the analysis of monochromatic images using complex Gabor filters","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132971287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An unsupervised nonparametric approach is proposed to automatically extract representative face samples (exemplars) from a video sequence or an image set for multiple-shot face recognition. Motivated by a nonlinear dimensionality reduction algorithm called Isomap, we use local neighborhood information to approximate the geodesic distances between face images. A hierarchical agglomerative clustering (HAC) algorithm is then applied to group similar faces together based on the estimated geodesic distances which approximate their locations on the appearance manifold. We define the exemplars as cluster centers for template matching at the subsequent testing stage. The final recognition is the outcome of a majority voting scheme which combines the decisions from all the individual frames in the test set. Experimental results on a 40-subject video database demonstrate the effectiveness and flexibility of our proposed method
{"title":"Face recognition with image sets using hierarchically extracted exemplars from appearance manifolds","authors":"Wei-liang Fan, D. Yeung","doi":"10.1109/FGR.2006.47","DOIUrl":"https://doi.org/10.1109/FGR.2006.47","url":null,"abstract":"An unsupervised nonparametric approach is proposed to automatically extract representative face samples (exemplars) from a video sequence or an image set for multiple-shot face recognition. Motivated by a nonlinear dimensionality reduction algorithm called Isomap, we use local neighborhood information to approximate the geodesic distances between face images. A hierarchical agglomerative clustering (HAC) algorithm is then applied to group similar faces together based on the estimated geodesic distances which approximate their locations on the appearance manifold. We define the exemplars as cluster centers for template matching at the subsequent testing stage. The final recognition is the outcome of a majority voting scheme which combines the decisions from all the individual frames in the test set. Experimental results on a 40-subject video database demonstrate the effectiveness and flexibility of our proposed method","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133884706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandhitsu R. Das, Robert C. Wilson, M. Lazarewicz, L. Finkel
We describe a methodology for classification of gait (walk, run, jog, etc.) and recognition of individuals based on gait using two successive stages of principal component analysis (PCA) on kinematic data. In psychophysical studies, we have found that observers are sensitive to specific "motion features" that characterize human gait. These spatiotemporal motion features closely correspond to the first few principal components (PC) of the kinematic data. The first few PCs provide a representation of an individual gait as trajectory along a low-dimensional manifold in PC space. A second stage of PCA captures variability in the shape of this manifold across individuals or gaits. This simple eigenspace based analysis is capable of accurate classification across subjects
{"title":"Gait recognition by two-stage principal component analysis","authors":"Sandhitsu R. Das, Robert C. Wilson, M. Lazarewicz, L. Finkel","doi":"10.1109/FGR.2006.56","DOIUrl":"https://doi.org/10.1109/FGR.2006.56","url":null,"abstract":"We describe a methodology for classification of gait (walk, run, jog, etc.) and recognition of individuals based on gait using two successive stages of principal component analysis (PCA) on kinematic data. In psychophysical studies, we have found that observers are sensitive to specific \"motion features\" that characterize human gait. These spatiotemporal motion features closely correspond to the first few principal components (PC) of the kinematic data. The first few PCs provide a representation of an individual gait as trajectory along a low-dimensional manifold in PC space. A second stage of PCA captures variability in the shape of this manifold across individuals or gaits. This simple eigenspace based analysis is capable of accurate classification across subjects","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133975333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In sign language recognition, one of the problems is to collect enough training data. Almost all of the statistical methods used in sign language recognition suffer from this problem. Inspired by the crossover of genetic algorithms, this paper presents a method to expand Chinese sign language (CSL) database through re-sampling from existing sign samples. Two original samples of the same sign are regarded as parents. They can reproduce their children by crossover. To verify the validity of the proposed method, some experiments are carried out on a vocabulary of 2435 gestures in Chinese sign language. Each gesture has 4 samples. Three samples are used to be the original generation. These three original samples and their offspring are used to construct the training set, and the remaining sample is used for test. The experimental results show that the new samples generated by the proposed method are effective
{"title":"Expanding Training Set for Chinese Sign Language Recognition","authors":"Chunli Wang, Xilin Chen, Wen Gao","doi":"10.1109/FGR.2006.39","DOIUrl":"https://doi.org/10.1109/FGR.2006.39","url":null,"abstract":"In sign language recognition, one of the problems is to collect enough training data. Almost all of the statistical methods used in sign language recognition suffer from this problem. Inspired by the crossover of genetic algorithms, this paper presents a method to expand Chinese sign language (CSL) database through re-sampling from existing sign samples. Two original samples of the same sign are regarded as parents. They can reproduce their children by crossover. To verify the validity of the proposed method, some experiments are carried out on a vocabulary of 2435 gestures in Chinese sign language. Each gesture has 4 samples. Three samples are used to be the original generation. These three original samples and their offspring are used to construct the training set, and the remaining sample is used for test. The experimental results show that the new samples generated by the proposed method are effective","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123677269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hand detection and tracking play important roles in human computer interaction (HCI) applications, as well as surveillance. We propose a self initializing and self correcting tracking technique that is robust to different skin color, illumination and shadow irregularities. Self initialization is achieved from a detector that has relatively high false positive rate. The detected hands are then tracked backwards and forward in time using mean shift trackers initialized at each hand to find the candidate tracks for possible objects in the test sequence. Observed tracks are merged and weighed to find the real trajectories. Simple actions can be inferred extracting each object from the scene and interpreting their locations within each frame. Extraction is possible using the color histograms of the objects built during the detection phase. We apply the technique here to simple hand tracking with good results, without the need for training for skin color
{"title":"Self correcting tracking for articulated objects","authors":"M. Caglar, N. Lobo","doi":"10.1109/FGR.2006.100","DOIUrl":"https://doi.org/10.1109/FGR.2006.100","url":null,"abstract":"Hand detection and tracking play important roles in human computer interaction (HCI) applications, as well as surveillance. We propose a self initializing and self correcting tracking technique that is robust to different skin color, illumination and shadow irregularities. Self initialization is achieved from a detector that has relatively high false positive rate. The detected hands are then tracked backwards and forward in time using mean shift trackers initialized at each hand to find the candidate tracks for possible objects in the test sequence. Observed tracks are merged and weighed to find the real trajectories. Simple actions can be inferred extracting each object from the scene and interpreting their locations within each frame. Extraction is possible using the color histograms of the objects built during the detection phase. We apply the technique here to simple hand tracking with good results, without the need for training for skin color","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128160985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a new face recognition method using a projection-based 3D normalization and a shading subspace orthogonalization under variation in facial pose and illumination. The proposed method does not need any reconstruction and reillumination for a personalized 3D model, thus it can avoid these troublesome problems and the recognition process can be done rapidly. The facial size and pose including out of plane rotation can be normalized to a generic 3D model from one still image and the input subspace is generated by perturbed cropped patterns in order to absorb the localization errors. Furthermore, by exploiting the fact that a normalized pattern is fitted to the generic 3D model, illumination robust features are extracted through the shading subspace orthogonalization. Evaluation experiments are performed using several databases and the results show the effectiveness of our method under various facial poses and illuminations
{"title":"Face recognition by projection-based 3D normalization and shading subspace orthogonalization","authors":"Tatsuo Kozakaya, Osamu Yamaguchi","doi":"10.1109/FGR.2006.43","DOIUrl":"https://doi.org/10.1109/FGR.2006.43","url":null,"abstract":"This paper describes a new face recognition method using a projection-based 3D normalization and a shading subspace orthogonalization under variation in facial pose and illumination. The proposed method does not need any reconstruction and reillumination for a personalized 3D model, thus it can avoid these troublesome problems and the recognition process can be done rapidly. The facial size and pose including out of plane rotation can be normalized to a generic 3D model from one still image and the input subspace is generated by perturbed cropped patterns in order to absorb the localization errors. Furthermore, by exploiting the fact that a normalized pattern is fitted to the generic 3D model, illumination robust features are extracted through the shading subspace orthogonalization. Evaluation experiments are performed using several databases and the results show the effectiveness of our method under various facial poses and illuminations","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114426664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we address a method that is able to track simultaneously 3D head movements and facial actions like lip and eyebrow movements in a video sequence. In a baseline framework, an adaptive appearance model is estimated online by the knowledge of a monocular video sequence. This method uses a 3D model of the face and a facial adaptive texture model. Then, we consider and compare two improved models in order to increase robustness to occlusions. First, we use robust statistics in order to downweight the hidden regions or outlier pixels. In a second approach, mixture models provides better integration of occlusions. Experiments demonstrate the benefit of the two robust models. The latter are compared under various occlusions
{"title":"Head and facial action tracking: comparison of two robust approaches","authors":"R. Hérault, F. Davoine, Yves Grandvalet","doi":"10.1109/FGR.2006.63","DOIUrl":"https://doi.org/10.1109/FGR.2006.63","url":null,"abstract":"In this work, we address a method that is able to track simultaneously 3D head movements and facial actions like lip and eyebrow movements in a video sequence. In a baseline framework, an adaptive appearance model is estimated online by the knowledge of a monocular video sequence. This method uses a 3D model of the face and a facial adaptive texture model. Then, we consider and compare two improved models in order to increase robustness to occlusions. First, we use robust statistics in order to downweight the hidden regions or outlier pixels. In a second approach, mixture models provides better integration of occlusions. Experiments demonstrate the benefit of the two robust models. The latter are compared under various occlusions","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130957840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}