Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840669
Rajeev Sharma, Jiongyu Cai, Srivatsan Chakravarthy, Indrajit Poddar, Y. Sethi
In order to incorporate naturalness in the design of human computer interfaces (HCI), it is desirable to develop recognition techniques capable of handling continuous natural gesture and speech inputs. Though many different researchers have reported high recognition rates for gesture recognition using hidden Markov models (HMM), the gestures used are mostly pre-defined and are bound with syntactical and grammatical constraints. But natural gestures do not string together in syntactical bindings. Moreover, strict classification of natural gestures is not feasible. We have examined hand gestures made in a very natural domain, that of a weather person narrating in front of a weather map. The gestures made by the weather person are embedded in a narration. This provides us with abundant data from an uncontrolled environment to study the interaction between speech and gesture in the context of a display. We hypothesize that this domain is very similar to that of a natural human-computer interface. We present an HMM architecture for continuous gesture recognition framework and keyword spotting. To explore the relation between gesture and speech, we conducted a statistical co-occurrence analysis of different gestures with a selected set of spoken keywords. We then demonstrate how this co-occurrence analysis can be exploited to improve the performance of continuous gesture recognition.
{"title":"Exploiting speech/gesture co-occurrence for improving continuous gesture recognition in weather narration","authors":"Rajeev Sharma, Jiongyu Cai, Srivatsan Chakravarthy, Indrajit Poddar, Y. Sethi","doi":"10.1109/AFGR.2000.840669","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840669","url":null,"abstract":"In order to incorporate naturalness in the design of human computer interfaces (HCI), it is desirable to develop recognition techniques capable of handling continuous natural gesture and speech inputs. Though many different researchers have reported high recognition rates for gesture recognition using hidden Markov models (HMM), the gestures used are mostly pre-defined and are bound with syntactical and grammatical constraints. But natural gestures do not string together in syntactical bindings. Moreover, strict classification of natural gestures is not feasible. We have examined hand gestures made in a very natural domain, that of a weather person narrating in front of a weather map. The gestures made by the weather person are embedded in a narration. This provides us with abundant data from an uncontrolled environment to study the interaction between speech and gesture in the context of a display. We hypothesize that this domain is very similar to that of a natural human-computer interface. We present an HMM architecture for continuous gesture recognition framework and keyword spotting. To explore the relation between gesture and speech, we conducted a statistical co-occurrence analysis of different gestures with a selected set of spoken keywords. We then demonstrate how this co-occurrence analysis can be exploited to improve the performance of continuous gesture recognition.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"345 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115290260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840673
Xiaojin Zhu, Jie Yang, A. Waibel
Hand segmentation is a prerequisite for many gesture recognition tasks. Color has been widely used for hand segmentation. However, many approaches rely on predefined skin color models. It is very difficult to predefine a color model in a mobile application where the light condition may change dramatically over time. We propose a novel statistical approach to hand segmentation based on Bayes decision theory. The proposed method requires no predefined skin color model. Instead it generates a hand color model and a background color model for a given image, and uses these models to classify each pixel in the image as either a hand pixel or a background pixel. Models are generated using a Gaussian mixture model with the restricted EM algorithm. Our method is capable of segmenting hands of arbitrary color in a complex scene. It performs well even when there is a significant overlap between hand and background colors, or when the user wears gloves. We show that the Bayes decision method is superior to a commonly used method by comparing their upper bound performance. Experimental results demonstrate the feasibility of the proposed method.
{"title":"Segmenting hands of arbitrary color","authors":"Xiaojin Zhu, Jie Yang, A. Waibel","doi":"10.1109/AFGR.2000.840673","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840673","url":null,"abstract":"Hand segmentation is a prerequisite for many gesture recognition tasks. Color has been widely used for hand segmentation. However, many approaches rely on predefined skin color models. It is very difficult to predefine a color model in a mobile application where the light condition may change dramatically over time. We propose a novel statistical approach to hand segmentation based on Bayes decision theory. The proposed method requires no predefined skin color model. Instead it generates a hand color model and a background color model for a given image, and uses these models to classify each pixel in the image as either a hand pixel or a background pixel. Models are generated using a Gaussian mixture model with the restricted EM algorithm. Our method is capable of segmenting hands of arbitrary color in a complex scene. It performs well even when there is a significant overlap between hand and background colors, or when the user wears gloves. We show that the Bayes decision method is superior to a commonly used method by comparing their upper bound performance. Experimental results demonstrate the feasibility of the proposed method.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115545730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840620
Ying-li Tian, T. Kanade, J. Cohn
Most eye trackers work well for open eyes. However blinking is a physiological necessity for humans. More over, for applications such as facial expression analysis and driver awareness systems, we need to do more than tracking of the locations of the person's eyes but obtain their detailed description. We need to recover the state of the eyes (i.e., whether they are open or closed), and the parameters of an eye model (e.g., the location and radius of the iris, and the corners and height of the eye opening). We develop a dual-state model-based system for tracking eye features that uses convergent tracking techniques and show how it can be used to detect whether the eyes are open or closed, and to recover the parameters of the eye model. Processing speed on a Pentium II 400 MHz PC is approximately 3 frames/second. In experimental tests on 500 image sequences from child and adult subjects with varying colors of skin and eye, accurate tracking results are obtained in 98% of image sequences.
{"title":"Dual-state parametric eye tracking","authors":"Ying-li Tian, T. Kanade, J. Cohn","doi":"10.1109/AFGR.2000.840620","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840620","url":null,"abstract":"Most eye trackers work well for open eyes. However blinking is a physiological necessity for humans. More over, for applications such as facial expression analysis and driver awareness systems, we need to do more than tracking of the locations of the person's eyes but obtain their detailed description. We need to recover the state of the eyes (i.e., whether they are open or closed), and the parameters of an eye model (e.g., the location and radius of the iris, and the corners and height of the eye opening). We develop a dual-state model-based system for tracking eye features that uses convergent tracking techniques and show how it can be used to detect whether the eyes are open or closed, and to recover the parameters of the eye model. Processing speed on a Pentium II 400 MHz PC is approximately 3 frames/second. In experimental tests on 500 image sequences from child and adult subjects with varying colors of skin and eye, accurate tracking results are obtained in 98% of image sequences.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129889515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840632
Ying Wu, K. Toyama
We present an algorithm for estimation of head orientation, given cropped images of a subject's head from any viewpoint. Our algorithm handles dramatic changes in illumination, applies to many people without per-user initialization, and covers a wider range (e.g., side and back) of head orientations than previous algorithms. The algorithm builds an ellipsoidal model of the head, where points on the model maintain probabilistic information about surface edge density. To collect data for each point on the model, edge-density features are extracted from hand-annotated training images and projected into the model. Each model point learns a probability density function from the training observations. During pose estimation, features are extracted from input images; then, the maximum a posteriori pose is sought, given the current observation.
{"title":"Wide-range, person- and illumination-insensitive head orientation estimation","authors":"Ying Wu, K. Toyama","doi":"10.1109/AFGR.2000.840632","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840632","url":null,"abstract":"We present an algorithm for estimation of head orientation, given cropped images of a subject's head from any viewpoint. Our algorithm handles dramatic changes in illumination, applies to many people without per-user initialization, and covers a wider range (e.g., side and back) of head orientations than previous algorithms. The algorithm builds an ellipsoidal model of the head, where points on the model maintain probabilistic information about surface edge density. To collect data for each point on the model, edge-density features are extracted from hand-annotated training images and projected into the model. Each model point learns a probability density function from the training observations. During pose estimation, features are extracted from input images; then, the maximum a posteriori pose is sought, given the current observation.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"97 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128827498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840658
S. McKenna, S. Jabri, Zoran Duric, H. Wechsler
A computer vision system for tracking multiple people in relatively unconstrained environments is described. Tracking is performed at three levels of abstraction: regions, people and groups. A novel, adaptive background subtraction method that combines colour and gradient information is used to cope with shadows and unreliable colour cues. People are tracked through mutual occlusions as they form groups and part from one another. Strong use is made of colour information to disambiguate occlusions and to provide qualitative estimates of depth ordering and position during occlusion. Some simple interactions with objects can also be detected. The system is tested using indoor and outdoor sequences. It is robust and should provide a useful mechanism for bootstrapping and reinitialisation of tracking using more-specific but less-robust human models.
{"title":"Tracking interacting people","authors":"S. McKenna, S. Jabri, Zoran Duric, H. Wechsler","doi":"10.1109/AFGR.2000.840658","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840658","url":null,"abstract":"A computer vision system for tracking multiple people in relatively unconstrained environments is described. Tracking is performed at three levels of abstraction: regions, people and groups. A novel, adaptive background subtraction method that combines colour and gradient information is used to cope with shadows and unreliable colour cues. People are tracked through mutual occlusions as they form groups and part from one another. Strong use is made of colour information to disambiguate occlusions and to provide qualitative estimates of depth ordering and position during occlusion. Some simple interactions with objects can also be detected. The system is tested using indoor and outdoor sequences. It is robust and should provide a useful mechanism for bootstrapping and reinitialisation of tracking using more-specific but less-robust human models.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125322560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840688
Yuanxin Zhu, Haibing Ren, Guangyou Xu, X. Lin
This paper, aiming at real-time gesture-controlled interaction, describes visual modeling, analysis, and recognition of continuous dynamic hand gestures. By hierarchically integrating multiple cues, a spatio-temporal appearance model and novel approaches are proposed for modeling and analysis of dynamic gestures respectively. At low level, fusion of flesh chrominance analysis and coarse image motion detection is employed to detect and segment hand gestures; at high level, parameters of the spatio-temporal appearance model are recovered by combining robust parameterized image motion estimation and hand shape analysis. The approach, therefore, fulfils real-time processing as well as high recognition rates. Without resorting to any special marks, twelve kinds of hand gestures can be recognized with average accuracy over 89%. A prototype system, gesture-controlled panoramic map browser is designed and implemented to demonstrate the usability of gesture-controlled interaction.
{"title":"Toward real-time human-computer interaction with continuous dynamic hand gestures","authors":"Yuanxin Zhu, Haibing Ren, Guangyou Xu, X. Lin","doi":"10.1109/AFGR.2000.840688","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840688","url":null,"abstract":"This paper, aiming at real-time gesture-controlled interaction, describes visual modeling, analysis, and recognition of continuous dynamic hand gestures. By hierarchically integrating multiple cues, a spatio-temporal appearance model and novel approaches are proposed for modeling and analysis of dynamic gestures respectively. At low level, fusion of flesh chrominance analysis and coarse image motion detection is employed to detect and segment hand gestures; at high level, parameters of the spatio-temporal appearance model are recovered by combining robust parameterized image motion estimation and hand shape analysis. The approach, therefore, fulfils real-time processing as well as high recognition rates. Without resorting to any special marks, twelve kinds of hand gestures can be recognized with average accuracy over 89%. A prototype system, gesture-controlled panoramic map browser is designed and implemented to demonstrate the usability of gesture-controlled interaction.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117111279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840633
H. Graf, E. Cosatto, Tony Ezzat
This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of such images are integrated with audio from a text-to-speech synthesizer. For a seamless integration of facial parts into an animation, their shape and visual appearance must be known with high accuracy. The recognition system has to find not only the locations of facial features, but must also be able to determine the head's orientation and recognize the facial expressions. Our face recognition proceeds in multiple steps, each with an increased precision. Using motion, color and shape information, the head's position and the location of the main facial features are determined first. Then smaller areas are searched with matched filters, in order to identify specific facial features with high precision. From this information a head's 3D orientation is calculated. Facial parts are cut from the image and, using the head's orientation, are warped into bitmaps with 'normalized' orientation and scale.
{"title":"Face analysis for the synthesis of photo-realistic talking heads","authors":"H. Graf, E. Cosatto, Tony Ezzat","doi":"10.1109/AFGR.2000.840633","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840633","url":null,"abstract":"This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of such images are integrated with audio from a text-to-speech synthesizer. For a seamless integration of facial parts into an animation, their shape and visual appearance must be known with high accuracy. The recognition system has to find not only the locations of facial features, but must also be able to determine the head's orientation and recognize the facial expressions. Our face recognition proceeds in multiple steps, each with an increased precision. Using motion, color and shape information, the head's position and the location of the main facial features are determined first. Then smaller areas are searched with matched filters, in order to identify specific facial features with high precision. From this information a head's 3D orientation is calculated. Facial parts are cut from the image and, using the head's orientation, are warped into bitmaps with 'normalized' orientation and scale.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124452820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840663
A. Garg, V. Pavlovic, James M. Rehg
The development of human-computer interfaces poses a challenging problem: actions and intentions of different users have to be inferred from sequences of noisy and ambiguous sensory data. Temporal fusion of multiple sensors can be efficiently formulated using dynamic Bayesian networks (DBN). The DBN framework allows the power of statistical inference and learning to be combined with contextual knowledge of the problem. We demonstrate the use of DBN in tackling the problem of audio/visual speaker detection. "Off-the-shelf" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors) are optimally fused along with contextual information in a DBN architecture that infers instances when an individual is speaking. Results obtained in the setup of an actual human-machine interaction system (Genie Casino Kiosk) demonstrate superiority of our approach over that of static, context-free fusion architecture.
{"title":"Audio-visual speaker detection using dynamic Bayesian networks","authors":"A. Garg, V. Pavlovic, James M. Rehg","doi":"10.1109/AFGR.2000.840663","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840663","url":null,"abstract":"The development of human-computer interfaces poses a challenging problem: actions and intentions of different users have to be inferred from sequences of noisy and ambiguous sensory data. Temporal fusion of multiple sensors can be efficiently formulated using dynamic Bayesian networks (DBN). The DBN framework allows the power of statistical inference and learning to be combined with contextual knowledge of the problem. We demonstrate the use of DBN in tackling the problem of audio/visual speaker detection. \"Off-the-shelf\" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors) are optimally fused along with contextual information in a DBN architecture that infers instances when an individual is speaking. Results obtained in the setup of an actual human-machine interaction system (Genie Casino Kiosk) demonstrate superiority of our approach over that of static, context-free fusion architecture.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134132259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840672
Britta Bauer, Hermann Hienz
This paper describes the development of a video-based continuous sign language recognition system. The system is based on continuous density hidden Markov models (HMM) with one model for each sign. Feature vectors reflecting manual sign parameters serve as input for training and recognition. To reduce computational complexity during the recognition task beam search is employed. The system aims for an automatic signer-dependent recognition of sign language sentences, based on a lexicon of 97 signs of German sign language (GSL). A further colour video camera is used for image recording. Furthermore the influence of different features reflecting different manual sign parameters on the recognition results are examined. Results are given for varying sized vocabulary. The system achieves an accuracy of 91.7% based on a lexicon of 97 signs.
{"title":"Relevant features for video-based continuous sign language recognition","authors":"Britta Bauer, Hermann Hienz","doi":"10.1109/AFGR.2000.840672","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840672","url":null,"abstract":"This paper describes the development of a video-based continuous sign language recognition system. The system is based on continuous density hidden Markov models (HMM) with one model for each sign. Feature vectors reflecting manual sign parameters serve as input for training and recognition. To reduce computational complexity during the recognition task beam search is employed. The system aims for an automatic signer-dependent recognition of sign language sentences, based on a lexicon of 97 signs of German sign language (GSL). A further colour video camera is used for image recording. Furthermore the influence of different features reflecting different manual sign parameters on the recognition results are examined. Results are given for varying sized vocabulary. The system achieves an accuracy of 91.7% based on a lexicon of 97 signs.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132686351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-03-26DOI: 10.1109/AFGR.2000.840634
G. Guo, S. Li, K. Chan
Support vector machines (SVM) have been recently proposed as a new technique for pattern recognition. SVM with a binary tree recognition strategy are used to tackle the face recognition problem. We illustrate the potential of SVM on the Cambridge ORL face database, which consists of 400 images of 40 individuals, containing quite a high degree of variability in expression, pose, and facial details. We also present the recognition experiment on a larger face database of 1079 images of 137 individuals. We compare the SVM-based recognition with the standard eigenface approach using the nearest center classification (NCC) criterion.
{"title":"Face recognition by support vector machines","authors":"G. Guo, S. Li, K. Chan","doi":"10.1109/AFGR.2000.840634","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840634","url":null,"abstract":"Support vector machines (SVM) have been recently proposed as a new technique for pattern recognition. SVM with a binary tree recognition strategy are used to tackle the face recognition problem. We illustrate the potential of SVM on the Cambridge ORL face database, which consists of 400 images of 40 individuals, containing quite a high degree of variability in expression, pose, and facial details. We also present the recognition experiment on a larger face database of 1079 images of 137 individuals. We compare the SVM-based recognition with the standard eigenface approach using the nearest center classification (NCC) criterion.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131917429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}