Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938922
James Maclean
Work on real-time hand-gesture recognition for SAVI (stereo active vision interface) is presented. Based on the detection of frontal faces, image regions near the face are searched for the existence of skin-tone blobs. Each blob is evaluated to determine if it it is a hand held in a standard pose. A verification algorithm based on the responses of elongated oriented filters is used to decide whether a hand is present or not. Once a hand is detected, gestures are given by varying the number of fingers visible. The hand is segmented using an algorithm which detects connected skin-tone blobs in the region of interest, and a medial axis transform (skeletonization) is applied. Analysis of the resulting skeleton allows detection of the number of fingers visible, thus determining the gesture. The skeletonization is sensitive to strong shadows which may alter the detected morphology of the hand. Experimental results are given indicating good performance of the algorithm.
{"title":"Fast hand gesture recognition for real-time teleconferencing applications","authors":"James Maclean","doi":"10.1109/RATFG.2001.938922","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938922","url":null,"abstract":"Work on real-time hand-gesture recognition for SAVI (stereo active vision interface) is presented. Based on the detection of frontal faces, image regions near the face are searched for the existence of skin-tone blobs. Each blob is evaluated to determine if it it is a hand held in a standard pose. A verification algorithm based on the responses of elongated oriented filters is used to decide whether a hand is present or not. Once a hand is detected, gestures are given by varying the number of fingers visible. The hand is segmented using an algorithm which detects connected skin-tone blobs in the region of interest, and a medial axis transform (skeletonization) is applied. Analysis of the resulting skeleton allows detection of the number of fingers visible, thus determining the gesture. The skeletonization is sensitive to strong shadows which may alter the detected morphology of the hand. Experimental results are given indicating good performance of the algorithm.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128992995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938920
T. Kurata, T. Okuma, M. Kourogi, K. Sakaue
This paper describes an algorithm to detect and track a hand in each image taken by a wearable camera. We primarily use color information, however, instead of pre-defined skin-color models, we dynamically construct hand- and background-color models by using a Gaussian mixture model (GMM) to approximate the color histogram. Not only to obtain the estimated mean of hand color necessary for the restricted EM algorithm that estimates the GMM but also to classify hand pixels based on the Bayes decision theory, we use a spatial probability distribution of hand pixels. Since the static distribution is inadequate for the hand-tracking stage, we translate the distribution with the hand motion based on the mean shift algorithm. Using the proposed method, we implemented the Hand Mouse that uses the wearer's hand as a pointing device, on our wearable vision system.
{"title":"The Hand Mouse: GMM hand-color classification and mean shift tracking","authors":"T. Kurata, T. Okuma, M. Kourogi, K. Sakaue","doi":"10.1109/RATFG.2001.938920","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938920","url":null,"abstract":"This paper describes an algorithm to detect and track a hand in each image taken by a wearable camera. We primarily use color information, however, instead of pre-defined skin-color models, we dynamically construct hand- and background-color models by using a Gaussian mixture model (GMM) to approximate the color histogram. Not only to obtain the estimated mean of hand color necessary for the restricted EM algorithm that estimates the GMM but also to classify hand pixels based on the Bayes decision theory, we use a spatial probability distribution of hand pixels. Since the static distribution is inadequate for the hand-tracking stage, we translate the distribution with the hand motion based on the mean shift algorithm. Using the proposed method, we implemented the Hand Mouse that uses the wearer's hand as a pointing device, on our wearable vision system.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129279715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938925
M. Walter, A. Psarrou, S. Gong
We present an approach to automatically segment and label a continuous observation sequence of hand gestures for a complete unsupervised model acquisition. The method is based on the assumption that gestures can be viewed as repetitive sequences of atomic components, similar to phonemes in speech, governed by a high level structure controlling the temporal sequence. We show that the generating process for the atomic components can be described in gesture space by a mixture of Gaussian, with each mixture component tied to one atomic behaviour. Mixture components are determined using a standard expectation maximisation approach while the determination of the number of components is based on an information criteria, the minimum description length.
{"title":"Auto clustering for unsupervised learning of atomic gesture components using minimum description length","authors":"M. Walter, A. Psarrou, S. Gong","doi":"10.1109/RATFG.2001.938925","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938925","url":null,"abstract":"We present an approach to automatically segment and label a continuous observation sequence of hand gestures for a complete unsupervised model acquisition. The method is based on the assumption that gestures can be viewed as repetitive sequences of atomic components, similar to phonemes in speech, governed by a high level structure controlling the temporal sequence. We show that the generating process for the atomic components can be described in gesture space by a mixture of Gaussian, with each mixture component tied to one atomic behaviour. Mixture components are determined using a standard expectation maximisation approach while the determination of the number of components is based on an information criteria, the minimum description length.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123377703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938917
G. Iyengar, C. Neti
We present our system for speech intent detection. In traditional desktop speech applications, the user has to explicitly indicate intent-to-speak to the computer by turning the microphone on. This is to alleviate problems associated with an open microphone in an automatic speech recognition system. In this paper, we use cues derived from user pose, proximity and visual speech activity to detect speech intent and enable automatic control of the microphone. We achieve real-time performance using pre-attentive cues to eliminate redundant computation.
{"title":"A vision-based microphone switch for speech intent detection","authors":"G. Iyengar, C. Neti","doi":"10.1109/RATFG.2001.938917","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938917","url":null,"abstract":"We present our system for speech intent detection. In traditional desktop speech applications, the user has to explicitly indicate intent-to-speak to the computer by turning the microphone on. This is to alleviate problems associated with an open microphone in an automatic speech recognition system. In this paper, we use cues derived from user pose, proximity and visual speech activity to detect speech intent and enable automatic control of the microphone. We achieve real-time performance using pre-attentive cues to eliminate redundant computation.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121566160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938926
H. Zhang, Y. Guo
Describes an approach to facial expression recognition (FER). We represent facial expressions by a facial motion graph (FMG), which is based on feature points and muscle movements. FER is achieved by analyzing the similarity between an unknown expression's FMG and FMG models of known expressions by employing continuous dynamic programming. Furthermore we propose a method to evaluate edge weights in FMG similarity calculation, and use these edge weights to achieve a more accurate and robust system. Experiments show the excellent performance of this system on our video database, which contains video data captured under various conditions with multiple motion patterns.
{"title":"Facial expression recognition using continuous dynamic programming","authors":"H. Zhang, Y. Guo","doi":"10.1109/RATFG.2001.938926","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938926","url":null,"abstract":"Describes an approach to facial expression recognition (FER). We represent facial expressions by a facial motion graph (FMG), which is based on feature points and muscle movements. FER is achieved by analyzing the similarity between an unknown expression's FMG and FMG models of known expressions by employing continuous dynamic programming. Furthermore we propose a method to evaluate edge weights in FMG similarity calculation, and use these edge weights to achieve a more accurate and robust system. Experiments show the excellent performance of this system on our video database, which contains video data captured under various conditions with multiple motion patterns.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126282002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938916
G. Guo, HongJiang Zhang
We propose to use the AdaBoost algorithm for face recognition. AdaBoost is a kind of large margin classifiers and is efficient for online learning. In order to adapt the AdaBoost algorithm to fast face recognition, the original AdaBoost which uses all given features is compared with the boosting feature dimensions. The comparable results assure the use of the latter, which is faster for classification. The AdaBoost is typically a classification between two classes. To solve the multi-class recognition problem, we propose to use a constrained majority voting strategy to largely reduce the number of pairwise comparisons, without losing the recognition accuracy. Experimental results on a large face database of 1079 faces of 137 individuals show the feasibility of our approach for fast face recognition.
{"title":"Boosting for fast face recognition","authors":"G. Guo, HongJiang Zhang","doi":"10.1109/RATFG.2001.938916","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938916","url":null,"abstract":"We propose to use the AdaBoost algorithm for face recognition. AdaBoost is a kind of large margin classifiers and is efficient for online learning. In order to adapt the AdaBoost algorithm to fast face recognition, the original AdaBoost which uses all given features is compared with the boosting feature dimensions. The comparable results assure the use of the latter, which is faster for classification. The AdaBoost is typically a classification between two classes. To solve the multi-class recognition problem, we propose to use a constrained majority voting strategy to largely reduce the number of pairwise comparisons, without losing the recognition accuracy. Experimental results on a large face database of 1079 faces of 137 individuals show the feasibility of our approach for fast face recognition.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115992999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938905
Chia-Ming Cheng, S. Lai
We present an integrated system for reconstruction an individualized 3D head model from a video sequence. Our reconstruction algorithm is based on the adaptation of a generic 3D head model. 3D geometric constraints on the head model are computed from the robust bundle adjustment algorithm and the structure from silhouette method. These 3D constraints are integrated to adapt the generic head model via radial basis function interpolation. Then the texture map of the reconstructed 3D head model is obtained by integrating all the images in the sequence through appropriate weighting. The proposed face model reconstruction method has the advantages of efficient computation as well as robustness against noises and outliers.
{"title":"An integrated approach to 3D face model reconstruction from video","authors":"Chia-Ming Cheng, S. Lai","doi":"10.1109/RATFG.2001.938905","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938905","url":null,"abstract":"We present an integrated system for reconstruction an individualized 3D head model from a video sequence. Our reconstruction algorithm is based on the adaptation of a generic 3D head model. 3D geometric constraints on the head model are computed from the robust bundle adjustment algorithm and the structure from silhouette method. These 3D constraints are integrated to adapt the generic head model via radial basis function interpolation. Then the texture map of the reconstructed 3D head model is obtained by integrating all the images in the sequence through appropriate weighting. The proposed face model reconstruction method has the advantages of efficient computation as well as robustness against noises and outliers.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124023640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938918
B. Kapralos, Michael Jenkin, E. Milios, John K. Tsotsos
This work investigates the development of a robust and portable teleconferencing system utilizing both audio and video cues. An omnidirectional video sensor is used to provide a view of the entire visual hemisphere thereby providing multiple dynamic views of the participants. Regions of skin are detected using simple statistical methods, along with histogram color models for both skin and non-skin color classes. Skin regions belonging to the same person are grouped together. Using simple geometrical properties, the location of each person's face in the "real world" is estimated and provided to the audio system as a possible sound source direction. Beamforming and sound detection techniques with a small, compact microphone array allows the audio system to detect and attend to the speech of each participant, thereby reducing unwanted noise and sounds emanating from other locations. The results of experiments conducted in normal, reverberant environments indicate the effectiveness of both the audio and video systems.
{"title":"Eyes 'n ears: face detection utilizing audio and video cues","authors":"B. Kapralos, Michael Jenkin, E. Milios, John K. Tsotsos","doi":"10.1109/RATFG.2001.938918","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938918","url":null,"abstract":"This work investigates the development of a robust and portable teleconferencing system utilizing both audio and video cues. An omnidirectional video sensor is used to provide a view of the entire visual hemisphere thereby providing multiple dynamic views of the participants. Regions of skin are detected using simple statistical methods, along with histogram color models for both skin and non-skin color classes. Skin regions belonging to the same person are grouped together. Using simple geometrical properties, the location of each person's face in the \"real world\" is estimated and provided to the audio system as a possible sound source direction. Beamforming and sound detection techniques with a small, compact microphone array allows the audio system to detect and attend to the speech of each participant, thereby reducing unwanted noise and sounds emanating from other locations. The results of experiments conducted in normal, reverberant environments indicate the effectiveness of both the audio and video systems.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132002195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938924
F. Wallhoff, Stefan Müller, G. Rigoll
Face recognition has established itself as an important sub-branch of pattern recognition within the field of computer science. Many state-of-the-art systems have focused on the task of recognizing frontal views or images with just slight variations in head pose and facial expression of people. We concentrate on two approaches to recognize profile views (90 degrees) with previous knowledge of only the frontal view, which is a challenging task even for human beings. The first presented system makes use of synthesized profile views and the second one uses a joint parameter estimation technique. The systems we present combine artificial neural networks (NN) and a modeling technique based on hidden Markov models (HMM). One of the main ideas of these systems is to perform the recognition task without the use of any 3D-information of heads and faces such as a physical 3D-models, for instance. Instead, we represent the rotation process by a NN, which has been trained with prior knowledge derived from image pairs showing the same person's frontal and profile view. Another important restriction to this task is that we use exactly one example frontal view to train the system to recognize the corresponding profile view for a previously unseen individual. The presented systems are tested with a sub-set of the MUGSHOT database.
{"title":"Hybrid face recognition systems for profile views using the MUGSHOT database","authors":"F. Wallhoff, Stefan Müller, G. Rigoll","doi":"10.1109/RATFG.2001.938924","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938924","url":null,"abstract":"Face recognition has established itself as an important sub-branch of pattern recognition within the field of computer science. Many state-of-the-art systems have focused on the task of recognizing frontal views or images with just slight variations in head pose and facial expression of people. We concentrate on two approaches to recognize profile views (90 degrees) with previous knowledge of only the frontal view, which is a challenging task even for human beings. The first presented system makes use of synthesized profile views and the second one uses a joint parameter estimation technique. The systems we present combine artificial neural networks (NN) and a modeling technique based on hidden Markov models (HMM). One of the main ideas of these systems is to perform the recognition task without the use of any 3D-information of heads and faces such as a physical 3D-models, for instance. Instead, we represent the rotation process by a NN, which has been trained with prior knowledge derived from image pairs showing the same person's frontal and profile view. Another important restriction to this task is that we use exactly one example frontal view to train the system to recognize the corresponding profile view for a previously unseen individual. The presented systems are tested with a sub-set of the MUGSHOT database.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124199782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-07-13DOI: 10.1109/RATFG.2001.938914
A. Corradini
We focus on the visual sensory information to recognize human activity in form of hand-arm movements from a small, predefined vocabulary. We accomplish this task by means of a matching technique by determining the distance between the unknown input and a set of previously defined templates. A dynamic time warping algorithm is used to perform the time alignment and normalization by computing a temporal transformation allowing the two signals to be matched. The system is trained with finite video sequences of single gesture performances whose start and end-point are accurately known. Preliminary experiments are accomplished off-line and result in a recognition accuracy of up to 92%.
{"title":"Dynamic time warping for off-line recognition of a small gesture vocabulary","authors":"A. Corradini","doi":"10.1109/RATFG.2001.938914","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938914","url":null,"abstract":"We focus on the visual sensory information to recognize human activity in form of hand-arm movements from a small, predefined vocabulary. We accomplish this task by means of a matching technique by determining the distance between the unknown input and a set of previously defined templates. A dynamic time warping algorithm is used to perform the time alignment and normalization by computing a temporal transformation allowing the two signals to be matched. The system is trained with finite video sequences of single gesture performances whose start and end-point are accurately known. Preliminary experiments are accomplished off-line and result in a recognition accuracy of up to 92%.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128817920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}