This paper describes a general methodology for automated recognition of complex human activities. The methodology uses a context-free grammar (CFG) based representation scheme to represent composite actions and interactions. The CFG-based representation enables us to formally define complex human activities based on simple actions or movements. Human activities are classified into three categories: atomic action, composite action, and interaction. Our system is not only able to represent complex human activities formally, but also able to recognize represented actions and interactions with high accuracy. Image sequences are processed to extract poses and gestures. Based on gestures, the system detects actions and interactions occurring in a sequence of image frames. Our results show that the system is able to represent composite actions and interactions naturally. The system was tested to represent and recognize eight types of interactions: approach, depart, point, shake-hands, hug, punch, kick, and push. The experiments show that the system can recognize sequences of represented composite actions and interactions with a high recognition rate.
{"title":"Recognition of Composite Human Activities through Context-Free Grammar Based Representation","authors":"M. Ryoo, J. Aggarwal","doi":"10.1109/CVPR.2006.242","DOIUrl":"https://doi.org/10.1109/CVPR.2006.242","url":null,"abstract":"This paper describes a general methodology for automated recognition of complex human activities. The methodology uses a context-free grammar (CFG) based representation scheme to represent composite actions and interactions. The CFG-based representation enables us to formally define complex human activities based on simple actions or movements. Human activities are classified into three categories: atomic action, composite action, and interaction. Our system is not only able to represent complex human activities formally, but also able to recognize represented actions and interactions with high accuracy. Image sequences are processed to extract poses and gestures. Based on gestures, the system detects actions and interactions occurring in a sequence of image frames. Our results show that the system is able to represent composite actions and interactions naturally. The system was tested to represent and recognize eight types of interactions: approach, depart, point, shake-hands, hug, punch, kick, and push. The experiments show that the system can recognize sequences of represented composite actions and interactions with a high recognition rate.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124417063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Active Appearance Models (AAMs) have been popularly used to represent the appearance and shape variations of human faces. Fitting an AAM to images recovers the face pose as well as its deformable shape and varying appearance. Successful fitting requires that the AAM is sufficiently generic such that it covers all possible facial appearances and shapes in the images. Such a generic AAM is often difficult to be obtained in practice, especially when the image quality is low or when occlusion occurs. To achieve robust AAM fitting under such circumstances, this paper proposes to incorporate the disparity data obtained from a stereo camera with the image fitting process. We develop an iterative multi-level algorithm that combines efficient AAM fitting to 2D images and robust 3D shape alignment to disparity data. Experiments on tracking faces in low-resolution images captured from meeting scenarios show that the proposed method achieves better performance than the original 2D AAM fitting algorithm. We also demonstrate an application of the proposed method to a facial expression recognition task.
{"title":"Robust AAM Fitting by Fusion of Images and Disparity Data","authors":"Joerg Liebelt, Jing Xiao, Jie Yang","doi":"10.1109/CVPR.2006.255","DOIUrl":"https://doi.org/10.1109/CVPR.2006.255","url":null,"abstract":"Active Appearance Models (AAMs) have been popularly used to represent the appearance and shape variations of human faces. Fitting an AAM to images recovers the face pose as well as its deformable shape and varying appearance. Successful fitting requires that the AAM is sufficiently generic such that it covers all possible facial appearances and shapes in the images. Such a generic AAM is often difficult to be obtained in practice, especially when the image quality is low or when occlusion occurs. To achieve robust AAM fitting under such circumstances, this paper proposes to incorporate the disparity data obtained from a stereo camera with the image fitting process. We develop an iterative multi-level algorithm that combines efficient AAM fitting to 2D images and robust 3D shape alignment to disparity data. Experiments on tracking faces in low-resolution images captured from meeting scenarios show that the proposed method achieves better performance than the original 2D AAM fitting algorithm. We also demonstrate an application of the proposed method to a facial expression recognition task.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114940186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. E. Solem, N. C. Overgaard, Markus Persson, A. Heyden
In this paper we consider region-based variational segmentation of two- and three-dimensional images by the minimization of functionals whose fidelity term is the quotient of two integrals. Users often refrain from quotient functionals, even when they seem to be the most natural choice, probably because the corresponding gradient descent PDEs are nonlocal and hence require the computation of global properties. Here it is shown how this problem may be overcome by employing the structure of the Euler-Lagrange equation of the fidelity term to construct a good initialization for the gradient descent PDE, which will then converge rapidly to the desired (local) minimum. The initializer is found by making a one-dimensional search among the level sets of a function related to the fidelity term, picking the level set which minimizes the segmentation functional. This partial extremal initialization is tested on a medical segmentation problem with velocity- and intensity data from MR images. In this particular application, the partial extremal initialization speeds up the segmentation by two orders of magnitude compared to straight forward gradient descent.
{"title":"Fast Variational Segmentation using Partial Extremal Initialization","authors":"J. E. Solem, N. C. Overgaard, Markus Persson, A. Heyden","doi":"10.1109/CVPR.2006.120","DOIUrl":"https://doi.org/10.1109/CVPR.2006.120","url":null,"abstract":"In this paper we consider region-based variational segmentation of two- and three-dimensional images by the minimization of functionals whose fidelity term is the quotient of two integrals. Users often refrain from quotient functionals, even when they seem to be the most natural choice, probably because the corresponding gradient descent PDEs are nonlocal and hence require the computation of global properties. Here it is shown how this problem may be overcome by employing the structure of the Euler-Lagrange equation of the fidelity term to construct a good initialization for the gradient descent PDE, which will then converge rapidly to the desired (local) minimum. The initializer is found by making a one-dimensional search among the level sets of a function related to the fidelity term, picking the level set which minimizes the segmentation functional. This partial extremal initialization is tested on a medical segmentation problem with velocity- and intensity data from MR images. In this particular application, the partial extremal initialization speeds up the segmentation by two orders of magnitude compared to straight forward gradient descent.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131021735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Zhang, Y. Yoo, Yongmin Kim, Lichen Zhang, L. M. Koh
A new noise reduction and edge enhancement method, i.e., Laplacian pyramid-based nonlinear diffusion and shock filter (LPNDSF), is proposed for medical ultrasound imaging. In the proposed LPNDSF, a coupled nonlinear diffusion and shock filter process is applied in Laplacian pyramid domain of an image, to remove speckle and enhance edges simultaneously. The performance of the proposed method was evaluated on a phantom and a real ultrasound image. In the phantom study, we obtained an average gain of 0.55 and 1.11 in contrast-to-noise ratio compared to the speckle reducing anisotropic diffusion (SRAD) and nonlinear coherent diffusion (NCD), respectively. Also, the proposed LPNDSF showed clearer boundaries on the phantom and the real ultrasound image. These preliminary results indicate that the proposed LPNDSF can effectively reduce speckle noise while enhancing image edges for retaining subtle features.
{"title":"Multiscale Nonlinear Diffusion and Shock Filter for Ultrasound Image Enhancement","authors":"Fan Zhang, Y. Yoo, Yongmin Kim, Lichen Zhang, L. M. Koh","doi":"10.1109/CVPR.2006.203","DOIUrl":"https://doi.org/10.1109/CVPR.2006.203","url":null,"abstract":"A new noise reduction and edge enhancement method, i.e., Laplacian pyramid-based nonlinear diffusion and shock filter (LPNDSF), is proposed for medical ultrasound imaging. In the proposed LPNDSF, a coupled nonlinear diffusion and shock filter process is applied in Laplacian pyramid domain of an image, to remove speckle and enhance edges simultaneously. The performance of the proposed method was evaluated on a phantom and a real ultrasound image. In the phantom study, we obtained an average gain of 0.55 and 1.11 in contrast-to-noise ratio compared to the speckle reducing anisotropic diffusion (SRAD) and nonlinear coherent diffusion (NCD), respectively. Also, the proposed LPNDSF showed clearer boundaries on the phantom and the real ultrasound image. These preliminary results indicate that the proposed LPNDSF can effectively reduce speckle noise while enhancing image edges for retaining subtle features.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128414186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of dense optical flow computation is addressed from a variational viewpoint. A new geometric framework is introduced. It unifies previous art and yields new efficient methods. Along with the framework a new alignment criterion suggests itself. It is shown that the alignment between the gradients of the optical flow components and between the latter and the intensity gradients is an important measure of the flow’s quality. Adding this criterion as a requirement in the optimization process improves the resulting flow. This is demonstrated in synthetic and real sequences.
{"title":"A General Framework and New Alignment Criterion for Dense Optical Flow","authors":"Rami Ben-Ari, N. Sochen","doi":"10.1109/CVPR.2006.25","DOIUrl":"https://doi.org/10.1109/CVPR.2006.25","url":null,"abstract":"The problem of dense optical flow computation is addressed from a variational viewpoint. A new geometric framework is introduced. It unifies previous art and yields new efficient methods. Along with the framework a new alignment criterion suggests itself. It is shown that the alignment between the gradients of the optical flow components and between the latter and the intensity gradients is an important measure of the flow’s quality. Adding this criterion as a requirement in the optimization process improves the resulting flow. This is demonstrated in synthetic and real sequences.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132161712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce Hierarchical Procrustes Matching (HPM), a segment-based shape matching algorithm which avoids problems associated with purely global or local methods and performs well on benchmark shape retrieval tests. The simplicity of the shape representation leads to a powerful matching algorithm which incorporates intuitive ideas about the perceptual nature of shape while being computationally efficient. This includes the ability to match similar parts even when they occur at different scales or positions. While comparison of multiscale shape representations is typically based on specific features, HPM avoids the need to extract such features. The hierarchical structure of the algorithm captures the appealing notion that matching should proceed in a global to local direction.
{"title":"Hierarchical Procrustes Matching for Shape Retrieval","authors":"Graham Mcneill, S. Vijayakumar","doi":"10.1109/CVPR.2006.133","DOIUrl":"https://doi.org/10.1109/CVPR.2006.133","url":null,"abstract":"We introduce Hierarchical Procrustes Matching (HPM), a segment-based shape matching algorithm which avoids problems associated with purely global or local methods and performs well on benchmark shape retrieval tests. The simplicity of the shape representation leads to a powerful matching algorithm which incorporates intuitive ideas about the perceptual nature of shape while being computationally efficient. This includes the ability to match similar parts even when they occur at different scales or positions. While comparison of multiscale shape representations is typically based on specific features, HPM avoids the need to extract such features. The hierarchical structure of the algorithm captures the appealing notion that matching should proceed in a global to local direction.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121415418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The detection and tracking of three-dimensional human body models has progressed rapidly but successful approaches typically rely on accurate foreground silhouettes obtained using background segmentation. There are many practical applications where such information is imprecise. Here we develop a new image likelihood function based on the visual appearance of the subject being tracked. We propose a robust, adaptive, appearance model based on the Wandering-Stable-Lost framework extended to the case of articulated body parts. The method models appearance using a mixture model that includes an adaptive template, frame-to-frame matching and an outlier process. We employ an annealed particle filtering algorithm for inference and take advantage of the 3D body model to predict selfocclusion and improve pose estimation accuracy. Quantitative tracking results are presented for a walking sequence with a 180 degree turn, captured with four synchronized and calibrated cameras and containing significant appearance changes and self-occlusion in each view.
{"title":"An Adaptive Appearance Model Approach for Model-based Articulated Object Tracking","authors":"A. O. Balan, Michael J. Black","doi":"10.1109/CVPR.2006.52","DOIUrl":"https://doi.org/10.1109/CVPR.2006.52","url":null,"abstract":"The detection and tracking of three-dimensional human body models has progressed rapidly but successful approaches typically rely on accurate foreground silhouettes obtained using background segmentation. There are many practical applications where such information is imprecise. Here we develop a new image likelihood function based on the visual appearance of the subject being tracked. We propose a robust, adaptive, appearance model based on the Wandering-Stable-Lost framework extended to the case of articulated body parts. The method models appearance using a mixture model that includes an adaptive template, frame-to-frame matching and an outlier process. We employ an annealed particle filtering algorithm for inference and take advantage of the 3D body model to predict selfocclusion and improve pose estimation accuracy. Quantitative tracking results are presented for a walking sequence with a 180 degree turn, captured with four synchronized and calibrated cameras and containing significant appearance changes and self-occlusion in each view.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128747896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-scale representations are motivated by the scale invariant properties of natural images. While many low level statistical measures, such as the local mean and variance of intensity, behave in a scale invariant manner, there are many higher order deviations from scale invariance where zero-crossings merge and disappear. Such scale variant behavior is important information to represent because it is not easily predicted from lower resolution data. A scale variant image pyramid is a representation that separates this information from the more redundant and predictable scale invariant information.
{"title":"Scale Variant Image Pyramids","authors":"J. Gluckman","doi":"10.1109/CVPR.2006.265","DOIUrl":"https://doi.org/10.1109/CVPR.2006.265","url":null,"abstract":"Multi-scale representations are motivated by the scale invariant properties of natural images. While many low level statistical measures, such as the local mean and variance of intensity, behave in a scale invariant manner, there are many higher order deviations from scale invariance where zero-crossings merge and disappear. Such scale variant behavior is important information to represent because it is not easily predicted from lower resolution data. A scale variant image pyramid is a representation that separates this information from the more redundant and predictable scale invariant information.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114287903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Akselrod-Ballin, M. Galun, R. Basri, A. Brandt, M. Gomori, M. Filippi, P. Valsasina
We present a novel multiscale approach that combines segmentation with classification to detect abnormal brain structures in medical imagery, and demonstrate its utility in detecting multiple sclerosis lesions in 3D MRI data. Our method uses segmentation to obtain a hierarchical decomposition of a multi-channel, anisotropic MRI scan. It then produces a rich set of features describing the segments in terms of intensity, shape, location, and neighborhood relations. These features are then fed into a decision tree-based classifier, trained with data labeled by experts, enabling the detection of lesions in all scales. Unlike common approaches that use voxel-by-voxel analysis, our system can utilize regional properties that are often important for characterizing abnormal brain structures. We provide experiments showing successful detections of lesions in both simulated and real MR images.
{"title":"An Integrated Segmentation and Classification Approach Applied to Multiple Sclerosis Analysis","authors":"A. Akselrod-Ballin, M. Galun, R. Basri, A. Brandt, M. Gomori, M. Filippi, P. Valsasina","doi":"10.1109/CVPR.2006.55","DOIUrl":"https://doi.org/10.1109/CVPR.2006.55","url":null,"abstract":"We present a novel multiscale approach that combines segmentation with classification to detect abnormal brain structures in medical imagery, and demonstrate its utility in detecting multiple sclerosis lesions in 3D MRI data. Our method uses segmentation to obtain a hierarchical decomposition of a multi-channel, anisotropic MRI scan. It then produces a rich set of features describing the segments in terms of intensity, shape, location, and neighborhood relations. These features are then fed into a decision tree-based classifier, trained with data labeled by experts, enabling the detection of lesions in all scales. Unlike common approaches that use voxel-by-voxel analysis, our system can utilize regional properties that are often important for characterizing abnormal brain structures. We provide experiments showing successful detections of lesions in both simulated and real MR images.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114450607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A person’s gait changes when he or she is carrying an object such as a bag, suitcase or rucksack. As a result, human identification and tracking are made more difficult because the averaged gait image is too simple to represent the carrying status. Therefore, in this paper we first introduce a set of Gabor based human gait appearance models, because Gabor functions are similar to the receptive field profiles in the mammalian cortical simple cells. The very high dimensionality of the feature space makes training difficult. In order to solve this problem we propose a general tensor discriminant analysis (GTDA), which seamlessly incorporates the object (Gabor based human gait appearance model) structure information as a natural constraint. GTDA differs from the previous tensor based discriminant analysis methods in that the training converges. Existing methods fail to converge in the training stage. This makes them unsuitable for practical tasks. Experiments are carried out on the USF baseline data set to recognize a human’s ID from the gait silhouette. The proposed Gabor gait incorporated with GTDA is demonstrated to significantly outperform the existing appearance-based methods.
{"title":"Human Carrying Status in Visual Surveillance","authors":"D. Tao, Xuelong Li, S. Maybank, Xindong Wu","doi":"10.1109/CVPR.2006.138","DOIUrl":"https://doi.org/10.1109/CVPR.2006.138","url":null,"abstract":"A person’s gait changes when he or she is carrying an object such as a bag, suitcase or rucksack. As a result, human identification and tracking are made more difficult because the averaged gait image is too simple to represent the carrying status. Therefore, in this paper we first introduce a set of Gabor based human gait appearance models, because Gabor functions are similar to the receptive field profiles in the mammalian cortical simple cells. The very high dimensionality of the feature space makes training difficult. In order to solve this problem we propose a general tensor discriminant analysis (GTDA), which seamlessly incorporates the object (Gabor based human gait appearance model) structure information as a natural constraint. GTDA differs from the previous tensor based discriminant analysis methods in that the training converges. Existing methods fail to converge in the training stage. This makes them unsuitable for practical tasks. Experiments are carried out on the USF baseline data set to recognize a human’s ID from the gait silhouette. The proposed Gabor gait incorporated with GTDA is demonstrated to significantly outperform the existing appearance-based methods.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125436402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}