Kosei Kurisu, N. Suematsu, Kazunori Iwata, A. Hayashi
Finite mixture modeling has been widely used for image segmentation. However, since it takes no account of the spatial correlation among pixels in its standard form, its segmentation accuracy can be heavily deteriorated by noise in images. To improve segmentation accuracy in noisy images, the spatially variant finite mixture model has been proposed, in which a Markov Random Filed (MRF) is used as the prior for the mixing proportions and its parameters are estimated using the Expectation-Maximization (EM) algorithm based on the maximum a posteriori (MAP) criterion. In this paper, we propose a spatially correlated mixture model in which the mixing proportions are governed by a set of underlying functions whose common prior distribution is a Gaussian process. The spatial correlation can be expressed with a Gaussian process easily and flexibly. Given an image, the underlying functions are estimated by using a quasi EM algorithm and used to segment the image. The effectiveness of the proposed technique is demonstrated by an experiment with synthetic images.
{"title":"Image Segmentation Using a Spatially Correlated Mixture Model with Gaussian Process Priors","authors":"Kosei Kurisu, N. Suematsu, Kazunori Iwata, A. Hayashi","doi":"10.1109/ACPR.2013.21","DOIUrl":"https://doi.org/10.1109/ACPR.2013.21","url":null,"abstract":"Finite mixture modeling has been widely used for image segmentation. However, since it takes no account of the spatial correlation among pixels in its standard form, its segmentation accuracy can be heavily deteriorated by noise in images. To improve segmentation accuracy in noisy images, the spatially variant finite mixture model has been proposed, in which a Markov Random Filed (MRF) is used as the prior for the mixing proportions and its parameters are estimated using the Expectation-Maximization (EM) algorithm based on the maximum a posteriori (MAP) criterion. In this paper, we propose a spatially correlated mixture model in which the mixing proportions are governed by a set of underlying functions whose common prior distribution is a Gaussian process. The spatial correlation can be expressed with a Gaussian process easily and flexibly. Given an image, the underlying functions are estimated by using a quasi EM algorithm and used to segment the image. The effectiveness of the proposed technique is demonstrated by an experiment with synthetic images.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134000341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous Face Recognition (HFR) refers to matching probe face images to a gallery of face images taken from alternate imaging modality, for example matching near infrared (NIR) face images to photographs. Matching heterogeneous face images has important practical applications such as surveillance and forensics, which is yet a challenging problem in face recognition community due to the large within-class discrepancy incurred from modality differences. In this paper, a novel feature descriptor is proposed in which the features of both gallery and probe face images are extracted with an adaptive feature descriptor which can maximize the correlation of the encoded face images between the modalities, so as to reduce the within-class variations at the feature extraction stage. The effectiveness of the proposed approach is demonstrated on the scenario of matching NIR face images to photographs based on a very large dataset consists of 2800 different persons.
{"title":"A Maximum Correlation Feature Descriptor for Heterogeneous Face Recognition","authors":"Dihong Gong, J. Zheng","doi":"10.1109/ACPR.2013.12","DOIUrl":"https://doi.org/10.1109/ACPR.2013.12","url":null,"abstract":"Heterogeneous Face Recognition (HFR) refers to matching probe face images to a gallery of face images taken from alternate imaging modality, for example matching near infrared (NIR) face images to photographs. Matching heterogeneous face images has important practical applications such as surveillance and forensics, which is yet a challenging problem in face recognition community due to the large within-class discrepancy incurred from modality differences. In this paper, a novel feature descriptor is proposed in which the features of both gallery and probe face images are extracted with an adaptive feature descriptor which can maximize the correlation of the encoded face images between the modalities, so as to reduce the within-class variations at the feature extraction stage. The effectiveness of the proposed approach is demonstrated on the scenario of matching NIR face images to photographs based on a very large dataset consists of 2800 different persons.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133790830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the problem of cross-modal retrieval, where users can search results across various modalities by submitting any modality of query. Since the query and its retrieved results can be of different modalities, how to measure the content similarity between different modalities of data remains a challenge. To address this problem, we propose a joint graph regularized multi-modal subspace learning (JGRMSL) algorithm, which integrates inter-modality similarities and intra-modality similarities into a joint graph regularization to better explore the cross-modal correlation and the local manifold structure in each modality of data. To obtain good class separation, the idea of Linear Discriminant Analysis (LDA) is incorporated into the proposed method by maximizing the between-class covariance of all projected data and minimizing the within-class covariance of all projected data. Experimental results on two public cross-modal datasets demonstrate the effectiveness of our algorithm.
{"title":"Multi-modal Subspace Learning with Joint Graph Regularization for Cross-Modal Retrieval","authors":"K. Wang, Wei Wang, R. He, Liang Wang, T. Tan","doi":"10.1109/ACPR.2013.44","DOIUrl":"https://doi.org/10.1109/ACPR.2013.44","url":null,"abstract":"This paper investigates the problem of cross-modal retrieval, where users can search results across various modalities by submitting any modality of query. Since the query and its retrieved results can be of different modalities, how to measure the content similarity between different modalities of data remains a challenge. To address this problem, we propose a joint graph regularized multi-modal subspace learning (JGRMSL) algorithm, which integrates inter-modality similarities and intra-modality similarities into a joint graph regularization to better explore the cross-modal correlation and the local manifold structure in each modality of data. To obtain good class separation, the idea of Linear Discriminant Analysis (LDA) is incorporated into the proposed method by maximizing the between-class covariance of all projected data and minimizing the within-class covariance of all projected data. Experimental results on two public cross-modal datasets demonstrate the effectiveness of our algorithm.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133930053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper mainly introduces the techniques required for a future system, called Magic Mirror. Imagine when you wake up in the morning and prepare for the coming day, the Magic Mirror will automatically recommend to you the most appropriate styles of hair, makeup, and dressing, according to the events and activities on your calendar, with which it is linked, so that you can present yourself on these occasions with elegant and suitable appearance. The work shall focus on the mathematical models for these tasks, particularly on how to model the relations between low-level human body features, middle-level facial/body attributes, and high-level recommendations. Being automatic and intelligent are the two main characteristics of the system, and this work shall show two prototype sub-systems related with the whole Magic Mirror system.
{"title":"Magic Mirror: An Intelligent Fashion Recommendation System","authors":"Si Liu, Luoqi Liu, Shuicheng Yan","doi":"10.1109/ACPR.2013.212","DOIUrl":"https://doi.org/10.1109/ACPR.2013.212","url":null,"abstract":"This paper mainly introduces the techniques required for a future system, called Magic Mirror. Imagine when you wake up in the morning and prepare for the coming day, the Magic Mirror will automatically recommend to you the most appropriate styles of hair, makeup, and dressing, according to the events and activities on your calendar, with which it is linked, so that you can present yourself on these occasions with elegant and suitable appearance. The work shall focus on the mathematical models for these tasks, particularly on how to model the relations between low-level human body features, middle-level facial/body attributes, and high-level recommendations. Being automatic and intelligent are the two main characteristics of the system, and this work shall show two prototype sub-systems related with the whole Magic Mirror system.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122289356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runze Zhang, Ruiling Deng, Xin He, Gang Zeng, Rui Gan, H. Zha
With strong inference of hierarchical and repetitive structures, semantic information has been widely used in dealing with urban scenes. In this paper, we present a super-pixel-based facade parsing framework which combines the top-down shape grammar splitting with bottom-up information aggregation: machine learning forecasts prior classes, super-pixels improve compactness, and boundary estimation divides the splitting into two procedures - raw and fine, providing a reasonable initial guess for the latter to achieve better random walk optimization results. We also put forward the correlation judging between floors for the purpose of compromising freedom degree reduction with style variety and flexibility, which is also introduced as alignment constraint term to extend the probability energy. Experiments show that our method converges fast and achieves the state-of-the-art results for different styles. Further study on understanding and reconstruction is in progress of exploiting these results.
{"title":"Correlation-Based Facade Parsing Using Shape Grammar","authors":"Runze Zhang, Ruiling Deng, Xin He, Gang Zeng, Rui Gan, H. Zha","doi":"10.1109/ACPR.2013.81","DOIUrl":"https://doi.org/10.1109/ACPR.2013.81","url":null,"abstract":"With strong inference of hierarchical and repetitive structures, semantic information has been widely used in dealing with urban scenes. In this paper, we present a super-pixel-based facade parsing framework which combines the top-down shape grammar splitting with bottom-up information aggregation: machine learning forecasts prior classes, super-pixels improve compactness, and boundary estimation divides the splitting into two procedures - raw and fine, providing a reasonable initial guess for the latter to achieve better random walk optimization results. We also put forward the correlation judging between floors for the purpose of compromising freedom degree reduction with style variety and flexibility, which is also introduced as alignment constraint term to extend the probability energy. Experiments show that our method converges fast and achieves the state-of-the-art results for different styles. Further study on understanding and reconstruction is in progress of exploiting these results.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116170001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a real-time object recognition method for a smart phone, which consists of light-weight local features, Fisher Vector and linear SVM. As light local descriptors, we adopt a HOG Patch descriptor and a Color Patch descriptor, and sample them from an image densely. Then we encode them with Fisher Vector representation, which can save the number of visual words greatly. As a classifier, we use a liner SVM the computational cost of which is very low. In the experiments, we have achieved the 79.2% classification rate for the top 5 category candidates for a 100-category food dataset. It outperformed the results using a conventional bag-of-features representation with a chi-square-RBF-kernel-based SVM. Moreover, the processing time of food recognition takes only 0.065 seconds, which is four times as faster as the existing work.
{"title":"Rapid Mobile Object Recognition Using Fisher Vector","authors":"Yoshiyuki Kawano, Keiji Yanai","doi":"10.1109/ACPR.2013.39","DOIUrl":"https://doi.org/10.1109/ACPR.2013.39","url":null,"abstract":"We propose a real-time object recognition method for a smart phone, which consists of light-weight local features, Fisher Vector and linear SVM. As light local descriptors, we adopt a HOG Patch descriptor and a Color Patch descriptor, and sample them from an image densely. Then we encode them with Fisher Vector representation, which can save the number of visual words greatly. As a classifier, we use a liner SVM the computational cost of which is very low. In the experiments, we have achieved the 79.2% classification rate for the top 5 category candidates for a 100-category food dataset. It outperformed the results using a conventional bag-of-features representation with a chi-square-RBF-kernel-based SVM. Moreover, the processing time of food recognition takes only 0.065 seconds, which is four times as faster as the existing work.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114897434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Local color correction methods transfer colors between corresponding regions. However, inconsistent segmentation between the source image and the target image tends to degrade the correction result. In this paper, we propose a local color correction technique for coarsely registered images. In the segmentation step, it enforces the consistent segmentation on both source and target images to alleviate the inaccurate registration problem. In the color transfer step, it uses the region confidences and the bilateral-filter-like color influence maps to improve the color correction result. The experiment shows the proposed method achieves improved color correction results compared with the global methods and the recent local color correction methods.
{"title":"Consistent Segmentation Based Color Correction for Coarsely Registered Images","authors":"Haoxing Wang, Longquan Dai, Xiaopeng Zhang","doi":"10.1109/ACPR.2013.72","DOIUrl":"https://doi.org/10.1109/ACPR.2013.72","url":null,"abstract":"Local color correction methods transfer colors between corresponding regions. However, inconsistent segmentation between the source image and the target image tends to degrade the correction result. In this paper, we propose a local color correction technique for coarsely registered images. In the segmentation step, it enforces the consistent segmentation on both source and target images to alleviate the inaccurate registration problem. In the color transfer step, it uses the region confidences and the bilateral-filter-like color influence maps to improve the color correction result. The experiment shows the proposed method achieves improved color correction results compared with the global methods and the recent local color correction methods.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"22 3 Suppl 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115350663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extracting planar surfaces from 3D point clouds is an important and challenging step for generating building models as the obtained data are always noisy, missing and unorganised. In this paper, we present a novel graph Laplacian regularized K-planes method for segmenting piece-wise planar surfaces of urban building point clouds. The core ideas behind our model are from two aspects: 1) a linear projection model is utilized to fit planar surfaces globally, 2) a graph Laplacian regularization is applied to preserve smoothness of each plane locally. The two terms are combined as an objective function, which is minimized via an iterative updating algorithm. Comparative experiments on both synthetic and real data sets are performed. The results demonstrate the effectiveness and efficiency of our method.
{"title":"Planar Segmentation from Point Clouds via Graph Laplacian Regularized K-Planes","authors":"Wei Sui, Lingfeng Wang, Huai-Yu Wu, Chunhong Pan","doi":"10.1109/ACPR.2013.15","DOIUrl":"https://doi.org/10.1109/ACPR.2013.15","url":null,"abstract":"Extracting planar surfaces from 3D point clouds is an important and challenging step for generating building models as the obtained data are always noisy, missing and unorganised. In this paper, we present a novel graph Laplacian regularized K-planes method for segmenting piece-wise planar surfaces of urban building point clouds. The core ideas behind our model are from two aspects: 1) a linear projection model is utilized to fit planar surfaces globally, 2) a graph Laplacian regularization is applied to preserve smoothness of each plane locally. The two terms are combined as an objective function, which is minimized via an iterative updating algorithm. Comparative experiments on both synthetic and real data sets are performed. The results demonstrate the effectiveness and efficiency of our method.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114665793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human age estimation based on facial images has many potential applications in practice. However, the current age estimation techniques are not matured. Most studies focus only on neutral faces, that is, expressionless faces. Several expressions such as happy expression, may help to improve the prediction accuracy. Recently, some works reported that expressions could badly impact on the accuracy. In this paper, we investigated the degree of facial expression impact on age prediction subjectively and objectively. It was revealed that expressions do not contribute for age prediction so much.
{"title":"How Do Facial Expressions Contribute to Age Prediction?","authors":"Yingmei Piao, Mineichi Kudo","doi":"10.1109/ACPR.2013.161","DOIUrl":"https://doi.org/10.1109/ACPR.2013.161","url":null,"abstract":"Human age estimation based on facial images has many potential applications in practice. However, the current age estimation techniques are not matured. Most studies focus only on neutral faces, that is, expressionless faces. Several expressions such as happy expression, may help to improve the prediction accuracy. Recently, some works reported that expressions could badly impact on the accuracy. In this paper, we investigated the degree of facial expression impact on age prediction subjectively and objectively. It was revealed that expressions do not contribute for age prediction so much.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121454415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel method, namely nuclear norm based 2DPCA (N-2DPCA), for image feature extraction. Unlike the conventional 2DPCA, N-2DPCA uses a nuclear norm based reconstruction error criterion. The criterion is minimized by converting the nuclear norm based optimization problem into a series of F-norm based optimization problems. N-2DPCA is applied to face recognition and is evaluated using the Extended Yale B and CMU PIE databases. Experimental results demonstrate that our method is more effective and robust than PCA, 2DPCA and L1-Norm based 2DPCA.
{"title":"Nuclear Norm Based 2DPCA","authors":"Fanlong Zhang, J. Qian, Jian Yang","doi":"10.1109/ACPR.2013.10","DOIUrl":"https://doi.org/10.1109/ACPR.2013.10","url":null,"abstract":"This paper presents a novel method, namely nuclear norm based 2DPCA (N-2DPCA), for image feature extraction. Unlike the conventional 2DPCA, N-2DPCA uses a nuclear norm based reconstruction error criterion. The criterion is minimized by converting the nuclear norm based optimization problem into a series of F-norm based optimization problems. N-2DPCA is applied to face recognition and is evaluated using the Extended Yale B and CMU PIE databases. Experimental results demonstrate that our method is more effective and robust than PCA, 2DPCA and L1-Norm based 2DPCA.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125464308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}