Large amount of labeled training data is required to develop robust and effective facial expression analysis methods. However, obtaining such data is typically a tedious and time-consuming task that is proportional to the size of the database. Due to the rapid advance of Internet and Web technologies, it is now feasible to collect a tremendous number of images with potential label information at a low cost of human effort. Therefore, this paper proposes a framework to collect realistic facial expression images from the web so as to promote further research on robust facial expression recognition. Due to the limitation of current commercial web search engines, a large fraction of returned images is not related to the query keyword. We present a SVM based active learning approach to selecting relevant images from noisy image search results. The resulting database is more diverse with more sample images, compared with other well established facial expression databases CK and JAFFE. Experimental results demonstrate that the generalization of our web based database outperforms those two existing databases. It is anticipated that further research on facial expression recognition or even affective computing will not be restricted to traditional 7 categories only.
{"title":"Harvesting Web Images for Realistic Facial Expression Recognition","authors":"Kaimin Yu, Zhiyong Wang, L. Zhuo, D. Feng","doi":"10.1109/DICTA.2010.93","DOIUrl":"https://doi.org/10.1109/DICTA.2010.93","url":null,"abstract":"Large amount of labeled training data is required to develop robust and effective facial expression analysis methods. However, obtaining such data is typically a tedious and time-consuming task that is proportional to the size of the database. Due to the rapid advance of Internet and Web technologies, it is now feasible to collect a tremendous number of images with potential label information at a low cost of human effort. Therefore, this paper proposes a framework to collect realistic facial expression images from the web so as to promote further research on robust facial expression recognition. Due to the limitation of current commercial web search engines, a large fraction of returned images is not related to the query keyword. We present a SVM based active learning approach to selecting relevant images from noisy image search results. The resulting database is more diverse with more sample images, compared with other well established facial expression databases CK and JAFFE. Experimental results demonstrate that the generalization of our web based database outperforms those two existing databases. It is anticipated that further research on facial expression recognition or even affective computing will not be restricted to traditional 7 categories only.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121693890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human detection has been widely used in many applications. In the meantime, it is still a difficult problem with many open questions due to challenges caused by various factors such as clothing, posture and etc. By investigating several benchmark methods and frameworks in the literature, this paper proposes a novel method which successfully implements the Real AdaBoost training procedure on multi-scale images. Various object features are exposed on multiple levels. To further boost the overall performance, a fusion scheme is established using scores obtained at various levels which integrates decision results with different scales to make the final decision. Unlike other score-based fusion methods, this paper re-formulates the fusion process through a supervised learning. Therefore, our fusion approach can better distinguish subtle difference between human objects and non-human objects. Furthermore, in our approach, we are able to use simpler weak features for boosting and hence alleviate the training complexity existed in most of AdaBoost training approaches. Encouraging results are obtained on a well recognized benchmark database.
{"title":"Adaptive Stick-Like Features for Human Detection Based on Multi-scale Feature Fusion Scheme","authors":"Sheng Wang, Ruo Du, Qiang Wu, Xiangjian He","doi":"10.1109/DICTA.2010.70","DOIUrl":"https://doi.org/10.1109/DICTA.2010.70","url":null,"abstract":"Human detection has been widely used in many applications. In the meantime, it is still a difficult problem with many open questions due to challenges caused by various factors such as clothing, posture and etc. By investigating several benchmark methods and frameworks in the literature, this paper proposes a novel method which successfully implements the Real AdaBoost training procedure on multi-scale images. Various object features are exposed on multiple levels. To further boost the overall performance, a fusion scheme is established using scores obtained at various levels which integrates decision results with different scales to make the final decision. Unlike other score-based fusion methods, this paper re-formulates the fusion process through a supervised learning. Therefore, our fusion approach can better distinguish subtle difference between human objects and non-human objects. Furthermore, in our approach, we are able to use simpler weak features for boosting and hence alleviate the training complexity existed in most of AdaBoost training approaches. Encouraging results are obtained on a well recognized benchmark database.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115967849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a unified framework for recognizing human action in video using human pose estimation. Due to high variation of human appearance and noisy context background, accurate human pose analysis is hard to achieve and rarely employed for the task of action recognition. In our approach, we take advantage of the current success of human detection and view invariability of local feature-based approach to design a pose-based action recognition system. We begin with a frame-wise human detection step to initialize the search space for human local parts, then integrate the detected parts into human kinematic structure using a tree structural graphical model. The final human articulation configuration is eventually used to infer the action class being performed based on each single part behavior and the overall structure variation. In our work, we also show that even with imprecise pose estimation, accurate action recognition can still be achieved based on informative clues from the overall pose part configuration. The promising results obtained from action recognition benchmark have proven our proposed framework is comparable to the existing state-of-the-art action recognition algorithms.
{"title":"Human Action Recognition from Boosted Pose Estimation","authors":"Li Wang, Li Cheng, Tuan Hue Thi, Jian Zhang","doi":"10.1109/DICTA.2010.60","DOIUrl":"https://doi.org/10.1109/DICTA.2010.60","url":null,"abstract":"This paper presents a unified framework for recognizing human action in video using human pose estimation. Due to high variation of human appearance and noisy context background, accurate human pose analysis is hard to achieve and rarely employed for the task of action recognition. In our approach, we take advantage of the current success of human detection and view invariability of local feature-based approach to design a pose-based action recognition system. We begin with a frame-wise human detection step to initialize the search space for human local parts, then integrate the detected parts into human kinematic structure using a tree structural graphical model. The final human articulation configuration is eventually used to infer the action class being performed based on each single part behavior and the overall structure variation. In our work, we also show that even with imprecise pose estimation, accurate action recognition can still be achieved based on informative clues from the overall pose part configuration. The promising results obtained from action recognition benchmark have proven our proposed framework is comparable to the existing state-of-the-art action recognition algorithms.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123819106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human action recognition can be approached by combining an action-discriminative feature set with a classifier. However, the dimensionality of typical feature sets joint with that of the time dimension often leads to a curse-of-dimensionality situation. Moreover, the measurement of the feature set is subject to sometime severe errors. This paper presents an approach to human action recognition based on robust dimensionality reduction. The observation probabilities of hidden Markov models (HMM) are modelled by mixtures of probabilistic principal components analyzers and mixtures of $t$-distribution sub-spaces, and compared with conventional Gaussian mixture models. Experimental results on two datasets show that dimensionality reduction helps improve the classification accuracy and that the heavier-tailed $t$-distribution can help reduce the impact of outliers generated by segmentation errors.
{"title":"Robust Dimensionality Reduction for Human Action Recognition","authors":"Óscar Pérez, R. Xu, M. Piccardi","doi":"10.1109/DICTA.2010.66","DOIUrl":"https://doi.org/10.1109/DICTA.2010.66","url":null,"abstract":"Human action recognition can be approached by combining an action-discriminative feature set with a classifier. However, the dimensionality of typical feature sets joint with that of the time dimension often leads to a curse-of-dimensionality situation. Moreover, the measurement of the feature set is subject to sometime severe errors. This paper presents an approach to human action recognition based on robust dimensionality reduction. The observation probabilities of hidden Markov models (HMM) are modelled by mixtures of probabilistic principal components analyzers and mixtures of $t$-distribution sub-spaces, and compared with conventional Gaussian mixture models. Experimental results on two datasets show that dimensionality reduction helps improve the classification accuracy and that the heavier-tailed $t$-distribution can help reduce the impact of outliers generated by segmentation errors.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121533582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Chojnacki, Zygmunt L. Szpak, M. Brooks, A. Hengel
A novel approach is presented to estimating a set of interdependent homography matrices linked together by latent variables. The approach allows enforcement of all underlying consistency constraints while accounting for the arbitrariness of the scale of each individual matrix. The input data is assumed to be in the form of a set of homography matrices obtained by estimation from image data with the consistency constraints ignored, appended by a set of error covariances associated with these matrix estimates. A cost function is proposed for upgrading, via optimisation, the input data to a set of homography matrices satisfying the constraints. The function is invariant to a change of any of the individual scales of the input matrices. The proposed approach is applied to the particular problem of estimating a set of homography matrices induced by multiple planes in the 3D scene between two views. Experimental results are given which demonstrate the effectiveness of the approach.
{"title":"Multiple Homography Estimation with Full Consistency Constraints","authors":"W. Chojnacki, Zygmunt L. Szpak, M. Brooks, A. Hengel","doi":"10.1109/DICTA.2010.87","DOIUrl":"https://doi.org/10.1109/DICTA.2010.87","url":null,"abstract":"A novel approach is presented to estimating a set of interdependent homography matrices linked together by latent variables. The approach allows enforcement of all underlying consistency constraints while accounting for the arbitrariness of the scale of each individual matrix. The input data is assumed to be in the form of a set of homography matrices obtained by estimation from image data with the consistency constraints ignored, appended by a set of error covariances associated with these matrix estimates. A cost function is proposed for upgrading, via optimisation, the input data to a set of homography matrices satisfying the constraints. The function is invariant to a change of any of the individual scales of the input matrices. The proposed approach is applied to the particular problem of estimating a set of homography matrices induced by multiple planes in the 3D scene between two views. Experimental results are given which demonstrate the effectiveness of the approach.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127535889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anucha Tungkatsathan, W. Premchaiswadi, Nucharee Premchaiswadi
One of the major challenges for region-based image retrieval is to identify the Region of Interest (ROI) that comprises object queries. However, automatically identifying the regions or objects of interest in a natural scene is a very difficult task because the content is complex and can be any shape. In this paper, we present a novel unsupervised detection method to automatically and efficiently minimize the ROI in the images. We applied an edge-based active contour model that drew upon edge information in local regions. The mathematical implementation of the proposed active contour model was accomplished using a variational level set formulation. In addition, the mean-shift algorithm was used to reduce the sensitivity of parameter change of level set formulation. The results show that our method can overcome the difficulties of non-uniform sub-region and intensity in homogeneities in natural image segmentation.
{"title":"Unsupervised Detection for Minimizing a Region of Interest around Distinct Object in Natural Images","authors":"Anucha Tungkatsathan, W. Premchaiswadi, Nucharee Premchaiswadi","doi":"10.1109/DICTA.2010.45","DOIUrl":"https://doi.org/10.1109/DICTA.2010.45","url":null,"abstract":"One of the major challenges for region-based image retrieval is to identify the Region of Interest (ROI) that comprises object queries. However, automatically identifying the regions or objects of interest in a natural scene is a very difficult task because the content is complex and can be any shape. In this paper, we present a novel unsupervised detection method to automatically and efficiently minimize the ROI in the images. We applied an edge-based active contour model that drew upon edge information in local regions. The mathematical implementation of the proposed active contour model was accomplished using a variational level set formulation. In addition, the mean-shift algorithm was used to reduce the sensitivity of parameter change of level set formulation. The results show that our method can overcome the difficulties of non-uniform sub-region and intensity in homogeneities in natural image segmentation.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"2005 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128809448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the normalised information distance (NID) proposed by Bennet et~al~(1998) as an approach to measure the visual similarity (or dissimilarity) of images. Earlier studies suggest that compression-based approximations to the NID can yield dissimilarity measures that correlate well with visual comparisons. However, results also indicate that conventional feature-based dissimilarity measures often outperform those that are based on the NID. This paper proposes that a theoretical decomposition of the NID can help explain why the NID-based dissimilarity measures might not perform well compared to feature-based approaches. The theoretical decomposition considers the perceptually relevant and irrelevant information content for image similarity. We illustrate how the NID-based dissimilarity measures could be improved by discarding the irrelevant information, and applying the NID on only the relevant information.
{"title":"Focusing the Normalised Information Distance on the Relevant Information Content for Image Similarity","authors":"Joselíto J. Chua, P. Tischer","doi":"10.1109/DICTA.2010.10","DOIUrl":"https://doi.org/10.1109/DICTA.2010.10","url":null,"abstract":"This paper investigates the normalised information distance (NID) proposed by Bennet et~al~(1998) as an approach to measure the visual similarity (or dissimilarity) of images. Earlier studies suggest that compression-based approximations to the NID can yield dissimilarity measures that correlate well with visual comparisons. However, results also indicate that conventional feature-based dissimilarity measures often outperform those that are based on the NID. This paper proposes that a theoretical decomposition of the NID can help explain why the NID-based dissimilarity measures might not perform well compared to feature-based approaches. The theoretical decomposition considers the perceptually relevant and irrelevant information content for image similarity. We illustrate how the NID-based dissimilarity measures could be improved by discarding the irrelevant information, and applying the NID on only the relevant information.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132441246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyperspectral unmixing is a crucial preprocessing step for material classification and recognition. In the last decade, nonnegative matrix factorization (NMF) and its extensions have been intensively studied to unmix hyperspectral imagery and recover the material end-members. As an important constraint, sparsity has been modeled making use of L1 or L2 regularizers. However, the full additivity constraint of material abundances is often overlooked, hence, limiting the practical efficacy of these methods. In this paper, we extend the NMF algorithm by incorporating the L1/2 sparsity constraint. The L1/2-NMF provides more sparse and accurate results than the other regularizers by considering the end-member additivity constraint explicitly in the optimisation process. Experiments on the synthetic and real hyperspectral data validate the proposed algorithm.
{"title":"L1/2 Sparsity Constrained Nonnegative Matrix Factorization for Hyperspectral Unmixing","authors":"Y. Qian, Sen Jia, J. Zhou, A. Robles-Kelly","doi":"10.1109/DICTA.2010.82","DOIUrl":"https://doi.org/10.1109/DICTA.2010.82","url":null,"abstract":"Hyperspectral unmixing is a crucial preprocessing step for material classification and recognition. In the last decade, nonnegative matrix factorization (NMF) and its extensions have been intensively studied to unmix hyperspectral imagery and recover the material end-members. As an important constraint, sparsity has been modeled making use of L1 or L2 regularizers. However, the full additivity constraint of material abundances is often overlooked, hence, limiting the practical efficacy of these methods. In this paper, we extend the NMF algorithm by incorporating the L1/2 sparsity constraint. The L1/2-NMF provides more sparse and accurate results than the other regularizers by considering the end-member additivity constraint explicitly in the optimisation process. Experiments on the synthetic and real hyperspectral data validate the proposed algorithm.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132259929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a novel approach for fast 3D reconstruction of an object inside a scene by using Inertial Measurement Unit (IMU) data. A network of cameras is used to observe the scene. For each camera within the network, a virtual camera is considered by using the concept of emph{infinite homography}. Such a virtual camera is downward and has optical axis parallel to the gravity vector. Then a set of virtual horizontal 3D planes are considered for the aim of 3D reconstruction. The intersection of these virtual parallel 3D planes with the object is computed using the concept of homography and by applying a 2D Bayesian occupancy grid for each plane. The experimental results validate both feasibility and effectiveness of the proposed method.
{"title":"IMU-Aided 3D Reconstruction Based on Multiple Virtual Planes","authors":"H. Aliakbarpour, J. Dias","doi":"10.1109/DICTA.2010.86","DOIUrl":"https://doi.org/10.1109/DICTA.2010.86","url":null,"abstract":"This paper proposes a novel approach for fast 3D reconstruction of an object inside a scene by using Inertial Measurement Unit (IMU) data. A network of cameras is used to observe the scene. For each camera within the network, a virtual camera is considered by using the concept of emph{infinite homography}. Such a virtual camera is downward and has optical axis parallel to the gravity vector. Then a set of virtual horizontal 3D planes are considered for the aim of 3D reconstruction. The intersection of these virtual parallel 3D planes with the object is computed using the concept of homography and by applying a 2D Bayesian occupancy grid for each plane. The experimental results validate both feasibility and effectiveness of the proposed method.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124197307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Relative motion between a camera and its subject introduces motion blur in captured images. Reconstruction of unblurred images is ill-posed due to the loss of spatial high frequencies. The flutter shutter preserves high frequencies by rapidly opening and closing the shutter during exposure, providing greatly improved reconstruction. We address two open problems in the reconstruction of unblurred images from flutter shutter images. Firstly, we propose a noise reduction technique that reduces reconstruction noise while preserving image detail. Secondly, we propose a semi-automatic technique for estimating the Point Spread Function of the motion blur. Together these techniques provide substantial improvement in reconstruction of flutter shutter images.
{"title":"Improved Reconstruction of Flutter Shutter Images for Motion Blur Reduction","authors":"A. Sarker, Len Hamey","doi":"10.1109/DICTA.2010.77","DOIUrl":"https://doi.org/10.1109/DICTA.2010.77","url":null,"abstract":"Relative motion between a camera and its subject introduces motion blur in captured images. Reconstruction of unblurred images is ill-posed due to the loss of spatial high frequencies. The flutter shutter preserves high frequencies by rapidly opening and closing the shutter during exposure, providing greatly improved reconstruction. We address two open problems in the reconstruction of unblurred images from flutter shutter images. Firstly, we propose a noise reduction technique that reduces reconstruction noise while preserving image detail. Secondly, we propose a semi-automatic technique for estimating the Point Spread Function of the motion blur. Together these techniques provide substantial improvement in reconstruction of flutter shutter images.","PeriodicalId":246460,"journal":{"name":"2010 International Conference on Digital Image Computing: Techniques and Applications","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121187278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}