Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166660
Zhenyu Liu, Jing Tian, Li Chen, Yongtao Wang
Wavelet shrinkage is an image denoising technique based on the concept of thresholding the wavelet coefficients. The key challenge of wavelet shrinkage is to find an appropriate threshold value, which is typically controlled by the signal variance. To tackle this challenge, a new image shrinkage approach is proposed in this paper by using a variance field diffusion, which can provide more accurate variance estimation. Experimental results are provided to demonstrate the superior performance of the proposed approach.
{"title":"Wavelet-domain image shrinkage using variance field diffusion","authors":"Zhenyu Liu, Jing Tian, Li Chen, Yongtao Wang","doi":"10.1109/ACPR.2011.6166660","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166660","url":null,"abstract":"Wavelet shrinkage is an image denoising technique based on the concept of thresholding the wavelet coefficients. The key challenge of wavelet shrinkage is to find an appropriate threshold value, which is typically controlled by the signal variance. To tackle this challenge, a new image shrinkage approach is proposed in this paper by using a variance field diffusion, which can provide more accurate variance estimation. Experimental results are provided to demonstrate the superior performance of the proposed approach.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114696561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166711
Vishal M. Patel, R. Chellappa
In recent years, the theories of Compressive Sensing (CS), Sparse Representation (SR) and Dictionary Learning (DL) have emerged as powerful tools for efficiently processing data in non-traditional ways. An area of promise for these theories is object recognition. In this paper, we review the role of SR, CS and DL for object recognition. Algorithms to perform object recognition using these theories are reviewed. An important aspect in object recognition is feature extraction. Recent works in SR and CS have shown that if sparsity in the recognition problem is properly harnessed then the choice of features is less critical. What becomes critical, however, is the number of features and the sparsity of representation. This issue is discussed in detail.
{"title":"Sparse Representations, Compressive Sensing and dictionaries for pattern recognition","authors":"Vishal M. Patel, R. Chellappa","doi":"10.1109/ACPR.2011.6166711","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166711","url":null,"abstract":"In recent years, the theories of Compressive Sensing (CS), Sparse Representation (SR) and Dictionary Learning (DL) have emerged as powerful tools for efficiently processing data in non-traditional ways. An area of promise for these theories is object recognition. In this paper, we review the role of SR, CS and DL for object recognition. Algorithms to perform object recognition using these theories are reviewed. An important aspect in object recognition is feature extraction. Recent works in SR and CS have shown that if sparsity in the recognition problem is properly harnessed then the choice of features is less critical. What becomes critical, however, is the number of features and the sparsity of representation. This issue is discussed in detail.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166556
Yuji Yamauchi, H. Fujiyoshi
Statistical learning methods for human detection require large quantities of training samples and thus suffer from high sample collection costs. Their detection performance is also liable to be lower when the training samples are collected in a different environment than the one in which the detection system must operate. In this paper we propose a generative learning method that uses the automatic generation of training samples from 3D models together with an advanced MILBoost learning algorithm. In this study, we use a three-dimensional human model to automatically generate positive samples for learning specialized to specific scenes. Negative training samples are collected by random automatic extraction from video stream, but some of these samples may be collected with incorrect labeling. When a classifier is trained by statistical learning using incorrectly labeled training samples, detection performance is impaired. Therefore, in this study an improved version of MILBoost is used to perform generative learning which is immune to the adverse effects of incorrectly labeled samples among the training samples. In evaluation, we found that a classifier trained using training samples generated from a 3D human model was capable of better detection performance than a classifier trained using training samples extracted by hand. The proposed method can also mitigate the degradation of detection performance when there are image of people mixed in with the negative samples used for learning.
{"title":"Automatic generation of training samples and a learning method based on advanced MILBoost for human detection","authors":"Yuji Yamauchi, H. Fujiyoshi","doi":"10.1109/ACPR.2011.6166556","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166556","url":null,"abstract":"Statistical learning methods for human detection require large quantities of training samples and thus suffer from high sample collection costs. Their detection performance is also liable to be lower when the training samples are collected in a different environment than the one in which the detection system must operate. In this paper we propose a generative learning method that uses the automatic generation of training samples from 3D models together with an advanced MILBoost learning algorithm. In this study, we use a three-dimensional human model to automatically generate positive samples for learning specialized to specific scenes. Negative training samples are collected by random automatic extraction from video stream, but some of these samples may be collected with incorrect labeling. When a classifier is trained by statistical learning using incorrectly labeled training samples, detection performance is impaired. Therefore, in this study an improved version of MILBoost is used to perform generative learning which is immune to the adverse effects of incorrectly labeled samples among the training samples. In evaluation, we found that a classifier trained using training samples generated from a 3D human model was capable of better detection performance than a classifier trained using training samples extracted by hand. The proposed method can also mitigate the degradation of detection performance when there are image of people mixed in with the negative samples used for learning.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116943553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166530
Huaping Liu, F. Sun
This paper tries to exploit the joint group intrinsics in face recognition problem by using sparse representation with multiple features. We claim that different feature vectors of one test face image share the same sparsity pattern at the higher group level, but not necessarily at the lower (inside the group) level. This means that they share the same active groups, but not necessarily the same active set. To this end, a hierarchical orthogonal matching pursuit algorithm is developed. The basic idea of this approach is straightforward: At each iteration step, we first select the best group which is shared by different features, then we select the best atoms (within this group) for each feature. This algorithm is very efficient and shows good performance in standard face recognition dataset.
{"title":"Hierarchical orthogonal matching pursuit for face recognition","authors":"Huaping Liu, F. Sun","doi":"10.1109/ACPR.2011.6166530","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166530","url":null,"abstract":"This paper tries to exploit the joint group intrinsics in face recognition problem by using sparse representation with multiple features. We claim that different feature vectors of one test face image share the same sparsity pattern at the higher group level, but not necessarily at the lower (inside the group) level. This means that they share the same active groups, but not necessarily the same active set. To this end, a hierarchical orthogonal matching pursuit algorithm is developed. The basic idea of this approach is straightforward: At each iteration step, we first select the best group which is shared by different features, then we select the best atoms (within this group) for each feature. This algorithm is very efficient and shows good performance in standard face recognition dataset.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122565467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166657
Fei Gao, Xinbo Gao, D. Tao, Xuelong Li, Lihuo He, Wen Lu
No reference image quality assessment (NR-IQA) is to evaluate image quality blindly without the ground truth. Most of the emerging NR-IQA algorithms are only effective for some specific distortion. Universal metrics that can work for various categories of distortions have hardly been explored, and the algorithms available are not fully adequate in performance. In this paper, we study the local dependency (LD) characteristic of natural images, and propose two universal NR-IQA metrics: LD global scheme (LD-GS) and LD two-step scheme (LD-TS). We claim that the local dependency characteristic among wavelet coefficients is disturbed by various distortion processes, and the disturbances are strongly correlated to image qualities. Experimental results on LIVE database II demonstrate that both the proposed metrics are highly consistent with the human perception and outpace the state-of-the-art NR-IQA indexes and some full reference quality indicators for diverse distortions and across the entire database.
{"title":"Universal no reference image quality assessment metrics based on local dependency","authors":"Fei Gao, Xinbo Gao, D. Tao, Xuelong Li, Lihuo He, Wen Lu","doi":"10.1109/ACPR.2011.6166657","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166657","url":null,"abstract":"No reference image quality assessment (NR-IQA) is to evaluate image quality blindly without the ground truth. Most of the emerging NR-IQA algorithms are only effective for some specific distortion. Universal metrics that can work for various categories of distortions have hardly been explored, and the algorithms available are not fully adequate in performance. In this paper, we study the local dependency (LD) characteristic of natural images, and propose two universal NR-IQA metrics: LD global scheme (LD-GS) and LD two-step scheme (LD-TS). We claim that the local dependency characteristic among wavelet coefficients is disturbed by various distortion processes, and the disturbances are strongly correlated to image qualities. Experimental results on LIVE database II demonstrate that both the proposed metrics are highly consistent with the human perception and outpace the state-of-the-art NR-IQA indexes and some full reference quality indicators for diverse distortions and across the entire database.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123331150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166695
Jianwei Ding, Xin Zhao, Kaiqi Huang, T. Tan
Inspired by interactive segmentation algorithms, we propose an online and unsupervised technique to extract moving objects from videos captured by stationary cameras. Our method consists of two main optimization steps, from local optimal extraction to global optimal segmentation. In the first stage, reliable foreground and background pixels are extracted from input image by modeling distributions of foreground and background with color and motion cues. These reliable pixels provide hard constraints for the next step of segmentation. Then global optimal segmentation of moving object is implemented by graph cuts in the second stage. Experimental results on several challenging videos demonstrate the effectiveness and robustness of the proposed approach.
{"title":"Robust moving object segmentation with two-stage optimization","authors":"Jianwei Ding, Xin Zhao, Kaiqi Huang, T. Tan","doi":"10.1109/ACPR.2011.6166695","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166695","url":null,"abstract":"Inspired by interactive segmentation algorithms, we propose an online and unsupervised technique to extract moving objects from videos captured by stationary cameras. Our method consists of two main optimization steps, from local optimal extraction to global optimal segmentation. In the first stage, reliable foreground and background pixels are extracted from input image by modeling distributions of foreground and background with color and motion cues. These reliable pixels provide hard constraints for the next step of segmentation. Then global optimal segmentation of moving object is implemented by graph cuts in the second stage. Experimental results on several challenging videos demonstrate the effectiveness and robustness of the proposed approach.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123592048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166668
Tao Xu, Chao Wang, Yunhong Wang, Zhaoxiang Zhang
Head pose plays an important role in Human-Computer interaction, and its estimation is a challenge problem compared to face detection and recognition in computer vision. In this paper, a novel and efficient method is proposed to estimate head pose in real-time video sequences. A saliency model based segmentation method is used not only to extract feature points of face, but also to update and rectify the location of feature points when missing happened. This step also gives a benchmark for vector generation in pose estimation. In subsequent frames feature points will be tracked by sparse optical flow method and head pose can be determined from vectors generated by feature points between successive frames. Via a voting scheme, these vectors with angle and length can give a robust estimation of the head pose. Compared with other methods, annotated training data set and training procedure is not essential in our method. Initialization and re-initialization can be done automatically and are robust for profile head pose. Experimental results show an efficient and robust estimation of the head pose.
{"title":"Saliency model based head pose estimation by sparse optical flow","authors":"Tao Xu, Chao Wang, Yunhong Wang, Zhaoxiang Zhang","doi":"10.1109/ACPR.2011.6166668","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166668","url":null,"abstract":"Head pose plays an important role in Human-Computer interaction, and its estimation is a challenge problem compared to face detection and recognition in computer vision. In this paper, a novel and efficient method is proposed to estimate head pose in real-time video sequences. A saliency model based segmentation method is used not only to extract feature points of face, but also to update and rectify the location of feature points when missing happened. This step also gives a benchmark for vector generation in pose estimation. In subsequent frames feature points will be tracked by sparse optical flow method and head pose can be determined from vectors generated by feature points between successive frames. Via a voting scheme, these vectors with angle and length can give a robust estimation of the head pose. Compared with other methods, annotated training data set and training procedure is not essential in our method. Initialization and re-initialization can be done automatically and are robust for profile head pose. Experimental results show an efficient and robust estimation of the head pose.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128992857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166605
Liuan Wang, Lin-Lin Huang, Yang Wu
To achieve fast and accurate text detection from videos, we propose an efficient coarse-to-fine scheme comprising three stages: key frame extraction, candidate text line detection and fine text detection. Key frames, which are assumed to carry texts, are extracted based on multi-threshold difference of color histogram (MDCH). From the key frames, candidate text lines are detected by morphological operations and connected component analysis. Sliding window classification is performed on the candidate text lines so as to detect refined text lines. We use two types of features: histogram of gradients (HOG) and local assembled binary (LAB), and two classifiers: Real Adaboost and polynomial neural network (PNN), for improving the classification accuracy. The effectiveness of the proposed method has been demonstrated by the experiment results on a large video dataset. Also, the benefits of key frame extraction and combining multiple features and classifiers have been justified.
{"title":"An efficient coarse-to-fine scheme for text detection in videos","authors":"Liuan Wang, Lin-Lin Huang, Yang Wu","doi":"10.1109/ACPR.2011.6166605","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166605","url":null,"abstract":"To achieve fast and accurate text detection from videos, we propose an efficient coarse-to-fine scheme comprising three stages: key frame extraction, candidate text line detection and fine text detection. Key frames, which are assumed to carry texts, are extracted based on multi-threshold difference of color histogram (MDCH). From the key frames, candidate text lines are detected by morphological operations and connected component analysis. Sliding window classification is performed on the candidate text lines so as to detect refined text lines. We use two types of features: histogram of gradients (HOG) and local assembled binary (LAB), and two classifiers: Real Adaboost and polynomial neural network (PNN), for improving the classification accuracy. The effectiveness of the proposed method has been demonstrated by the experiment results on a large video dataset. Also, the benefits of key frame extraction and combining multiple features and classifiers have been justified.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126904256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166591
ByoungChul Ko, Joon-Young Kwak, June-Hyeok Hong, J. Nam
In this paper, a review of vision-based natural disaster warning methods is presented. Because natural disaster warning is receiving a lot of attention in recent research, a comprehensive review of various disaster-warning techniques developed in recent years is needed. This paper surveys recent studies on warning systems four different types of natural disaster, i.e., wildfire smoke and flame detection, water level detection for flood prevention, and coastal zone monitoring, using computer vision and pattern-recognition techniques. Finally, we conclude with some thoughts about future research directions.
{"title":"Current trend in natural disaster warning systems based on computer vision techniques","authors":"ByoungChul Ko, Joon-Young Kwak, June-Hyeok Hong, J. Nam","doi":"10.1109/ACPR.2011.6166591","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166591","url":null,"abstract":"In this paper, a review of vision-based natural disaster warning methods is presented. Because natural disaster warning is receiving a lot of attention in recent research, a comprehensive review of various disaster-warning techniques developed in recent years is needed. This paper surveys recent studies on warning systems four different types of natural disaster, i.e., wildfire smoke and flame detection, water level detection for flood prevention, and coastal zone monitoring, using computer vision and pattern-recognition techniques. Finally, we conclude with some thoughts about future research directions.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125999631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166623
Jieying She, Duo-Chao Wang, Mingli Song
Image cropping is a technique to help people improve their taken photos' quality by discarding unnecessary parts of a photo. In this paper, we propose a new approach to crop the photo for better composition through learning the structure. Firstly, we classify photos into different categories. Then we extract the graph-based visual saliency map of these photos, based on which we build a dictionary for each categories. Finally, by solving the sparse coding problem of each input photo based on the dictionary, we find a cropped region that can be best decoded by this dictionary. The experimental results demonstrate that our technique is applicable to a wide range of photos and produce more agreeable resulting photos.
{"title":"Automatic image cropping using sparse coding","authors":"Jieying She, Duo-Chao Wang, Mingli Song","doi":"10.1109/ACPR.2011.6166623","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166623","url":null,"abstract":"Image cropping is a technique to help people improve their taken photos' quality by discarding unnecessary parts of a photo. In this paper, we propose a new approach to crop the photo for better composition through learning the structure. Firstly, we classify photos into different categories. Then we extract the graph-based visual saliency map of these photos, based on which we build a dictionary for each categories. Finally, by solving the sparse coding problem of each input photo based on the dictionary, we find a cropped region that can be best decoded by this dictionary. The experimental results demonstrate that our technique is applicable to a wide range of photos and produce more agreeable resulting photos.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127220822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}