Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166694
Ryusuke Furuhashi, K. Yamada
Pedestrian protection is an active area in the research field of advanced driver assistance systems. A pedestrian who intends to cross the road is more critical to the driver than one who has no intention. This paper proposes a method for estimating the crossing intention of a pedestrian on a sidewalk from the pedestrian's posture and change in posture in multiple frames during a short period of video images. We evaluate the method in an indoor simulated environment and a real, outdoor environment and demonstrate the results and performance.
{"title":"Estimation of street crossing intention from a pedestrian's posture on a sidewalk using multiple image frames","authors":"Ryusuke Furuhashi, K. Yamada","doi":"10.1109/ACPR.2011.6166694","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166694","url":null,"abstract":"Pedestrian protection is an active area in the research field of advanced driver assistance systems. A pedestrian who intends to cross the road is more critical to the driver than one who has no intention. This paper proposes a method for estimating the crossing intention of a pedestrian on a sidewalk from the pedestrian's posture and change in posture in multiple frames during a short period of video images. We evaluate the method in an indoor simulated environment and a real, outdoor environment and demonstrate the results and performance.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128578456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166639
Yi Chen, Zhong Jin
Based on linear regression techniques, we present a new supervised learning algorithm called Class-oriented Regression Embedding (CRE) for feature extraction. By minimizing the intra-class reconstruction error, CRE finds a low-dimensional subspace in which samples can be best represented as a combination of their intra-class samples. This characteristic can significantly strengthen the performance of the newly proposed classifier called linear regression-based classification (LRC). The experimental results on the extended-YALE Face Database B (YaleB) and CENPARMI handwritten numeral database show the effectiveness and robustness of CRE plus LRC.
{"title":"Feature extraction using class-oriented regression embedding","authors":"Yi Chen, Zhong Jin","doi":"10.1109/ACPR.2011.6166639","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166639","url":null,"abstract":"Based on linear regression techniques, we present a new supervised learning algorithm called Class-oriented Regression Embedding (CRE) for feature extraction. By minimizing the intra-class reconstruction error, CRE finds a low-dimensional subspace in which samples can be best represented as a combination of their intra-class samples. This characteristic can significantly strengthen the performance of the newly proposed classifier called linear regression-based classification (LRC). The experimental results on the extended-YALE Face Database B (YaleB) and CENPARMI handwritten numeral database show the effectiveness and robustness of CRE plus LRC.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134427104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166603
Chun Dong, Timothy S Newman
A spin-off of the Extended Gaussian Image [1] (EGI) registration technique to volumetric datasets is presented. This spin-off technique directly allows recovery of the rotation (and indirectly may allow recovery of the translation) transformations that aligns one volumetric dataset to another. An extension of the basic EGI's orientation histogram to volumetric datasets is also described. Using this histogram, a volume gradient orientation histogram, enables the registration (i.e., aligning) of two instances of one subject. The spin-off technique can be useful for fully automated registration without extraction of higher level features or markers. Results on multiple types of datasets are also reported.
{"title":"A volumetric spin-off EGI for registration of volume datasets","authors":"Chun Dong, Timothy S Newman","doi":"10.1109/ACPR.2011.6166603","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166603","url":null,"abstract":"A spin-off of the Extended Gaussian Image [1] (EGI) registration technique to volumetric datasets is presented. This spin-off technique directly allows recovery of the rotation (and indirectly may allow recovery of the translation) transformations that aligns one volumetric dataset to another. An extension of the basic EGI's orientation histogram to volumetric datasets is also described. Using this histogram, a volume gradient orientation histogram, enables the registration (i.e., aligning) of two instances of one subject. The spin-off technique can be useful for fully automated registration without extraction of higher level features or markers. Results on multiple types of datasets are also reported.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132774749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166676
W. Liao, Jingye Wen, J. Kuo
In this research, we develop and integrate methods for real-time streaming audio classification based on psychoacoustic models of hearing as well as techniques in pattern recognition. Specifically, a framework for auditory event detection and signal description by means of computer vision approach has been designed to enable real-time processing and classification of audio signals present in home environments. Local binary patterns are employed to describe the extracted sound blobs in the spectrogram. Experimental results show that the proposed approach is quite effective, achieving an overall recognition rate of 80–90% for 8 types of audio input. The performance degrades only slightly in the presence of noise and other interferences.
{"title":"Streaming audio classification in Smart Home environments","authors":"W. Liao, Jingye Wen, J. Kuo","doi":"10.1109/ACPR.2011.6166676","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166676","url":null,"abstract":"In this research, we develop and integrate methods for real-time streaming audio classification based on psychoacoustic models of hearing as well as techniques in pattern recognition. Specifically, a framework for auditory event detection and signal description by means of computer vision approach has been designed to enable real-time processing and classification of audio signals present in home environments. Local binary patterns are employed to describe the extracted sound blobs in the spectrogram. Experimental results show that the proposed approach is quite effective, achieving an overall recognition rate of 80–90% for 8 types of audio input. The performance degrades only slightly in the presence of noise and other interferences.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132946708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166671
Fang Zhang, Yunhong Wang, Zhaoxiang Zhang
Recently, human action recognition has been a popular and important topic in computer vision. However, except some conventional problems such as noise, low resolution etc., view-invariant recognition is one of the most challenging problems. In this paper, we focus on solve multi-view action recognition from surveillance video. To detect moving objects from complicated backgrounds, this paper employs improved Gaussian mixed model, which uses K-means clustering to initialize the model and it gets better motion detection results for surveillance videos. We demonstrate the silhouette representation “Envelope Shape” can solve the viewpoint problem in surveillance videos. The experiment results demonstrate that our human action recognition system is fast and efficient on CASIA activity analysis database.
{"title":"View-invariant action recognition in surveillance videos","authors":"Fang Zhang, Yunhong Wang, Zhaoxiang Zhang","doi":"10.1109/ACPR.2011.6166671","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166671","url":null,"abstract":"Recently, human action recognition has been a popular and important topic in computer vision. However, except some conventional problems such as noise, low resolution etc., view-invariant recognition is one of the most challenging problems. In this paper, we focus on solve multi-view action recognition from surveillance video. To detect moving objects from complicated backgrounds, this paper employs improved Gaussian mixed model, which uses K-means clustering to initialize the model and it gets better motion detection results for surveillance videos. We demonstrate the silhouette representation “Envelope Shape” can solve the viewpoint problem in surveillance videos. The experiment results demonstrate that our human action recognition system is fast and efficient on CASIA activity analysis database.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129372814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6305053
Shan Liang, Wenju Liu
The ideal binary mask (IBM) estimation has been set as the computational goal of Computational auditory scene analysis (CASA). A lot of effort has been made in the IBM estimation via statistical learning method. The current Bayesian methods usually estimate the mask value of each time-frequency (T-F) unit independently with only local auditory features. In this paper, we propose a new Bayesian approach. First, a set of pitch-based auditory features are summarized to exploit the inherent characteristics of the reliable and unreliable time-frequency (T-F) units. A rough estimation is obtained according to Maximum Likelihood (ML) rule. Then, we propose a prior model which is derived from onset/offset segmentation to improve the estimation. Finally, an efficient Markov Chain Monte Carlo (MCMC) procedure is applied to approach the maximum a posterior (MAP) estimation. Proposed method is evaluated on Cooke's 100 mixtures and compared with previous model. Experiments show that our method performs better.
{"title":"Binary mask estimation for voiced speech segregation using Bayesian method","authors":"Shan Liang, Wenju Liu","doi":"10.1109/ACPR.2011.6305053","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6305053","url":null,"abstract":"The ideal binary mask (IBM) estimation has been set as the computational goal of Computational auditory scene analysis (CASA). A lot of effort has been made in the IBM estimation via statistical learning method. The current Bayesian methods usually estimate the mask value of each time-frequency (T-F) unit independently with only local auditory features. In this paper, we propose a new Bayesian approach. First, a set of pitch-based auditory features are summarized to exploit the inherent characteristics of the reliable and unreliable time-frequency (T-F) units. A rough estimation is obtained according to Maximum Likelihood (ML) rule. Then, we propose a prior model which is derived from onset/offset segmentation to improve the estimation. Finally, an efficient Markov Chain Monte Carlo (MCMC) procedure is applied to approach the maximum a posterior (MAP) estimation. Proposed method is evaluated on Cooke's 100 mixtures and compared with previous model. Experiments show that our method performs better.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125865647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166585
D. Prasad, Hiok Chai Quek, M. Leung, Siu-Yeung Cho
We prove that when a line is approximated using digital line, the error in the slope of the digital line has a definite upper bound and is strongly dependent on the two pixels chosen for defining the digital line. Thus, an analytical expression of the maximum deviation of the pixels from the digital line can be derived. Using this, the conventional line fitting methods that use maximum tolerable deviation as the optimization goal can be made control-parameter independent. This error bound can be used to make the most recent and sophisticated line fitting methods parameter independent and more robust to digitization noises. In our knowledge, this is the first line fitting method completely devoid of any control parameter. Such control-parameter independent line fitting algorithm retains the characteristics of the digital curve with sufficient reliability and precision and provides good dimensionality reduction in representing the digital curves. Extensive results have been generated for 9 datasets comprising of about a hundred thousand images. The proposed method shows robust and repeatable performance across all the datasets with low standard deviation in the performance.
{"title":"A parameter independent line fitting method","authors":"D. Prasad, Hiok Chai Quek, M. Leung, Siu-Yeung Cho","doi":"10.1109/ACPR.2011.6166585","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166585","url":null,"abstract":"We prove that when a line is approximated using digital line, the error in the slope of the digital line has a definite upper bound and is strongly dependent on the two pixels chosen for defining the digital line. Thus, an analytical expression of the maximum deviation of the pixels from the digital line can be derived. Using this, the conventional line fitting methods that use maximum tolerable deviation as the optimization goal can be made control-parameter independent. This error bound can be used to make the most recent and sophisticated line fitting methods parameter independent and more robust to digitization noises. In our knowledge, this is the first line fitting method completely devoid of any control parameter. Such control-parameter independent line fitting algorithm retains the characteristics of the digital curve with sufficient reliability and precision and provides good dimensionality reduction in representing the digital curves. Extensive results have been generated for 9 datasets comprising of about a hundred thousand images. The proposed method shows robust and repeatable performance across all the datasets with low standard deviation in the performance.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130045659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There are two different people counting methods: (1) counting people across a detecting line in certain time duration and (2) estimating the total number of people in some region at certain time instance. This paper presents a new approach to count the number of people crossing a line of interest (LOI). First, the foreground object silhouettes are extracted described as blobs. Second, we generate the blob linkage based on the one-to-one or one-to-many correspondence between the blobs in every two consecutive frames. Third, we label the number of objects in each blob by applying the ellipse detection technique. However, the occlusion problem jeopardizes the labeling process. Here, we use forward/backward tracing to re-label the number of objects in the occluded blob. In the experiments, we illustrate the effectiveness of our method.
{"title":"People counting using ellipse detection and forward/backward tracing","authors":"Chung-Lin Huang, Shih-Chung Hsu, I-Chung Tsao, Ben-Syuan Huang, Hau-Wei Wang, Hung-Wei Lin","doi":"10.1109/ACPR.2011.6166629","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166629","url":null,"abstract":"There are two different people counting methods: (1) counting people across a detecting line in certain time duration and (2) estimating the total number of people in some region at certain time instance. This paper presents a new approach to count the number of people crossing a line of interest (LOI). First, the foreground object silhouettes are extracted described as blobs. Second, we generate the blob linkage based on the one-to-one or one-to-many correspondence between the blobs in every two consecutive frames. Third, we label the number of objects in each blob by applying the ellipse detection technique. However, the occlusion problem jeopardizes the labeling process. Here, we use forward/backward tracing to re-label the number of objects in the occluded blob. In the experiments, we illustrate the effectiveness of our method.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122555131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166705
Junliang Xing, H. Ai, S. Lao
In this paper, we propose a general video object segmentation framework which views object segmentation from a unified Bayesian perspective and optimizes the MAP formulated problem in a progressive manner. Based on object detection and tracking results, a three-level hierarchical video object segmentation approach is presented. At the first level, an offline learned segmentor is applied to each object tracking result of current frame to get a coarse segmentation. At the second level, the coarse segmentation is updated into an intermediate segmentation by a temporal model which propagates the fine segmentation of previous frame to current frame based on a discriminative feature points voting process. At the third level, the intermediate segmentation is refined by an iterative procedure which uses online collected color-and-shape information to get the final result. We apply the approach to pedestrian segmentation on many challenging datasets that demonstrates its effectiveness.
{"title":"Hierarchical video object segmentation","authors":"Junliang Xing, H. Ai, S. Lao","doi":"10.1109/ACPR.2011.6166705","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166705","url":null,"abstract":"In this paper, we propose a general video object segmentation framework which views object segmentation from a unified Bayesian perspective and optimizes the MAP formulated problem in a progressive manner. Based on object detection and tracking results, a three-level hierarchical video object segmentation approach is presented. At the first level, an offline learned segmentor is applied to each object tracking result of current frame to get a coarse segmentation. At the second level, the coarse segmentation is updated into an intermediate segmentation by a temporal model which propagates the fine segmentation of previous frame to current frame based on a discriminative feature points voting process. At the third level, the intermediate segmentation is refined by an iterative procedure which uses online collected color-and-shape information to get the final result. We apply the approach to pedestrian segmentation on many challenging datasets that demonstrates its effectiveness.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"3489 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127510064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166675
Dayi Gong, Shutao Li, Yin Xiang
This paper presents a face recognition method using the Weber Local Descriptor (WLD) feature. The WLD consists of differential excitation component and orientation component, which contains abundant local texture information. In our method, we firstly divide face images into a set of sub-regions and extract their WLD features respectively. We introduce the Sobel descriptor to obtain the orientation component. Then each of sub-regions of probe image is recognized by nearest neighborhood method and the results are fused in decision level through voting to yield the final recognition result. The experimental results over ORL and Yale face database verify the effectiveness of our method.
{"title":"Face recognition using the Weber Local Descriptor","authors":"Dayi Gong, Shutao Li, Yin Xiang","doi":"10.1109/ACPR.2011.6166675","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166675","url":null,"abstract":"This paper presents a face recognition method using the Weber Local Descriptor (WLD) feature. The WLD consists of differential excitation component and orientation component, which contains abundant local texture information. In our method, we firstly divide face images into a set of sub-regions and extract their WLD features respectively. We introduce the Sobel descriptor to obtain the orientation component. Then each of sub-regions of probe image is recognized by nearest neighborhood method and the results are fused in decision level through voting to yield the final recognition result. The experimental results over ORL and Yale face database verify the effectiveness of our method.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125400951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}