Pub Date : 2011-12-01DOI: 10.1109/ACPR.2011.6166619
Atsushi Shimada, H. Nagahara, R. Taniguchi, V. Charvillat
The growth of photo-sharing website such as Flickr and Picasa enables us to access the billions of images easily. Recent years, many researchers leverage such photo-sharing site to tackle the image annotation problem. The aim of the image annotation is to give a proper label to an unknown image. Generally, image features and label features are used to acquire the relationship between them. Meanwhile, we use not only such image and label features but also geolocation which indicate the information where the image was taken. We formulate the image annotation problem as two important issues; image-based labeling and label-based localization. The former issue is to estimate a proper label from a given image. The latter is the issue to estimate the location from the label. Our approach combine these two estimation strategies. We conducted some experiments and found that our approach outperformed the traditional approach.
{"title":"Geolocation based image annotation","authors":"Atsushi Shimada, H. Nagahara, R. Taniguchi, V. Charvillat","doi":"10.1109/ACPR.2011.6166619","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166619","url":null,"abstract":"The growth of photo-sharing website such as Flickr and Picasa enables us to access the billions of images easily. Recent years, many researchers leverage such photo-sharing site to tackle the image annotation problem. The aim of the image annotation is to give a proper label to an unknown image. Generally, image features and label features are used to acquire the relationship between them. Meanwhile, we use not only such image and label features but also geolocation which indicate the information where the image was taken. We formulate the image annotation problem as two important issues; image-based labeling and label-based localization. The former issue is to estimate a proper label from a given image. The latter is the issue to estimate the location from the label. Our approach combine these two estimation strategies. We conducted some experiments and found that our approach outperformed the traditional approach.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115808084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ACPR.2011.6166599
Tetsu Matsukawa, Takio Kurita
Extending popular histogram representations of local motion patterns, we present a novel weighted integration method based on an assumption that a motion importance should be changed by its appearance to obtain better recognition accuracies. The proposed integration method of motion and appearance patterns can weight information involving “what is moving” by discriminant way. The discriminant weights can be learned efficiently and naturally using two-dimensional fisher discriminant analysis (or, fisher weight maps) of co-occurrence matrices. Original fisher weight maps lose shift invariance of histogram features, while the proposed method preserves it. Experimental results on KTH human action dataset and UT-interaction dataset revealed the effectiveness of the proposed integration compared to naive integration methods of independent motion and appearance features and also other state-of-the-art methods.
{"title":"Discriminant appearance weighting for action recognition","authors":"Tetsu Matsukawa, Takio Kurita","doi":"10.1109/ACPR.2011.6166599","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166599","url":null,"abstract":"Extending popular histogram representations of local motion patterns, we present a novel weighted integration method based on an assumption that a motion importance should be changed by its appearance to obtain better recognition accuracies. The proposed integration method of motion and appearance patterns can weight information involving “what is moving” by discriminant way. The discriminant weights can be learned efficiently and naturally using two-dimensional fisher discriminant analysis (or, fisher weight maps) of co-occurrence matrices. Original fisher weight maps lose shift invariance of histogram features, while the proposed method preserves it. Experimental results on KTH human action dataset and UT-interaction dataset revealed the effectiveness of the proposed integration compared to naive integration methods of independent motion and appearance features and also other state-of-the-art methods.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"429 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123416905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-29DOI: 10.1109/ACPR.2011.6166666
Jia Zhou, C. Proisy, P. Couteron, X. Descombes, J. Zerubia, G. Maire, Y. Nouvellon
Individual tree detection methods are more and more present, and improve, in forestry and silviculture domains with the increasing availability of satellite metric imagery [2–7]. Automatic detection on these very high spatial resolution images aims to determine the tree positions and crown sizes. In this paper, we use a mathematical model based on marked point processes, which showed advantages w.r.t. several individual tree detection algorithms for plantations [2], to analyze an Eucalyptus plantation in Brazil, with 2 optical images acquired by the WorldView-2 satellite. A tentative detection simultaneously with 2 images of different dates (multi-date) has been tested for the first time, which estimates individual tree crown variation during these dates. While, for most current detection methods, only the static state of tree crowns at the moment of one image's acquisition is estimated. The relevance of detection is discussed considering the detection performance in tree localizations and crown sizes. Then, tree crown growth are deduced from detection results and compared with the expected dynamics of corresponding populations.
{"title":"Tree crown detection in high resolution optical images during the early growth stages of Eucalyptus plantations in Brazil","authors":"Jia Zhou, C. Proisy, P. Couteron, X. Descombes, J. Zerubia, G. Maire, Y. Nouvellon","doi":"10.1109/ACPR.2011.6166666","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166666","url":null,"abstract":"Individual tree detection methods are more and more present, and improve, in forestry and silviculture domains with the increasing availability of satellite metric imagery [2–7]. Automatic detection on these very high spatial resolution images aims to determine the tree positions and crown sizes. In this paper, we use a mathematical model based on marked point processes, which showed advantages w.r.t. several individual tree detection algorithms for plantations [2], to analyze an Eucalyptus plantation in Brazil, with 2 optical images acquired by the WorldView-2 satellite. A tentative detection simultaneously with 2 images of different dates (multi-date) has been tested for the first time, which estimates individual tree crown variation during these dates. While, for most current detection methods, only the static state of tree crowns at the moment of one image's acquisition is estimated. The relevance of detection is discussed considering the detection performance in tree localizations and crown sizes. Then, tree crown growth are deduced from detection results and compared with the expected dynamics of corresponding populations.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126619552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166545
Chenguang Zhang, H. Ai
Video Object Segmentation (VOS) is to cut out a selected object from video sequences, where the main difficulties are shape deformation, appearance variations and background clutter. To cope with these difficulties, we propose a novel method, named as Hierarchical Localized Classification of Regions (HLCR). We suggest that appearance models as well as the spatial and temporal coherence between frames are the keys to break through bottleneck. Locally, in order to identify foreground regions, we propose to use Hierarchial Localized Classifiers, which organize regional features as decision trees. In global, we adopt Gaussian Mixture Color Models (GMMs). After integrating the local and global results into a probability mask, we can achieve the final segmentation result by graph cut. Experiments on various challenging video sequences demonstrate the efficiency and adaptability of the proposed method.
{"title":"Video Object Segmentation by Hierarchical Localized Classification of Regions","authors":"Chenguang Zhang, H. Ai","doi":"10.1109/ACPR.2011.6166545","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166545","url":null,"abstract":"Video Object Segmentation (VOS) is to cut out a selected object from video sequences, where the main difficulties are shape deformation, appearance variations and background clutter. To cope with these difficulties, we propose a novel method, named as Hierarchical Localized Classification of Regions (HLCR). We suggest that appearance models as well as the spatial and temporal coherence between frames are the keys to break through bottleneck. Locally, in order to identify foreground regions, we propose to use Hierarchial Localized Classifiers, which organize regional features as decision trees. In global, we adopt Gaussian Mixture Color Models (GMMs). After integrating the local and global results into a probability mask, we can achieve the final segmentation result by graph cut. Experiments on various challenging video sequences demonstrate the efficiency and adaptability of the proposed method.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115212503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166673
Y. Qiao, Masayuki Suzuki, N. Minematsu, K. Hirose
We proposed a structural representation of speech that is robust to speaker difference due to its transformation-invariant property in previous works, where we compared two speech structures by calculating the distance between two structural vectors, each composed of the lengths of a structure's edges. However, this distance cannot yield matching scores directly related to individual events (nodes) of the two structures. In spite of comparing structural vectors directly, this paper takes structures as constraints for optimal pattern matching. We derive the formulas of objective functions and constraint functions for optimization. Under assumptions of Gaussian and shared covariance matrices, we show that this optimal problem can be reduced to a quadratically constrained quadratic programming problem. To relieve the too strong invariance problem, we use a subspace decomposition method and perform the optimization in each subspace. We evaluate the proposed method on a task to assess the goodness of students' English pronunciation. Experimental results show that the proposed method achieves higher correlations with teachers' manual scores than compared methods.
{"title":"Structure-constrained distribution matching using quadratic programming and its application to pronunciation evaluation","authors":"Y. Qiao, Masayuki Suzuki, N. Minematsu, K. Hirose","doi":"10.1109/ACPR.2011.6166673","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166673","url":null,"abstract":"We proposed a structural representation of speech that is robust to speaker difference due to its transformation-invariant property in previous works, where we compared two speech structures by calculating the distance between two structural vectors, each composed of the lengths of a structure's edges. However, this distance cannot yield matching scores directly related to individual events (nodes) of the two structures. In spite of comparing structural vectors directly, this paper takes structures as constraints for optimal pattern matching. We derive the formulas of objective functions and constraint functions for optimization. Under assumptions of Gaussian and shared covariance matrices, we show that this optimal problem can be reduced to a quadratically constrained quadratic programming problem. To relieve the too strong invariance problem, we use a subspace decomposition method and perform the optimization in each subspace. We evaluate the proposed method on a task to assess the goodness of students' English pronunciation. Experimental results show that the proposed method achieves higher correlations with teachers' manual scores than compared methods.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117191479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166534
Yan Song, Jinhui Tang, Xia Li, Q. Tian, Lirong Dai
Recently, the Bag-of-visual Words (BoW) based image representation has drawn much attention in image categorization and retrieval applications. It is known that the visual codebook construction and the related quantization methods play the important roles in BoW model. Traditionally, visual codebook is generated by clustering local features into groups, and the original feature is hard quantized to its nearest centers. It is known that the quantization error may degrade the effectiveness of the BoW representation. To address this problem, several soft quantization based methods have been proposed in literature. However, the effectiveness of these methods is still unsatisfactory. In this paper, we propose a novel and effective image representation method based on a bi-layer codebook. In this method, we first construct the bi-layer codebook to explicitly reduce the quantization error. And then, inspired by the locality-constrained linear coding method[18], we propose a ridge regression based quantization to assign multiple visual words to the local feature. Furthermore, the k nearest neighbor strategy is integrated to improve the efficiency of quantization. To evaluate the proposed image representation, we compare it with the existing image representations on two benchmark datasets in the image classification experiments. The experimental results demonstrate the superiority over the state-of-the-art techniques.
{"title":"Effective image representation based on bi-layer visual codebook","authors":"Yan Song, Jinhui Tang, Xia Li, Q. Tian, Lirong Dai","doi":"10.1109/ACPR.2011.6166534","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166534","url":null,"abstract":"Recently, the Bag-of-visual Words (BoW) based image representation has drawn much attention in image categorization and retrieval applications. It is known that the visual codebook construction and the related quantization methods play the important roles in BoW model. Traditionally, visual codebook is generated by clustering local features into groups, and the original feature is hard quantized to its nearest centers. It is known that the quantization error may degrade the effectiveness of the BoW representation. To address this problem, several soft quantization based methods have been proposed in literature. However, the effectiveness of these methods is still unsatisfactory. In this paper, we propose a novel and effective image representation method based on a bi-layer codebook. In this method, we first construct the bi-layer codebook to explicitly reduce the quantization error. And then, inspired by the locality-constrained linear coding method[18], we propose a ridge regression based quantization to assign multiple visual words to the local feature. Furthermore, the k nearest neighbor strategy is integrated to improve the efficiency of quantization. To evaluate the proposed image representation, we compare it with the existing image representations on two benchmark datasets in the image classification experiments. The experimental results demonstrate the superiority over the state-of-the-art techniques.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124803799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166709
Chuan-Xian Ren, D. Dai, Hong Yan
Practical face recognition systems are sometimes confronted with low-resolution (LR) images. Most existing feature extraction algorithms aim to preserve relational structure among objects of the input space in a linear embedding space. However, it has been a consensus that such complex visual learning tasks will be well be solved by adopting multiple descriptors to more precisely characterize the data for improving performance. In this paper, we addresses the problem of matching LR and high-resolution images that are difficult for conventional methods in practice due to the lack of an efficient similarity measure, and a multiple kernel criterion (MKC) is proposed for LR face recognition without any super-resolution (SR) preprocessing. Different image descriptors including RsL2, LBP, Gradientface and IMED are considered as the multiple kernel generators and the Gaussian function is exploited as the distance induced kernel. MKC solves this problem by minimizing the inconsistency between the similarities captured by the multiple kernels, and the nonlinear objective function can be alternatively minimized by a constrained eigenvalue decomposition. Experiments on benchmark databases show that our MKC method indeed improves the recognition performance.
{"title":"Low resolution facial image recognition via multiple kernel criterion","authors":"Chuan-Xian Ren, D. Dai, Hong Yan","doi":"10.1109/ACPR.2011.6166709","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166709","url":null,"abstract":"Practical face recognition systems are sometimes confronted with low-resolution (LR) images. Most existing feature extraction algorithms aim to preserve relational structure among objects of the input space in a linear embedding space. However, it has been a consensus that such complex visual learning tasks will be well be solved by adopting multiple descriptors to more precisely characterize the data for improving performance. In this paper, we addresses the problem of matching LR and high-resolution images that are difficult for conventional methods in practice due to the lack of an efficient similarity measure, and a multiple kernel criterion (MKC) is proposed for LR face recognition without any super-resolution (SR) preprocessing. Different image descriptors including RsL2, LBP, Gradientface and IMED are considered as the multiple kernel generators and the Gaussian function is exploited as the distance induced kernel. MKC solves this problem by minimizing the inconsistency between the similarities captured by the multiple kernels, and the nonlinear objective function can be alternatively minimized by a constrained eigenvalue decomposition. Experiments on benchmark databases show that our MKC method indeed improves the recognition performance.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123714913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166650
M. Shimosaka, S. Masuda, R. Fukui, Taketoshi Mori, Tomomasa Sato
Counting pedestrians in crowded scenes provides powerful cues for several applications such as traffic, safety, and advertising analysis in urban areas. Recent research progress has shown that direct mapping from image statistics (e.g. area or texture histograms of people regions) to the number of pedestrians, also known as counting by regression, is a promise way of robust pedestrian counting. While leveraging arbitrary image features is encouraged in the counting by regression to improve the accuracy, this leads to risk of over-fitting issue. Furthermore, the most image statistics are sensitive to the way of foreground region segmentation. Hence, careful selection process on both segmentation and feature levels is needed. This paper presents an efficient sparse training method via LARS (Least Angle Regression) to achieve the selection process on both levels, which provides the both sparsity of Lasso and Group Lasso. The experimental results using synthetic and pedestrian counting dataset show that our method provides robust performance with reasonable training cost among the state of the art pedestrian counting methods.
{"title":"Counting pedestrians in crowded scenes with efficient sparse learning","authors":"M. Shimosaka, S. Masuda, R. Fukui, Taketoshi Mori, Tomomasa Sato","doi":"10.1109/ACPR.2011.6166650","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166650","url":null,"abstract":"Counting pedestrians in crowded scenes provides powerful cues for several applications such as traffic, safety, and advertising analysis in urban areas. Recent research progress has shown that direct mapping from image statistics (e.g. area or texture histograms of people regions) to the number of pedestrians, also known as counting by regression, is a promise way of robust pedestrian counting. While leveraging arbitrary image features is encouraged in the counting by regression to improve the accuracy, this leads to risk of over-fitting issue. Furthermore, the most image statistics are sensitive to the way of foreground region segmentation. Hence, careful selection process on both segmentation and feature levels is needed. This paper presents an efficient sparse training method via LARS (Least Angle Regression) to achieve the selection process on both levels, which provides the both sparsity of Lasso and Group Lasso. The experimental results using synthetic and pedestrian counting dataset show that our method provides robust performance with reasonable training cost among the state of the art pedestrian counting methods.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114598871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166527
Guoqiang Ma, Juan Liu, Bin Ni
In practical cases, one finger is always scanned with different scanners at different time, which generates different fingerprint images. The key issue of the fingerprint-based cryptosystem is how to realize a unique crypto key from a fingerprint feature data which may differ from scanner to scanner. In this paper, the main difference between the fingerprint key-generation cryptosystems and the fingerprint-based recognition system was studied and a crypto key generation scheme based on one fingerprint image was proposed. The experimental results showed that about forty percent fingers can generate a same crypto key even if the images are from different scanners. That is to say, a unique crypto key could be generated based on different fingerprint images of one finger even these images were scanned with different scanners. This means the efforts of future work should be the technology of image quality checking and key generation scheme.
{"title":"Probability of a unique crypto key generation based on finger's different images with two scanners","authors":"Guoqiang Ma, Juan Liu, Bin Ni","doi":"10.1109/ACPR.2011.6166527","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166527","url":null,"abstract":"In practical cases, one finger is always scanned with different scanners at different time, which generates different fingerprint images. The key issue of the fingerprint-based cryptosystem is how to realize a unique crypto key from a fingerprint feature data which may differ from scanner to scanner. In this paper, the main difference between the fingerprint key-generation cryptosystems and the fingerprint-based recognition system was studied and a crypto key generation scheme based on one fingerprint image was proposed. The experimental results showed that about forty percent fingers can generate a same crypto key even if the images are from different scanners. That is to say, a unique crypto key could be generated based on different fingerprint images of one finger even these images were scanned with different scanners. This means the efforts of future work should be the technology of image quality checking and key generation scheme.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117300875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/ACPR.2011.6166561
Xin Zhao, Jianwei Ding, Kaiqi Huang, T. Tan
This paper presents a new training framework for multi-class moving object classification in surveillance-oriented scene. In many practical multi-class classification tasks, the instances are close to each other in the input feature space when they have similar features. These instances may have different class labels. Since the moving objects may have various view and shape, the above phenomenon is common in multi-class moving object classification. In our framework, firstly the input feature space is divided into several local clusters. Then, global training and local training are carried out sequential with an efficient online learning based algorithm. The induced global classifier is used to assign candidate instances to the most reliable clusters. Meanwhile, the trained local classifiers within those clusters can determine which classes the candidate instances belong to. Our experimental results illustrate the effectiveness of our method for moving object classification in surveillance-oriented scene.
{"title":"Global and local training for moving object classification in surveillance-oriented scene","authors":"Xin Zhao, Jianwei Ding, Kaiqi Huang, T. Tan","doi":"10.1109/ACPR.2011.6166561","DOIUrl":"https://doi.org/10.1109/ACPR.2011.6166561","url":null,"abstract":"This paper presents a new training framework for multi-class moving object classification in surveillance-oriented scene. In many practical multi-class classification tasks, the instances are close to each other in the input feature space when they have similar features. These instances may have different class labels. Since the moving objects may have various view and shape, the above phenomenon is common in multi-class moving object classification. In our framework, firstly the input feature space is divided into several local clusters. Then, global training and local training are carried out sequential with an efficient online learning based algorithm. The induced global classifier is used to assign candidate instances to the most reliable clusters. Meanwhile, the trained local classifiers within those clusters can determine which classes the candidate instances belong to. Our experimental results illustrate the effectiveness of our method for moving object classification in surveillance-oriented scene.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"13 Suppl 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128541488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}