Learning low-dimensional feature representations is a crucial task in machine learning and computer vision. Recently the impressive breakthrough in general object recognition made by large scale convolutional networks shows that convolutional networks are able to extract discriminative hierarchical features in large scale object classification task. However, for vision tasks other than end-to-end classification, such as K Nearest Neighbor classification, the learned intermediate features are not necessary optimal for the specific problem. In this paper, we aim to exploit the power of deep convolutional networks and optimize the output feature layer with respect to the task of K Nearest Neighbor (kNN) classification. By directly optimizing the kNN classification error on training data, we in fact learn convolutional nonlinear features in a data-driven and task-driven way. Experimental results on standard image classification benchmarks show that the proposed method is able to learn better feature representations than other general end-to-end classification methods on kNN classification task.
{"title":"Learning Convolutional Nonlinear Features for K Nearest Neighbor Image Classification","authors":"Weiqiang Ren, Yinan Yu, Junge Zhang, Kaiqi Huang","doi":"10.1109/ICPR.2014.746","DOIUrl":"https://doi.org/10.1109/ICPR.2014.746","url":null,"abstract":"Learning low-dimensional feature representations is a crucial task in machine learning and computer vision. Recently the impressive breakthrough in general object recognition made by large scale convolutional networks shows that convolutional networks are able to extract discriminative hierarchical features in large scale object classification task. However, for vision tasks other than end-to-end classification, such as K Nearest Neighbor classification, the learned intermediate features are not necessary optimal for the specific problem. In this paper, we aim to exploit the power of deep convolutional networks and optimize the output feature layer with respect to the task of K Nearest Neighbor (kNN) classification. By directly optimizing the kNN classification error on training data, we in fact learn convolutional nonlinear features in a data-driven and task-driven way. Experimental results on standard image classification benchmarks show that the proposed method is able to learn better feature representations than other general end-to-end classification methods on kNN classification task.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"178 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123571961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual object tracking is a challenging task because designing an effective and efficient appearance model is difficult. Current online tracking algorithms treat tracking as a classification task and use labeled samples to update appearance model. However, it is not clear to evaluate instance confidence belong to the object. In this paper, we propose a simple and efficient tracking algorithm with a deformable structure appearance. In our method, model updates with continuous labeled samples which are dense sampling. In order to improve the accuracy, we introduce a couple-layer regression model which prevents negative background from impacting on the model learning rather than traditional classification. The proposed DSR tracker runs in real-time and performs favorably against state-of-the-art trackers on various challenging sequences.
{"title":"Real-Time Tracking via Deformable Structure Regression Learning","authors":"Xian Yang, Quan Xiao, Shoujue Wang, Peizhong Liu","doi":"10.1109/ICPR.2014.379","DOIUrl":"https://doi.org/10.1109/ICPR.2014.379","url":null,"abstract":"Visual object tracking is a challenging task because designing an effective and efficient appearance model is difficult. Current online tracking algorithms treat tracking as a classification task and use labeled samples to update appearance model. However, it is not clear to evaluate instance confidence belong to the object. In this paper, we propose a simple and efficient tracking algorithm with a deformable structure appearance. In our method, model updates with continuous labeled samples which are dense sampling. In order to improve the accuracy, we introduce a couple-layer regression model which prevents negative background from impacting on the model learning rather than traditional classification. The proposed DSR tracker runs in real-time and performs favorably against state-of-the-art trackers on various challenging sequences.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114054945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper focuses on road sign classification for creating accurate and up-to-date inventories of traffic signs, which is important for road safety and maintenance. This is a challenging multi-class classification task, as a large number of different sign types exist which only differ in minor details. Moreover, changes in viewpoint, capturing conditions and partial occlusions result in large intra-class variations. Ideally, road sign classification systems should be robust against these variations, while having an acceptable computational load. This paper presents a classification approach based on the popular Bag Of Words (BOW) framework, which we optimize towards the best trade-off between performance and execution time. We analyze the performance aspects of PCA-based dimensionality reduction, soft and hard assignment for BOW codebook matching and the codebook size. Furthermore, we provide an efficient implementation scheme. We compare these techniques to design a fast and accurate BOW-based classification scheme. This approach allows for the selection of a fast but accurate classification methodology. This BOW approach is compared against structural classification, and we show that their combination outperforms both individual methods. This combination, exploiting both BOW and structural information, attains high classification scores (96.25% to 98%) on our challenging real-world datasets.
{"title":"Optimal Performance-Efficiency Trade-off for Bag of Words Classification of Road Signs","authors":"L. Hazelhoff, Ivo M. Creusen, P. D. With","doi":"10.1109/ICPR.2014.517","DOIUrl":"https://doi.org/10.1109/ICPR.2014.517","url":null,"abstract":"This paper focuses on road sign classification for creating accurate and up-to-date inventories of traffic signs, which is important for road safety and maintenance. This is a challenging multi-class classification task, as a large number of different sign types exist which only differ in minor details. Moreover, changes in viewpoint, capturing conditions and partial occlusions result in large intra-class variations. Ideally, road sign classification systems should be robust against these variations, while having an acceptable computational load. This paper presents a classification approach based on the popular Bag Of Words (BOW) framework, which we optimize towards the best trade-off between performance and execution time. We analyze the performance aspects of PCA-based dimensionality reduction, soft and hard assignment for BOW codebook matching and the codebook size. Furthermore, we provide an efficient implementation scheme. We compare these techniques to design a fast and accurate BOW-based classification scheme. This approach allows for the selection of a fast but accurate classification methodology. This BOW approach is compared against structural classification, and we show that their combination outperforms both individual methods. This combination, exploiting both BOW and structural information, attains high classification scores (96.25% to 98%) on our challenging real-world datasets.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114744048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper deals with the interactive design of generic classifiers for aerial images. In many real-life cases, object detectors that work are not available, due to a new geographical context or a need for a type of object unseen before. We propose an approach for on-line learning of such detectors using user interactions. Variants of gradient boosting and support-vector machine classification are proposed to cope with the problems raised by interactivity: unbalanced and partially mislabeled training data. We assess our framework for various visual classes (buildings, vegetation, cars, visual changes) on challenging data corresponding to several applications (SAR or optical sensors at various resolutions). We show that our model and algorithms outperform several state-of-the-art baselines for feature extraction and learning in remote sensing.
{"title":"Interactive Design of Object Classifiers in Remote Sensing","authors":"B. L. Saux","doi":"10.1109/ICPR.2014.444","DOIUrl":"https://doi.org/10.1109/ICPR.2014.444","url":null,"abstract":"This paper deals with the interactive design of generic classifiers for aerial images. In many real-life cases, object detectors that work are not available, due to a new geographical context or a need for a type of object unseen before. We propose an approach for on-line learning of such detectors using user interactions. Variants of gradient boosting and support-vector machine classification are proposed to cope with the problems raised by interactivity: unbalanced and partially mislabeled training data. We assess our framework for various visual classes (buildings, vegetation, cars, visual changes) on challenging data corresponding to several applications (SAR or optical sensors at various resolutions). We show that our model and algorithms outperform several state-of-the-art baselines for feature extraction and learning in remote sensing.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132361346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a novel RGB-D-based visual target tracking method for person-following robots. We enhance a single-object tracker, which combines RGB and depth information, by exploiting two different types of distracters. First set of distracters includes objects existing near-by the target, and the other set is for objects looking similar to the target. The proposed algorithm reduces tracking drifts and wrong target re-identification by exploiting the distracters. Experiments on real-world video sequences demonstrating a person-following problem show a significant improvement over the method without tracking distracters and state-of-the-art RGB-based trackers. A mobile robot following a person is tested in real environment.
{"title":"Real-Time Visual Target Tracking in RGB-D Data for Person-Following Robots","authors":"Youngwoo Yoon, Woo-han Yun, H. Yoon, Jaehong Kim","doi":"10.1109/ICPR.2014.387","DOIUrl":"https://doi.org/10.1109/ICPR.2014.387","url":null,"abstract":"This paper describes a novel RGB-D-based visual target tracking method for person-following robots. We enhance a single-object tracker, which combines RGB and depth information, by exploiting two different types of distracters. First set of distracters includes objects existing near-by the target, and the other set is for objects looking similar to the target. The proposed algorithm reduces tracking drifts and wrong target re-identification by exploiting the distracters. Experiments on real-world video sequences demonstrating a person-following problem show a significant improvement over the method without tracking distracters and state-of-the-art RGB-based trackers. A mobile robot following a person is tested in real environment.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129576990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hua-Tsung Chen, Chun-Yu Lai, Chun-Chieh Hsu, Suh-Yin Lee, B. Lin, Chien-Peng Ho
Advanced vehicle safety is a recently emerging issue, appealed from the rapidly explosive population of car owners. Increasing driver assistance systems have been designed for warning drivers of what should be noticed by analyzing surrounding environments with sensors and/or cameras. As one of the hazard road conditions, road bumps not only damage vehicles but also cause serious danger, especially at night or under poor lighting conditions. In this paper we propose a vision-based road bump detection system using a front-mounted car camcorder, which tends to be widespread deployed. First, the input video is transformed into a time-sliced image, which is a condensed video representation. Consequently, we estimate the vertical motion of the vehicle based on the time-sliced image and infer the existence of road bumps. Once a bump is detected, the location fix obtained from GPS is reported to a central server, so that the other vehicles can receive warnings when approaching the detected bumpy regions. Encouraging experimental results show that the proposed system can detect road bumps efficiently and effectively. It can be expected that traffic security will be significantly promoted through the mutually beneficial mechanism that a driver who is willing to report the bumps he/she meets can receive warnings issued from others as well.
{"title":"Vision-Based Road Bump Detection Using a Front-Mounted Car Camcorder","authors":"Hua-Tsung Chen, Chun-Yu Lai, Chun-Chieh Hsu, Suh-Yin Lee, B. Lin, Chien-Peng Ho","doi":"10.1109/ICPR.2014.776","DOIUrl":"https://doi.org/10.1109/ICPR.2014.776","url":null,"abstract":"Advanced vehicle safety is a recently emerging issue, appealed from the rapidly explosive population of car owners. Increasing driver assistance systems have been designed for warning drivers of what should be noticed by analyzing surrounding environments with sensors and/or cameras. As one of the hazard road conditions, road bumps not only damage vehicles but also cause serious danger, especially at night or under poor lighting conditions. In this paper we propose a vision-based road bump detection system using a front-mounted car camcorder, which tends to be widespread deployed. First, the input video is transformed into a time-sliced image, which is a condensed video representation. Consequently, we estimate the vertical motion of the vehicle based on the time-sliced image and infer the existence of road bumps. Once a bump is detected, the location fix obtained from GPS is reported to a central server, so that the other vehicles can receive warnings when approaching the detected bumpy regions. Encouraging experimental results show that the proposed system can detect road bumps efficiently and effectively. It can be expected that traffic security will be significantly promoted through the mutually beneficial mechanism that a driver who is willing to report the bumps he/she meets can receive warnings issued from others as well.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129207504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel full-image guided filtering based on eight-connected weight propagation for dense stereo matching. The proposed method has three main features: first, the proposed eight-connected weight propagation is more approximate compared to previous approach, second, the pixels employed into the filtering are all the pixels without constrained by one fixed window, last but not least, computational complexity of each pixel at each disparity level is 0(1), and the implementation of the filter can efficiently parallelized on hardware platform. Performance evaluation on Middlebury data sets shows that the proposed method is one of the best local algorithms in terms of both accuracy and speed.
{"title":"An Improved Filtering for Fast Stereo Matching","authors":"Xiaoming Huang, Guoqin Cui, Yundong Zhang","doi":"10.1109/ICPR.2014.423","DOIUrl":"https://doi.org/10.1109/ICPR.2014.423","url":null,"abstract":"This paper presents a novel full-image guided filtering based on eight-connected weight propagation for dense stereo matching. The proposed method has three main features: first, the proposed eight-connected weight propagation is more approximate compared to previous approach, second, the pixels employed into the filtering are all the pixels without constrained by one fixed window, last but not least, computational complexity of each pixel at each disparity level is 0(1), and the implementation of the filter can efficiently parallelized on hardware platform. Performance evaluation on Middlebury data sets shows that the proposed method is one of the best local algorithms in terms of both accuracy and speed.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125249711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the challenges in medical imaging is to increase the resolution of 3D MRI (Magnetic Resonance Imaging) signals. This is the problem of 3D signal reconstruction under the condition of very low sampling rate. Based on compressive sensing theory, the Direct Volume Reconstruction (DVR) method is proposed to reconstruct the 3D signal volume-by-volume based on a learned dictionary. DVR is a general method and applicable to any 3D signal as long as it can be sparsely represented. To exploit the nature of the 3D MRI system, the Progressive Volume Reconstruction (PVR) method is further proposed to improve the DVR reconstruction. In PVR, local reconstruction is used to reconstruct in-plane slices, and the output is then forwarded to global reconstruction, in which both the initially sampled and locally reconstructed signals are used together to reconstruct the whole 3D signal. Two separate dictionaries, rather than one, are trained in PVR. In this way, more prior knowledge from the training data is exploited. Experiments on a head MRI dataset demonstrate that DVR achieves much better performance than conventional tricubic interpolation and that PVR considerably improves DVR performance with regard to both PSNR and visibility quality.
{"title":"Volume Reconstruction for MRI","authors":"Meiqing Zhang, Huirao Nie, Yang Pei, L. Tao","doi":"10.1109/ICPR.2014.577","DOIUrl":"https://doi.org/10.1109/ICPR.2014.577","url":null,"abstract":"One of the challenges in medical imaging is to increase the resolution of 3D MRI (Magnetic Resonance Imaging) signals. This is the problem of 3D signal reconstruction under the condition of very low sampling rate. Based on compressive sensing theory, the Direct Volume Reconstruction (DVR) method is proposed to reconstruct the 3D signal volume-by-volume based on a learned dictionary. DVR is a general method and applicable to any 3D signal as long as it can be sparsely represented. To exploit the nature of the 3D MRI system, the Progressive Volume Reconstruction (PVR) method is further proposed to improve the DVR reconstruction. In PVR, local reconstruction is used to reconstruct in-plane slices, and the output is then forwarded to global reconstruction, in which both the initially sampled and locally reconstructed signals are used together to reconstruct the whole 3D signal. Two separate dictionaries, rather than one, are trained in PVR. In this way, more prior knowledge from the training data is exploited. Experiments on a head MRI dataset demonstrate that DVR achieves much better performance than conventional tricubic interpolation and that PVR considerably improves DVR performance with regard to both PSNR and visibility quality.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114263796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper visualizes surveillance video contents effectively in a 2D temporal image for indexing and retrieving a large video database. As an intermediate representation, this work extracts a temporal profile from video to convey accurate temporal information while keeps certain spatial characteristics for recognition. Different from spatial video indexing such as video synopsis that montages video frames for video summarization, our temporal indexing is achieved by designing a scheme to observe video from a side of video volume. A series of sampling lines are located over the field of view along the principal directions of targets to probe target motion, and the position varied temporal slices are obtained from the video volume. These slices are blended into the temporal profile with transparencies related to their spatial locations in the video. Static background and targets with less motion are further embedded into the temporal profile. Our goal is to provide less deformed shapes, more spatial information, and precise event occurrence in the temporal profile for video index and browsing.
{"title":"Temporal Mapping of Surveillance Video","authors":"S. Bagheri, J. Zheng","doi":"10.1109/ICPR.2014.707","DOIUrl":"https://doi.org/10.1109/ICPR.2014.707","url":null,"abstract":"This paper visualizes surveillance video contents effectively in a 2D temporal image for indexing and retrieving a large video database. As an intermediate representation, this work extracts a temporal profile from video to convey accurate temporal information while keeps certain spatial characteristics for recognition. Different from spatial video indexing such as video synopsis that montages video frames for video summarization, our temporal indexing is achieved by designing a scheme to observe video from a side of video volume. A series of sampling lines are located over the field of view along the principal directions of targets to probe target motion, and the position varied temporal slices are obtained from the video volume. These slices are blended into the temporal profile with transparencies related to their spatial locations in the video. Static background and targets with less motion are further embedded into the temporal profile. Our goal is to provide less deformed shapes, more spatial information, and precise event occurrence in the temporal profile for video index and browsing.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114818053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, a coordinate solver for elastic net regularized logistic regression is proposed. In particular, a method based on majorization maximization using a cubic function is derived. This to reliably and accurately optimize the objective function at each step without resorting to line search. Experiments show that the proposed solver is comparable to, or improves, state-of-the-art solvers. The proposed method is simpler, in the sense that there is no need for any line search, and can directly be used for small to large scale learning problems with elastic net regularization.
{"title":"Elastic Net Regularized Logistic Regression Using Cubic Majorization","authors":"M. Nilsson","doi":"10.1109/ICPR.2014.593","DOIUrl":"https://doi.org/10.1109/ICPR.2014.593","url":null,"abstract":"In this work, a coordinate solver for elastic net regularized logistic regression is proposed. In particular, a method based on majorization maximization using a cubic function is derived. This to reliably and accurately optimize the objective function at each step without resorting to line search. Experiments show that the proposed solver is comparable to, or improves, state-of-the-art solvers. The proposed method is simpler, in the sense that there is no need for any line search, and can directly be used for small to large scale learning problems with elastic net regularization.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115516087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}