Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4562970
Bo Peng, G. Qian
In this paper, we propose a novel approach to dance pose recognition and body orientation estimation using multilinear analysis. By performing tensor decomposition and projection using silhouette images obtained from wide base-line binocular cameras, low dimensional pose and body orientation coefficient vectors can be extracted. Different from traditional tensor-based recognition methods, the proposed approach takes the pose coefficient vector as features to train a family of support vector machines as pose classifiers. Using the body orientation coefficient vectors, a one-dimensional orientation manifold is learned and further used for the estimation of body orientation. Experiment results obtained using both synthetic and real image data showed the efficacy of the proposed approach, and that our approach outperformed the traditional tensor-based approach in the comparative test.
{"title":"Binocular dance pose recognition and body orientation estimation via multilinear analysis","authors":"Bo Peng, G. Qian","doi":"10.1109/CVPRW.2008.4562970","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4562970","url":null,"abstract":"In this paper, we propose a novel approach to dance pose recognition and body orientation estimation using multilinear analysis. By performing tensor decomposition and projection using silhouette images obtained from wide base-line binocular cameras, low dimensional pose and body orientation coefficient vectors can be extracted. Different from traditional tensor-based recognition methods, the proposed approach takes the pose coefficient vector as features to train a family of support vector machines as pose classifiers. Using the body orientation coefficient vectors, a one-dimensional orientation manifold is learned and further used for the estimation of body orientation. Experiment results obtained using both synthetic and real image data showed the efficacy of the proposed approach, and that our approach outperformed the traditional tensor-based approach in the comparative test.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126884091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563010
Jin Zhou, Ananya Das, Feng Li, Baoxin Li
Endoscopy has become an established procedure for the diagnosis and therapy of various gastrointestinal (GI) ailments, and has also emerged as a commonly-used technique for minimally-invasive surgery. Most existing endoscopes are monocular, with stereo-endoscopy facing practical difficulties, preventing the physicians/surgeons from having a desired, realistic 3D view. Traditional monocular 3D reconstruction approaches (e.g., structure from motion) face extraordinary challenges for this application due to issues including noisy data, lack of textures supporting robust feature matching, nonrigidity of the objects, and glare artifacts from the imaging process, etc. In this paper, we propose a method to automatically reconstruct 3D structure from a monocular endoscopic video. Our approach attempts to address the above challenges by incorporating a circular generalized cylinder (CGC) model in 3D reconstruction. The CGC model is decomposed as a series of 3D circles. To reconstruct this model, we formulate the problem as one of maximum a posteriori estimation within a Markov random field framework, so as to ensure the smoothness constraints of the CGC model and to support robust search for the optimal solution, which is achieved by a two-stage heuristic search scheme. Both simulated and real data experiments demonstrate the effectiveness of the proposed approach.
{"title":"Circular generalized cylinder fitting for 3D reconstruction in endoscopic imaging based on MRF","authors":"Jin Zhou, Ananya Das, Feng Li, Baoxin Li","doi":"10.1109/CVPRW.2008.4563010","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563010","url":null,"abstract":"Endoscopy has become an established procedure for the diagnosis and therapy of various gastrointestinal (GI) ailments, and has also emerged as a commonly-used technique for minimally-invasive surgery. Most existing endoscopes are monocular, with stereo-endoscopy facing practical difficulties, preventing the physicians/surgeons from having a desired, realistic 3D view. Traditional monocular 3D reconstruction approaches (e.g., structure from motion) face extraordinary challenges for this application due to issues including noisy data, lack of textures supporting robust feature matching, nonrigidity of the objects, and glare artifacts from the imaging process, etc. In this paper, we propose a method to automatically reconstruct 3D structure from a monocular endoscopic video. Our approach attempts to address the above challenges by incorporating a circular generalized cylinder (CGC) model in 3D reconstruction. The CGC model is decomposed as a series of 3D circles. To reconstruct this model, we formulate the problem as one of maximum a posteriori estimation within a Markov random field framework, so as to ensure the smoothness constraints of the CGC model and to support robust search for the optimal solution, which is achieved by a two-stage heuristic search scheme. Both simulated and real data experiments demonstrate the effectiveness of the proposed approach.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"40 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113976041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563096
Qian Yu, G. Medioni
We describe a GPU-based implementation of motion detection from a moving platform. Motion detection from a moving platform is inherently difficult as the moving camera induces 2D motion field in the entire image. A step compensating for camera motion is required prior to estimating of the background model. Due to inevitable registration errors, the background model is estimated according to a sliding window of frames to avoid the case where erroneous registration influences the quality of the detection for the whole sequence. However, this approach involves several characteristics that put a heavy burden on real-time CPU implementation. We exploit GPU to achieve significant acceleration over standard CPU implementations. Our GPU-based implementation can build the background model and detect motion regions at around 18 fps on 320times240 videos that are captured for a moving camera.
{"title":"A GPU-based implementation of motion detection from a moving platform","authors":"Qian Yu, G. Medioni","doi":"10.1109/CVPRW.2008.4563096","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563096","url":null,"abstract":"We describe a GPU-based implementation of motion detection from a moving platform. Motion detection from a moving platform is inherently difficult as the moving camera induces 2D motion field in the entire image. A step compensating for camera motion is required prior to estimating of the background model. Due to inevitable registration errors, the background model is estimated according to a sliding window of frames to avoid the case where erroneous registration influences the quality of the detection for the whole sequence. However, this approach involves several characteristics that put a heavy burden on real-time CPU implementation. We exploit GPU to achieve significant acceleration over standard CPU implementations. Our GPU-based implementation can build the background model and detect motion regions at around 18 fps on 320times240 videos that are captured for a moving camera.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"48 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114020286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563133
P. Espinace, A. Soto
The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capability for a mobile robot. Recently, we proposed a system that aims to achieve this capability by combining a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. The top-down feedback is based on two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of different segmentation algorithms in the recognition of each landmark. In this paper we explore the benefits of extending our previous work by including a visual tracking step for each of the selected landmarks. Our intuition is that the inclusion of a tracking step can help to improve the model of each landmark by associating and selecting information from its most significant views. Furthermore, it can also help to avoid problems related to the selection of spurious landmarks. Our results confirm these intuitions by showing that the inclusion of the tracking step produces a significant increase in the recall rate for landmark recognition.
{"title":"Improving the selection and detection of visual landmarks through object tracking","authors":"P. Espinace, A. Soto","doi":"10.1109/CVPRW.2008.4563133","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563133","url":null,"abstract":"The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capability for a mobile robot. Recently, we proposed a system that aims to achieve this capability by combining a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. The top-down feedback is based on two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of different segmentation algorithms in the recognition of each landmark. In this paper we explore the benefits of extending our previous work by including a visual tracking step for each of the selected landmarks. Our intuition is that the inclusion of a tracking step can help to improve the model of each landmark by associating and selecting information from its most significant views. Furthermore, it can also help to avoid problems related to the selection of spurious landmarks. Our results confirm these intuitions by showing that the inclusion of the tracking step produces a significant increase in the recall rate for landmark recognition.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"11 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120807545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563160
Y. Kim, Derek Chan, C. Theobalt, S. Thrun
This paper describes the design and calibration of a system that enables simultaneous recording of dynamic scenes with multiple high-resolution video and low-resolution Swissranger time-of-flight (TOF) depth cameras. The system shall serve as a testbed for the development of new algorithms for high-quality multi-view dynamic scene reconstruction and 3D video. The paper also provides a detailed analysis of random and systematic depth camera noise which is important for reliable fusion of video and depth data. Finally, the paper describes how to compensate systematic depth errors and calibrate all dynamic depth and video data into a common frame.
{"title":"Design and calibration of a multi-view TOF sensor fusion system","authors":"Y. Kim, Derek Chan, C. Theobalt, S. Thrun","doi":"10.1109/CVPRW.2008.4563160","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563160","url":null,"abstract":"This paper describes the design and calibration of a system that enables simultaneous recording of dynamic scenes with multiple high-resolution video and low-resolution Swissranger time-of-flight (TOF) depth cameras. The system shall serve as a testbed for the development of new algorithms for high-quality multi-view dynamic scene reconstruction and 3D video. The paper also provides a detailed analysis of random and systematic depth camera noise which is important for reliable fusion of video and depth data. Finally, the paper describes how to compensate systematic depth errors and calibrate all dynamic depth and video data into a common frame.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125811959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4562974
Leandro A. Loss, G. Bebis, M. Nicolescu, A. Skurikhin
Boundary detection in natural images represents an important but also challenging problem in computer vision. Motivated by studies in psychophysics claiming that humans use multiple cues for segmentation, several promising methods have been proposed which perform boundary detection by optimally combining local image measurements such as color, texture, and brightness. Very interesting results have been reported by applying these methods on challenging datasets such as the Berkeley segmentation benchmark. Although combining different cues for boundary detection has been shown to outperform methods using a single cue, results can be further improved by integrating perceptual organization cues with the boundary detection process. The main goal of this study is to investigate how and when perceptual organization cues improve boundary detection in natural images. In this context, we investigate the idea of integrating with segmentation the iterative multi-scale tensor voting (IMSTV), a variant of tensor voting (TV) that performs perceptual grouping by analyzing information at multiple-scales and removing background clutter in an iterative fashion, preserving salient, organized structures. The key idea is to use IMSTV to post-process the boundary posterior probability (PB) map produced by segmentation algorithms. Detailed analysis of our experimental results reveals how and when perceptual organization cues are likely to improve or degrade boundary detection. In particular, we show that using perceptual grouping as a post-processing step improves boundary detection in 84% of the grayscale test images in the Berkeley segmentation dataset.
{"title":"Investigating how and when perceptual organization cues improve boundary detection in natural images","authors":"Leandro A. Loss, G. Bebis, M. Nicolescu, A. Skurikhin","doi":"10.1109/CVPRW.2008.4562974","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4562974","url":null,"abstract":"Boundary detection in natural images represents an important but also challenging problem in computer vision. Motivated by studies in psychophysics claiming that humans use multiple cues for segmentation, several promising methods have been proposed which perform boundary detection by optimally combining local image measurements such as color, texture, and brightness. Very interesting results have been reported by applying these methods on challenging datasets such as the Berkeley segmentation benchmark. Although combining different cues for boundary detection has been shown to outperform methods using a single cue, results can be further improved by integrating perceptual organization cues with the boundary detection process. The main goal of this study is to investigate how and when perceptual organization cues improve boundary detection in natural images. In this context, we investigate the idea of integrating with segmentation the iterative multi-scale tensor voting (IMSTV), a variant of tensor voting (TV) that performs perceptual grouping by analyzing information at multiple-scales and removing background clutter in an iterative fashion, preserving salient, organized structures. The key idea is to use IMSTV to post-process the boundary posterior probability (PB) map produced by segmentation algorithms. Detailed analysis of our experimental results reveals how and when perceptual organization cues are likely to improve or degrade boundary detection. In particular, we show that using perceptual grouping as a post-processing step improves boundary detection in 84% of the grayscale test images in the Berkeley segmentation dataset.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"54 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131470794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563172
Marvin Lindner, A. Kolb, T. Ringbeck
Time-of-flight (ToF) sensors have become an alternative to conventional distance sensing techniques like laser scanners or image based stereo. ToF sensors provide full range distance information at high frame-rates and thus have a significant impact onto current research in areas like online object recognition, collision prevention or scene reconstruction. However, ToF cameras like the photonic mixer device (PMD) still exhibit a number of challenges regarding static and dynamic effects, e.g. systematic distance errors and motion artefacts, respectively. Sensor calibration techniques reducing static system errors have been proposed and show promising results. However, current calibration techniques in general need a large set of reference data in order to determine the corresponding parameters for the calibration model. This paper introduces a new calibration approach which combines different demodulation techniques for the ToF- camera 's reference signal. Examples show, that the resulting combined demodulation technique yields improved distance values based on only two required reference data sets.
{"title":"New insights into the calibration of ToF-sensors","authors":"Marvin Lindner, A. Kolb, T. Ringbeck","doi":"10.1109/CVPRW.2008.4563172","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563172","url":null,"abstract":"Time-of-flight (ToF) sensors have become an alternative to conventional distance sensing techniques like laser scanners or image based stereo. ToF sensors provide full range distance information at high frame-rates and thus have a significant impact onto current research in areas like online object recognition, collision prevention or scene reconstruction. However, ToF cameras like the photonic mixer device (PMD) still exhibit a number of challenges regarding static and dynamic effects, e.g. systematic distance errors and motion artefacts, respectively. Sensor calibration techniques reducing static system errors have been proposed and show promising results. However, current calibration techniques in general need a large set of reference data in order to determine the corresponding parameters for the calibration model. This paper introduces a new calibration approach which combines different demodulation techniques for the ToF- camera 's reference signal. Examples show, that the resulting combined demodulation technique yields improved distance values based on only two required reference data sets.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"29 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563033
Theodore C. Yapo, C. Stewart, R. Radke
We present a novel approach to 3D object detection in scenes scanned by LiDAR sensors, based on a probabilistic representation of free, occupied, and hidden space that extends the concept of occupancy grids from robot mapping algorithms. This scene representation naturally handles LiDAR sampling issues, can be used to fuse multiple LiDAR data sets, and captures the inherent uncertainty of the data due to occlusions and clutter. Using this model, we formulate a hypothesis testing methodology to determine the probability that given 3D objects are present in the scene. By propagating uncertainty in the original sample points, we are able to measure confidence in the detection results in a principled way. We demonstrate the approach in examples of detecting objects that are partially occluded by scene clutter such as camouflage netting.
{"title":"A probabilistic representation of LiDAR range data for efficient 3D object detection","authors":"Theodore C. Yapo, C. Stewart, R. Radke","doi":"10.1109/CVPRW.2008.4563033","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563033","url":null,"abstract":"We present a novel approach to 3D object detection in scenes scanned by LiDAR sensors, based on a probabilistic representation of free, occupied, and hidden space that extends the concept of occupancy grids from robot mapping algorithms. This scene representation naturally handles LiDAR sampling issues, can be used to fuse multiple LiDAR data sets, and captures the inherent uncertainty of the data due to occlusions and clutter. Using this model, we formulate a hypothesis testing methodology to determine the probability that given 3D objects are present in the scene. By propagating uncertainty in the original sample points, we are able to measure confidence in the detection results in a principled way. We demonstrate the approach in examples of detecting objects that are partially occluded by scene clutter such as camouflage netting.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123875887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563115
K. Kollreider, H. Fronthaler, J. Bigün
Resisting spoofing attempts via photographs and video playbacks is a vital issue for the success of face biometrics. Yet, the ldquolivenessrdquo topic has only been partially studied in the past. In this paper we are suggesting a holistic liveness detection paradigm that collaborates with standard techniques in 2D face biometrics. The experiments show that many attacks are avertible via a combination of anti-spoofing measures. We have investigated the topic using real-time techniques and applied them to real-life spoofing scenarios in an indoor, yet uncontrolled environment.
{"title":"Verifying liveness by multiple experts in face biometrics","authors":"K. Kollreider, H. Fronthaler, J. Bigün","doi":"10.1109/CVPRW.2008.4563115","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563115","url":null,"abstract":"Resisting spoofing attempts via photographs and video playbacks is a vital issue for the success of face biometrics. Yet, the ldquolivenessrdquo topic has only been partially studied in the past. In this paper we are suggesting a holistic liveness detection paradigm that collaborates with standard techniques in 2D face biometrics. The experiments show that many attacks are avertible via a combination of anti-spoofing measures. We have investigated the topic using real-time techniques and applied them to real-life spoofing scenarios in an indoor, yet uncontrolled environment.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127709336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4562958
Tom Yeh, John J. Lee, Trevor Darrell
Object recognition systems designed for Internet applications typically need to adapt to userspsila needs in a flexible fashion and scale up to very large data sets. In this paper, we analyze the complexity of several multiclass SVM-based algorithms and highlight the computational bottleneck they suffer at test time: comparing the input image to every training image. We propose an algorithm that overcomes this bottleneck; it offers not only the efficiency of a simple nearest-neighbor classifier, by voting on class labels based on the k nearest neighbors quickly determined by a vocabulary tree, but also the recognition accuracy comparable to that of a complex SVM classifier, by incorporating SVM parameters into the voting scores incrementally accumulated from individual image features. Empirical results demonstrate that adjusting votes by relevant support vector weights can improve the recognition accuracy of a nearest-neighbor classifier without sacrificing speed. Compared to existing methods, our algorithm achieves a ten-fold speed increase while incurring an acceptable accuracy loss that can be easily offset by showing about two more labels in the result. The speed, scalability, and adaptability of our algorithm makes it suitable for Internet vision applications.
{"title":"Scalable classifiers for Internet vision tasks","authors":"Tom Yeh, John J. Lee, Trevor Darrell","doi":"10.1109/CVPRW.2008.4562958","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4562958","url":null,"abstract":"Object recognition systems designed for Internet applications typically need to adapt to userspsila needs in a flexible fashion and scale up to very large data sets. In this paper, we analyze the complexity of several multiclass SVM-based algorithms and highlight the computational bottleneck they suffer at test time: comparing the input image to every training image. We propose an algorithm that overcomes this bottleneck; it offers not only the efficiency of a simple nearest-neighbor classifier, by voting on class labels based on the k nearest neighbors quickly determined by a vocabulary tree, but also the recognition accuracy comparable to that of a complex SVM classifier, by incorporating SVM parameters into the voting scores incrementally accumulated from individual image features. Empirical results demonstrate that adjusting votes by relevant support vector weights can improve the recognition accuracy of a nearest-neighbor classifier without sacrificing speed. Compared to existing methods, our algorithm achieves a ten-fold speed increase while incurring an acceptable accuracy loss that can be easily offset by showing about two more labels in the result. The speed, scalability, and adaptability of our algorithm makes it suitable for Internet vision applications.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127797960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}