Devinder Kumar, H. Neher, Arun Das, David A Clausi, Steven L. Waslander
Robust place recognition systems are essential for long term localization and autonomy. Such systems should recognize scenes with both conditional and viewpoint changes. In this paper, we present a deep learning based planar omni-directional place recognition approach that can simultaneously cope with conditional and viewpoint variations, including large viewpoint changes, which current methods do not address. We evaluate the proposed method on two real world datasets dealing with illumination, seasonal/weather changes and changes occurred in the environment across a period of 1 year, respectively. We provide both quantitative (recall at 100% precision) and qualitative (confusion matrices) comparison of the basic pipeline of place recognition for the omni-directional approach with single-view and side-view camera approaches. The results prove the efficacy of the proposed omnidirectional deep learning method over the single-view and side-view cameras in dealing with both conditional and large viewpoint changes.
{"title":"Condition and Viewpoint Invariant Omni-Directional Place Recognition Using CNN","authors":"Devinder Kumar, H. Neher, Arun Das, David A Clausi, Steven L. Waslander","doi":"10.1109/CRV.2017.26","DOIUrl":"https://doi.org/10.1109/CRV.2017.26","url":null,"abstract":"Robust place recognition systems are essential for long term localization and autonomy. Such systems should recognize scenes with both conditional and viewpoint changes. In this paper, we present a deep learning based planar omni-directional place recognition approach that can simultaneously cope with conditional and viewpoint variations, including large viewpoint changes, which current methods do not address. We evaluate the proposed method on two real world datasets dealing with illumination, seasonal/weather changes and changes occurred in the environment across a period of 1 year, respectively. We provide both quantitative (recall at 100% precision) and qualitative (confusion matrices) comparison of the basic pipeline of place recognition for the omni-directional approach with single-view and side-view camera approaches. The results prove the efficacy of the proposed omnidirectional deep learning method over the single-view and side-view cameras in dealing with both conditional and large viewpoint changes.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115273719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hoang Le, Carl S. Marshall, T. Doan, Long Mai, Feng Liu
Today's projectors are widely used for information and media display in a stationary setup. There is also a growing effort to deploy projectors creatively, such as using a mobile projector to display visual content on an arbitrary surface. However, the quality of projected content is often limited by the quality of projection surface, environment lighting, and non-optimal projector settings. This paper presents a visual quality assessment method for projected content. Our method assesses the quality of the projected image by analyzing the projected image captured by a camera. The key challenge is that the quality of the captured image is often different from the perceived quality by a viewer as she "sees" the projected image differently than the camera. To address this problem, our method employs a data-driven approach that learns from the labeled data to bridge this gap. Our method integrates both manually crafted features and deep learning features and formulates projection quality assessment as a regression problem. Our experiments on a wide range of projection content, projection surfaces, and environment lighting show that our method can reliably score the quality of projected visual content in a way that is consistent with the human perception.
{"title":"Visual Quality Assessment for Projected Content","authors":"Hoang Le, Carl S. Marshall, T. Doan, Long Mai, Feng Liu","doi":"10.1109/CRV.2017.47","DOIUrl":"https://doi.org/10.1109/CRV.2017.47","url":null,"abstract":"Today's projectors are widely used for information and media display in a stationary setup. There is also a growing effort to deploy projectors creatively, such as using a mobile projector to display visual content on an arbitrary surface. However, the quality of projected content is often limited by the quality of projection surface, environment lighting, and non-optimal projector settings. This paper presents a visual quality assessment method for projected content. Our method assesses the quality of the projected image by analyzing the projected image captured by a camera. The key challenge is that the quality of the captured image is often different from the perceived quality by a viewer as she \"sees\" the projected image differently than the camera. To address this problem, our method employs a data-driven approach that learns from the labeled data to bridge this gap. Our method integrates both manually crafted features and deep learning features and formulates projection quality assessment as a regression problem. Our experiments on a wide range of projection content, projection surfaces, and environment lighting show that our method can reliably score the quality of projected visual content in a way that is consistent with the human perception.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123045613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most methods for automatic estimation of external camera parameters (e.g., tilt angle) from deployed cameras are based on vanishing points. This requires that specific static scene features, e.g., sets of parallel lines, be present and reliably detected, and this is not always possible. An alternative is to use properties of the motion field computed over multiple frames. However, methods reported to date make strong assumptions about the nature of objects and motions in the scene, and often depend on feature tracking, which can be computationally intensive and unreliable. In this paper, we propose a novel motion-based approach for recovering camera tilt that does not require tracking. Our method assumes that motion statistics in the scene are stationary over the ground plane, so that statistical variation in image speed with vertical position in the image can be attributed to projection. The tilt angle is then estimated iteratively by nulling the variance in rectified speed explained by the vertical image coordinate. The method does not require tracking or learning and can therefore be applied without modification to diverse scene conditions. The algorithm is evaluated on four diverse datasets and found to outperform three alternative state-of-the-art methods.
{"title":"Estimating Camera Tilt from Motion without Tracking","authors":"Nada Elassal, J. Elder","doi":"10.1109/CRV.2017.36","DOIUrl":"https://doi.org/10.1109/CRV.2017.36","url":null,"abstract":"Most methods for automatic estimation of external camera parameters (e.g., tilt angle) from deployed cameras are based on vanishing points. This requires that specific static scene features, e.g., sets of parallel lines, be present and reliably detected, and this is not always possible. An alternative is to use properties of the motion field computed over multiple frames. However, methods reported to date make strong assumptions about the nature of objects and motions in the scene, and often depend on feature tracking, which can be computationally intensive and unreliable. In this paper, we propose a novel motion-based approach for recovering camera tilt that does not require tracking. Our method assumes that motion statistics in the scene are stationary over the ground plane, so that statistical variation in image speed with vertical position in the image can be attributed to projection. The tilt angle is then estimated iteratively by nulling the variance in rectified speed explained by the vertical image coordinate. The method does not require tracking or learning and can therefore be applied without modification to diverse scene conditions. The algorithm is evaluated on four diverse datasets and found to outperform three alternative state-of-the-art methods.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116156146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Precise landing of multirotor unmanned aerial vehicles (UAVs) in confined, GPS-denied and vision-compromised environments presents a challenge to common autopilot systems. In this work we outline an autonomous infrared (IR) landing system using a ground-based IR radiator, UAV-mounted IR camera, and image processing computer. Previous work has focused on UAV-mounted IR sources for UAV localization, or systems using multiple distributed ground-based IR sources to estimate UAV pose. We experimented with the use of a single ground-based IR radiator to determine the UAV's relative location in three-dimensional space. The outcome of our research significantly simplifies the landing zone setup by requiring only a single IR source, and increases operational flexibility, as the vision-based system adapts to changes in landing zone position. The usefulness of our system is especially demonstrated in vision-compromised applications such as nighttime operations, or in smoky environments observed during forest fires. We also evaluated a high-power IR radiator for future research in the field of outdoor autonomous point-to-point navigation between IR sources where GPS is unavailable.
{"title":"Development of a Plug-and-Play Infrared Landing System for Multirotor Unmanned Aerial Vehicles","authors":"Ephraim Nowak, Kashish Gupta, H. Najjaran","doi":"10.1109/CRV.2017.23","DOIUrl":"https://doi.org/10.1109/CRV.2017.23","url":null,"abstract":"Precise landing of multirotor unmanned aerial vehicles (UAVs) in confined, GPS-denied and vision-compromised environments presents a challenge to common autopilot systems. In this work we outline an autonomous infrared (IR) landing system using a ground-based IR radiator, UAV-mounted IR camera, and image processing computer. Previous work has focused on UAV-mounted IR sources for UAV localization, or systems using multiple distributed ground-based IR sources to estimate UAV pose. We experimented with the use of a single ground-based IR radiator to determine the UAV's relative location in three-dimensional space. The outcome of our research significantly simplifies the landing zone setup by requiring only a single IR source, and increases operational flexibility, as the vision-based system adapts to changes in landing zone position. The usefulness of our system is especially demonstrated in vision-compromised applications such as nighttime operations, or in smoky environments observed during forest fires. We also evaluated a high-power IR radiator for future research in the field of outdoor autonomous point-to-point navigation between IR sources where GPS is unavailable.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126951844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existent video description approaches advocated in the literature rely on capturing the semantic relationships among concepts and visual features from training data specific to various datasets. Naturally, their success at generalizing the video descriptions for the domain is closely dependent on the availability, representativeness, size and annotation quality of the training data. Common issues are overfitting, the amount of training data and computational time required for the model. To overcome these issues, we propose to alleviate the learning of semantic knowledge from domain-specific datasets by leveraging general human knowledge sources such as ConceptNet. We propose the use of ConceptNet as the source of knowledge for generating video descriptions using Grenander's pattern theory formalism. Instead of relying on training data to estimate semantic compatibility of two concepts, we use weights in the ConceptNet that determines the degree of validity of the assertion between two concepts based on the knowledge sources. We test and compare this idea on the task of generating semantically coherent descriptions for videos from the Breakfast Actions and Carnegie Mellon's Multimodal activities dataset. In comparison with other approaches, the proposed method achieves comparable accuracy against state-of-the-art methods based on HMMs and CFGs and generate semantically coherent descriptions even when presented with inconsistent action and object labels. We are also able to show that the proposed approach performs comparably with models trained on domain-specific data.
{"title":"Towards a Knowledge-Based Approach for Generating Video Descriptions","authors":"Sathyanarayanan N. Aakur, F. Souza, Sudeep Sarkar","doi":"10.1109/CRV.2017.51","DOIUrl":"https://doi.org/10.1109/CRV.2017.51","url":null,"abstract":"Existent video description approaches advocated in the literature rely on capturing the semantic relationships among concepts and visual features from training data specific to various datasets. Naturally, their success at generalizing the video descriptions for the domain is closely dependent on the availability, representativeness, size and annotation quality of the training data. Common issues are overfitting, the amount of training data and computational time required for the model. To overcome these issues, we propose to alleviate the learning of semantic knowledge from domain-specific datasets by leveraging general human knowledge sources such as ConceptNet. We propose the use of ConceptNet as the source of knowledge for generating video descriptions using Grenander's pattern theory formalism. Instead of relying on training data to estimate semantic compatibility of two concepts, we use weights in the ConceptNet that determines the degree of validity of the assertion between two concepts based on the knowledge sources. We test and compare this idea on the task of generating semantically coherent descriptions for videos from the Breakfast Actions and Carnegie Mellon's Multimodal activities dataset. In comparison with other approaches, the proposed method achieves comparable accuracy against state-of-the-art methods based on HMMs and CFGs and generate semantically coherent descriptions even when presented with inconsistent action and object labels. We are also able to show that the proposed approach performs comparably with models trained on domain-specific data.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126644590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern security cameras are capable of capturing high-resolution HD or 4K videos and support embedded analytics capable of automatically tracking objects such as people and cars moving through the scene. However, due to a lack of computational power on these cameras, the embedded video analytics cannot utilize the full available video resolution, severely limiting the range at which they can detect objects. We present a technique for scale correction, leveraging approximate camera calibration information, that uses high image resolutions in parts of the frame that are far from the camera and lower image resolution in parts of the frame that are closer to the camera. Existing background models can run on the proposed scale-normalized high-resolution (1280x720) video frame for a similar computational cost as an unnormalized 640x360 frame. Our proposed scale correction technique also improves object-level precision and recall.
{"title":"Scale-Corrected Background Modeling","authors":"P. Siva, Michael Jamieson","doi":"10.1109/CRV.2017.31","DOIUrl":"https://doi.org/10.1109/CRV.2017.31","url":null,"abstract":"Modern security cameras are capable of capturing high-resolution HD or 4K videos and support embedded analytics capable of automatically tracking objects such as people and cars moving through the scene. However, due to a lack of computational power on these cameras, the embedded video analytics cannot utilize the full available video resolution, severely limiting the range at which they can detect objects. We present a technique for scale correction, leveraging approximate camera calibration information, that uses high image resolutions in parts of the frame that are far from the camera and lower image resolution in parts of the frame that are closer to the camera. Existing background models can run on the proposed scale-normalized high-resolution (1280x720) video frame for a similar computational cost as an unnormalized 640x360 frame. Our proposed scale correction technique also improves object-level precision and recall.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127444768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nail biting or onychophagia is a body-focused repetitive behavior that is especially prevalent in the younger population of children and adolescents. The behavior produces negative physical and psychological effects on individuals who exhibit onychophagia. Therapy for nail biting involves awareness from the subject which requires constant effort from third parties. This research project utilized a commercial robotic toy in combination with a machine vision system based on image processing to deliver a new strategy to prevent nail biting. The machine vision system recognized nail biting using a webcam in the computer which communicated to the toy robot to alert the subject. The implementation is validated with a user case study obtaining reduction in episode occurrences and duration.
{"title":"Combined Strategy of Machine Vision with a Robotic Assistant for Nail Biting Prevention","authors":"Jonathan Camargo, Aaron J. Young","doi":"10.1109/CRV.2017.57","DOIUrl":"https://doi.org/10.1109/CRV.2017.57","url":null,"abstract":"Nail biting or onychophagia is a body-focused repetitive behavior that is especially prevalent in the younger population of children and adolescents. The behavior produces negative physical and psychological effects on individuals who exhibit onychophagia. Therapy for nail biting involves awareness from the subject which requires constant effort from third parties. This research project utilized a commercial robotic toy in combination with a machine vision system based on image processing to deliver a new strategy to prevent nail biting. The machine vision system recognized nail biting using a webcam in the computer which communicated to the toy robot to alert the subject. The implementation is validated with a user case study obtaining reduction in episode occurrences and duration.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122346008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider image classification in a weakly supervised scenario where the training data are annotated at different levels of abstractions. A subset of the training data are annotated with coarse labels (e.g. wolf, dog), while the rest of the training data are annotated with fine labels (e.g. breeds of wolves and dogs). Each coarse label corresponds to a superclass of several fine labels. Our goal is to learn a model that can classify a new image into one of the fine classes. We investigate how the coarsely labeled data can help improve the fine label classification. Since it is usually much easier to collect data with coarse labels than those with fine labels, the problem setup considered in this paper can benefit a wide range of real-world applications. We propose a model based on convolutional neural networks (CNNs) to address this problem. We demonstrate the effectiveness of the proposed model on several benchmark datasets. Our model significantly outperforms the naive approach that discards the extra coarsely labeled data.
{"title":"Weakly Supervised Image Classification with Coarse and Fine Labels","authors":"Jie Lei, Zhenyu Guo, Yang Wang","doi":"10.1109/CRV.2017.21","DOIUrl":"https://doi.org/10.1109/CRV.2017.21","url":null,"abstract":"We consider image classification in a weakly supervised scenario where the training data are annotated at different levels of abstractions. A subset of the training data are annotated with coarse labels (e.g. wolf, dog), while the rest of the training data are annotated with fine labels (e.g. breeds of wolves and dogs). Each coarse label corresponds to a superclass of several fine labels. Our goal is to learn a model that can classify a new image into one of the fine classes. We investigate how the coarsely labeled data can help improve the fine label classification. Since it is usually much easier to collect data with coarse labels than those with fine labels, the problem setup considered in this paper can benefit a wide range of real-world applications. We propose a model based on convolutional neural networks (CNNs) to address this problem. We demonstrate the effectiveness of the proposed model on several benchmark datasets. Our model significantly outperforms the naive approach that discards the extra coarsely labeled data.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131997614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anatomical landmarks on 3-D human body scans play key roles in shape-essential applications, including consistent parameterization, body measurement extraction, segmentation, and mesh re-targeting. Manually locating landmarks is tedious and time-consuming for large-scale 3-D anthropometric surveys. To automate the landmarking process, we propose a data-driven approach, which learns from landmark locations known on a dataset of 3-D scans and predicts their locations on new scans. More specifically, we adopt a coarse-to-fine approach by training a deep regression neural network to compute the locations of all landmarks and then for each landmark training an individual deep classification neural network to improve its accuracy. In regards to input images being fed into the neural networks, we compute from a frontal view three types of image renderings for comparison, i.e., gray-scale appearance images, range depth images, and curvature mapped images. Among these, curvature mapped images result in the best empirical accuracy from the deep regression network, whereas depth images lead to higher accuracy for locating most landmarks using the deep classification networks. In conclusion, the proposed approach performs better than state of the art on locating most landmarks. The simple yet effective approach can be extended to automatically locate landmarks in large scale 3-D scan datasets.
{"title":"Localizing 3-D Anatomical Landmarks Using Deep Convolutional Neural Networks","authors":"P. Xi, Chang Shu, R. Goubran","doi":"10.1109/CRV.2017.11","DOIUrl":"https://doi.org/10.1109/CRV.2017.11","url":null,"abstract":"Anatomical landmarks on 3-D human body scans play key roles in shape-essential applications, including consistent parameterization, body measurement extraction, segmentation, and mesh re-targeting. Manually locating landmarks is tedious and time-consuming for large-scale 3-D anthropometric surveys. To automate the landmarking process, we propose a data-driven approach, which learns from landmark locations known on a dataset of 3-D scans and predicts their locations on new scans. More specifically, we adopt a coarse-to-fine approach by training a deep regression neural network to compute the locations of all landmarks and then for each landmark training an individual deep classification neural network to improve its accuracy. In regards to input images being fed into the neural networks, we compute from a frontal view three types of image renderings for comparison, i.e., gray-scale appearance images, range depth images, and curvature mapped images. Among these, curvature mapped images result in the best empirical accuracy from the deep regression network, whereas depth images lead to higher accuracy for locating most landmarks using the deep classification networks. In conclusion, the proposed approach performs better than state of the art on locating most landmarks. The simple yet effective approach can be extended to automatically locate landmarks in large scale 3-D scan datasets.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankit Pensia, G. Sharma, Gaurav Pandey, J. McBride
In this paper, we report a novel algorithm for localization of autonomous vehicles in an urban environment using orthographic ground reflectivity map created with a three-dimensional (3D) laser scanner. It should be noted that the road paint (lane markings, zebra crossing, traffic signs etc.) constitute the distinctive features in the surface reflectivity map which are generally sparse as compared to the non-interesting asphalt and the off-road portion of the map. Therefore, we propose to project the reflectivity map to a lower dimensional space, that captures the useful features of the map, and then use these projected feature maps for localization. We use discriminative metric learning technique to obtain this lower dimensional space of feature maps. Experimental evaluation of the proposed method on real data shows that it is better than the standard image matching techniques in terms of accuracy. Moreover, the proposed method is computationally fast and can be executed at real-time (10 Hz) on a standard CPU.
{"title":"Fast Localization of Autonomous Vehicles Using Discriminative Metric Learning","authors":"Ankit Pensia, G. Sharma, Gaurav Pandey, J. McBride","doi":"10.1109/CRV.2017.56","DOIUrl":"https://doi.org/10.1109/CRV.2017.56","url":null,"abstract":"In this paper, we report a novel algorithm for localization of autonomous vehicles in an urban environment using orthographic ground reflectivity map created with a three-dimensional (3D) laser scanner. It should be noted that the road paint (lane markings, zebra crossing, traffic signs etc.) constitute the distinctive features in the surface reflectivity map which are generally sparse as compared to the non-interesting asphalt and the off-road portion of the map. Therefore, we propose to project the reflectivity map to a lower dimensional space, that captures the useful features of the map, and then use these projected feature maps for localization. We use discriminative metric learning technique to obtain this lower dimensional space of feature maps. Experimental evaluation of the proposed method on real data shows that it is better than the standard image matching techniques in terms of accuracy. Moreover, the proposed method is computationally fast and can be executed at real-time (10 Hz) on a standard CPU.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127660472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}