Yongjie Shi, Xin Tong, Jingsi Wen, He Zhao, Xianghua Ying, H. Zha
{"title":"3D Orientation Estimation and Vanishing Point Extraction from Single Panoramas Using Convolutional Neural Network","authors":"Yongjie Shi, Xin Tong, Jingsi Wen, He Zhao, Xianghua Ying, H. Zha","doi":"10.1109/ICRA40945.2020.9196966","DOIUrl":null,"url":null,"abstract":"3D orientation estimation is a key component of many important computer vision tasks such as autonomous navigation and 3D scene understanding. This paper presents a new CNN architecture to estimate the 3D orientation of an omnidirectional camera with respect to the world coordinate system from a single spherical panorama. To train the proposed architecture, we leverage a dataset of panoramas named VOP60K from Google Street View with labeled 3D orientation, including 50 thousand panoramas for training and 10 thousand panoramas for testing. Previous approaches usually estimate 3D orientation under pinhole cameras. However, for a panorama, due to its larger field of view, previous approaches cannot be suitable. In this paper, we propose an edge extractor layer to utilize the low-level and geometric information of panorama, an attention module to fuse different features generated by previous layers. A regression loss for two column vectors of the rotation matrix and classification loss for the position of vanishing points are added to optimize our network simultaneously. The proposed algorithm is validated on our benchmark, and experimental results clearly demonstrate that it outperforms previous methods.","PeriodicalId":6859,"journal":{"name":"2020 IEEE International Conference on Robotics and Automation (ICRA)","volume":"58 1","pages":"596-602"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA40945.2020.9196966","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
3D orientation estimation is a key component of many important computer vision tasks such as autonomous navigation and 3D scene understanding. This paper presents a new CNN architecture to estimate the 3D orientation of an omnidirectional camera with respect to the world coordinate system from a single spherical panorama. To train the proposed architecture, we leverage a dataset of panoramas named VOP60K from Google Street View with labeled 3D orientation, including 50 thousand panoramas for training and 10 thousand panoramas for testing. Previous approaches usually estimate 3D orientation under pinhole cameras. However, for a panorama, due to its larger field of view, previous approaches cannot be suitable. In this paper, we propose an edge extractor layer to utilize the low-level and geometric information of panorama, an attention module to fuse different features generated by previous layers. A regression loss for two column vectors of the rotation matrix and classification loss for the position of vanishing points are added to optimize our network simultaneously. The proposed algorithm is validated on our benchmark, and experimental results clearly demonstrate that it outperforms previous methods.