{"title":"Semantic Segmentation of RGB-D Images Using 3D and Local Neighbouring Features","authors":"F. Fooladgar, S. Kasaei","doi":"10.1109/DICTA.2015.7371307","DOIUrl":null,"url":null,"abstract":"3D scene understanding is one of the most important problems in the field of computer vision. Although, in the past decades, considerable attention has been devoted on the 2D scene understanding problem, now with the development of the depth sensors (like Microsoft Kinect), the 3D scene understanding has become a very challenging task. Traditionally, the scene understanding problem was considered as the semantic labeling of each image pixel. Semantic labeling of RGB-D images has not attained a comparable success, as the RGB semantic labeling, due to the lack of a challenging dataset. With the introduction of an RGB-D dataset, called NYU-V2, it became possible to propose a novel method to improve the labeling accuracy. In this paper, a semantic segmentation algorithm for RGB-D images is presented. The concentration of the proposed algorithm is on the feature description and classification steps. In the feature description step, the more discriminative features from RGB images and the 3D point cloud data are grouped with local neighboring features to incorporate their context into the classification step. In the classification step, a pairwise multi-class conditional random field framework is utilized in which the unary potential function is considered as the probabilistic output of a random forest classifier. The proposed algorithm is evaluated on the NYU-V2 dataset and the performance is compared to that of other methods presented in the literature. The proposed algorithm achieves the state-of-the-art results on the NYU-V2 dataset.","PeriodicalId":214897,"journal":{"name":"2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA.2015.7371307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
3D scene understanding is one of the most important problems in the field of computer vision. Although, in the past decades, considerable attention has been devoted on the 2D scene understanding problem, now with the development of the depth sensors (like Microsoft Kinect), the 3D scene understanding has become a very challenging task. Traditionally, the scene understanding problem was considered as the semantic labeling of each image pixel. Semantic labeling of RGB-D images has not attained a comparable success, as the RGB semantic labeling, due to the lack of a challenging dataset. With the introduction of an RGB-D dataset, called NYU-V2, it became possible to propose a novel method to improve the labeling accuracy. In this paper, a semantic segmentation algorithm for RGB-D images is presented. The concentration of the proposed algorithm is on the feature description and classification steps. In the feature description step, the more discriminative features from RGB images and the 3D point cloud data are grouped with local neighboring features to incorporate their context into the classification step. In the classification step, a pairwise multi-class conditional random field framework is utilized in which the unary potential function is considered as the probabilistic output of a random forest classifier. The proposed algorithm is evaluated on the NYU-V2 dataset and the performance is compared to that of other methods presented in the literature. The proposed algorithm achieves the state-of-the-art results on the NYU-V2 dataset.