Semantic Segmentation of RGB-D Images Using 3D and Local Neighbouring Features

2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2015-11-01 DOI:10.1109/DICTA.2015.7371307

F. Fooladgar, S. Kasaei

{"title":"Semantic Segmentation of RGB-D Images Using 3D and Local Neighbouring Features","authors":"F. Fooladgar, S. Kasaei","doi":"10.1109/DICTA.2015.7371307","DOIUrl":null,"url":null,"abstract":"3D scene understanding is one of the most important problems in the field of computer vision. Although, in the past decades, considerable attention has been devoted on the 2D scene understanding problem, now with the development of the depth sensors (like Microsoft Kinect), the 3D scene understanding has become a very challenging task. Traditionally, the scene understanding problem was considered as the semantic labeling of each image pixel. Semantic labeling of RGB-D images has not attained a comparable success, as the RGB semantic labeling, due to the lack of a challenging dataset. With the introduction of an RGB-D dataset, called NYU-V2, it became possible to propose a novel method to improve the labeling accuracy. In this paper, a semantic segmentation algorithm for RGB-D images is presented. The concentration of the proposed algorithm is on the feature description and classification steps. In the feature description step, the more discriminative features from RGB images and the 3D point cloud data are grouped with local neighboring features to incorporate their context into the classification step. In the classification step, a pairwise multi-class conditional random field framework is utilized in which the unary potential function is considered as the probabilistic output of a random forest classifier. The proposed algorithm is evaluated on the NYU-V2 dataset and the performance is compared to that of other methods presented in the literature. The proposed algorithm achieves the state-of-the-art results on the NYU-V2 dataset.","PeriodicalId":214897,"journal":{"name":"2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA.2015.7371307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

3D scene understanding is one of the most important problems in the field of computer vision. Although, in the past decades, considerable attention has been devoted on the 2D scene understanding problem, now with the development of the depth sensors (like Microsoft Kinect), the 3D scene understanding has become a very challenging task. Traditionally, the scene understanding problem was considered as the semantic labeling of each image pixel. Semantic labeling of RGB-D images has not attained a comparable success, as the RGB semantic labeling, due to the lack of a challenging dataset. With the introduction of an RGB-D dataset, called NYU-V2, it became possible to propose a novel method to improve the labeling accuracy. In this paper, a semantic segmentation algorithm for RGB-D images is presented. The concentration of the proposed algorithm is on the feature description and classification steps. In the feature description step, the more discriminative features from RGB images and the 3D point cloud data are grouped with local neighboring features to incorporate their context into the classification step. In the classification step, a pairwise multi-class conditional random field framework is utilized in which the unary potential function is considered as the probabilistic output of a random forest classifier. The proposed algorithm is evaluated on the NYU-V2 dataset and the performance is compared to that of other methods presented in the literature. The proposed algorithm achieves the state-of-the-art results on the NYU-V2 dataset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于3D和局部邻近特征的RGB-D图像语义分割

三维场景理解是计算机视觉领域的重要问题之一。虽然在过去的几十年里，人们对2D场景的理解问题给予了相当大的关注，但现在随着深度传感器(如微软Kinect)的发展，3D场景的理解已经成为一项非常具有挑战性的任务。传统上，场景理解问题被认为是每个图像像素的语义标注问题。由于缺乏具有挑战性的数据集，RGB- d图像的语义标记没有取得与RGB语义标记相当的成功。随着RGB-D数据集(称为NYU-V2)的引入，提出一种提高标记准确性的新方法成为可能。本文提出了一种RGB-D图像的语义分割算法。该算法的重点在于特征描述和分类步骤。在特征描述步骤中，将RGB图像和3D点云数据中更具判别性的特征与局部相邻特征分组，将其上下文纳入分类步骤。在分类步骤中，采用两两多类条件随机场框架，其中一元势函数作为随机森林分类器的概率输出。在NYU-V2数据集上对该算法进行了评估，并将其性能与文献中提出的其他方法进行了比较。提出的算法在NYU-V2数据集上实现了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

自引率

0.00%

发文量