基于场景几何的语义图像分割

2021 IEEE International Conference on Autonomous Systems (ICAS) Pub Date : 2021-08-11 DOI:10.1109/ICAS49788.2021.9551117

Sotirios Papadopoulos, Ioannis Mademlis, I. Pitas

{"title":"基于场景几何的语义图像分割","authors":"Sotirios Papadopoulos, Ioannis Mademlis, I. Pitas","doi":"10.1109/ICAS49788.2021.9551117","DOIUrl":null,"url":null,"abstract":"Semantic image segmentation is an important functionality in various applications, such as robotic vision for autonomous cars, drones, etc. Modern Convolutional Neural Networks (CNNs) process input RGB images and predict per-pixel semantic classes. Depth maps have been successfully utilized to increase accuracy over RGB-only input. They can be used as an additional input channel complementing the RGB image, or they may be estimated by an extra neural branch under a multitask training setting. Contrary to these approaches, in this paper we explore a novel regularizer that penalizes differences between semantic and self-supervised depth predictions on presumed object boundaries during CNN training. The proposed method does not resort to multitask training (which may require a more complex CNN backbone to avoid underfitting), does not rely on RGB-D or stereoscopic 3D training data and does not require known or estimated depth maps during inference. Quantitative evaluation on a public scene parsing video dataset for autonomous driving indicates enhanced semantic segmentation accuracy with zero inference runtime overhead.","PeriodicalId":287105,"journal":{"name":"2021 IEEE International Conference on Autonomous Systems (ICAS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Semantic Image Segmentation Guided By Scene Geometry\",\"authors\":\"Sotirios Papadopoulos, Ioannis Mademlis, I. Pitas\",\"doi\":\"10.1109/ICAS49788.2021.9551117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic image segmentation is an important functionality in various applications, such as robotic vision for autonomous cars, drones, etc. Modern Convolutional Neural Networks (CNNs) process input RGB images and predict per-pixel semantic classes. Depth maps have been successfully utilized to increase accuracy over RGB-only input. They can be used as an additional input channel complementing the RGB image, or they may be estimated by an extra neural branch under a multitask training setting. Contrary to these approaches, in this paper we explore a novel regularizer that penalizes differences between semantic and self-supervised depth predictions on presumed object boundaries during CNN training. The proposed method does not resort to multitask training (which may require a more complex CNN backbone to avoid underfitting), does not rely on RGB-D or stereoscopic 3D training data and does not require known or estimated depth maps during inference. Quantitative evaluation on a public scene parsing video dataset for autonomous driving indicates enhanced semantic segmentation accuracy with zero inference runtime overhead.\",\"PeriodicalId\":287105,\"journal\":{\"name\":\"2021 IEEE International Conference on Autonomous Systems (ICAS)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Autonomous Systems (ICAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAS49788.2021.9551117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Autonomous Systems (ICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAS49788.2021.9551117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

语义图像分割是各种应用中的重要功能，例如自动驾驶汽车的机器人视觉，无人机等。现代卷积神经网络(cnn)处理输入的RGB图像并预测每个像素的语义类。深度图已被成功地用于提高仅rgb输入的准确性。它们可以用作补充RGB图像的额外输入通道，也可以在多任务训练设置下由额外的神经分支进行估计。与这些方法相反，在本文中，我们探索了一种新的正则化器，该正则化器在CNN训练期间对假定对象边界进行语义和自监督深度预测之间的差异进行惩罚。该方法不需要多任务训练(这可能需要更复杂的CNN主干来避免欠拟合)，不依赖于RGB-D或立体3D训练数据，在推理过程中不需要已知或估计的深度图。对自动驾驶公共场景解析视频数据集的定量评价表明，在零推理运行时开销的情况下，语义分割精度得到了提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Semantic Image Segmentation Guided By Scene Geometry

Semantic image segmentation is an important functionality in various applications, such as robotic vision for autonomous cars, drones, etc. Modern Convolutional Neural Networks (CNNs) process input RGB images and predict per-pixel semantic classes. Depth maps have been successfully utilized to increase accuracy over RGB-only input. They can be used as an additional input channel complementing the RGB image, or they may be estimated by an extra neural branch under a multitask training setting. Contrary to these approaches, in this paper we explore a novel regularizer that penalizes differences between semantic and self-supervised depth predictions on presumed object boundaries during CNN training. The proposed method does not resort to multitask training (which may require a more complex CNN backbone to avoid underfitting), does not rely on RGB-D or stereoscopic 3D training data and does not require known or estimated depth maps during inference. Quantitative evaluation on a public scene parsing video dataset for autonomous driving indicates enhanced semantic segmentation accuracy with zero inference runtime overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Autonomous Systems (ICAS)

自引率

0.00%

发文量