Monocular Depth Estimation Using Encoder-Decoder Architecture and Transfer Learning from Single RGB Image

2020 IEEE 7th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) Pub Date : 2020-11-27 DOI:10.1109/UPCON50219.2020.9376365

Hritam Basak, Sagnik Ghosal, Mainak Sarkar, Mayukhmali Das, Soham Chattopadhyay

{"title":"Monocular Depth Estimation Using Encoder-Decoder Architecture and Transfer Learning from Single RGB Image","authors":"Hritam Basak, Sagnik Ghosal, Mainak Sarkar, Mayukhmali Das, Soham Chattopadhyay","doi":"10.1109/UPCON50219.2020.9376365","DOIUrl":null,"url":null,"abstract":"Depth estimation from a single RGB image has been one of the most important research topics in recent days as it has several important applications in self-supervised driving in autonomous cars, image reconstruction, and scene segmentation. Depth estimation from a single monocular image has been challenging as compared to stereo images due to the lack of spatio-temporal features per frame that makes 3D depth perception easier. Existing models and solutions in monocular depth estimation often resulted in low resolution and blurry depth maps and often fail to identify small object boundaries. In this paper, we propose a simple encoder-decoder based network that can predict high-quality depth images from single RGB images using transfer learning. We have utilized important features extracted from pre-trained networks, and after initializing the encoder with fine-tuning and important augmentation strategies, the network decoder part computes the high-end depth maps. The network has fewer trainable parameters and small iterations, though it outperforms the existing state-of-the-art methods and captures accurate boundaries when evaluated on two standard datasets, KITTI, and NYU Depth V2.","PeriodicalId":192190,"journal":{"name":"2020 IEEE 7th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 7th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UPCON50219.2020.9376365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Depth estimation from a single RGB image has been one of the most important research topics in recent days as it has several important applications in self-supervised driving in autonomous cars, image reconstruction, and scene segmentation. Depth estimation from a single monocular image has been challenging as compared to stereo images due to the lack of spatio-temporal features per frame that makes 3D depth perception easier. Existing models and solutions in monocular depth estimation often resulted in low resolution and blurry depth maps and often fail to identify small object boundaries. In this paper, we propose a simple encoder-decoder based network that can predict high-quality depth images from single RGB images using transfer learning. We have utilized important features extracted from pre-trained networks, and after initializing the encoder with fine-tuning and important augmentation strategies, the network decoder part computes the high-end depth maps. The network has fewer trainable parameters and small iterations, though it outperforms the existing state-of-the-art methods and captures accurate boundaries when evaluated on two standard datasets, KITTI, and NYU Depth V2.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于编码器-解码器结构和RGB单幅图像迁移学习的单目深度估计

单幅RGB图像的深度估计是近年来最重要的研究课题之一，因为它在自动驾驶汽车的自监督驾驶、图像重建和场景分割中有几个重要的应用。与立体图像相比，单眼图像的深度估计具有挑战性，因为缺乏每帧的时空特征，这使得3D深度感知更容易。现有的单目深度估计模型和解决方案往往导致深度图分辨率低、模糊，难以识别小目标边界。在本文中，我们提出了一个简单的基于编码器-解码器的网络，该网络可以使用迁移学习从单个RGB图像中预测高质量的深度图像。我们利用了从预训练网络中提取的重要特征，并在使用微调和重要增强策略初始化编码器后，网络解码器部分计算高端深度图。该网络具有更少的可训练参数和较小的迭代，尽管它优于现有的最先进的方法，并在两个标准数据集KITTI和NYU Depth V2上进行评估时捕获准确的边界。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 7th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)

自引率

0.00%

发文量