SEB-Net: Revisiting Deep Encoder-Decoder Networks for Scene Understanding

P. K. Gadosey, Yuijan Li, Ting Zhang, Zhaoying Liu, Edna Chebet Too, Firdaous Essaf
{"title":"SEB-Net: Revisiting Deep Encoder-Decoder Networks for Scene Understanding","authors":"P. K. Gadosey, Yuijan Li, Ting Zhang, Zhaoying Liu, Edna Chebet Too, Firdaous Essaf","doi":"10.1145/3404555.3404629","DOIUrl":null,"url":null,"abstract":"As a research area of computer vision and deep learning, scene understanding has attracted a lot of attention in recent years. One major challenge encountered is obtaining high levels of segmentation accuracy while dealing with the computational cost and time associated with training or inference. Most current algorithms compromise one metric for the other depending on the intended devices. To address this problem, this paper proposes a novel deep neural network architecture called Segmentation Efficient Blocks Network (SEB-Net) that seeks to achieve the best possible balance between accuracy and computational costs as well as real-time inference speed. The model is composed of both an encoder path and a decoder path in a symmetric structure. The encoder path consists of 16 convolution layers identical to a VGG-19 model, and the decoder path includes what we call E-blocks (Efficient Blocks) inspired by the widely popular ENet architecture's bottleneck module with slight modifications. One advantage of this model is that the max-unpooling in the decoder path is employed for expansion and projection convolutions in the E-Blocks, allowing for less learnable parameters and efficient computation (10.1 frames per second (fps) for a 480x320 input, 11x fewer parameters than DeconvNet, 52.4 GFLOPs for a 640x360 input on a TESLA K40 GPU device). Experimental results on two outdoor scene datasets; Cambridge-driving Labeled Video Database (CamVid) and Cityscapes, indicate that SEB-Net can achieve higher performance compared to Fully Convolutional Networks (FCN), SegNet, DeepLabV, and Dilation8 in most cases. What's more, SEB-Net outperforms efficient architectures like ENet and LinkNet by 16.1 and 11.6 respectively in terms of Instance-level intersection over Union (iLoU). SEB-Net also shows better performance when further evaluated on the SUNRGB-D, an indoor scene dataset","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"156 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

As a research area of computer vision and deep learning, scene understanding has attracted a lot of attention in recent years. One major challenge encountered is obtaining high levels of segmentation accuracy while dealing with the computational cost and time associated with training or inference. Most current algorithms compromise one metric for the other depending on the intended devices. To address this problem, this paper proposes a novel deep neural network architecture called Segmentation Efficient Blocks Network (SEB-Net) that seeks to achieve the best possible balance between accuracy and computational costs as well as real-time inference speed. The model is composed of both an encoder path and a decoder path in a symmetric structure. The encoder path consists of 16 convolution layers identical to a VGG-19 model, and the decoder path includes what we call E-blocks (Efficient Blocks) inspired by the widely popular ENet architecture's bottleneck module with slight modifications. One advantage of this model is that the max-unpooling in the decoder path is employed for expansion and projection convolutions in the E-Blocks, allowing for less learnable parameters and efficient computation (10.1 frames per second (fps) for a 480x320 input, 11x fewer parameters than DeconvNet, 52.4 GFLOPs for a 640x360 input on a TESLA K40 GPU device). Experimental results on two outdoor scene datasets; Cambridge-driving Labeled Video Database (CamVid) and Cityscapes, indicate that SEB-Net can achieve higher performance compared to Fully Convolutional Networks (FCN), SegNet, DeepLabV, and Dilation8 in most cases. What's more, SEB-Net outperforms efficient architectures like ENet and LinkNet by 16.1 and 11.6 respectively in terms of Instance-level intersection over Union (iLoU). SEB-Net also shows better performance when further evaluated on the SUNRGB-D, an indoor scene dataset
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SEB-Net:重新审视场景理解的深度编码器-解码器网络
场景理解作为计算机视觉和深度学习的一个研究领域,近年来受到了广泛的关注。遇到的一个主要挑战是在处理与训练或推理相关的计算成本和时间的同时获得高水平的分割准确性。目前的大多数算法都是根据预期的设备折衷一个度量。为了解决这个问题,本文提出了一种新的深度神经网络架构,称为分割高效块网络(SEB-Net),旨在实现准确性和计算成本以及实时推理速度之间的最佳平衡。该模型由对称结构的编码器路径和解码器路径组成。编码器路径由16个与VGG-19模型相同的卷积层组成,解码器路径包括我们所谓的e块(高效块),灵感来自广泛流行的ENet架构的瓶颈模块,并进行了轻微修改。该模型的一个优点是,解码器路径中的最大解池用于E-Blocks中的扩展和投影卷积,允许较少的可学习参数和高效计算(480 × 320输入10.1帧每秒(fps),比DeconvNet少11倍参数,在TESLA K40 GPU设备上640x360输入52.4 GFLOPs)。两种室外场景数据集的实验结果剑桥驾驶标记视频数据库(CamVid)和cityscape的研究表明,在大多数情况下,SEB-Net可以比Fully Convolutional Networks (FCN)、SegNet、DeepLabV和Dilation8实现更高的性能。更重要的是,SEB-Net在实例级Union交集(iLoU)方面比ENet和LinkNet等高效架构分别高出16.1和11.6。SEB-Net在室内场景数据集SUNRGB-D上进一步评估时也显示出更好的性能
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
mRNA Big Data Analysis of Hepatoma Carcinoma Between Different Genders Generalization or Instantiation?: Estimating the Relative Abstractness between Images and Text Auxiliary Edge Detection for Semantic Image Segmentation Intrusion Detection of Abnormal Objects for Railway Scenes Using Infrared Images Multi-Tenant Machine Learning Platform Based on Kubernetes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1