语义分割:深度架构的动物学

IF 0.7 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Image Processing On Line Pub Date : 2023-06-07 DOI:10.5201/ipol.2023.447

Aitor Artola

{"title":"语义分割:深度架构的动物学","authors":"Aitor Artola","doi":"10.5201/ipol.2023.447","DOIUrl":null,"url":null,"abstract":"In this paper we review the evolution of deep architectures for semantic segmentation. The (cid:28)rst successful model was fully convolutional network (FCN) published in CVPR in 2015. Since then, the subject has become very popular and many methods have been published, mainly proposing improvements of FCN. We describe in detail the Pyramid Scene Parsing Network (PSPnet) and DeepLabV3, in addition to FCN, which provide a multi-scale description and increase the resolution of segmentation. In recent years, convolutional architectures have reached a bottleneck and have been surpassed by transformers from natural language processing (NLP), even though these models are generally larger and slower. We have chosen to discuss about the Segmentation Transformer (SETR), a (cid:28)rst architecture with a transformer backbone. We also discuss SegFormer, that includes a multi-scale interpretation and tricks to decrease the size and inference time of the network. The networks presented in the demo come from the MM-Segmentation library, an open source semantic segmentation toolbox based on PyTorch. We propose to compare these methods qualitatively","PeriodicalId":54190,"journal":{"name":"Image Processing On Line","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Segmentation: A Zoology of Deep Architectures\",\"authors\":\"Aitor Artola\",\"doi\":\"10.5201/ipol.2023.447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we review the evolution of deep architectures for semantic segmentation. The (cid:28)rst successful model was fully convolutional network (FCN) published in CVPR in 2015. Since then, the subject has become very popular and many methods have been published, mainly proposing improvements of FCN. We describe in detail the Pyramid Scene Parsing Network (PSPnet) and DeepLabV3, in addition to FCN, which provide a multi-scale description and increase the resolution of segmentation. In recent years, convolutional architectures have reached a bottleneck and have been surpassed by transformers from natural language processing (NLP), even though these models are generally larger and slower. We have chosen to discuss about the Segmentation Transformer (SETR), a (cid:28)rst architecture with a transformer backbone. We also discuss SegFormer, that includes a multi-scale interpretation and tricks to decrease the size and inference time of the network. The networks presented in the demo come from the MM-Segmentation library, an open source semantic segmentation toolbox based on PyTorch. We propose to compare these methods qualitatively\",\"PeriodicalId\":54190,\"journal\":{\"name\":\"Image Processing On Line\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image Processing On Line\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5201/ipol.2023.447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image Processing On Line","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5201/ipol.2023.447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

本文回顾了语义分割深度架构的发展历程。(cid:28)第一个成功的模型是2015年在CVPR上发表的全卷积网络(FCN)。从那时起，这个课题变得非常流行，发表了许多方法，主要是提出对FCN的改进。我们详细描述了金字塔场景解析网络(PSPnet)和DeepLabV3，以及FCN，它们提供了多尺度描述并提高了分割分辨率。近年来，卷积架构已经达到了瓶颈，并且已经被自然语言处理(NLP)的转换器所超越，尽管这些模型通常更大、更慢。我们选择讨论分割变压器(SETR)，这是一个具有变压器主干的(cid:28)rst架构。我们还讨论了SegFormer，它包括一个多尺度解释和技巧，以减少网络的大小和推理时间。演示中呈现的网络来自MM-Segmentation库，这是一个基于PyTorch的开源语义分割工具箱。我们建议对这些方法进行定性比较

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Semantic Segmentation: A Zoology of Deep Architectures

In this paper we review the evolution of deep architectures for semantic segmentation. The (cid:28)rst successful model was fully convolutional network (FCN) published in CVPR in 2015. Since then, the subject has become very popular and many methods have been published, mainly proposing improvements of FCN. We describe in detail the Pyramid Scene Parsing Network (PSPnet) and DeepLabV3, in addition to FCN, which provide a multi-scale description and increase the resolution of segmentation. In recent years, convolutional architectures have reached a bottleneck and have been surpassed by transformers from natural language processing (NLP), even though these models are generally larger and slower. We have chosen to discuss about the Segmentation Transformer (SETR), a (cid:28)rst architecture with a transformer backbone. We also discuss SegFormer, that includes a multi-scale interpretation and tricks to decrease the size and inference time of the network. The networks presented in the demo come from the MM-Segmentation library, an open source semantic segmentation toolbox based on PyTorch. We propose to compare these methods qualitatively

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image Processing On Line COMPUTER SCIENCE, SOFTWARE ENGINEERING-

CiteScore

1.90

自引率

0.00%

发文量

审稿时长

16 weeks

期刊介绍： IPOL publishes relevant image processing and image analysis algorithms emphasizing the role of mathematics as a source for algorithm design. The publication is as precise and comprehensive as possible. To this aim, the publication of each algorithm is fourfold and includes: a manuscript containing the detailed description of the published algorithm, of its bibliography, along with commented examples and a failure case analysis; a software implementation of the algorithm in C, C++ or Matlab; an online demo, where the algorithm can be tested on data sets uploaded by the users; an archive containing extensive online experiments. The restricted goal of IPOL is to make accessible the algorithms in their uttermost explicit form to the scientific community. The publication of an algorithm by IPOL is different from, and complementary to a classic journal publication.

期刊最新文献

Accelerating NeRF with the Visual Hull A Brief Evaluation of InSAR Phase Denoising and Coherence Estimation with Phi-Net Survival Forest for Left-Truncated Right-Censored Data A Short Analysis of BigColor for Image Colorization A Brief Analysis of iColoriT for Interactive Image Colorization