{"title":"语义分割:深度架构的动物学","authors":"Aitor Artola","doi":"10.5201/ipol.2023.447","DOIUrl":null,"url":null,"abstract":"In this paper we review the evolution of deep architectures for semantic segmentation. The (cid:28)rst successful model was fully convolutional network (FCN) published in CVPR in 2015. Since then, the subject has become very popular and many methods have been published, mainly proposing improvements of FCN. We describe in detail the Pyramid Scene Parsing Network (PSPnet) and DeepLabV3, in addition to FCN, which provide a multi-scale description and increase the resolution of segmentation. In recent years, convolutional architectures have reached a bottleneck and have been surpassed by transformers from natural language processing (NLP), even though these models are generally larger and slower. We have chosen to discuss about the Segmentation Transformer (SETR), a (cid:28)rst architecture with a transformer backbone. We also discuss SegFormer, that includes a multi-scale interpretation and tricks to decrease the size and inference time of the network. The networks presented in the demo come from the MM-Segmentation library, an open source semantic segmentation toolbox based on PyTorch. We propose to compare these methods qualitatively","PeriodicalId":54190,"journal":{"name":"Image Processing On Line","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Segmentation: A Zoology of Deep Architectures\",\"authors\":\"Aitor Artola\",\"doi\":\"10.5201/ipol.2023.447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we review the evolution of deep architectures for semantic segmentation. The (cid:28)rst successful model was fully convolutional network (FCN) published in CVPR in 2015. Since then, the subject has become very popular and many methods have been published, mainly proposing improvements of FCN. We describe in detail the Pyramid Scene Parsing Network (PSPnet) and DeepLabV3, in addition to FCN, which provide a multi-scale description and increase the resolution of segmentation. In recent years, convolutional architectures have reached a bottleneck and have been surpassed by transformers from natural language processing (NLP), even though these models are generally larger and slower. We have chosen to discuss about the Segmentation Transformer (SETR), a (cid:28)rst architecture with a transformer backbone. We also discuss SegFormer, that includes a multi-scale interpretation and tricks to decrease the size and inference time of the network. The networks presented in the demo come from the MM-Segmentation library, an open source semantic segmentation toolbox based on PyTorch. We propose to compare these methods qualitatively\",\"PeriodicalId\":54190,\"journal\":{\"name\":\"Image Processing On Line\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image Processing On Line\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5201/ipol.2023.447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image Processing On Line","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5201/ipol.2023.447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Semantic Segmentation: A Zoology of Deep Architectures
In this paper we review the evolution of deep architectures for semantic segmentation. The (cid:28)rst successful model was fully convolutional network (FCN) published in CVPR in 2015. Since then, the subject has become very popular and many methods have been published, mainly proposing improvements of FCN. We describe in detail the Pyramid Scene Parsing Network (PSPnet) and DeepLabV3, in addition to FCN, which provide a multi-scale description and increase the resolution of segmentation. In recent years, convolutional architectures have reached a bottleneck and have been surpassed by transformers from natural language processing (NLP), even though these models are generally larger and slower. We have chosen to discuss about the Segmentation Transformer (SETR), a (cid:28)rst architecture with a transformer backbone. We also discuss SegFormer, that includes a multi-scale interpretation and tricks to decrease the size and inference time of the network. The networks presented in the demo come from the MM-Segmentation library, an open source semantic segmentation toolbox based on PyTorch. We propose to compare these methods qualitatively
期刊介绍:
IPOL publishes relevant image processing and image analysis algorithms emphasizing the role of mathematics as a source for algorithm design. The publication is as precise and comprehensive as possible. To this aim, the publication of each algorithm is fourfold and includes: a manuscript containing the detailed description of the published algorithm, of its bibliography, along with commented examples and a failure case analysis; a software implementation of the algorithm in C, C++ or Matlab; an online demo, where the algorithm can be tested on data sets uploaded by the users; an archive containing extensive online experiments. The restricted goal of IPOL is to make accessible the algorithms in their uttermost explicit form to the scientific community. The publication of an algorithm by IPOL is different from, and complementary to a classic journal publication.