{"title":"用于语义分割的增强特征金字塔网络","authors":"Van Toan Quyen, Jong Hyuk Lee, Min Young Kim","doi":"10.1109/ICAIIC57133.2023.10067062","DOIUrl":null,"url":null,"abstract":"Semantic segmentation is a complicated topic when they require strictly the object boundary accuracy. For autonomous driving applications, they have to face a long range of objective sizes in the street scenes, so a single field of views is not suitable to extract input features. Feature pyramid network (FPN) is an effective method for computer vision tasks such as object detection and semantic segmentation. The architecture of this approach composes of a bottom-up pathway and a top-down pathway. Based on the structure, we can obtain rich spatial information from the largest layer and extract rich segmentation information from lower-scale features. The traditional FPN efficiently captures different objective sizes by using multiple receptive fields and then predicts the outputs from the concatenated features. The final feature combination is not optimistic when they burden the hardware with huge computation and reduce the semantic information. In this paper, we propose multiple predictions for semantic segmentation. Instead of combining four-feature scales together, the proposed method processes separately three lower scales as the contextual contributor and the largest features as the coarser-information branch. Each contextual feature is concatenated with the coarse branch to generate an individual prediction. By deploying this architecture, a single prediction effectively segments specific objective sizes. Finally, score maps are fused together in order to gather the prominent weights from the different predictions. A series of experiments is implemented to validate the efficiency on various open data sets. We have achieved good results 76.4% $m$IoU at 52 FPS on Cityscapes and 43.6% $m$IoU on Mapillary Vistas.","PeriodicalId":105769,"journal":{"name":"2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced-feature pyramid network for semantic segmentation\",\"authors\":\"Van Toan Quyen, Jong Hyuk Lee, Min Young Kim\",\"doi\":\"10.1109/ICAIIC57133.2023.10067062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic segmentation is a complicated topic when they require strictly the object boundary accuracy. For autonomous driving applications, they have to face a long range of objective sizes in the street scenes, so a single field of views is not suitable to extract input features. Feature pyramid network (FPN) is an effective method for computer vision tasks such as object detection and semantic segmentation. The architecture of this approach composes of a bottom-up pathway and a top-down pathway. Based on the structure, we can obtain rich spatial information from the largest layer and extract rich segmentation information from lower-scale features. The traditional FPN efficiently captures different objective sizes by using multiple receptive fields and then predicts the outputs from the concatenated features. The final feature combination is not optimistic when they burden the hardware with huge computation and reduce the semantic information. In this paper, we propose multiple predictions for semantic segmentation. Instead of combining four-feature scales together, the proposed method processes separately three lower scales as the contextual contributor and the largest features as the coarser-information branch. Each contextual feature is concatenated with the coarse branch to generate an individual prediction. By deploying this architecture, a single prediction effectively segments specific objective sizes. Finally, score maps are fused together in order to gather the prominent weights from the different predictions. A series of experiments is implemented to validate the efficiency on various open data sets. We have achieved good results 76.4% $m$IoU at 52 FPS on Cityscapes and 43.6% $m$IoU on Mapillary Vistas.\",\"PeriodicalId\":105769,\"journal\":{\"name\":\"2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIIC57133.2023.10067062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIIC57133.2023.10067062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhanced-feature pyramid network for semantic segmentation
Semantic segmentation is a complicated topic when they require strictly the object boundary accuracy. For autonomous driving applications, they have to face a long range of objective sizes in the street scenes, so a single field of views is not suitable to extract input features. Feature pyramid network (FPN) is an effective method for computer vision tasks such as object detection and semantic segmentation. The architecture of this approach composes of a bottom-up pathway and a top-down pathway. Based on the structure, we can obtain rich spatial information from the largest layer and extract rich segmentation information from lower-scale features. The traditional FPN efficiently captures different objective sizes by using multiple receptive fields and then predicts the outputs from the concatenated features. The final feature combination is not optimistic when they burden the hardware with huge computation and reduce the semantic information. In this paper, we propose multiple predictions for semantic segmentation. Instead of combining four-feature scales together, the proposed method processes separately three lower scales as the contextual contributor and the largest features as the coarser-information branch. Each contextual feature is concatenated with the coarse branch to generate an individual prediction. By deploying this architecture, a single prediction effectively segments specific objective sizes. Finally, score maps are fused together in order to gather the prominent weights from the different predictions. A series of experiments is implemented to validate the efficiency on various open data sets. We have achieved good results 76.4% $m$IoU at 52 FPS on Cityscapes and 43.6% $m$IoU on Mapillary Vistas.