Dexin Ren , Minxian Li , Shidong Wang , Mingwu Ren , Haofeng Zhang
{"title":"SAFENet:用于无监督跨域道路场景分割的语义感知特征增强网络","authors":"Dexin Ren , Minxian Li , Shidong Wang , Mingwu Ren , Haofeng Zhang","doi":"10.1016/j.imavis.2024.105318","DOIUrl":null,"url":null,"abstract":"<div><div>Unsupervised cross-domain road scene segmentation has attracted substantial interest because of its capability to perform segmentation on new and unlabeled domains, thereby reducing the dependence on expensive manual annotations. This is achieved by leveraging networks trained on labeled source domains to classify images on unlabeled target domains. Conventional techniques usually use adversarial networks to align inputs from the source and the target in either of their domains. However, these approaches often fall short in effectively integrating information from both domains due to Alignment in each space usually leads to bias problems during feature learning. To overcome these limitations and enhance cross-domain interaction while mitigating overfitting to the source domain, we introduce a novel framework called Semantic-Aware Feature Enhancement Network (SAFENet) for Unsupervised Cross-domain Road Scene Segmentation. SAFENet incorporates the Semantic-Aware Enhancement (SAE) module to amplify the importance of class information in segmentation tasks and uses the semantic space as a new domain to guide the alignment of the source and target domains. Additionally, we integrate Adaptive Instance Normalization with Momentum (AdaIN-M) techniques, which convert the source domain image style to the target domain image style, thereby reducing the adverse effects of source domain overfitting on target domain segmentation performance. Moreover, SAFENet employs a Knowledge Transfer (KT) module to optimize network architecture, enhancing computational efficiency during testing while maintaining the robust inference capabilities developed during training. To further improve the segmentation performance, we further employ Curriculum Learning, a self-training mechanism that uses pseudo-labels derived from the target domain to iteratively refine the network. Comprehensive experiments on three well-known datasets, “Synthia<span><math><mo>→</mo></math></span>Cityscapes” and “GTA5<span><math><mo>→</mo></math></span>Cityscapes”, demonstrate the superior performance of our method. In-depth examinations and ablation studies verify the efficacy of each module within the proposed method.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105318"},"PeriodicalIF":4.2000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SAFENet: Semantic-Aware Feature Enhancement Network for unsupervised cross-domain road scene segmentation\",\"authors\":\"Dexin Ren , Minxian Li , Shidong Wang , Mingwu Ren , Haofeng Zhang\",\"doi\":\"10.1016/j.imavis.2024.105318\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Unsupervised cross-domain road scene segmentation has attracted substantial interest because of its capability to perform segmentation on new and unlabeled domains, thereby reducing the dependence on expensive manual annotations. This is achieved by leveraging networks trained on labeled source domains to classify images on unlabeled target domains. Conventional techniques usually use adversarial networks to align inputs from the source and the target in either of their domains. However, these approaches often fall short in effectively integrating information from both domains due to Alignment in each space usually leads to bias problems during feature learning. To overcome these limitations and enhance cross-domain interaction while mitigating overfitting to the source domain, we introduce a novel framework called Semantic-Aware Feature Enhancement Network (SAFENet) for Unsupervised Cross-domain Road Scene Segmentation. SAFENet incorporates the Semantic-Aware Enhancement (SAE) module to amplify the importance of class information in segmentation tasks and uses the semantic space as a new domain to guide the alignment of the source and target domains. Additionally, we integrate Adaptive Instance Normalization with Momentum (AdaIN-M) techniques, which convert the source domain image style to the target domain image style, thereby reducing the adverse effects of source domain overfitting on target domain segmentation performance. Moreover, SAFENet employs a Knowledge Transfer (KT) module to optimize network architecture, enhancing computational efficiency during testing while maintaining the robust inference capabilities developed during training. To further improve the segmentation performance, we further employ Curriculum Learning, a self-training mechanism that uses pseudo-labels derived from the target domain to iteratively refine the network. Comprehensive experiments on three well-known datasets, “Synthia<span><math><mo>→</mo></math></span>Cityscapes” and “GTA5<span><math><mo>→</mo></math></span>Cityscapes”, demonstrate the superior performance of our method. In-depth examinations and ablation studies verify the efficacy of each module within the proposed method.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"152 \",\"pages\":\"Article 105318\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624004232\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004232","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
SAFENet: Semantic-Aware Feature Enhancement Network for unsupervised cross-domain road scene segmentation
Unsupervised cross-domain road scene segmentation has attracted substantial interest because of its capability to perform segmentation on new and unlabeled domains, thereby reducing the dependence on expensive manual annotations. This is achieved by leveraging networks trained on labeled source domains to classify images on unlabeled target domains. Conventional techniques usually use adversarial networks to align inputs from the source and the target in either of their domains. However, these approaches often fall short in effectively integrating information from both domains due to Alignment in each space usually leads to bias problems during feature learning. To overcome these limitations and enhance cross-domain interaction while mitigating overfitting to the source domain, we introduce a novel framework called Semantic-Aware Feature Enhancement Network (SAFENet) for Unsupervised Cross-domain Road Scene Segmentation. SAFENet incorporates the Semantic-Aware Enhancement (SAE) module to amplify the importance of class information in segmentation tasks and uses the semantic space as a new domain to guide the alignment of the source and target domains. Additionally, we integrate Adaptive Instance Normalization with Momentum (AdaIN-M) techniques, which convert the source domain image style to the target domain image style, thereby reducing the adverse effects of source domain overfitting on target domain segmentation performance. Moreover, SAFENet employs a Knowledge Transfer (KT) module to optimize network architecture, enhancing computational efficiency during testing while maintaining the robust inference capabilities developed during training. To further improve the segmentation performance, we further employ Curriculum Learning, a self-training mechanism that uses pseudo-labels derived from the target domain to iteratively refine the network. Comprehensive experiments on three well-known datasets, “SynthiaCityscapes” and “GTA5Cityscapes”, demonstrate the superior performance of our method. In-depth examinations and ablation studies verify the efficacy of each module within the proposed method.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.