Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-08-10 DOI:10.1016/j.imavis.2024.105211

Ruiying Chen , Yunan Liu , Yuming Bo , Mingyu Lu

{"title":"Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation","authors":"Ruiying Chen , Yunan Liu , Yuming Bo , Mingyu Lu","doi":"10.1016/j.imavis.2024.105211","DOIUrl":null,"url":null,"abstract":"<div><p>While significant progress has been achieved in the field of image semantic segmentation, the majority of research has been primarily concentrated on daytime scenes. Semantic segmentation of nighttime images is equally critical for autonomous driving; however, this task presents greater challenges due to inadequate lighting and difficulties associated with obtaining accurate manual annotations. In this paper, we introduce a novel method called the Dual-Branch Teacher-Student (DBTS) framework for unsupervised nighttime semantic segmentation. Our approach combines domain alignment and knowledge distillation in a mutually reinforcing manner. Firstly, we employ a photometric alignment module to dynamically generate target-like latent images, bridging the appearance gap between the source domain (daytime) and the target domain (nighttime). Secondly, we establish a dual-branch framework, where each branch enhances collaboration between the teacher and student networks. The student network utilizes adversarial learning to align the target domain with another domain (i.e., source or latent domain), while the teacher network generates reliable pseudo-labels by distilling knowledge from the latent domain. Furthermore, recognizing the potential noise present in pseudo-labels, we propose a noise-tolerant learning method to mitigate the risks associated with overreliance on pseudo-labels during domain adaptation. When evaluated on benchmark datasets, the proposed DBTS achieves state-of-the-art performance. Specifically, DBTS, using different backbones, outperforms established baseline models by approximately 25% in mIoU on the Zurich dataset and by over 26% in mIoU on the ACDC dataset, demonstrating the effectiveness of our method in addressing the challenges of domain-adaptive nighttime segmentation.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105211"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003160","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

While significant progress has been achieved in the field of image semantic segmentation, the majority of research has been primarily concentrated on daytime scenes. Semantic segmentation of nighttime images is equally critical for autonomous driving; however, this task presents greater challenges due to inadequate lighting and difficulties associated with obtaining accurate manual annotations. In this paper, we introduce a novel method called the Dual-Branch Teacher-Student (DBTS) framework for unsupervised nighttime semantic segmentation. Our approach combines domain alignment and knowledge distillation in a mutually reinforcing manner. Firstly, we employ a photometric alignment module to dynamically generate target-like latent images, bridging the appearance gap between the source domain (daytime) and the target domain (nighttime). Secondly, we establish a dual-branch framework, where each branch enhances collaboration between the teacher and student networks. The student network utilizes adversarial learning to align the target domain with another domain (i.e., source or latent domain), while the teacher network generates reliable pseudo-labels by distilling knowledge from the latent domain. Furthermore, recognizing the potential noise present in pseudo-labels, we propose a noise-tolerant learning method to mitigate the risks associated with overreliance on pseudo-labels during domain adaptation. When evaluated on benchmark datasets, the proposed DBTS achieves state-of-the-art performance. Specifically, DBTS, using different backbones, outperforms established baseline models by approximately 25% in mIoU on the Zurich dataset and by over 26% in mIoU on the ACDC dataset, demonstrating the effectiveness of our method in addressing the challenges of domain-adaptive nighttime segmentation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于域自适应夜间分段的具有容噪学习功能的双分支师生系统

虽然图像语义分割领域取得了重大进展，但大部分研究主要集中在白天场景。夜间图像的语义分割对于自动驾驶同样至关重要；然而，由于照明不足以及难以获得准确的人工注释，这项任务面临着更大的挑战。在本文中，我们介绍了一种用于无监督夜间语义分割的新型方法，即双支师生（DBTS）框架。我们的方法以相辅相成的方式将领域对齐和知识提炼结合在一起。首先，我们采用光度对齐模块动态生成类似目标的潜在图像，弥合源域（白天）和目标域（夜间）之间的外观差距。其次，我们建立了一个双分支框架，每个分支都加强了教师网络和学生网络之间的协作。学生网络利用对抗学习将目标域与另一个域（即源域或潜域）对齐，而教师网络则通过从潜域中提炼知识生成可靠的伪标签。此外，考虑到伪标签中可能存在的噪声，我们提出了一种噪声容忍学习方法，以降低在域适应过程中过度依赖伪标签所带来的风险。在基准数据集上进行评估时，所提出的 DBTS 达到了最先进的性能。具体来说，使用不同骨干网的 DBTS 在苏黎世数据集上的 mIoU 优于已建立的基线模型约 25%，在 ACDC 数据集上的 mIoU 优于已建立的基线模型 26%，这证明了我们的方法在应对域自适应夜间分割挑战方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.