Multi-scale large kernel convolution and hybrid attention network for remote sensing image dehazing

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-08-14 DOI:10.1016/j.imavis.2024.105212

Hang Su, Lina Liu, Zenghui Wang, Mingliang Gao

{"title":"Multi-scale large kernel convolution and hybrid attention network for remote sensing image dehazing","authors":"Hang Su, Lina Liu, Zenghui Wang, Mingliang Gao","doi":"10.1016/j.imavis.2024.105212","DOIUrl":null,"url":null,"abstract":"<div><p>Remote sensing (RS) image dehazing holds significant importance in enhancing the quality and information extraction capability of RS imagery. The enhancement in image dehazing quality has progressively advanced alongside the evolution of convolutional neural network (CNN). Due to the fixed receptive field of CNN, there is insufficient utilization of contextual information on haze features in multi-scale RS images. Additionally, the network fails to adequately extract both local and global information of haze features. In addressing the above problems, in this paper, we propose an RS image dehazing network based on multi-scale large kernel convolution and hybrid attention (MKHANet). The network is mainly composed of multi-scale large kernel convolution (MSLKC) module, hybrid attention (HA) module and feature fusion attention (FFA) module. The MSLKC module fully fuses the multi-scale information of features while enhancing the effective receptive field of the network by parallel multiple large kernel convolutions. To alleviate the problem of uneven distribution of haze and effectively extract the global and local information of haze features, the HA module is introduced by focusing on the importance of haze pixels at the channel level. The FFA module aims to boost the interaction of feature information between the network's deep and shallow layers. The subjective and objective experimental results on on multiple RS hazy image datasets illustrates that MKHANet surpasses existing state-of-the-art (SOTA) approaches. The source code is available at <span><span>https://github.com/tohang98/MKHA_Net</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105212"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003172","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Remote sensing (RS) image dehazing holds significant importance in enhancing the quality and information extraction capability of RS imagery. The enhancement in image dehazing quality has progressively advanced alongside the evolution of convolutional neural network (CNN). Due to the fixed receptive field of CNN, there is insufficient utilization of contextual information on haze features in multi-scale RS images. Additionally, the network fails to adequately extract both local and global information of haze features. In addressing the above problems, in this paper, we propose an RS image dehazing network based on multi-scale large kernel convolution and hybrid attention (MKHANet). The network is mainly composed of multi-scale large kernel convolution (MSLKC) module, hybrid attention (HA) module and feature fusion attention (FFA) module. The MSLKC module fully fuses the multi-scale information of features while enhancing the effective receptive field of the network by parallel multiple large kernel convolutions. To alleviate the problem of uneven distribution of haze and effectively extract the global and local information of haze features, the HA module is introduced by focusing on the importance of haze pixels at the channel level. The FFA module aims to boost the interaction of feature information between the network's deep and shallow layers. The subjective and objective experimental results on on multiple RS hazy image datasets illustrates that MKHANet surpasses existing state-of-the-art (SOTA) approaches. The source code is available at https://github.com/tohang98/MKHA_Net.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于遥感图像去毛刺的多尺度大核卷积和混合注意力网络

遥感（RS）图像脱灰对于提高遥感图像的质量和信息提取能力具有重要意义。随着卷积神经网络（CNN）的发展，图像去毛刺的质量也在逐步提高。由于卷积神经网络的感受野是固定的，因此无法充分利用多尺度 RS 图像中雾霾特征的上下文信息。此外，该网络无法充分提取灰霾特征的局部和全局信息。针对上述问题，本文提出了一种基于多尺度大核卷积和混合注意力的 RS 图像去噪网络（MKHANet）。该网络主要由多尺度大核卷积（MSLKC）模块、混合注意力（HA）模块和特征融合注意力（FFA）模块组成。MSLKC 模块充分融合了特征的多尺度信息，同时通过并行多重大核卷积增强了网络的有效感受野。为了缓解雾霾分布不均的问题，并有效提取雾霾特征的全局和局部信息，引入了 HA 模块，关注雾霾像素在信道级的重要性。FFA 模块旨在增强网络深层和浅层之间的特征信息交互。在多个 RS 灰霾图像数据集上的主观和客观实验结果表明，MKHANet 超越了现有的最先进（SOTA）方法。源代码见 https://github.com/tohang98/MKHA_Net。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.