A novel single-stage network for accurate image restoration

Hu Gao, Jing Yang, Ying Zhang, Ning Wang, Jingfan Yang, Depeng Dang
{"title":"A novel single-stage network for accurate image restoration","authors":"Hu Gao, Jing Yang, Ying Zhang, Ning Wang, Jingfan Yang, Depeng Dang","doi":"10.1007/s00371-024-03599-6","DOIUrl":null,"url":null,"abstract":"<p>Image restoration is the task of aiming to obtain a high-quality image from a corrupt input image, such as deblurring and deraining. In image restoration, it is typically necessary to maintain a complex balance between spatial details and contextual information. Although a multi-stage network can optimally balance these competing goals and achieve significant performance, this also increases the system’s complexity. In this paper, we propose a mountain-shaped single-stage design, which achieves the performance of multi-stage networks through a plug-and-play feature fusion middleware. Specifically, we propose a plug-and-play feature fusion middleware mechanism as an information exchange component between the encoder-decoder architectural levels. It seamlessly integrates upper-layer information into the adjacent lower layer, sequentially down to the lowest layer. Finally, all information is fused into the original image resolution manipulation level. This preserves spatial details and integrates contextual information, ensuring high-quality image restoration. Simultaneously, we propose a multi-head attention middle block as a bridge between the encoder and decoder to capture more global information and surpass the limitations of the receptive field of CNNs. In order to achieve low system complexity, we removes or replaces unnecessary nonlinear activation functions. Extensive experiments demonstrate that our approach, named as M3SNet, outperforms previous state-of-the-art models while using less than half the computational costs, for several image restoration tasks, such as image deraining and deblurring. The code and the pre-trained models will be released at https://github.com/Tombs98/M3SNet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03599-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Image restoration is the task of aiming to obtain a high-quality image from a corrupt input image, such as deblurring and deraining. In image restoration, it is typically necessary to maintain a complex balance between spatial details and contextual information. Although a multi-stage network can optimally balance these competing goals and achieve significant performance, this also increases the system’s complexity. In this paper, we propose a mountain-shaped single-stage design, which achieves the performance of multi-stage networks through a plug-and-play feature fusion middleware. Specifically, we propose a plug-and-play feature fusion middleware mechanism as an information exchange component between the encoder-decoder architectural levels. It seamlessly integrates upper-layer information into the adjacent lower layer, sequentially down to the lowest layer. Finally, all information is fused into the original image resolution manipulation level. This preserves spatial details and integrates contextual information, ensuring high-quality image restoration. Simultaneously, we propose a multi-head attention middle block as a bridge between the encoder and decoder to capture more global information and surpass the limitations of the receptive field of CNNs. In order to achieve low system complexity, we removes or replaces unnecessary nonlinear activation functions. Extensive experiments demonstrate that our approach, named as M3SNet, outperforms previous state-of-the-art models while using less than half the computational costs, for several image restoration tasks, such as image deraining and deblurring. The code and the pre-trained models will be released at https://github.com/Tombs98/M3SNet.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于精确图像复原的新型单级网络
图像复原是一项旨在从损坏的输入图像中获取高质量图像的任务,例如去模糊和去毛刺。在图像复原中,通常需要在空间细节和上下文信息之间保持复杂的平衡。虽然多级网络可以优化平衡这些相互竞争的目标,并取得显著的性能,但这也增加了系统的复杂性。在本文中,我们提出了一种山形单级设计,通过即插即用的特征融合中间件实现多级网络的性能。具体来说,我们提出了一种即插即用的特征融合中间件机制,作为编码器-解码器架构层之间的信息交换组件。它能将上层信息无缝整合到相邻的下层,并依次向下整合到最底层。最后,所有信息都被融合到原始图像分辨率处理层。这样既保留了空间细节,又整合了上下文信息,确保了高质量的图像复原。同时,我们还提出了多头注意力中间块,作为编码器和解码器之间的桥梁,以捕捉更多的全局信息,超越 CNN 感受场的限制。为了降低系统复杂性,我们删除或替换了不必要的非线性激活函数。广泛的实验证明,我们的方法(命名为 M3SNet)在图像去毛刺和去模糊等多项图像修复任务中的表现优于之前的先进模型,而计算成本却不到其一半。代码和预训练模型将在 https://github.com/Tombs98/M3SNet 上发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling Video-driven musical composition using large language model with memory-augmented state space 3D human pose estimation using spatiotemporal hypergraphs and its public benchmark on opera videos Topological structure extraction for computing surface–surface intersection curves Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1