Exploring high-quality image deraining Transformer via effective large kernel attention

Haobo Dong, Tianyu Song, Xuanyu Qi, Jiyu Jin, Guiyue Jin, Lei Fan
{"title":"Exploring high-quality image deraining Transformer via effective large kernel attention","authors":"Haobo Dong, Tianyu Song, Xuanyu Qi, Jiyu Jin, Guiyue Jin, Lei Fan","doi":"10.1007/s00371-024-03551-8","DOIUrl":null,"url":null,"abstract":"<p>In recent years, Transformer has demonstrated significant performance in single image deraining tasks. However, the standard self-attention in the Transformer makes it difficult to model local features of images effectively. To alleviate the above problem, this paper proposes a high-quality deraining Transformer with <b>e</b>ffective <b>l</b>arge <b>k</b>ernel <b>a</b>ttention, named as ELKAformer. The network employs the Transformer-Style Effective Large Kernel Conv-Block (ELKB), which contains 3 key designs: Large Kernel Attention Block (LKAB), Dynamical Enhancement Feed-forward Network (DEFN), and Edge Squeeze Recovery Block (ESRB) to guide the extraction of rich features. To be specific, LKAB introduces convolutional modulation to substitute vanilla self-attention and achieve better local representations. The designed DEFN refines the most valuable attention values in LKAB, allowing the overall design to better preserve pixel-wise information. Additionally, we develop ESRB to obtain long-range dependencies of different positional information. Massive experimental results demonstrate that this method achieves favorable effects while effectively saving computational costs. Our code is available at github</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03551-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, Transformer has demonstrated significant performance in single image deraining tasks. However, the standard self-attention in the Transformer makes it difficult to model local features of images effectively. To alleviate the above problem, this paper proposes a high-quality deraining Transformer with effective large kernel attention, named as ELKAformer. The network employs the Transformer-Style Effective Large Kernel Conv-Block (ELKB), which contains 3 key designs: Large Kernel Attention Block (LKAB), Dynamical Enhancement Feed-forward Network (DEFN), and Edge Squeeze Recovery Block (ESRB) to guide the extraction of rich features. To be specific, LKAB introduces convolutional modulation to substitute vanilla self-attention and achieve better local representations. The designed DEFN refines the most valuable attention values in LKAB, allowing the overall design to better preserve pixel-wise information. Additionally, we develop ESRB to obtain long-range dependencies of different positional information. Massive experimental results demonstrate that this method achieves favorable effects while effectively saving computational costs. Our code is available at github

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过有效的大内核关注探索高质量图像派生变换器
近年来,Transformer 在单幅图像派生任务中表现出了显著的性能。然而,Transformer 中的标准自关注使得它难以对图像的局部特征进行有效建模。为了解决上述问题,本文提出了一种具有有效大内核注意力的高质量派生变换器,并将其命名为 ELKAformer。该网络采用了 Transformer-Style Effective Large Kernel Conv-Block (ELKB),其中包含 3 个关键设计:大型内核注意块(LKAB)、动态增强前馈网络(DEFN)和边缘挤压恢复块(ESRB),用于指导提取丰富的特征。具体来说,LKAB 引入了卷积调制,以替代虚无自注意,实现更好的局部表征。所设计的 DEFN 提炼出了 LKAB 中最有价值的注意力值,使整体设计能够更好地保存像素信息。此外,我们还开发了 ESRB,以获得不同位置信息的长程依赖性。大量实验结果表明,这种方法在取得良好效果的同时,还有效地节约了计算成本。我们的代码可在 github
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling Video-driven musical composition using large language model with memory-augmented state space 3D human pose estimation using spatiotemporal hypergraphs and its public benchmark on opera videos Topological structure extraction for computing surface–surface intersection curves Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1