Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-07-05 DOI:10.1016/j.imavis.2024.105162
Zhicheng Ma , Zhaoxiang Liu , Kai Wang , Shiguo Lian
{"title":"Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution","authors":"Zhicheng Ma ,&nbsp;Zhaoxiang Liu ,&nbsp;Kai Wang ,&nbsp;Shiguo Lian","doi":"10.1016/j.imavis.2024.105162","DOIUrl":null,"url":null,"abstract":"<div><p>Single image super-resolution is a well-established low-level vision task that aims to reconstruct high-resolution images from low-resolution images. Methods based on Transformer have shown remarkable success and achieved outstanding performance in SISR tasks. While Transformer effectively models global information, it is less effective at capturing high frequencies such as stripes that primarily provide local information. Additionally, it has the potential to further enhance the capture of global information. To tackle this, we propose a novel Large Kernel Hybrid Attention Transformer using re-parameterization. It combines different kernel sizes and different steps re-parameterized convolution layers with Transformer to effectively capture global and local information to learn comprehensive features with low-frequency and high-frequency information. Moreover, in order to solve the problem of using batch normalization layer to introduce artifacts in SISR, we propose a new training strategy which is fusing convolution layer and batch normalization layer after certain training epochs. This strategy can enjoy the acceleration convergence effect of batch normalization layer in training and effectively eliminate the problem of artifacts in the inference stage. For re-parameterization of multiple parallel branch convolution layers, adopting this strategy can further reduce the amount of calculation of training. By coupling these core improvements, our LKHAT achieves state-of-the-art performance for single image super-resolution task.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624002671","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Single image super-resolution is a well-established low-level vision task that aims to reconstruct high-resolution images from low-resolution images. Methods based on Transformer have shown remarkable success and achieved outstanding performance in SISR tasks. While Transformer effectively models global information, it is less effective at capturing high frequencies such as stripes that primarily provide local information. Additionally, it has the potential to further enhance the capture of global information. To tackle this, we propose a novel Large Kernel Hybrid Attention Transformer using re-parameterization. It combines different kernel sizes and different steps re-parameterized convolution layers with Transformer to effectively capture global and local information to learn comprehensive features with low-frequency and high-frequency information. Moreover, in order to solve the problem of using batch normalization layer to introduce artifacts in SISR, we propose a new training strategy which is fusing convolution layer and batch normalization layer after certain training epochs. This strategy can enjoy the acceleration convergence effect of batch normalization layer in training and effectively eliminate the problem of artifacts in the inference stage. For re-parameterization of multiple parallel branch convolution layers, adopting this strategy can further reduce the amount of calculation of training. By coupling these core improvements, our LKHAT achieves state-of-the-art performance for single image super-resolution task.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
混合注意力变换器与重参数化大核卷积用于图像超分辨率
单图像超分辨率是一项成熟的低级视觉任务,旨在从低分辨率图像重建高分辨率图像。基于变换器的方法在 SISR 任务中取得了显著的成功和出色的性能。虽然 Transformer 能有效地模拟全局信息,但它在捕捉高频率(如主要提供局部信息的条纹)方面却不太有效。此外,它还有可能进一步增强对全局信息的捕捉。为了解决这个问题,我们提出了一种使用重新参数化的新型大核混合注意力变换器。它将不同核大小和不同步骤的重参数化卷积层与变换器相结合,有效捕捉全局和局部信息,从而学习到具有低频和高频信息的综合特征。此外,为了解决在 SISR 中使用批量归一化层会引入伪影的问题,我们提出了一种新的训练策略,即在一定训练历元后融合卷积层和批量归一化层。这种策略既能享受批归一化层在训练中的加速收敛效果,又能有效消除推理阶段的伪影问题。对于多并行分支卷积层的重新参数化,采用这种策略可以进一步减少训练的计算量。结合这些核心改进,我们的 LKHAT 在单图像超分辨率任务中实现了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
期刊最新文献
A dictionary learning based unsupervised neural network for single image compressed sensing Unbiased scene graph generation via head-tail cooperative network with self-supervised learning UIR-ES: An unsupervised underwater image restoration framework with equivariance and stein unbiased risk estimator A new deepfake detection model for responding to perception attacks in embodied artificial intelligence Ground4Act: Leveraging visual-language model for collaborative pushing and grasping in clutter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1