Residual SwinV2 transformer coordinate attention network for image super resolution

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE AI Communications Pub Date : 2024-04-09 DOI:10.3233/aic-230340

Yushi Lei, Zhengwei Zhu, Yilin Qin, Chenyang Zhu, Yanping Zhu

{"title":"Residual SwinV2 transformer coordinate attention network for image super resolution","authors":"Yushi Lei, Zhengwei Zhu, Yilin Qin, Chenyang Zhu, Yanping Zhu","doi":"10.3233/aic-230340","DOIUrl":null,"url":null,"abstract":"Swin Transformers have been designed and used in various image super-resolution (SR) applications. One of the recent image restoration methods is RSTCANet, which combines Swin Transformer with Channel Attention. However, for some channels of images that may carry less useful information or noise, Channel Attention cannot automatically learn the insignificance of these channels. Instead, it tries to enhance their expression capability by adjusting the weights. It may lead to excessive focus on noise information while neglecting more essential features. In this paper, we propose a new image SR method, RSVTCANet, based on an extension of Swin2SR. Specifically, to effectively gather global information for the channel of images, we modify the Residual SwinV2 Transformer blocks in Swin2SR by introducing the coordinate attention for each two successive SwinV2 Transformer Layers (S2TL) and replacing Multi-head Self-Attention (MSA) with Efficient Multi-head Self-Attention version 2 (EMSAv2) to employ the resulting residual SwinV2 Transformer coordinate attention blocks (RSVTCABs) for feature extraction. Additionally, to improve the generalization of RSVTCANet during training, we apply an optimized RandAugment for data augmentation on the training dataset. Extensive experimental results show that RSVTCANet outperforms the recent image SR method regarding visual quality and measures such as PSNR and SSIM.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/aic-230340","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Swin Transformers have been designed and used in various image super-resolution (SR) applications. One of the recent image restoration methods is RSTCANet, which combines Swin Transformer with Channel Attention. However, for some channels of images that may carry less useful information or noise, Channel Attention cannot automatically learn the insignificance of these channels. Instead, it tries to enhance their expression capability by adjusting the weights. It may lead to excessive focus on noise information while neglecting more essential features. In this paper, we propose a new image SR method, RSVTCANet, based on an extension of Swin2SR. Specifically, to effectively gather global information for the channel of images, we modify the Residual SwinV2 Transformer blocks in Swin2SR by introducing the coordinate attention for each two successive SwinV2 Transformer Layers (S2TL) and replacing Multi-head Self-Attention (MSA) with Efficient Multi-head Self-Attention version 2 (EMSAv2) to employ the resulting residual SwinV2 Transformer coordinate attention blocks (RSVTCABs) for feature extraction. Additionally, to improve the generalization of RSVTCANet during training, we apply an optimized RandAugment for data augmentation on the training dataset. Extensive experimental results show that RSVTCANet outperforms the recent image SR method regarding visual quality and measures such as PSNR and SSIM.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于图像超分辨率的残差 SwinV2 变换器坐标注意网络

斯温变换器已被设计并用于各种图像超分辨率（SR）应用中。最近的一种图像复原方法是 RSTCANet，它将斯温变换器与通道注意相结合。然而，对于图像中可能携带较少有用信息或噪声的某些通道，通道注意无法自动了解这些通道的重要性。相反，它试图通过调整权重来增强这些通道的表达能力。这可能会导致过度关注噪声信息，而忽略更重要的特征。本文基于 Swin2SR 的扩展，提出了一种新的图像 SR 方法 RSVTCANet。具体来说，为了有效收集图像通道的全局信息，我们修改了 Swin2SR 中的残余 SwinV2 变换器块，为每两个连续的 SwinV2 变换器层（S2TL）引入了坐标注意，并用 Efficient Multi-head Self-Attention version 2（EMSAv2）取代了多头自注意（MSA），利用生成的残余 SwinV2 变换器坐标注意块（RSVTCAB）进行特征提取。此外，为了提高 RSVTCANet 在训练过程中的泛化能力，我们在训练数据集上使用了优化的 RandAugment 进行数据增强。广泛的实验结果表明，RSVTCANet 在视觉质量以及 PSNR 和 SSIM 等指标方面优于最新的图像 SR 方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

AI Communications 工程技术-计算机：人工智能

CiteScore

2.30

自引率

12.50%

发文量

审稿时长

4.5 months

期刊介绍： AI Communications is a journal on artificial intelligence (AI) which has a close relationship to EurAI (European Association for Artificial Intelligence, formerly ECCAI). It covers the whole AI community: Scientific institutions as well as commercial and industrial companies. AI Communications aims to enhance contacts and information exchange between AI researchers and developers, and to provide supranational information to those concerned with AI and advanced information processing. AI Communications publishes refereed articles concerning scientific and technical AI procedures, provided they are of sufficient interest to a large readership of both scientific and practical background. In addition it contains high-level background material, both at the technical level as well as the level of opinions, policies and news.