{"title":"Lightweight image super-resolution with sliding Proxy Attention Network","authors":"Zhenyu Hu, Wanjie Sun, Zhenzhong Chen","doi":"10.1016/j.sigpro.2024.109704","DOIUrl":null,"url":null,"abstract":"<div><p>Recently, image super-resolution (SR) models using window-based Transformers have demonstrated superior performance compared to SR models based on convolutional neural networks. Nevertheless, Transformer-based SR models often entail high computational demands. This is due to the adoption of shifted window self-attention following the window self-attention layer to model long-range relationships, resulting in additional computational overhead. Moreover, extracting local image features only with the self-attention mechanism is insufficient to reconstruct rich high-frequency image content. To overcome these challenges, we propose the Sliding Proxy Attention Network (SPAN), capable of recovering high-quality High-Resolution (HR) images from Low-Resolution (LR) inputs with substantially fewer model parameters and computational operations. The primary innovation of SPAN lies in the Sliding Proxy Transformer Block (SPTB), integrating the local detail sensitivity of convolution with the long-range dependency modeling of self-attention mechanism. Key components within SPTB include the Enhanced Local Feature Extraction Block (ELFEB) and the Sliding Proxy Attention Block (SPAB). ELFEB is designed to enhance the local receptive field with lightweight parameters for high-frequency details compensation. SPAB optimizes computational efficiency by implementing intra-window and cross-window attention in a single operation through leveraging window overlap. Experimental results demonstrate that SPAN can produce high-quality SR images while effectively managing computational complexity. The code is publicly available at: <span><span>https://github.com/zononhzy/SPAN</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49523,"journal":{"name":"Signal Processing","volume":"227 ","pages":"Article 109704"},"PeriodicalIF":3.4000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165168424003244","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, image super-resolution (SR) models using window-based Transformers have demonstrated superior performance compared to SR models based on convolutional neural networks. Nevertheless, Transformer-based SR models often entail high computational demands. This is due to the adoption of shifted window self-attention following the window self-attention layer to model long-range relationships, resulting in additional computational overhead. Moreover, extracting local image features only with the self-attention mechanism is insufficient to reconstruct rich high-frequency image content. To overcome these challenges, we propose the Sliding Proxy Attention Network (SPAN), capable of recovering high-quality High-Resolution (HR) images from Low-Resolution (LR) inputs with substantially fewer model parameters and computational operations. The primary innovation of SPAN lies in the Sliding Proxy Transformer Block (SPTB), integrating the local detail sensitivity of convolution with the long-range dependency modeling of self-attention mechanism. Key components within SPTB include the Enhanced Local Feature Extraction Block (ELFEB) and the Sliding Proxy Attention Block (SPAB). ELFEB is designed to enhance the local receptive field with lightweight parameters for high-frequency details compensation. SPAB optimizes computational efficiency by implementing intra-window and cross-window attention in a single operation through leveraging window overlap. Experimental results demonstrate that SPAN can produce high-quality SR images while effectively managing computational complexity. The code is publicly available at: https://github.com/zononhzy/SPAN.
期刊介绍:
Signal Processing incorporates all aspects of the theory and practice of signal processing. It features original research work, tutorial and review articles, and accounts of practical developments. It is intended for a rapid dissemination of knowledge and experience to engineers and scientists working in the research, development or practical application of signal processing.
Subject areas covered by the journal include: Signal Theory; Stochastic Processes; Detection and Estimation; Spectral Analysis; Filtering; Signal Processing Systems; Software Developments; Image Processing; Pattern Recognition; Optical Signal Processing; Digital Signal Processing; Multi-dimensional Signal Processing; Communication Signal Processing; Biomedical Signal Processing; Geophysical and Astrophysical Signal Processing; Earth Resources Signal Processing; Acoustic and Vibration Signal Processing; Data Processing; Remote Sensing; Signal Processing Technology; Radar Signal Processing; Sonar Signal Processing; Industrial Applications; New Applications.