Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-24 DOI:10.1109/TPAMI.2025.3529927

Ao Li;Le Zhang;Yun Liu;Ce Zhu

{"title":"Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution","authors":"Ao Li;Le Zhang;Yun Liu;Ce Zhu","doi":"10.1109/TPAMI.2025.3529927","DOIUrl":null,"url":null,"abstract":"Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the <bold>cross-<bold>refinement <bold>adaptive <bold>feature modulation <bold>transformer (<bold>CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (<bold>HFERB) for extracting high-frequency information, the shift rectangle window attention block (<bold>SRWAB) for capturing global information, and the hybrid fusion block (<bold>HFB) for refining the global representation. To tackle the inherent intricacies of transformer structures, we introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency. These strategies incorporate adaptive dual clipping and boundary refinement. To further amplify the versatility of our proposed approach, we extend our PTQ strategy to function as a general quantization method for transformer-based SISR techniques. Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods, both in full-precision and quantization scenarios. These results underscore the efficacy and universality of our PTQ strategy.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3141-3158"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10852524/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. To tackle the inherent intricacies of transformer structures, we introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency. These strategies incorporate adaptive dual clipping and boundary refinement. To further amplify the versatility of our proposed approach, we extend our PTQ strategy to function as a general quantization method for transformer-based SISR techniques. Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods, both in full-precision and quantization scenarios. These results underscore the efficacy and universality of our PTQ strategy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

探索变压器的频率激励优化以实现高效的单图像超分辨率

基于变压器的方法通过有效地提取远程依赖关系，在单幅图像超分辨率（SISR）方面显示出显著的潜力。然而，目前该领域的大多数研究都优先考虑了变压器块的设计，以获取全局信息，而忽略了纳入高频先验的重要性，我们认为这可能是有益的。在我们的研究中，我们进行了一系列实验，发现变压器结构更擅长捕获低频信息，但与卷积结构相比，变压器结构在构建高频表示方面的能力有限。我们提出的交叉细化自适应特征调制变压器（CRAFT）集成了卷积结构和变压器结构的优点。它包括三个关键部分：用于提取高频信息的高频增强残差块（HFERB）、用于捕获全局信息的移位矩形窗口注意块（SRWAB）和用于精炼全局表示的混合融合块（HFB）。为了解决变压器结构固有的复杂性，我们引入了一种频率引导训练后量化（PTQ）方法，旨在提高CRAFT的效率。这些策略结合了自适应双裁剪和边界细化。为了进一步扩大我们提出的方法的通用性，我们扩展了我们的PTQ策略，作为基于变压器的SISR技术的通用量化方法。我们的实验结果显示CRAFT优于当前最先进的方法，无论是在全精度和量化场景。这些结果强调了我们的PTQ策略的有效性和普遍性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量