Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution

Ao Li;Le Zhang;Yun Liu;Ce Zhu
{"title":"Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution","authors":"Ao Li;Le Zhang;Yun Liu;Ce Zhu","doi":"10.1109/TPAMI.2025.3529927","DOIUrl":null,"url":null,"abstract":"Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the <bold>c</b>ross-<bold>r</b>efinement <bold>a</b>daptive <bold>f</b>eature modulation <bold>t</b>ransformer (<bold>CRAFT</b>), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (<bold>HFERB</b>) for extracting high-frequency information, the shift rectangle window attention block (<bold>SRWAB</b>) for capturing global information, and the hybrid fusion block (<bold>HFB</b>) for refining the global representation. To tackle the inherent intricacies of transformer structures, we introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency. These strategies incorporate adaptive dual clipping and boundary refinement. To further amplify the versatility of our proposed approach, we extend our PTQ strategy to function as a general quantization method for transformer-based SISR techniques. Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods, both in full-precision and quantization scenarios. These results underscore the efficacy and universality of our PTQ strategy.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3141-3158"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10852524/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. To tackle the inherent intricacies of transformer structures, we introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency. These strategies incorporate adaptive dual clipping and boundary refinement. To further amplify the versatility of our proposed approach, we extend our PTQ strategy to function as a general quantization method for transformer-based SISR techniques. Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods, both in full-precision and quantization scenarios. These results underscore the efficacy and universality of our PTQ strategy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
探索变压器的频率激励优化以实现高效的单图像超分辨率
基于变压器的方法通过有效地提取远程依赖关系,在单幅图像超分辨率(SISR)方面显示出显著的潜力。然而,目前该领域的大多数研究都优先考虑了变压器块的设计,以获取全局信息,而忽略了纳入高频先验的重要性,我们认为这可能是有益的。在我们的研究中,我们进行了一系列实验,发现变压器结构更擅长捕获低频信息,但与卷积结构相比,变压器结构在构建高频表示方面的能力有限。我们提出的交叉细化自适应特征调制变压器(CRAFT)集成了卷积结构和变压器结构的优点。它包括三个关键部分:用于提取高频信息的高频增强残差块(HFERB)、用于捕获全局信息的移位矩形窗口注意块(SRWAB)和用于精炼全局表示的混合融合块(HFB)。为了解决变压器结构固有的复杂性,我们引入了一种频率引导训练后量化(PTQ)方法,旨在提高CRAFT的效率。这些策略结合了自适应双裁剪和边界细化。为了进一步扩大我们提出的方法的通用性,我们扩展了我们的PTQ策略,作为基于变压器的SISR技术的通用量化方法。我们的实验结果显示CRAFT优于当前最先进的方法,无论是在全精度和量化场景。这些结果强调了我们的PTQ策略的有效性和普遍性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation. Continuous Review and Timely Correction: Enhancing the Resistance to Noisy Labels via Self-Not-True and Class-Wise Distillation. On the Transferability and Discriminability of Representation Learning in Unsupervised Domain Adaptation. Fast Multi-view Discrete Clustering via Spectral Embedding Fusion. GrowSP++: Growing Superpoints and Primitives for Unsupervised 3D Semantic Segmentation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1