{"title":"IDP: Image Denoising Using PoolFormer","authors":"Shou-Kai Yin, Jenhui Chen","doi":"10.1109/IS3C57901.2023.00019","DOIUrl":null,"url":null,"abstract":"Recently, transformer-based models have achieved significant success in various computer vision tasks, with the attention-based token mixer module commonly believed to be the key factor. However, further research has shown that the attention-based token mixer module in transformers can be replaced by other methods, such as spatial multilayer perceptrons (MLPs) or Fourier transforms, to mix information between different tokens without sacrificing performance. Therefore, some have raised whether the success of transformers and its variants is not solely due to the attention-based token mixer module but rather to other factors. In a recent paper titled “PoolFormer” the authors demonstrated that using a simple spatial pooling operation instead of the attention module in transformers can achieve competitive performance in object detection vision tasks. Based on this finding, we propose a low-computation model for image denoising based on the PoolFormer and an MLP + CNN Transformer decoder for image restoration. By reducing the computational complexity brought by the token mixer, the model still achieves a good peak signal-to-noise ratio (PSNR) in grayscale as well as in color image denoising. This suggests that, in low-level vision tasks such as denoising, simple attention modules can also achieve good results, particularly in grayscale image denoising.","PeriodicalId":142483,"journal":{"name":"2023 Sixth International Symposium on Computer, Consumer and Control (IS3C)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Sixth International Symposium on Computer, Consumer and Control (IS3C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS3C57901.2023.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, transformer-based models have achieved significant success in various computer vision tasks, with the attention-based token mixer module commonly believed to be the key factor. However, further research has shown that the attention-based token mixer module in transformers can be replaced by other methods, such as spatial multilayer perceptrons (MLPs) or Fourier transforms, to mix information between different tokens without sacrificing performance. Therefore, some have raised whether the success of transformers and its variants is not solely due to the attention-based token mixer module but rather to other factors. In a recent paper titled “PoolFormer” the authors demonstrated that using a simple spatial pooling operation instead of the attention module in transformers can achieve competitive performance in object detection vision tasks. Based on this finding, we propose a low-computation model for image denoising based on the PoolFormer and an MLP + CNN Transformer decoder for image restoration. By reducing the computational complexity brought by the token mixer, the model still achieves a good peak signal-to-noise ratio (PSNR) in grayscale as well as in color image denoising. This suggests that, in low-level vision tasks such as denoising, simple attention modules can also achieve good results, particularly in grayscale image denoising.