The pursuit of high-fidelity stereo image super-resolution (SR) is paramount for 3D vision applications. However, existing Transformer-based methods often suffer from high computational complexity and limited effectiveness in capturing long-range cross-view dependencies. To address these issues, we propose a combined permuted self-attention and fast Fourier transform-based network for stereo image SR (CPFSSR), a novel network that combines a permuted Swin Fourier Transformer block (PSFTB) with a deep cross-attention module (DCAM) to tackle these dual challenges. The PSFTB employs a permuted self-attention mechanism and fast Fourier convolution to achieve global receptive fields with linear computational complexity, and captures intra-view contextual details. For better fusion, a DCAM enables adaptive hierarchical interaction between views. In addition, we propose a spatial frequency reinforcement block (SFRB) to enhance the extraction of complex frequency information using fast Fourier convolution. Rigorous evaluation of benchmarks shows that CPFSSR sets a new state-of-the-art, outperforming existing methods by an average on the Flickr1024, Middlebury, KITTI2012, and KITTI2015 datasets. Visual assessments also confirm its superiority in reconstructing fine natural textures with minimal artifacts. The proposed method achieves a trade-off between parametric and stereo image SR task performance and is suitable for accurate high-resolution image reconstruction. The source code is available at https://github.com/Flt-Flag/CPFSSR.
扫码关注我们
求助内容:
应助结果提醒方式:
