Omnidirectional images are integral to virtual reality (VR) applications. They present unique challenges for quality assessment. This is due to their high resolution and spatial complexity. Current omnidirectional image quality assessment (OIQA) techniques still struggle to extract multi-perceptual features and create interrelationships across consecutive viewports, which makes it difficult to replicate the subjective perception of the human eye. In response, this research proposes a multi-perceptual feature aggregation-based omnidirectional image quality assessment approach. The method creates a pseudo-temporal input by transforming the equirectangular projection (ERP) omnidirectional image into a series of viewports, simulating the user’s multi-viewport browsing journey. To improve frequency domain feature extraction capabilities, the backbone network combines a convolutional neural network with 2D wavelet transform convolution (WTConv). This module allows signal decomposition in the frequency domain while maintaining spatial information, which makes it easier to identify high-frequency features and structural defects in pictures. To better capture the continuous relationship between viewports, a temporal shift module (TSM) is added, which dynamically shifts the viewport features in the channel dimension, thereby improving the model’s perception of the continuity and spatial consistency of viewpoints. Additionally, the model incorporates the self-channel attention (SCA) mechanism to merge various perceptual characteristics and amplify salient feature expression to further improve the perceptual ability of important distortion regions. Experiments are conducted on the OIQA and CVIQD standard datasets, and the results show that our proposed models achieve excellent performance compared to existing full-reference and no-reference methods.
扫码关注我们
求助内容:
应助结果提醒方式:
