{"title":"Omnidirectional Video Quality Assessment With Causal Intervention","authors":"Zongyao Hu;Lixiong Liu;Qingbing Sang","doi":"10.1109/TBC.2023.3342707","DOIUrl":null,"url":null,"abstract":"Spherical signals of omnidirectional videos need to be projected to a 2D plane for transmission or storage. The projection will produce geometrical deformation that affects the feature representation of Convolutional Neural Networks (CNN) on the perception of omnidirectional videos. Currently developed omnidirectional video quality assessment (OVQA) methods leverage viewport images or spherical CNN to circumvent the geometrical deformation. However, the viewport-based methods neglect the interaction between viewport images while there lacks sufficient pre-training samples for taking spherical CNN as an efficient backbone in OVQA model. In this paper, we alleviate the influence of geometrical deformation from a causal perspective. A structural causal model is adopted to analyze the implicit reason for the disturbance of geometrical deformation on quality representation and we find the latitude factor confounds the feature representation and distorted contents. Based on this evidence, we propose a Causal Intervention-based Quality prediction Network (CIQNet) to alleviate the causal effect of the confounder. The resulting framework first segments the video content into sub-areas and trains feature encoders to obtain latitude-invariant representation for removing the relationship between the latitude and feature representation. Then the features of each sub-area are aggregated by estimated weights in a backdoor adjustment module to remove the relationship between the latitude and video contents. Finally, the temporal dependencies of aggregated features are modeled to implement the quality prediction. We evaluate the performance of CIQNet on three publicly available OVQA databases. The experimental results show CIQNet achieves competitive performance against state-of-art methods. The source code of CIQNet is available at: \n<uri>https://github.com/Aca4peop/CIQNet</uri>\n.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 1","pages":"238-250"},"PeriodicalIF":3.2000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10380455/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Spherical signals of omnidirectional videos need to be projected to a 2D plane for transmission or storage. The projection will produce geometrical deformation that affects the feature representation of Convolutional Neural Networks (CNN) on the perception of omnidirectional videos. Currently developed omnidirectional video quality assessment (OVQA) methods leverage viewport images or spherical CNN to circumvent the geometrical deformation. However, the viewport-based methods neglect the interaction between viewport images while there lacks sufficient pre-training samples for taking spherical CNN as an efficient backbone in OVQA model. In this paper, we alleviate the influence of geometrical deformation from a causal perspective. A structural causal model is adopted to analyze the implicit reason for the disturbance of geometrical deformation on quality representation and we find the latitude factor confounds the feature representation and distorted contents. Based on this evidence, we propose a Causal Intervention-based Quality prediction Network (CIQNet) to alleviate the causal effect of the confounder. The resulting framework first segments the video content into sub-areas and trains feature encoders to obtain latitude-invariant representation for removing the relationship between the latitude and feature representation. Then the features of each sub-area are aggregated by estimated weights in a backdoor adjustment module to remove the relationship between the latitude and video contents. Finally, the temporal dependencies of aggregated features are modeled to implement the quality prediction. We evaluate the performance of CIQNet on three publicly available OVQA databases. The experimental results show CIQNet achieves competitive performance against state-of-art methods. The source code of CIQNet is available at:
https://github.com/Aca4peop/CIQNet
.
期刊介绍:
The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”