摘要
Spherical signals of omnidirectional videos need to be projected to a 2D plane for transmission or storage. The projection will produce geometrical deformation that affects the feature representation of Convolutional Neural Networks (CNN) on the perception of omnidirectional videos. Currently developed omnidirectional video quality assessment (OVQA) methods leverage viewport images or spherical CNN to circumvent the geometrical deformation. However, the viewport-based methods neglect the interaction between viewport images while there lacks sufficient pre-training samples for taking spherical CNN as an efficient backbone in OVQA model. In this paper, we alleviate the influence of geometrical deformation from a causal perspective. A structural causal model is adopted to analyze the implicit reason for the disturbance of geometrical deformation on quality representation and we find the latitude factor confounds the feature representation and distorted contents. Based on this evidence, we propose a Causal Intervention-based Quality prediction Network (CIQNet) to alleviate the causal effect of the confounder. The resulting framework first segments the video content into sub-areas and trains feature encoders to obtain latitude-invariant representation for removing the relationship between the latitude and feature representation. Then the features of each sub-area are aggregated by estimated weights in a backdoor adjustment module to remove the relationship between the latitude and video contents. Finally, the temporal dependencies of aggregated features are modeled to implement the quality prediction. We evaluate the performance of CIQNet on three publicly available OVQA databases. The experimental results show CIQNet achieves competitive performance against state-of-art methods. The source code of CIQNet is available at: https://github.com/Aca4peop/CIQNet.
源语言 | 英语 |
---|---|
页(从-至) | 238-250 |
页数 | 13 |
期刊 | IEEE Transactions on Broadcasting |
卷 | 70 |
期 | 1 |
DOI | |
出版状态 | 已出版 - 1 3月 2024 |