Omnidirectional Video Quality Assessment With Causal Intervention

Zongyao Hu; Lixiong Liu; Qingbing Sang

doi:10.1109/TBC.2023.3342707

Omnidirectional Video Quality Assessment With Causal Intervention

Zongyao Hu, Lixiong Liu^*, Qingbing Sang

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

Spherical signals of omnidirectional videos need to be projected to a 2D plane for transmission or storage. The projection will produce geometrical deformation that affects the feature representation of Convolutional Neural Networks (CNN) on the perception of omnidirectional videos. Currently developed omnidirectional video quality assessment (OVQA) methods leverage viewport images or spherical CNN to circumvent the geometrical deformation. However, the viewport-based methods neglect the interaction between viewport images while there lacks sufficient pre-training samples for taking spherical CNN as an efficient backbone in OVQA model. In this paper, we alleviate the influence of geometrical deformation from a causal perspective. A structural causal model is adopted to analyze the implicit reason for the disturbance of geometrical deformation on quality representation and we find the latitude factor confounds the feature representation and distorted contents. Based on this evidence, we propose a Causal Intervention-based Quality prediction Network (CIQNet) to alleviate the causal effect of the confounder. The resulting framework first segments the video content into sub-areas and trains feature encoders to obtain latitude-invariant representation for removing the relationship between the latitude and feature representation. Then the features of each sub-area are aggregated by estimated weights in a backdoor adjustment module to remove the relationship between the latitude and video contents. Finally, the temporal dependencies of aggregated features are modeled to implement the quality prediction. We evaluate the performance of CIQNet on three publicly available OVQA databases. The experimental results show CIQNet achieves competitive performance against state-of-art methods. The source code of CIQNet is available at: https://github.com/Aca4peop/CIQNet.

源语言	英语
页（从-至）	238-250
页数	13
期刊	IEEE Transactions on Broadcasting
卷	70
期	1
DOI	https://doi.org/10.1109/TBC.2023.3342707
出版状态	已出版 - 1 3月 2024

访问文件

10.1109/TBC.2023.3342707

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{af6ed8a472164f81b52def24b6578fe9,

title = "Omnidirectional Video Quality Assessment With Causal Intervention",

abstract = "Spherical signals of omnidirectional videos need to be projected to a 2D plane for transmission or storage. The projection will produce geometrical deformation that affects the feature representation of Convolutional Neural Networks (CNN) on the perception of omnidirectional videos. Currently developed omnidirectional video quality assessment (OVQA) methods leverage viewport images or spherical CNN to circumvent the geometrical deformation. However, the viewport-based methods neglect the interaction between viewport images while there lacks sufficient pre-training samples for taking spherical CNN as an efficient backbone in OVQA model. In this paper, we alleviate the influence of geometrical deformation from a causal perspective. A structural causal model is adopted to analyze the implicit reason for the disturbance of geometrical deformation on quality representation and we find the latitude factor confounds the feature representation and distorted contents. Based on this evidence, we propose a Causal Intervention-based Quality prediction Network (CIQNet) to alleviate the causal effect of the confounder. The resulting framework first segments the video content into sub-areas and trains feature encoders to obtain latitude-invariant representation for removing the relationship between the latitude and feature representation. Then the features of each sub-area are aggregated by estimated weights in a backdoor adjustment module to remove the relationship between the latitude and video contents. Finally, the temporal dependencies of aggregated features are modeled to implement the quality prediction. We evaluate the performance of CIQNet on three publicly available OVQA databases. The experimental results show CIQNet achieves competitive performance against state-of-art methods. The source code of CIQNet is available at: https://github.com/Aca4peop/CIQNet.",

keywords = "Omnidirectional video quality assessment, backdoor adjustment, causal intervention, latitude-invariant representation",

author = "Zongyao Hu and Lixiong Liu and Qingbing Sang",

note = "Publisher Copyright: {\textcopyright} 1963-12012 IEEE.",

year = "2024",

month = mar,

day = "1",

doi = "10.1109/TBC.2023.3342707",

language = "English",

volume = "70",

pages = "238--250",

journal = "IEEE Transactions on Broadcasting",

issn = "0018-9316",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "1",

}

TY - JOUR

T1 - Omnidirectional Video Quality Assessment With Causal Intervention

AU - Hu, Zongyao

AU - Liu, Lixiong

AU - Sang, Qingbing

PY - 2024/3/1

Y1 - 2024/3/1

N2 - Spherical signals of omnidirectional videos need to be projected to a 2D plane for transmission or storage. The projection will produce geometrical deformation that affects the feature representation of Convolutional Neural Networks (CNN) on the perception of omnidirectional videos. Currently developed omnidirectional video quality assessment (OVQA) methods leverage viewport images or spherical CNN to circumvent the geometrical deformation. However, the viewport-based methods neglect the interaction between viewport images while there lacks sufficient pre-training samples for taking spherical CNN as an efficient backbone in OVQA model. In this paper, we alleviate the influence of geometrical deformation from a causal perspective. A structural causal model is adopted to analyze the implicit reason for the disturbance of geometrical deformation on quality representation and we find the latitude factor confounds the feature representation and distorted contents. Based on this evidence, we propose a Causal Intervention-based Quality prediction Network (CIQNet) to alleviate the causal effect of the confounder. The resulting framework first segments the video content into sub-areas and trains feature encoders to obtain latitude-invariant representation for removing the relationship between the latitude and feature representation. Then the features of each sub-area are aggregated by estimated weights in a backdoor adjustment module to remove the relationship between the latitude and video contents. Finally, the temporal dependencies of aggregated features are modeled to implement the quality prediction. We evaluate the performance of CIQNet on three publicly available OVQA databases. The experimental results show CIQNet achieves competitive performance against state-of-art methods. The source code of CIQNet is available at: https://github.com/Aca4peop/CIQNet.

AB - Spherical signals of omnidirectional videos need to be projected to a 2D plane for transmission or storage. The projection will produce geometrical deformation that affects the feature representation of Convolutional Neural Networks (CNN) on the perception of omnidirectional videos. Currently developed omnidirectional video quality assessment (OVQA) methods leverage viewport images or spherical CNN to circumvent the geometrical deformation. However, the viewport-based methods neglect the interaction between viewport images while there lacks sufficient pre-training samples for taking spherical CNN as an efficient backbone in OVQA model. In this paper, we alleviate the influence of geometrical deformation from a causal perspective. A structural causal model is adopted to analyze the implicit reason for the disturbance of geometrical deformation on quality representation and we find the latitude factor confounds the feature representation and distorted contents. Based on this evidence, we propose a Causal Intervention-based Quality prediction Network (CIQNet) to alleviate the causal effect of the confounder. The resulting framework first segments the video content into sub-areas and trains feature encoders to obtain latitude-invariant representation for removing the relationship between the latitude and feature representation. Then the features of each sub-area are aggregated by estimated weights in a backdoor adjustment module to remove the relationship between the latitude and video contents. Finally, the temporal dependencies of aggregated features are modeled to implement the quality prediction. We evaluate the performance of CIQNet on three publicly available OVQA databases. The experimental results show CIQNet achieves competitive performance against state-of-art methods. The source code of CIQNet is available at: https://github.com/Aca4peop/CIQNet.

KW - Omnidirectional video quality assessment

KW - backdoor adjustment

KW - causal intervention

KW - latitude-invariant representation

UR - http://www.scopus.com/inward/record.url?scp=85181572388&partnerID=8YFLogxK

U2 - 10.1109/TBC.2023.3342707

DO - 10.1109/TBC.2023.3342707

M3 - Article

AN - SCOPUS:85181572388

SN - 0018-9316

VL - 70

SP - 238

EP - 250

JO - IEEE Transactions on Broadcasting

JF - IEEE Transactions on Broadcasting

IS - 1

ER -

Omnidirectional Video Quality Assessment With Causal Intervention

摘要

访问文件

其它文件与链接

指纹

引用此