Self-Supervised Representation Learning for Video Quality Assessment

Shaojie Jiang, Qingbing Sang, Zongyao Hu, Lixiong Liu*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

11 引用 (Scopus)

摘要

No-reference (NR) video quality assessment (VQA) is a challenging problem due to the difficulty in model training caused by insufficient annotation samples. Previous work commonly utilizes transfer learning to directly migrate pre-trained models on the image database, which suffers from domain inadaptation. Recently, self-supervised representation learning has become a hot spot for the independence of large-scale labeled data. However, existing self-supervised representation learning method only considers the distortion types and contents of the video, there needs to investigate the intrinsic properties of videos for the VQA task. To amend this, here we propose a novel multi-task self-supervised representation learning framework to pre-train a video quality assessment model. Specifically, we consider the effects of distortion degrees, distortion types, and frame rates on the perceived quality of videos, and utilize them as guidance to generate self-supervised samples and labels. Then, we optimize the ability of the VQA model in capturing spatio-temporal differences between the original video and the distorted version using three pretext tasks. The resulting framework not only eases the requirements for the quality of the original video but also benefits from the self-supervised labels as well as the Siamese network. In addition, we propose a Transformer-based VQA model, where short-term spatio-temporal dependencies of videos are modeled by 3D-CNN and 2D-CNN, and then the long-term spatio-temporal dependencies are modeled by Transformer because of its excellent long-term modeling capability. We evaluated the proposed method on four public video quality assessment databases and found that it is competitive with all compared VQA algorithms.

源语言英语
页(从-至)118-129
页数12
期刊IEEE Transactions on Broadcasting
69
1
DOI
出版状态已出版 - 1 3月 2023

指纹

探究 'Self-Supervised Representation Learning for Video Quality Assessment' 的科研主题。它们共同构成独一无二的指纹。

引用此