Distortion-Sensitive Masked Autoencoder for Omnidirectional Video Quality Assessment

  • Zongyao Hu
  • , Lixiong Liu*
  • , Ke Gu
  • , Leida Li
  • , Alan Conrad Bovik
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Omnidirectional Video Quality Assessment (OVQA) is a challenging task due to the limited availability of adequate numbers of training samples for learning representations of distortions on omnidirectional videos. The recent masked autoencoder (MAE) has shown promising performance in learning local and global representations in a self-supervised way, and can be used to attempt to mitigate the difficulty of having insufficient annotated samples to adequately train omnidirectional video quality prediction models. But the reconstruction tasks that MAE models are designed for do not pertain to predicting diverse perceptual distortions, especially those relevant to the task of OVQA. We have attempted to overcome these limitations to harness and apply the power of the MAE concept to the OVQA problem. Towards this purpose, we create a Distortion-Sensitive Masked AutoEncoder (DS-MAE) that is able to represent perceptual distortions on omnidirectional videos. DS-MAE extracts viewports from omnidirectional videos and employs a masked autoencoding module (MAM) and a knowledge replay module (KRM) to learn representations on each viewport. In the MAM, distorted patches from omnidirectional videos are masked, by replacing them with undistorted counterparts. The autoencoder is trained to reconstruct the masked distortions, imbuing them with the ability to represent diverse video degradations. The KRM extracts and stores content representations, which are then “replayed” to mitigate potential catastrophic forgetting of content during training of the DS-MAE. Finally, a simple OVQA model is constructed using the pre-trained DS-MAE across all viewports. The new model, called OmniVQA, was tested on three public OVQA datasets. The experimental results show that OmniVQA delivers competitive performance against all compared models.

Original languageEnglish
JournalIEEE Transactions on Multimedia
DOIs
Publication statusAccepted/In press - 2026
Externally publishedYes

Keywords

  • Masked autoencoder
  • omnidirectional video quality assessment
  • self-supervised representation learning

Fingerprint

Dive into the research topics of 'Distortion-Sensitive Masked Autoencoder for Omnidirectional Video Quality Assessment'. Together they form a unique fingerprint.

Cite this