TY - JOUR
T1 - Distortion-Sensitive Masked Autoencoder for Omnidirectional Video Quality Assessment
AU - Hu, Zongyao
AU - Liu, Lixiong
AU - Gu, Ke
AU - Li, Leida
AU - Bovik, Alan Conrad
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Omnidirectional Video Quality Assessment (OVQA) is a challenging task due to the limited availability of adequate numbers of training samples for learning representations of distortions on omnidirectional videos. The recent masked autoencoder (MAE) has shown promising performance in learning local and global representations in a self-supervised way, and can be used to attempt to mitigate the difficulty of having insufficient annotated samples to adequately train omnidirectional video quality prediction models. But the reconstruction tasks that MAE models are designed for do not pertain to predicting diverse perceptual distortions, especially those relevant to the task of OVQA. We have attempted to overcome these limitations to harness and apply the power of the MAE concept to the OVQA problem. Towards this purpose, we create a Distortion-Sensitive Masked AutoEncoder (DS-MAE) that is able to represent perceptual distortions on omnidirectional videos. DS-MAE extracts viewports from omnidirectional videos and employs a masked autoencoding module (MAM) and a knowledge replay module (KRM) to learn representations on each viewport. In the MAM, distorted patches from omnidirectional videos are masked, by replacing them with undistorted counterparts. The autoencoder is trained to reconstruct the masked distortions, imbuing them with the ability to represent diverse video degradations. The KRM extracts and stores content representations, which are then “replayed” to mitigate potential catastrophic forgetting of content during training of the DS-MAE. Finally, a simple OVQA model is constructed using the pre-trained DS-MAE across all viewports. The new model, called OmniVQA, was tested on three public OVQA datasets. The experimental results show that OmniVQA delivers competitive performance against all compared models.
AB - Omnidirectional Video Quality Assessment (OVQA) is a challenging task due to the limited availability of adequate numbers of training samples for learning representations of distortions on omnidirectional videos. The recent masked autoencoder (MAE) has shown promising performance in learning local and global representations in a self-supervised way, and can be used to attempt to mitigate the difficulty of having insufficient annotated samples to adequately train omnidirectional video quality prediction models. But the reconstruction tasks that MAE models are designed for do not pertain to predicting diverse perceptual distortions, especially those relevant to the task of OVQA. We have attempted to overcome these limitations to harness and apply the power of the MAE concept to the OVQA problem. Towards this purpose, we create a Distortion-Sensitive Masked AutoEncoder (DS-MAE) that is able to represent perceptual distortions on omnidirectional videos. DS-MAE extracts viewports from omnidirectional videos and employs a masked autoencoding module (MAM) and a knowledge replay module (KRM) to learn representations on each viewport. In the MAM, distorted patches from omnidirectional videos are masked, by replacing them with undistorted counterparts. The autoencoder is trained to reconstruct the masked distortions, imbuing them with the ability to represent diverse video degradations. The KRM extracts and stores content representations, which are then “replayed” to mitigate potential catastrophic forgetting of content during training of the DS-MAE. Finally, a simple OVQA model is constructed using the pre-trained DS-MAE across all viewports. The new model, called OmniVQA, was tested on three public OVQA datasets. The experimental results show that OmniVQA delivers competitive performance against all compared models.
KW - Masked autoencoder
KW - omnidirectional video quality assessment
KW - self-supervised representation learning
UR - https://www.scopus.com/pages/publications/105027342544
U2 - 10.1109/TMM.2026.3651128
DO - 10.1109/TMM.2026.3651128
M3 - Article
AN - SCOPUS:105027342544
SN - 1520-9210
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -