TY - JOUR
T1 - 基于注意力机制的多用户全景视频视口预测
AU - Zhang, Hanqi
AU - Huang, Congyu
AU - Wang, Jing
AU - Li, Zhiyu
AU - Yang, Lidong
N1 - Publisher Copyright:
© 2025 Editorial Board of Journal of Signal Processing. All rights reserved.
PY - 2025/2
Y1 - 2025/2
N2 - Recently, with the development of immersive technologies such as virtual reality, the application prospects of panoramic video technology have gradually expanded. While offering realistic experiences, panoramic videos strain network bandwidth. Therefore, reducing the transmission bandwidth has become a research focus, with viewport prediction emerging as a popular topic in the field. Currently, mainstream solutions for viewport prediction often utilize viewpoint trajectories and scene content, combined with neural network outputs for evaluation. Most of the existing methods cannot achieve good performance in long-term prediction and do not fully utilize information in multi-user scenarios. This paper proposes a viewport prediction method inspired by Transformer networks. Because of the similarity in viewpoint trajectories of different users watching the same video, this paper first proposes a scheme to compare multi-user viewport trajectory similarity, which uses the target user’s and historical user’s viewport trajectory data to predict the target user’s future viewport trajectory data. Owing to the discontinuity of the panoramic video viewport trajectory, this paper maps the discontinuous trajectory to solve the problem of discontinuous single prediction trajectory data. In an experiment, this method was used to process a dataset, and promising results were achieved. Finally, experimental comparisons with similar algorithms from recent years show a reduction in error across metrics such as the mean absolute error, Manhattan distance, and angle distance error proposed in this paper, with some metrics reduced by more than 10%. This indicates that the proposed solution can achieve higher accuracy in long-term viewport prediction, and the introduction of attention mechanism and multi-user similarity comparison can aid in improving model performance.
AB - Recently, with the development of immersive technologies such as virtual reality, the application prospects of panoramic video technology have gradually expanded. While offering realistic experiences, panoramic videos strain network bandwidth. Therefore, reducing the transmission bandwidth has become a research focus, with viewport prediction emerging as a popular topic in the field. Currently, mainstream solutions for viewport prediction often utilize viewpoint trajectories and scene content, combined with neural network outputs for evaluation. Most of the existing methods cannot achieve good performance in long-term prediction and do not fully utilize information in multi-user scenarios. This paper proposes a viewport prediction method inspired by Transformer networks. Because of the similarity in viewpoint trajectories of different users watching the same video, this paper first proposes a scheme to compare multi-user viewport trajectory similarity, which uses the target user’s and historical user’s viewport trajectory data to predict the target user’s future viewport trajectory data. Owing to the discontinuity of the panoramic video viewport trajectory, this paper maps the discontinuous trajectory to solve the problem of discontinuous single prediction trajectory data. In an experiment, this method was used to process a dataset, and promising results were achieved. Finally, experimental comparisons with similar algorithms from recent years show a reduction in error across metrics such as the mean absolute error, Manhattan distance, and angle distance error proposed in this paper, with some metrics reduced by more than 10%. This indicates that the proposed solution can achieve higher accuracy in long-term viewport prediction, and the introduction of attention mechanism and multi-user similarity comparison can aid in improving model performance.
KW - attention mechanism
KW - neural network
KW - panoramic video
KW - saliency map
KW - viewport prediction
KW - virtual reality
UR - http://www.scopus.com/inward/record.url?scp=85218807467&partnerID=8YFLogxK
U2 - 10.12466/xhcl.2025.02.009
DO - 10.12466/xhcl.2025.02.009
M3 - 文章
AN - SCOPUS:85218807467
SN - 1003-0530
VL - 41
SP - 302
EP - 311
JO - Journal of Signal Processing
JF - Journal of Signal Processing
IS - 2
ER -