TY - GEN
T1 - Preserving Global and Local Temporal Consistency for Arbitrary Video Style Transfer
AU - Wu, Xinxiao
AU - Chen, Jialu
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10/12
Y1 - 2020/10/12
N2 - Video style transfer is a challenging task that requires not only stylizing video frames but also preserving temporal consistency among them. Many existing methods resort to optical flow for maintaining the temporal consistency in stylized videos. However, optical flow is sensitive to occlusions and rapid motions, and its training processing speed is quite slow, which makes it less practical in real-world applications. In this paper, we propose a novel fast method that explores both global and local temporal consistency for video style transfer without estimating optical flow. To preserve the temporal consistency of the entire video (i.e., global consistency), we use structural similarity index instead of flow optical and propose a self-similarity loss to ensure the temporal structure similarity between the stylized video and the source video. Furthermore, to enhance the coherence between adjacent frames (i.e., local consistency), a self-attention mechanism is designed to attend the previous stylized frame for synthesizing the current frame.
AB - Video style transfer is a challenging task that requires not only stylizing video frames but also preserving temporal consistency among them. Many existing methods resort to optical flow for maintaining the temporal consistency in stylized videos. However, optical flow is sensitive to occlusions and rapid motions, and its training processing speed is quite slow, which makes it less practical in real-world applications. In this paper, we propose a novel fast method that explores both global and local temporal consistency for video style transfer without estimating optical flow. To preserve the temporal consistency of the entire video (i.e., global consistency), we use structural similarity index instead of flow optical and propose a self-similarity loss to ensure the temporal structure similarity between the stylized video and the source video. Furthermore, to enhance the coherence between adjacent frames (i.e., local consistency), a self-attention mechanism is designed to attend the previous stylized frame for synthesizing the current frame.
KW - self-attention
KW - self-similarity
KW - video style transfer
UR - http://www.scopus.com/inward/record.url?scp=85106882811&partnerID=8YFLogxK
U2 - 10.1145/3394171.3413872
DO - 10.1145/3394171.3413872
M3 - Conference contribution
AN - SCOPUS:85106882811
T3 - MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
SP - 1791
EP - 1799
BT - MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 28th ACM International Conference on Multimedia, MM 2020
Y2 - 12 October 2020 through 16 October 2020
ER -