基于 Transformer 的深度条件视频压缩

Translated title of the contribution: A Transformer based deep conditional video compression

Guo Lu, Tianxiong Zhong, Jing Geng*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Convolutional neural networks (CNN) are the foundation of most recent learning-based video compression algorithms, which also use residual coding and motion compensation architectures. It is difficult to attain the best compression performance given that typical CNN can only use local correlations and the sparsity of prediction residual. To solve the problems above, this paper proposed a Transformer-based deep conditional video compression algorithm, which can achieve better compression performance. The proposed algorithm uses deformable convolution to obtain the predicted frame feature based on the motion information between the front and rear frames. The predicted frame feature is used as conditional information to conditionally encode the original input frame feature which avoids the direct encoding of sparse residual signals. The proposed algorithm further utilizes the non-local correlation between the features and proposes a transformer-based autoencoder architecture to implement motion coding and conditional coding, which further improves the performance of compression. Experiments show that our Transformer based deep conditional video compression algorithm surpasses the current mainstream learning-based video compression algorithms in both HEVC and UVG datasets.

Translated title of the contributionA Transformer based deep conditional video compression
Original languageChinese (Traditional)
Pages (from-to)442-448
Number of pages7
JournalBeijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics
Volume50
Issue number2
DOIs
Publication statusPublished - Feb 2024

Fingerprint

Dive into the research topics of 'A Transformer based deep conditional video compression'. Together they form a unique fingerprint.

Cite this