跳到主要导航 跳到搜索 跳到主要内容

基于 Transformer 的深度条件视频压缩

  • Guo Lu
  • , Tianxiong Zhong
  • , Jing Geng*
  • *此作品的通讯作者
  • Beijing Institute of Technology

科研成果: 期刊稿件文章同行评审

摘要

Convolutional neural networks (CNN) are the foundation of most recent learning-based video compression algorithms, which also use residual coding and motion compensation architectures. It is difficult to attain the best compression performance given that typical CNN can only use local correlations and the sparsity of prediction residual. To solve the problems above, this paper proposed a Transformer-based deep conditional video compression algorithm, which can achieve better compression performance. The proposed algorithm uses deformable convolution to obtain the predicted frame feature based on the motion information between the front and rear frames. The predicted frame feature is used as conditional information to conditionally encode the original input frame feature which avoids the direct encoding of sparse residual signals. The proposed algorithm further utilizes the non-local correlation between the features and proposes a transformer-based autoencoder architecture to implement motion coding and conditional coding, which further improves the performance of compression. Experiments show that our Transformer based deep conditional video compression algorithm surpasses the current mainstream learning-based video compression algorithms in both HEVC and UVG datasets.

投稿的翻译标题A Transformer based deep conditional video compression
源语言繁体中文
页(从-至)442-448
页数7
期刊Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics
50
2
DOI
出版状态已出版 - 2月 2024

关键词

  • Transformer
  • compression algorithm
  • deep learning
  • neural network
  • video compression

指纹

探究 '基于 Transformer 的深度条件视频压缩' 的科研主题。它们共同构成独一无二的指纹。

引用此