Alternating attention Transformer for single image deraining

Dawei Yang, Xin He*, Ruiheng Zhang

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

7 Citations (Scopus)

Abstract

Recently, Transformer-based network architectures have achieved significant improvements over convolutional neural networks (CNNs) in the field of single image deraining, due to the powerful ability of modeling non-local information. However, these approaches employ all token similarities between queries and keys to aggregate global features based on a dense self-attention mechanism, which may neglect to focus on the most relevant information and induce blurry effect by the irrelevant representations. To alleviate the above issues, we propose an effective alternating attention Transformer (called AAT) for boosting image deraining performance. Specifically, we only select the most useful similarity values based on top-k approximate calculation to achieve sparse self-attention. In our framework, the representational capability of Transformer is significantly improved by alternately applying dense and sparse self-attention blocks. In addition, we insert a multi-dilconv feed-forward network to replace the native MLP into our proposed AAT, in order to better characterize the multi-scale rain streaks distribution. To compensate for the lack of modeling of Transformer backbone on local features, we introduce the local feature refinement block for achieving high-quality derained results. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed method. The source codes will be released.

Original languageEnglish
Article number104144
JournalDigital Signal Processing: A Review Journal
Volume141
DOIs
Publication statusPublished - Sept 2023
Externally publishedYes

Keywords

  • Dense self-attention
  • Image restoration
  • Rain removal
  • Single image deraining
  • Sparse self-attention
  • Vision Transformers

Fingerprint

Dive into the research topics of 'Alternating attention Transformer for single image deraining'. Together they form a unique fingerprint.

Cite this