HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation

Mengmeng Wu, Tiantian Liu, Xin Dai, Chuyang Ye, Jinglong Wu, Shintaro Funahashi, Tianyi Yan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Transformers have been applied to medical image segmentation tasks owing to their excellent longrange modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channelwise cross attention enriching feature synthesis.

Original languageEnglish
JournalIEEE Journal of Biomedical and Health Informatics
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • cross attention bridge
  • hybrid model
  • medical image segmentation
  • multi-scale deformable attention

Fingerprint

Dive into the research topics of 'HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation'. Together they form a unique fingerprint.

Cite this