TY - JOUR
T1 - HMDA
T2 - A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation
AU - Wu, Mengmeng
AU - Liu, Tiantian
AU - Dai, Xin
AU - Ye, Chuyang
AU - Wu, Jinglong
AU - Funahashi, Shintaro
AU - Yan, Tianyi
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Transformers have been applied to medical image segmentation tasks owing to their excellent longrange modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channelwise cross attention enriching feature synthesis.
AB - Transformers have been applied to medical image segmentation tasks owing to their excellent longrange modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channelwise cross attention enriching feature synthesis.
KW - cross attention bridge
KW - hybrid model
KW - medical image segmentation
KW - multi-scale deformable attention
UR - http://www.scopus.com/inward/record.url?scp=85207143347&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2024.3469230
DO - 10.1109/JBHI.2024.3469230
M3 - Article
C2 - 39374270
AN - SCOPUS:85207143347
SN - 2168-2194
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
ER -