HMDA: A Hybrid Model with Multi-Scale Deformable Attention for Medical Image Segmentation

Mengmeng Wu; Tiantian Liu; Xin Dai; Chuyang Ye; Jinglong Wu; Shintaro Funahashi; Tianyi Yan

doi:10.1109/JBHI.2024.3469230

HMDA: A Hybrid Model with Multi-Scale Deformable Attention for Medical Image Segmentation

Mengmeng Wu, Tiantian Liu^*, Xin Dai, Chuyang Ye, Jinglong Wu, Shintaro Funahashi, Tianyi Yan^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Transformers have been applied to medical image segmentation tasks owing to their excellent long-range modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channel-wise cross attention enriching feature synthesis. HMDA is validated on multiple datasets, and the results demonstrate the effectiveness of our approach, which achieves competitive results compared to the previous methods.

Original language	English
Pages (from-to)	1243-1255
Number of pages	13
Journal	IEEE Journal of Biomedical and Health Informatics
Volume	29
Issue number	2
DOIs	https://doi.org/10.1109/JBHI.2024.3469230
Publication status	Published - 2025

Keywords

Medical image segmentation
cross attention bridge
hybrid model
multi-scale deformable attention

Access to Document

10.1109/JBHI.2024.3469230

Cite this

Wu, M., Liu, T., Dai, X., Ye, C., Wu, J., Funahashi, S., & Yan, T. (2025). HMDA: A Hybrid Model with Multi-Scale Deformable Attention for Medical Image Segmentation. IEEE Journal of Biomedical and Health Informatics, 29(2), 1243-1255. https://doi.org/10.1109/JBHI.2024.3469230

@article{dcb3a8cae3144caf86d7a1e6d431569e,

title = "HMDA: A Hybrid Model with Multi-Scale Deformable Attention for Medical Image Segmentation",

abstract = "Transformers have been applied to medical image segmentation tasks owing to their excellent long-range modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channel-wise cross attention enriching feature synthesis. HMDA is validated on multiple datasets, and the results demonstrate the effectiveness of our approach, which achieves competitive results compared to the previous methods.",

keywords = "Medical image segmentation, cross attention bridge, hybrid model, multi-scale deformable attention",

author = "Mengmeng Wu and Tiantian Liu and Xin Dai and Chuyang Ye and Jinglong Wu and Shintaro Funahashi and Tianyi Yan",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2025",

doi = "10.1109/JBHI.2024.3469230",

language = "English",

volume = "29",

pages = "1243--1255",

journal = "IEEE Journal of Biomedical and Health Informatics",

issn = "2168-2194",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2",

}

TY - JOUR

T1 - HMDA

T2 - A Hybrid Model with Multi-Scale Deformable Attention for Medical Image Segmentation

AU - Wu, Mengmeng

AU - Liu, Tiantian

AU - Dai, Xin

AU - Ye, Chuyang

AU - Wu, Jinglong

AU - Funahashi, Shintaro

AU - Yan, Tianyi

PY - 2025

Y1 - 2025

N2 - Transformers have been applied to medical image segmentation tasks owing to their excellent long-range modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channel-wise cross attention enriching feature synthesis. HMDA is validated on multiple datasets, and the results demonstrate the effectiveness of our approach, which achieves competitive results compared to the previous methods.

AB - Transformers have been applied to medical image segmentation tasks owing to their excellent long-range modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channel-wise cross attention enriching feature synthesis. HMDA is validated on multiple datasets, and the results demonstrate the effectiveness of our approach, which achieves competitive results compared to the previous methods.

KW - Medical image segmentation

KW - cross attention bridge

KW - hybrid model

KW - multi-scale deformable attention

UR - http://www.scopus.com/inward/record.url?scp=85207143347&partnerID=8YFLogxK

U2 - 10.1109/JBHI.2024.3469230

DO - 10.1109/JBHI.2024.3469230

M3 - Article

C2 - 39374270

AN - SCOPUS:85207143347

SN - 2168-2194

VL - 29

SP - 1243

EP - 1255

JO - IEEE Journal of Biomedical and Health Informatics

JF - IEEE Journal of Biomedical and Health Informatics

IS - 2

ER -

HMDA: A Hybrid Model with Multi-Scale Deformable Attention for Medical Image Segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this