HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation

Mengmeng Wu; Tiantian Liu; Xin Dai; Chuyang Ye; Jinglong Wu; Shintaro Funahashi; Tianyi Yan

doi:10.1109/JBHI.2024.3469230

HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation

Mengmeng Wu, Tiantian Liu, Xin Dai, Chuyang Ye, Jinglong Wu, Shintaro Funahashi, Tianyi Yan^*

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Transformers have been applied to medical image segmentation tasks owing to their excellent longrange modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channelwise cross attention enriching feature synthesis.

源语言	英语
期刊	IEEE Journal of Biomedical and Health Informatics
DOI	https://doi.org/10.1109/JBHI.2024.3469230
出版状态	已接受/待刊 - 2024

访问文件

10.1109/JBHI.2024.3469230

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{dcb3a8cae3144caf86d7a1e6d431569e,

title = "HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation",

abstract = "Transformers have been applied to medical image segmentation tasks owing to their excellent longrange modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channelwise cross attention enriching feature synthesis.",

keywords = "cross attention bridge, hybrid model, medical image segmentation, multi-scale deformable attention",

author = "Mengmeng Wu and Tiantian Liu and Xin Dai and Chuyang Ye and Jinglong Wu and Shintaro Funahashi and Tianyi Yan",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2024",

doi = "10.1109/JBHI.2024.3469230",

language = "English",

journal = "IEEE Journal of Biomedical and Health Informatics",

issn = "2168-2194",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - HMDA

T2 - A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation

AU - Wu, Mengmeng

AU - Liu, Tiantian

AU - Dai, Xin

AU - Ye, Chuyang

AU - Wu, Jinglong

AU - Funahashi, Shintaro

AU - Yan, Tianyi

PY - 2024

Y1 - 2024

N2 - Transformers have been applied to medical image segmentation tasks owing to their excellent longrange modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channelwise cross attention enriching feature synthesis.

AB - Transformers have been applied to medical image segmentation tasks owing to their excellent longrange modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channelwise cross attention enriching feature synthesis.

KW - cross attention bridge

KW - hybrid model

KW - medical image segmentation

KW - multi-scale deformable attention

UR - http://www.scopus.com/inward/record.url?scp=85207143347&partnerID=8YFLogxK

U2 - 10.1109/JBHI.2024.3469230

DO - 10.1109/JBHI.2024.3469230

M3 - Article

C2 - 39374270

AN - SCOPUS:85207143347

SN - 2168-2194

JO - IEEE Journal of Biomedical and Health Informatics

JF - IEEE Journal of Biomedical and Health Informatics

ER -

HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation

摘要

访问文件

其它文件与链接

指纹

引用此