TY - GEN
T1 - Dual Encoders Neural Network for Medical Image Segmentation
AU - Wang, Xin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In the field of medical image segmentation, UNet has emerged as a widely utilized backbone network architecture. The emergence of deep learning techniques such as convolutional neural networks (CNNs), attention mechanisms, and Transformer has provided a foundation for building newer and more powerful versions of UNet. The pure CNNs-based UNet has demonstrated excellent performance in medical image segmentation, and recently, the pure Transformer-based UNet has achieved even better segmentation results. Owing to their local inductive bias, CNNs excel at capturing local features and generate fine but potentially incomplete results, whereas Transformers excel at capturing global context and generate complete but less detailed results. Recently, some studies have explored the integration of CNNs and Transformers, achieving promising performance. In this paper, we introduce a novel dual encoders architecture that combines Swin Transformer and CNNs. Unlike prior methods, our architecture comprises two distinct sets of encoders: one leveraging Swin Transformer and the other utilizing CNNs. Furthermore, a spatial-channel attention-based fusion(SCAF) module is designed to effectively fuse the outputs. These innovative designs empower our network to effectively grasp both global context and local textural details, thereby enhancing the performance of medical image segmentation. Experimental results demonstrate that the proposed method outperforms previous state-of-the-art methods on both the Synapse multi-organ CT dataset and the ACDC dataset.
AB - In the field of medical image segmentation, UNet has emerged as a widely utilized backbone network architecture. The emergence of deep learning techniques such as convolutional neural networks (CNNs), attention mechanisms, and Transformer has provided a foundation for building newer and more powerful versions of UNet. The pure CNNs-based UNet has demonstrated excellent performance in medical image segmentation, and recently, the pure Transformer-based UNet has achieved even better segmentation results. Owing to their local inductive bias, CNNs excel at capturing local features and generate fine but potentially incomplete results, whereas Transformers excel at capturing global context and generate complete but less detailed results. Recently, some studies have explored the integration of CNNs and Transformers, achieving promising performance. In this paper, we introduce a novel dual encoders architecture that combines Swin Transformer and CNNs. Unlike prior methods, our architecture comprises two distinct sets of encoders: one leveraging Swin Transformer and the other utilizing CNNs. Furthermore, a spatial-channel attention-based fusion(SCAF) module is designed to effectively fuse the outputs. These innovative designs empower our network to effectively grasp both global context and local textural details, thereby enhancing the performance of medical image segmentation. Experimental results demonstrate that the proposed method outperforms previous state-of-the-art methods on both the Synapse multi-organ CT dataset and the ACDC dataset.
KW - convolutional neural network
KW - deep learning
KW - medical image segmentaion
KW - transformer
KW - unet
UR - http://www.scopus.com/inward/record.url?scp=85201155063&partnerID=8YFLogxK
U2 - 10.1109/ICCEA62105.2024.10603676
DO - 10.1109/ICCEA62105.2024.10603676
M3 - Conference contribution
AN - SCOPUS:85201155063
T3 - 2024 5th International Conference on Computer Engineering and Application, ICCEA 2024
SP - 905
EP - 909
BT - 2024 5th International Conference on Computer Engineering and Application, ICCEA 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Conference on Computer Engineering and Application, ICCEA 2024
Y2 - 12 April 2024 through 14 April 2024
ER -