TY - GEN
T1 - SwinUnet with Multi-task Learning for Image Segmentation
AU - Wang, Nan
AU - Zeng, Zhifan
AU - Qiu, Xin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Image segmentation finds extensive application in various scenarios. While transformer-based variant models have significantly enhanced image segmentation performance, the stability of model training remains a concern. To address these challenges, this study introduces the Multi-Task SwinUnet (MSwinUnet) framework. It achieves multi-task learning by incorporating an additional task, Mask Reconstruction Segmentation (MaskRSeg), alongside the original Image Segmentation (ImgSeg) task. Our approach seamlessly integrates with Swin-Unet, enhancing the model's segmentation performance. Extensive experimental results demonstrate that MSwinUnet surpasses baseline models including UNet, TransUNet, and Swin-Unet, achieving DSC of 89.53% and MIoU of 0.8176 on the ACDC benchmark dataset. Furthermore, optimal model stability is achieved when the task ratio for ImgSeg and MaskRSeg is 8:2. Furthermore, we thoroughly illustrate the variations in the effects of different mask rate and mask patch size parameters on the MaskRSeg task. Among these parameters, a mask rate of 45% and a mask patch size of 4 yield the most optimal segmentation results. The training approach proposed in this paper will assist in further improving the accuracy of image segmentation for a wider range of Transformer variant models.
AB - Image segmentation finds extensive application in various scenarios. While transformer-based variant models have significantly enhanced image segmentation performance, the stability of model training remains a concern. To address these challenges, this study introduces the Multi-Task SwinUnet (MSwinUnet) framework. It achieves multi-task learning by incorporating an additional task, Mask Reconstruction Segmentation (MaskRSeg), alongside the original Image Segmentation (ImgSeg) task. Our approach seamlessly integrates with Swin-Unet, enhancing the model's segmentation performance. Extensive experimental results demonstrate that MSwinUnet surpasses baseline models including UNet, TransUNet, and Swin-Unet, achieving DSC of 89.53% and MIoU of 0.8176 on the ACDC benchmark dataset. Furthermore, optimal model stability is achieved when the task ratio for ImgSeg and MaskRSeg is 8:2. Furthermore, we thoroughly illustrate the variations in the effects of different mask rate and mask patch size parameters on the MaskRSeg task. Among these parameters, a mask rate of 45% and a mask patch size of 4 yield the most optimal segmentation results. The training approach proposed in this paper will assist in further improving the accuracy of image segmentation for a wider range of Transformer variant models.
KW - Image Segmentation
KW - Mask reconstruction
KW - MSwinUnet
KW - Multi-task learning
KW - Swin-Unet
UR - https://www.scopus.com/pages/publications/85187311846
U2 - 10.1109/ICISCAE59047.2023.10393666
DO - 10.1109/ICISCAE59047.2023.10393666
M3 - Conference contribution
AN - SCOPUS:85187311846
T3 - 2023 IEEE 6th International Conference on Information Systems and Computer Aided Education, ICISCAE 2023
SP - 602
EP - 607
BT - 2023 IEEE 6th International Conference on Information Systems and Computer Aided Education, ICISCAE 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE 6th International Conference on Information Systems and Computer Aided Education, ICISCAE 2023
Y2 - 23 September 2023 through 25 September 2023
ER -