Skip to main navigation Skip to search Skip to main content

PFormer: An efficient CNN-Transformer hybrid network with content-driven P-attention for 3D medical image segmentation

  • Yueyang Gao
  • , Jinhui Zhang*
  • , Siyi Wei
  • , Zheng Li
  • *Corresponding author for this work
  • Beijing Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Medical imaging, particularly medical image segmentation, is pivotal in modern medicine. Transformer architectures have gained significant attention in the field of medical image segmentation. Both pure transformer architectures and hybrid architectures combining transformer with CNN have been extensively proposed, demonstrating impressive performance in medical image segmentation. However, whether embedding the transformer in the higher layers of CNN networks or enhancing attention mechanisms, these approaches have inherent limitations in fully harnessing the potential of the transformer within the constraints of low computational cost. This paper introduces an efficient hybrid network structure that combines CNN and transformer for 3D medical image segmentation. This network utilizes depth-wise separable convolutions (DWconvs) to extract local fine-grained details and introduces a content-driven attention mechanism, P-attention, to replace self-attention within the transformer. P-attention enables global feature modeling in both channel and spatial dimensions. By integrating local information with global dependencies, the network achieves more precise segmentation results with a reduced computational burden. Extensive experiments on Synapse and Automated Cardiac Diagnosis Challenge (ACDC) datasets, utilizing various evaluation metrics such as dice similarity coefficients (DSC), 95% percentile Hausdorff Distance (HD95), average symmetric surface distance (ASSD), precision (PRE) and sensitivity (SE), were conducted to assess the performance of PFormer. The experimental results demonstrate that, compared to other state-of-the-art methods, PFormer achieves superior segmentation performance under lower parameters and floating point of operations (FLOPs), providing strong evidence of the effectiveness of our method. Codes and models of PFormer are available at https://github.com/BitGyy/PFormer.

Original languageEnglish
Article number107154
JournalBiomedical Signal Processing and Control
Volume101
DOIs
Publication statusPublished - Mar 2025

Keywords

  • 3D medical image segmentation
  • CNN-Transformer hybrid architecture
  • Low computational cost
  • P-attention
  • Superior segmentation performance
  • Various evaluation metrics

Fingerprint

Dive into the research topics of 'PFormer: An efficient CNN-Transformer hybrid network with content-driven P-attention for 3D medical image segmentation'. Together they form a unique fingerprint.

Cite this