The auto segmentation for cardiac structures using a dual-input deep learning network based on vision saliency and transformer

Jing Wang; Shuyu Wang; Wei Liang; Nan Zhang; Yan Zhang

doi:10.1002/acm2.13597

The auto segmentation for cardiac structures using a dual-input deep learning network based on vision saliency and transformer

Jing Wang, Shuyu Wang, Wei Liang, Nan Zhang, Yan Zhang^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

Purpose: Accurate segmentation of cardiac structures on coronary CT angiography (CCTA) images is crucial for the morphological analysis, measurement, and functional evaluation. In this study, we achieve accurate automatic segmentation of cardiac structures on CCTA image by adopting an innovative deep learning method based on visual attention mechanism and transformer network, and its practical application value is discussed. Methods: We developed a dual-input deep learning network based on visual saliency and transformer (VST), which consists of self-attention mechanism for cardiac structures segmentation. Sixty patients’ CCTA subjects were randomly selected as a development set, which were manual marked by an experienced technician. The proposed vision attention and transformer mode was trained on the patients CCTA images, with a manual contour-derived binary mask used as the learning-based target. We also used the deep supervision strategy by adding auxiliary losses. The loss function of our model was the sum of the Dice loss and cross-entropy loss. To quantitatively evaluate the segmentation results, we calculated the Dice similarity coefficient (DSC) and Hausdorff distance (HD). Meanwhile, we compare the volume of automatic segmentation and manual segmentation to analyze whether there is statistical difference. Results: Fivefold cross-validation was used to benchmark the segmentation method. The results showed the left ventricular myocardium (LVM, DSC = 0.87), the left ventricular (LV, DSC = 0.94), the left atrial (LA, DSC = 0.90), the right ventricular (RV, DSC = 0.92), the right atrial (RA, DSC = 0.91), and the aortic (AO, DSC = 0.96). The average DSC was 0.92, and HD was 7.2 ± 2.1 mm. In volume comparison, except LVM and LA (p < 0.05), there was no significant statistical difference in other structures. Proposed method for structural segmentation fit well with the true profile of the cardiac substructure, and the model prediction results closed to the manual annotation. Conclusions: The adoption of the dual-input and transformer architecture based on visual saliency has high sensitivity and specificity to cardiac structures segmentation, which can obviously improve the accuracy of automatic substructure segmentation. This is of gr.

Original language	English
Article number	e13597
Journal	Journal of Applied Clinical Medical Physics
Volume	23
Issue number	5
DOIs	https://doi.org/10.1002/acm2.13597
Publication status	Published - May 2022
Externally published	Yes

Keywords

coronary CT angiography
deep learning
self-attention
transformers
visual attention mechanism

Access to Document

10.1002/acm2.13597

Cite this

@article{4d268b2e6cbb4d3ba486c5aa2274e337,

title = "The auto segmentation for cardiac structures using a dual-input deep learning network based on vision saliency and transformer",

abstract = "Purpose: Accurate segmentation of cardiac structures on coronary CT angiography (CCTA) images is crucial for the morphological analysis, measurement, and functional evaluation. In this study, we achieve accurate automatic segmentation of cardiac structures on CCTA image by adopting an innovative deep learning method based on visual attention mechanism and transformer network, and its practical application value is discussed. Methods: We developed a dual-input deep learning network based on visual saliency and transformer (VST), which consists of self-attention mechanism for cardiac structures segmentation. Sixty patients{\textquoteright} CCTA subjects were randomly selected as a development set, which were manual marked by an experienced technician. The proposed vision attention and transformer mode was trained on the patients CCTA images, with a manual contour-derived binary mask used as the learning-based target. We also used the deep supervision strategy by adding auxiliary losses. The loss function of our model was the sum of the Dice loss and cross-entropy loss. To quantitatively evaluate the segmentation results, we calculated the Dice similarity coefficient (DSC) and Hausdorff distance (HD). Meanwhile, we compare the volume of automatic segmentation and manual segmentation to analyze whether there is statistical difference. Results: Fivefold cross-validation was used to benchmark the segmentation method. The results showed the left ventricular myocardium (LVM, DSC = 0.87), the left ventricular (LV, DSC = 0.94), the left atrial (LA, DSC = 0.90), the right ventricular (RV, DSC = 0.92), the right atrial (RA, DSC = 0.91), and the aortic (AO, DSC = 0.96). The average DSC was 0.92, and HD was 7.2 ± 2.1 mm. In volume comparison, except LVM and LA (p < 0.05), there was no significant statistical difference in other structures. Proposed method for structural segmentation fit well with the true profile of the cardiac substructure, and the model prediction results closed to the manual annotation. Conclusions: The adoption of the dual-input and transformer architecture based on visual saliency has high sensitivity and specificity to cardiac structures segmentation, which can obviously improve the accuracy of automatic substructure segmentation. This is of gr.",

keywords = "coronary CT angiography, deep learning, self-attention, transformers, visual attention mechanism",

author = "Jing Wang and Shuyu Wang and Wei Liang and Nan Zhang and Yan Zhang",

note = "Publisher Copyright: {\textcopyright} 2022 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, LLC on behalf of The American Association of Physicists in Medicine.",

year = "2022",

month = may,

doi = "10.1002/acm2.13597",

language = "English",

volume = "23",

journal = "Journal of Applied Clinical Medical Physics",

issn = "1526-9914",

publisher = "American Institute of Physics",

number = "5",

}

TY - JOUR

T1 - The auto segmentation for cardiac structures using a dual-input deep learning network based on vision saliency and transformer

AU - Wang, Jing

AU - Wang, Shuyu

AU - Liang, Wei

AU - Zhang, Nan

AU - Zhang, Yan

PY - 2022/5

Y1 - 2022/5

N2 - Purpose: Accurate segmentation of cardiac structures on coronary CT angiography (CCTA) images is crucial for the morphological analysis, measurement, and functional evaluation. In this study, we achieve accurate automatic segmentation of cardiac structures on CCTA image by adopting an innovative deep learning method based on visual attention mechanism and transformer network, and its practical application value is discussed. Methods: We developed a dual-input deep learning network based on visual saliency and transformer (VST), which consists of self-attention mechanism for cardiac structures segmentation. Sixty patients’ CCTA subjects were randomly selected as a development set, which were manual marked by an experienced technician. The proposed vision attention and transformer mode was trained on the patients CCTA images, with a manual contour-derived binary mask used as the learning-based target. We also used the deep supervision strategy by adding auxiliary losses. The loss function of our model was the sum of the Dice loss and cross-entropy loss. To quantitatively evaluate the segmentation results, we calculated the Dice similarity coefficient (DSC) and Hausdorff distance (HD). Meanwhile, we compare the volume of automatic segmentation and manual segmentation to analyze whether there is statistical difference. Results: Fivefold cross-validation was used to benchmark the segmentation method. The results showed the left ventricular myocardium (LVM, DSC = 0.87), the left ventricular (LV, DSC = 0.94), the left atrial (LA, DSC = 0.90), the right ventricular (RV, DSC = 0.92), the right atrial (RA, DSC = 0.91), and the aortic (AO, DSC = 0.96). The average DSC was 0.92, and HD was 7.2 ± 2.1 mm. In volume comparison, except LVM and LA (p < 0.05), there was no significant statistical difference in other structures. Proposed method for structural segmentation fit well with the true profile of the cardiac substructure, and the model prediction results closed to the manual annotation. Conclusions: The adoption of the dual-input and transformer architecture based on visual saliency has high sensitivity and specificity to cardiac structures segmentation, which can obviously improve the accuracy of automatic substructure segmentation. This is of gr.

AB - Purpose: Accurate segmentation of cardiac structures on coronary CT angiography (CCTA) images is crucial for the morphological analysis, measurement, and functional evaluation. In this study, we achieve accurate automatic segmentation of cardiac structures on CCTA image by adopting an innovative deep learning method based on visual attention mechanism and transformer network, and its practical application value is discussed. Methods: We developed a dual-input deep learning network based on visual saliency and transformer (VST), which consists of self-attention mechanism for cardiac structures segmentation. Sixty patients’ CCTA subjects were randomly selected as a development set, which were manual marked by an experienced technician. The proposed vision attention and transformer mode was trained on the patients CCTA images, with a manual contour-derived binary mask used as the learning-based target. We also used the deep supervision strategy by adding auxiliary losses. The loss function of our model was the sum of the Dice loss and cross-entropy loss. To quantitatively evaluate the segmentation results, we calculated the Dice similarity coefficient (DSC) and Hausdorff distance (HD). Meanwhile, we compare the volume of automatic segmentation and manual segmentation to analyze whether there is statistical difference. Results: Fivefold cross-validation was used to benchmark the segmentation method. The results showed the left ventricular myocardium (LVM, DSC = 0.87), the left ventricular (LV, DSC = 0.94), the left atrial (LA, DSC = 0.90), the right ventricular (RV, DSC = 0.92), the right atrial (RA, DSC = 0.91), and the aortic (AO, DSC = 0.96). The average DSC was 0.92, and HD was 7.2 ± 2.1 mm. In volume comparison, except LVM and LA (p < 0.05), there was no significant statistical difference in other structures. Proposed method for structural segmentation fit well with the true profile of the cardiac substructure, and the model prediction results closed to the manual annotation. Conclusions: The adoption of the dual-input and transformer architecture based on visual saliency has high sensitivity and specificity to cardiac structures segmentation, which can obviously improve the accuracy of automatic substructure segmentation. This is of gr.

KW - coronary CT angiography

KW - deep learning

KW - self-attention

KW - transformers

KW - visual attention mechanism

UR - http://www.scopus.com/inward/record.url?scp=85127452919&partnerID=8YFLogxK

U2 - 10.1002/acm2.13597

DO - 10.1002/acm2.13597

M3 - Article

C2 - 35363415

AN - SCOPUS:85127452919

SN - 1526-9914

VL - 23

JO - Journal of Applied Clinical Medical Physics

JF - Journal of Applied Clinical Medical Physics

IS - 5

M1 - e13597

ER -

The auto segmentation for cardiac structures using a dual-input deep learning network based on vision saliency and transformer

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this