SA-TF-UNet：基于空间注意力机制和Transformer 的 MRI 海马体分割

Yuxuan Ou; Min Gao; Di Zhao; Jun Liu

doi:10.11834/jig.220567

SA-TF-UNet：基于空间注意力机制和Transformer 的 MRI 海马体分割

Translated title of the contribution: SA-TF-UNet：a Transformer and spatial attention mechanisms based hippocampus segmentation network

Yuxuan Ou, Min Gao, Di Zhao^*, Jun Liu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Objective The early intervention and diagnosis of Alzheimer’s disease（AD）have its high clinical and social value to a certain extent.Hippocampus is located and as one of the earliest affected brain regions in AD，and its dysfunction is recognized as such core features of the disease-memory impairment.It is labor-intensive and time inefficient to deal with AD contexts using magnetic resonance imaging（MRI）.The emerging artificial intelligence（AI）technique is beneficial for high-accuracy hippocampus segmentation work on MRI scanning effectively and efficiently.When an AI-related algorithm is developed for AD diagnosis，convolutional neural networks（CNNs）based deep learning methods can be employed to carry out the task of hippocampus segmentation further.As the down-sampling steps are involved in the encoder，convolutions of various kernel sizes can be used to contract images and extract image features.To expand the generated feature map through encoding，upsampling it to the original spatial size of the input image，the decoders can be used to transpose convolutions and bilinear interpolation as well.First，convolutions can be used to integrate context information within the receptive field only.In this case，all pixels-out would be ignored for in-bound of the receptive field，even pixels are correlated with in-bound pixels，and redundant information is produced after that.To optimize task of hippocampus segmentation network，we focus on the natural characteristics of the hippocampus and clinical-based segmentation works.The characteristics of the hippocampus can be affected on the two aspects as mentioned below：the first one is oriented that the shape of the hippocampus is irregular，while its size of the second one is minimal，occupied by only 0.000 2 of the whole pixels of the MRI scans.For the first one，convolutions are difficult to extract features effectively from irregular shape objectives because they can extract local features only.An encoder in a neural network may contain many feature extraction layers，so the extracted information of the hippocampus will be lost because there are only limited pixels of the hippocampus in the original image.To sort the hippocampus-relevant region of interest out，it is required to segment small objects is a superposition of a detection network.The semantic segmentation network will only be oriented and applied inside the bounding box.However，it still has two identical features in the learning process，for which redundancy of computing resources are inevitable.Method To extract features from targets with irregular shapes effectively and highlight the target areas automatically，we adjust the segmentation in medical images and treat it as a sequence-to-sequence prediction task.We develop a U-shaped network based on self-attention and spatial attention mechanisms，called SA-TF-UNet.The SA-TF-UNet has an encoder-decoder architecture，where the encoder is based on pure Transformer blocks.Self-attention mechanisms in Transformer blocks can be used to enable global modeling as well.An attention gate（AG）is adopted to optimize the concatenation of the skip connections in U-Net，where the AGs can be learnt from depth layers of the Transformer and the weights on the target areas can be automatically set up more.To validate the effectiveness of AGs，we carried out experiments where one AG is only contained for the network.The comparative analysis is carried out the experiment as well，where we apply AG to all four layers.To determine the gating signals for each AG further，two sorts of structures are illustrated.The gating signals in these two sorts of structures are focused on the depth outputs of two Transformer blocks，and three Transformer blocks.Result Our models proposed are tested on a dataset sample derived of 54 clinical MRI scans from AD patients.The dataset is divided into training data and testing data at a ratio of 8∶1 randomly.Three independent experiments are carried out，and an average result is used to reduce contingency simutaneously.The potential of SA-TF-UNet is demonstrated that the average dice of the left hippocampus and right hippocampus in three independent experiments are 0.900 1 and 0.909 1 relevant to an improvement of 2.82% and 3.37%.The other two related fine-tuned structures are linked that a dice coefficient of them is reached to more than 0.88 as well.Conclusion The integrated self and spatial attention is beneifical for the precision of hippocampus segmentation.It is effective that the gating signal in AG is outputted in terms of one depth Transformer block only.

Translated title of the contribution	SA-TF-UNet：a Transformer and spatial attention mechanisms based hippocampus segmentation network
Original language	Chinese (Traditional)
Pages (from-to)	3191-3202
Number of pages	12
Journal	Journal of Image and Graphics
Volume	22
Issue number	8
DOIs	https://doi.org/10.11834/jig.220567
Publication status	Published - Oct 2023
Externally published	Yes

Access to Document

10.11834/jig.220567

Cite this

@article{7ceba03bc51248e8b93e865644c8484c,

title = "SA-TF-UNet：基于空间注意力机制和Transformer 的 MRI 海马体分割",

abstract = "Objective The early intervention and diagnosis of Alzheimer{\textquoteright}s disease（AD）have its high clinical and social value to a certain extent.Hippocampus is located and as one of the earliest affected brain regions in AD，and its dysfunction is recognized as such core features of the disease-memory impairment.It is labor-intensive and time inefficient to deal with AD contexts using magnetic resonance imaging（MRI）.The emerging artificial intelligence（AI）technique is beneficial for high-accuracy hippocampus segmentation work on MRI scanning effectively and efficiently.When an AI-related algorithm is developed for AD diagnosis，convolutional neural networks（CNNs）based deep learning methods can be employed to carry out the task of hippocampus segmentation further.As the down-sampling steps are involved in the encoder，convolutions of various kernel sizes can be used to contract images and extract image features.To expand the generated feature map through encoding，upsampling it to the original spatial size of the input image，the decoders can be used to transpose convolutions and bilinear interpolation as well.First，convolutions can be used to integrate context information within the receptive field only.In this case，all pixels-out would be ignored for in-bound of the receptive field，even pixels are correlated with in-bound pixels，and redundant information is produced after that.To optimize task of hippocampus segmentation network，we focus on the natural characteristics of the hippocampus and clinical-based segmentation works.The characteristics of the hippocampus can be affected on the two aspects as mentioned below：the first one is oriented that the shape of the hippocampus is irregular，while its size of the second one is minimal，occupied by only 0.000 2 of the whole pixels of the MRI scans.For the first one，convolutions are difficult to extract features effectively from irregular shape objectives because they can extract local features only.An encoder in a neural network may contain many feature extraction layers，so the extracted information of the hippocampus will be lost because there are only limited pixels of the hippocampus in the original image.To sort the hippocampus-relevant region of interest out，it is required to segment small objects is a superposition of a detection network.The semantic segmentation network will only be oriented and applied inside the bounding box.However，it still has two identical features in the learning process，for which redundancy of computing resources are inevitable.Method To extract features from targets with irregular shapes effectively and highlight the target areas automatically，we adjust the segmentation in medical images and treat it as a sequence-to-sequence prediction task.We develop a U-shaped network based on self-attention and spatial attention mechanisms，called SA-TF-UNet.The SA-TF-UNet has an encoder-decoder architecture，where the encoder is based on pure Transformer blocks.Self-attention mechanisms in Transformer blocks can be used to enable global modeling as well.An attention gate（AG）is adopted to optimize the concatenation of the skip connections in U-Net，where the AGs can be learnt from depth layers of the Transformer and the weights on the target areas can be automatically set up more.To validate the effectiveness of AGs，we carried out experiments where one AG is only contained for the network.The comparative analysis is carried out the experiment as well，where we apply AG to all four layers.To determine the gating signals for each AG further，two sorts of structures are illustrated.The gating signals in these two sorts of structures are focused on the depth outputs of two Transformer blocks，and three Transformer blocks.Result Our models proposed are tested on a dataset sample derived of 54 clinical MRI scans from AD patients.The dataset is divided into training data and testing data at a ratio of 8∶1 randomly.Three independent experiments are carried out，and an average result is used to reduce contingency simutaneously.The potential of SA-TF-UNet is demonstrated that the average dice of the left hippocampus and right hippocampus in three independent experiments are 0.900 1 and 0.909 1 relevant to an improvement of 2.82% and 3.37%.The other two related fine-tuned structures are linked that a dice coefficient of them is reached to more than 0.88 as well.Conclusion The integrated self and spatial attention is beneifical for the precision of hippocampus segmentation.It is effective that the gating signal in AG is outputted in terms of one depth Transformer block only.",

keywords = "Transformer, hippocampus, medical image processing, sementic segmentation, spatial attention",

author = "Yuxuan Ou and Min Gao and Di Zhao and Jun Liu",

year = "2023",

month = oct,

doi = "10.11834/jig.220567",

language = "繁体中文",

volume = "22",

pages = "3191--3202",

journal = "Journal of Image and Graphics",

issn = "1006-8961",

publisher = "Editorial and Publishing Board of JIG",

number = "8",

}

TY - JOUR

T1 - SA-TF-UNet：基于空间注意力机制和Transformer 的 MRI 海马体分割

AU - Ou, Yuxuan

AU - Gao, Min

AU - Zhao, Di

AU - Liu, Jun

PY - 2023/10

Y1 - 2023/10

N2 - Objective The early intervention and diagnosis of Alzheimer’s disease（AD）have its high clinical and social value to a certain extent.Hippocampus is located and as one of the earliest affected brain regions in AD，and its dysfunction is recognized as such core features of the disease-memory impairment.It is labor-intensive and time inefficient to deal with AD contexts using magnetic resonance imaging（MRI）.The emerging artificial intelligence（AI）technique is beneficial for high-accuracy hippocampus segmentation work on MRI scanning effectively and efficiently.When an AI-related algorithm is developed for AD diagnosis，convolutional neural networks（CNNs）based deep learning methods can be employed to carry out the task of hippocampus segmentation further.As the down-sampling steps are involved in the encoder，convolutions of various kernel sizes can be used to contract images and extract image features.To expand the generated feature map through encoding，upsampling it to the original spatial size of the input image，the decoders can be used to transpose convolutions and bilinear interpolation as well.First，convolutions can be used to integrate context information within the receptive field only.In this case，all pixels-out would be ignored for in-bound of the receptive field，even pixels are correlated with in-bound pixels，and redundant information is produced after that.To optimize task of hippocampus segmentation network，we focus on the natural characteristics of the hippocampus and clinical-based segmentation works.The characteristics of the hippocampus can be affected on the two aspects as mentioned below：the first one is oriented that the shape of the hippocampus is irregular，while its size of the second one is minimal，occupied by only 0.000 2 of the whole pixels of the MRI scans.For the first one，convolutions are difficult to extract features effectively from irregular shape objectives because they can extract local features only.An encoder in a neural network may contain many feature extraction layers，so the extracted information of the hippocampus will be lost because there are only limited pixels of the hippocampus in the original image.To sort the hippocampus-relevant region of interest out，it is required to segment small objects is a superposition of a detection network.The semantic segmentation network will only be oriented and applied inside the bounding box.However，it still has two identical features in the learning process，for which redundancy of computing resources are inevitable.Method To extract features from targets with irregular shapes effectively and highlight the target areas automatically，we adjust the segmentation in medical images and treat it as a sequence-to-sequence prediction task.We develop a U-shaped network based on self-attention and spatial attention mechanisms，called SA-TF-UNet.The SA-TF-UNet has an encoder-decoder architecture，where the encoder is based on pure Transformer blocks.Self-attention mechanisms in Transformer blocks can be used to enable global modeling as well.An attention gate（AG）is adopted to optimize the concatenation of the skip connections in U-Net，where the AGs can be learnt from depth layers of the Transformer and the weights on the target areas can be automatically set up more.To validate the effectiveness of AGs，we carried out experiments where one AG is only contained for the network.The comparative analysis is carried out the experiment as well，where we apply AG to all four layers.To determine the gating signals for each AG further，two sorts of structures are illustrated.The gating signals in these two sorts of structures are focused on the depth outputs of two Transformer blocks，and three Transformer blocks.Result Our models proposed are tested on a dataset sample derived of 54 clinical MRI scans from AD patients.The dataset is divided into training data and testing data at a ratio of 8∶1 randomly.Three independent experiments are carried out，and an average result is used to reduce contingency simutaneously.The potential of SA-TF-UNet is demonstrated that the average dice of the left hippocampus and right hippocampus in three independent experiments are 0.900 1 and 0.909 1 relevant to an improvement of 2.82% and 3.37%.The other two related fine-tuned structures are linked that a dice coefficient of them is reached to more than 0.88 as well.Conclusion The integrated self and spatial attention is beneifical for the precision of hippocampus segmentation.It is effective that the gating signal in AG is outputted in terms of one depth Transformer block only.

AB - Objective The early intervention and diagnosis of Alzheimer’s disease（AD）have its high clinical and social value to a certain extent.Hippocampus is located and as one of the earliest affected brain regions in AD，and its dysfunction is recognized as such core features of the disease-memory impairment.It is labor-intensive and time inefficient to deal with AD contexts using magnetic resonance imaging（MRI）.The emerging artificial intelligence（AI）technique is beneficial for high-accuracy hippocampus segmentation work on MRI scanning effectively and efficiently.When an AI-related algorithm is developed for AD diagnosis，convolutional neural networks（CNNs）based deep learning methods can be employed to carry out the task of hippocampus segmentation further.As the down-sampling steps are involved in the encoder，convolutions of various kernel sizes can be used to contract images and extract image features.To expand the generated feature map through encoding，upsampling it to the original spatial size of the input image，the decoders can be used to transpose convolutions and bilinear interpolation as well.First，convolutions can be used to integrate context information within the receptive field only.In this case，all pixels-out would be ignored for in-bound of the receptive field，even pixels are correlated with in-bound pixels，and redundant information is produced after that.To optimize task of hippocampus segmentation network，we focus on the natural characteristics of the hippocampus and clinical-based segmentation works.The characteristics of the hippocampus can be affected on the two aspects as mentioned below：the first one is oriented that the shape of the hippocampus is irregular，while its size of the second one is minimal，occupied by only 0.000 2 of the whole pixels of the MRI scans.For the first one，convolutions are difficult to extract features effectively from irregular shape objectives because they can extract local features only.An encoder in a neural network may contain many feature extraction layers，so the extracted information of the hippocampus will be lost because there are only limited pixels of the hippocampus in the original image.To sort the hippocampus-relevant region of interest out，it is required to segment small objects is a superposition of a detection network.The semantic segmentation network will only be oriented and applied inside the bounding box.However，it still has two identical features in the learning process，for which redundancy of computing resources are inevitable.Method To extract features from targets with irregular shapes effectively and highlight the target areas automatically，we adjust the segmentation in medical images and treat it as a sequence-to-sequence prediction task.We develop a U-shaped network based on self-attention and spatial attention mechanisms，called SA-TF-UNet.The SA-TF-UNet has an encoder-decoder architecture，where the encoder is based on pure Transformer blocks.Self-attention mechanisms in Transformer blocks can be used to enable global modeling as well.An attention gate（AG）is adopted to optimize the concatenation of the skip connections in U-Net，where the AGs can be learnt from depth layers of the Transformer and the weights on the target areas can be automatically set up more.To validate the effectiveness of AGs，we carried out experiments where one AG is only contained for the network.The comparative analysis is carried out the experiment as well，where we apply AG to all four layers.To determine the gating signals for each AG further，two sorts of structures are illustrated.The gating signals in these two sorts of structures are focused on the depth outputs of two Transformer blocks，and three Transformer blocks.Result Our models proposed are tested on a dataset sample derived of 54 clinical MRI scans from AD patients.The dataset is divided into training data and testing data at a ratio of 8∶1 randomly.Three independent experiments are carried out，and an average result is used to reduce contingency simutaneously.The potential of SA-TF-UNet is demonstrated that the average dice of the left hippocampus and right hippocampus in three independent experiments are 0.900 1 and 0.909 1 relevant to an improvement of 2.82% and 3.37%.The other two related fine-tuned structures are linked that a dice coefficient of them is reached to more than 0.88 as well.Conclusion The integrated self and spatial attention is beneifical for the precision of hippocampus segmentation.It is effective that the gating signal in AG is outputted in terms of one depth Transformer block only.

KW - Transformer

KW - hippocampus

KW - medical image processing

KW - sementic segmentation

KW - spatial attention

UR - http://www.scopus.com/inward/record.url?scp=85178180021&partnerID=8YFLogxK

U2 - 10.11834/jig.220567

DO - 10.11834/jig.220567

M3 - 文章

AN - SCOPUS:85178180021

SN - 1006-8961

VL - 22

SP - 3191

EP - 3202

JO - Journal of Image and Graphics

JF - Journal of Image and Graphics

IS - 8

ER -

SA-TF-UNet：基于空间注意力机制和Transformer 的 MRI 海马体分割

Abstract

Access to Document

Other files and links

Fingerprint

Cite this