TY - JOUR
T1 - Token labeling-guided multi-scale medical image classification
AU - Yan, Fangyuan
AU - Yan, Bin
AU - Liang, Wei
AU - Pei, Mingtao
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2024/2
Y1 - 2024/2
N2 - Vision transformer has been widely used in medical image analysis. However, in most of current methods, only the class token is concerned during training, while the output patch tokens’ information is not well utilized. To track this problem, we propose a two-stage token labeling guided multi-scale model for medical image classification. In the first stage, we pre-train a classification model to extract critical areas as token labeling. In the second stage, we adopt coarse and fine branches to encode visual features, which adapts to the various lesions in medical images. Then, the class token output by each branch is fused for classification. The token labeling is used to supervise the representation learning of patch tokens, which can integrate the local information into the learning. The experimental results on Laryngoscope8, ISIC 2018, and REFUGE data sets show that after adding token labeling, this dual-branch classification model achieves significantly better performance than the model using only class token loss, which demonstrates the effectiveness of our method for medical image classification tasks.
AB - Vision transformer has been widely used in medical image analysis. However, in most of current methods, only the class token is concerned during training, while the output patch tokens’ information is not well utilized. To track this problem, we propose a two-stage token labeling guided multi-scale model for medical image classification. In the first stage, we pre-train a classification model to extract critical areas as token labeling. In the second stage, we adopt coarse and fine branches to encode visual features, which adapts to the various lesions in medical images. Then, the class token output by each branch is fused for classification. The token labeling is used to supervise the representation learning of patch tokens, which can integrate the local information into the learning. The experimental results on Laryngoscope8, ISIC 2018, and REFUGE data sets show that after adding token labeling, this dual-branch classification model achieves significantly better performance than the model using only class token loss, which demonstrates the effectiveness of our method for medical image classification tasks.
KW - Medical image classification
KW - Token labeling
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85181112991&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2023.12.018
DO - 10.1016/j.patrec.2023.12.018
M3 - Article
AN - SCOPUS:85181112991
SN - 0167-8655
VL - 178
SP - 28
EP - 34
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -