Dual Transformer Encoder Model for Medical Image Classification

Fangyuan Yan; Bin Yan; Mingtao Pei

doi:10.1109/ICIP49359.2023.10222303

Dual Transformer Encoder Model for Medical Image Classification

Fangyuan Yan^*, Bin Yan, Mingtao Pei

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Citations (Scopus)

Abstract

Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.

Original language	English
Title of host publication	2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
Publisher	IEEE Computer Society
Pages	690-694
Number of pages	5
ISBN (Electronic)	9781728198354
DOIs	https://doi.org/10.1109/ICIP49359.2023.10222303
Publication status	Published - 2023
Event	30th IEEE International Conference on Image Processing, ICIP 2023 - Kuala Lumpur, Malaysia Duration: 8 Oct 2023 → 11 Oct 2023

Publication series

Name	Proceedings - International Conference on Image Processing, ICIP
ISSN (Print)	1522-4880

Conference

Conference	30th IEEE International Conference on Image Processing, ICIP 2023
Country/Territory	Malaysia
City	Kuala Lumpur
Period	8/10/23 → 11/10/23

Keywords

dual-encoder model
medical image classification
vision transformer

Access to Document

10.1109/ICIP49359.2023.10222303

Cite this

Yan, F., Yan, B., & Pei, M. (2023). Dual Transformer Encoder Model for Medical Image Classification. In 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings (pp. 690-694). (Proceedings - International Conference on Image Processing, ICIP). IEEE Computer Society. https://doi.org/10.1109/ICIP49359.2023.10222303

@inproceedings{7e6e912d317347f6b78d0b4c911c45d3,

title = "Dual Transformer Encoder Model for Medical Image Classification",

abstract = "Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.",

keywords = "dual-encoder model, medical image classification, vision transformer",

author = "Fangyuan Yan and Bin Yan and Mingtao Pei",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 30th IEEE International Conference on Image Processing, ICIP 2023 ; Conference date: 08-10-2023 Through 11-10-2023",

year = "2023",

doi = "10.1109/ICIP49359.2023.10222303",

language = "English",

series = "Proceedings - International Conference on Image Processing, ICIP",

publisher = "IEEE Computer Society",

pages = "690--694",

booktitle = "2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings",

address = "United States",

}

Yan, F, Yan, B & Pei, M 2023, Dual Transformer Encoder Model for Medical Image Classification. in 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings. Proceedings - International Conference on Image Processing, ICIP, IEEE Computer Society, pp. 690-694, 30th IEEE International Conference on Image Processing, ICIP 2023, Kuala Lumpur, Malaysia, 8/10/23. https://doi.org/10.1109/ICIP49359.2023.10222303

Dual Transformer Encoder Model for Medical Image Classification. / Yan, Fangyuan; Yan, Bin; Pei, Mingtao.
2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings. IEEE Computer Society, 2023. p. 690-694 (Proceedings - International Conference on Image Processing, ICIP).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Dual Transformer Encoder Model for Medical Image Classification

AU - Yan, Fangyuan

AU - Yan, Bin

AU - Pei, Mingtao

PY - 2023

Y1 - 2023

N2 - Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.

AB - Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.

KW - dual-encoder model

KW - medical image classification

KW - vision transformer

UR - http://www.scopus.com/inward/record.url?scp=85180777351&partnerID=8YFLogxK

U2 - 10.1109/ICIP49359.2023.10222303

DO - 10.1109/ICIP49359.2023.10222303

M3 - Conference contribution

AN - SCOPUS:85180777351

T3 - Proceedings - International Conference on Image Processing, ICIP

SP - 690

EP - 694

BT - 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings

PB - IEEE Computer Society

T2 - 30th IEEE International Conference on Image Processing, ICIP 2023

Y2 - 8 October 2023 through 11 October 2023

ER -

Dual Transformer Encoder Model for Medical Image Classification

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this