Dual Transformer Encoder Model for Medical Image Classification

Fangyuan Yan*, Bin Yan, Mingtao Pei

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.

Original languageEnglish
Title of host publication2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
PublisherIEEE Computer Society
Pages690-694
Number of pages5
ISBN (Electronic)9781728198354
DOIs
Publication statusPublished - 2023
Event30th IEEE International Conference on Image Processing, ICIP 2023 - Kuala Lumpur, Malaysia
Duration: 8 Oct 202311 Oct 2023

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference30th IEEE International Conference on Image Processing, ICIP 2023
Country/TerritoryMalaysia
CityKuala Lumpur
Period8/10/2311/10/23

Keywords

  • dual-encoder model
  • medical image classification
  • vision transformer

Fingerprint

Dive into the research topics of 'Dual Transformer Encoder Model for Medical Image Classification'. Together they form a unique fingerprint.

Cite this