Dual Transformer Encoder Model for Medical Image Classification

Fangyuan Yan*, Bin Yan, Mingtao Pei

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.

源语言英语
主期刊名2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
出版商IEEE Computer Society
690-694
页数5
ISBN(电子版)9781728198354
DOI
出版状态已出版 - 2023
活动30th IEEE International Conference on Image Processing, ICIP 2023 - Kuala Lumpur, 马来西亚
期限: 8 10月 202311 10月 2023

出版系列

姓名Proceedings - International Conference on Image Processing, ICIP
ISSN(印刷版)1522-4880

会议

会议30th IEEE International Conference on Image Processing, ICIP 2023
国家/地区马来西亚
Kuala Lumpur
时期8/10/2311/10/23

指纹

探究 'Dual Transformer Encoder Model for Medical Image Classification' 的科研主题。它们共同构成独一无二的指纹。

引用此