Dual Transformer Encoder Model for Medical Image Classification

Fangyuan Yan; Bin Yan; Mingtao Pei

doi:10.1109/ICIP49359.2023.10222303

Dual Transformer Encoder Model for Medical Image Classification

Fangyuan Yan^*, Bin Yan, Mingtao Pei

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.

源语言	英语
主期刊名	2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
出版商	IEEE Computer Society
页	690-694
页数	5
ISBN（电子版）	9781728198354
DOI	https://doi.org/10.1109/ICIP49359.2023.10222303
出版状态	已出版 - 2023
活动	30th IEEE International Conference on Image Processing, ICIP 2023 - Kuala Lumpur, 马来西亚期限: 8 10月 2023 → 11 10月 2023

出版系列

姓名	Proceedings - International Conference on Image Processing, ICIP
ISSN（印刷版）	1522-4880

会议

会议	30th IEEE International Conference on Image Processing, ICIP 2023
国家/地区	马来西亚
市	Kuala Lumpur
时期	8/10/23 → 11/10/23

访问文件

10.1109/ICIP49359.2023.10222303

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{7e6e912d317347f6b78d0b4c911c45d3,

title = "Dual Transformer Encoder Model for Medical Image Classification",

abstract = "Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.",

keywords = "dual-encoder model, medical image classification, vision transformer",

author = "Fangyuan Yan and Bin Yan and Mingtao Pei",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 30th IEEE International Conference on Image Processing, ICIP 2023 ; Conference date: 08-10-2023 Through 11-10-2023",

year = "2023",

doi = "10.1109/ICIP49359.2023.10222303",

language = "English",

series = "Proceedings - International Conference on Image Processing, ICIP",

publisher = "IEEE Computer Society",

pages = "690--694",

booktitle = "2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings",

address = "United States",

}

Yan, F, Yan, B & Pei, M 2023, Dual Transformer Encoder Model for Medical Image Classification. 在 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings. Proceedings - International Conference on Image Processing, ICIP, IEEE Computer Society, 页码 690-694, 30th IEEE International Conference on Image Processing, ICIP 2023, Kuala Lumpur, 马来西亚, 8/10/23. https://doi.org/10.1109/ICIP49359.2023.10222303

Dual Transformer Encoder Model for Medical Image Classification. / Yan, Fangyuan; Yan, Bin; Pei, Mingtao.
2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings. IEEE Computer Society, 2023. 页码 690-694 (Proceedings - International Conference on Image Processing, ICIP).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Dual Transformer Encoder Model for Medical Image Classification

AU - Yan, Fangyuan

AU - Yan, Bin

AU - Pei, Mingtao

PY - 2023

Y1 - 2023

N2 - Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.

AB - Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.

KW - dual-encoder model

KW - medical image classification

KW - vision transformer

UR - http://www.scopus.com/inward/record.url?scp=85180777351&partnerID=8YFLogxK

U2 - 10.1109/ICIP49359.2023.10222303

DO - 10.1109/ICIP49359.2023.10222303

M3 - Conference contribution

AN - SCOPUS:85180777351

T3 - Proceedings - International Conference on Image Processing, ICIP

SP - 690

EP - 694

BT - 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings

PB - IEEE Computer Society

T2 - 30th IEEE International Conference on Image Processing, ICIP 2023

Y2 - 8 October 2023 through 11 October 2023

ER -

Dual Transformer Encoder Model for Medical Image Classification

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此