TY - JOUR
T1 - Sparse and Hierarchical Transformer for Survival Analysis on Whole Slide Images
AU - Yan, Rui
AU - Lv, Zhilong
AU - Yang, Zhidong
AU - Lin, Senlin
AU - Zheng, Chunhou
AU - Zhang, Fa
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - The Transformer-based methods provide a good opportunity for modeling the global context of gigapixel whole slide image (WSI), however, there are still two main problems in applying Transformer to WSI-based survival analysis task. First, the training data for survival analysis is limited, which makes the model prone to overfitting. This problem is even worse for Transformer-based models which require large-scale data to train. Second, WSI is of extremely high resolution (up to 150,000 × 150,000 pixels) and is typically organized as a multi-resolution pyramid. Vanilla Transformer cannot model the hierarchical structure of WSI (such as patch cluster-level relationships), which makes it incapable of learning hierarchical WSI representation. To address these problems, in this article, we propose a novel Sparse and Hierarchical Transformer (SH-Transformer) for survival analysis. Specifically, we introduce sparse self-attention to alleviate the overfitting problem, and propose a hierarchical Transformer structure to learn the hierarchical WSI representation. Experimental results based on three WSI datasets show that the proposed framework outperforms the state-of-the-art methods.
AB - The Transformer-based methods provide a good opportunity for modeling the global context of gigapixel whole slide image (WSI), however, there are still two main problems in applying Transformer to WSI-based survival analysis task. First, the training data for survival analysis is limited, which makes the model prone to overfitting. This problem is even worse for Transformer-based models which require large-scale data to train. Second, WSI is of extremely high resolution (up to 150,000 × 150,000 pixels) and is typically organized as a multi-resolution pyramid. Vanilla Transformer cannot model the hierarchical structure of WSI (such as patch cluster-level relationships), which makes it incapable of learning hierarchical WSI representation. To address these problems, in this article, we propose a novel Sparse and Hierarchical Transformer (SH-Transformer) for survival analysis. Specifically, we introduce sparse self-attention to alleviate the overfitting problem, and propose a hierarchical Transformer structure to learn the hierarchical WSI representation. Experimental results based on three WSI datasets show that the proposed framework outperforms the state-of-the-art methods.
KW - Hierarchical representation
KW - pathological image analysis
KW - sparse transformer
KW - survival analysis
UR - http://www.scopus.com/inward/record.url?scp=85168719849&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2023.3307584
DO - 10.1109/JBHI.2023.3307584
M3 - Article
C2 - 37607153
AN - SCOPUS:85168719849
SN - 2168-2194
VL - 28
SP - 7
EP - 18
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 1
ER -