TY - JOUR
T1 - Tree enhanced deep adaptive network for cancer prediction with high dimension low sample size microarray data
AU - Wu, Yao
AU - Zhu, Donghua
AU - Wang, Xuefeng
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/3
Y1 - 2023/3
N2 - Cancer prediction based on microarray data can facilitate the molecular exploration of cancers, thus building more accurate cancer prediction models is essential. This study focuses on a deep learning-based cancer prediction model. However, using a deep neural network to predict cancer is a difficult task due to the complexity of the underlying biological patterns and high dimension low sample size (HDLSS) of microarray data, which could bring about over-fitting and large training gradient variance. Therefore, a tree-enhanced deep adaptive network (TEDAN) is proposed to address these issues. Firstly, we employ the idea of the ensemble tree as a feature transformation method to alleviate the over-fitting problem, which generates a feature with a lower dimension and a more discriminative pattern. Secondly, a deep adaptive network (DAN) based on a self-attention mechanism is proposed to model the underlying biological interaction between different genes. Thirdly, a low sample size training (LSST) method is proposed to further reduce the large training gradient variance. Experiment results on six public cancer prediction datasets demonstrate that the TEDAN outperforms other strong baseline models.
AB - Cancer prediction based on microarray data can facilitate the molecular exploration of cancers, thus building more accurate cancer prediction models is essential. This study focuses on a deep learning-based cancer prediction model. However, using a deep neural network to predict cancer is a difficult task due to the complexity of the underlying biological patterns and high dimension low sample size (HDLSS) of microarray data, which could bring about over-fitting and large training gradient variance. Therefore, a tree-enhanced deep adaptive network (TEDAN) is proposed to address these issues. Firstly, we employ the idea of the ensemble tree as a feature transformation method to alleviate the over-fitting problem, which generates a feature with a lower dimension and a more discriminative pattern. Secondly, a deep adaptive network (DAN) based on a self-attention mechanism is proposed to model the underlying biological interaction between different genes. Thirdly, a low sample size training (LSST) method is proposed to further reduce the large training gradient variance. Experiment results on six public cancer prediction datasets demonstrate that the TEDAN outperforms other strong baseline models.
KW - Cancer prediction
KW - Deep learning
KW - Feature transformation
KW - High dimension low sample size (HDLSS)
KW - Tree-enhanced deep adaptive network (TEDAN)
UR - http://www.scopus.com/inward/record.url?scp=85147603302&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2023.110078
DO - 10.1016/j.asoc.2023.110078
M3 - Article
AN - SCOPUS:85147603302
SN - 1568-4946
VL - 136
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 110078
ER -