TY - GEN
T1 - Prediction of Anti-Breast Cancer Drugs Activity Based on Bayesian Optimization Random Forest
AU - Zhao, Yiran
AU - Xu, Houbao
N1 - Publisher Copyright:
© 2023 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2023
Y1 - 2023
N2 - Anti-breast cancer drugs can inhibit the over-expression of estrogen receptor alpha (ERa), which is closely linked to the development of breast cancer. As such, predicting the activity of these drugs is a crucial step in anti-breast cancer drug research. To improve prediction efficiency and accuracy, this paper combines the random forest regression model with Bayesian optimization which outperforms other methods in automatic tuning of model hyperparameters to predict the activity of anti-breast cancer drugs. The preprocessing of activity and molecular descriptors data of 1974 compounds is conducted using correlation analysis and outliers elimination, and then the data are divided into training and test sets. The mean absolute error (MAE) of the model over the test sets is found to be 0.576. Additionally, the variable importance values of molecular descriptors are identified. The results of this paper show that the Bayesian optimization random forest model proposed has better prediction performance than the other three models, with mean absolute errors of 0.607, 0.605 and 0.581, respectively.
AB - Anti-breast cancer drugs can inhibit the over-expression of estrogen receptor alpha (ERa), which is closely linked to the development of breast cancer. As such, predicting the activity of these drugs is a crucial step in anti-breast cancer drug research. To improve prediction efficiency and accuracy, this paper combines the random forest regression model with Bayesian optimization which outperforms other methods in automatic tuning of model hyperparameters to predict the activity of anti-breast cancer drugs. The preprocessing of activity and molecular descriptors data of 1974 compounds is conducted using correlation analysis and outliers elimination, and then the data are divided into training and test sets. The mean absolute error (MAE) of the model over the test sets is found to be 0.576. Additionally, the variable importance values of molecular descriptors are identified. The results of this paper show that the Bayesian optimization random forest model proposed has better prediction performance than the other three models, with mean absolute errors of 0.607, 0.605 and 0.581, respectively.
KW - Activity Prediction
KW - Anti-breast Cancer Drugs
KW - Bayesian Optimization
KW - Random Forest Regression
UR - http://www.scopus.com/inward/record.url?scp=85175569087&partnerID=8YFLogxK
U2 - 10.23919/CCC58697.2023.10241131
DO - 10.23919/CCC58697.2023.10241131
M3 - Conference contribution
AN - SCOPUS:85175569087
T3 - Chinese Control Conference, CCC
SP - 3471
EP - 3475
BT - 2023 42nd Chinese Control Conference, CCC 2023
PB - IEEE Computer Society
T2 - 42nd Chinese Control Conference, CCC 2023
Y2 - 24 July 2023 through 26 July 2023
ER -