TY - JOUR
T1 - Predicting the enthalpy of formation of energetic molecules via conventional machine learning and GNN
AU - Zhang, Di
AU - Chu, Qingzhao
AU - Chen, Dongping
N1 - Publisher Copyright:
© 2024 The Royal Society of Chemistry.
PY - 2024/2/6
Y1 - 2024/2/6
N2 - Machine learning (ML) provides a promising method for efficiently and accurately predicting molecular properties. Using ML models to predict the enthalpy of formation of energetic molecules helps in fast screening of potential high-energy molecules, thereby accelerating the design of energetic materials. A persistent challenge is to determine the optimal featurization methods for molecular representation and use an appropriate ML model. Thus, in our study, we evaluate various featurization methods (CDS, ECFP, SOAP, GNF) and ML models (RF, MLP, GCN, MPNN), dividing them into two groups: conventional ML models and GNN models, to predict the enthalpy of formation of potential high-energy molecules. Our results demonstrate that CDS and SOAP have advantages over the ECFP, while the GNFs in GCN and MPNN models perform better. Furthermore, the MPNN model performs best among all models with a root mean square error (RMSE) as low as 8.42 kcal mol−1, surpassing even the best performing CDS-MLP model in conventional ML models. Overall, this study provides a benchmark for ML in predicting enthalpy of formation and emphasizes the tremendous potential of GNN in property prediction.
AB - Machine learning (ML) provides a promising method for efficiently and accurately predicting molecular properties. Using ML models to predict the enthalpy of formation of energetic molecules helps in fast screening of potential high-energy molecules, thereby accelerating the design of energetic materials. A persistent challenge is to determine the optimal featurization methods for molecular representation and use an appropriate ML model. Thus, in our study, we evaluate various featurization methods (CDS, ECFP, SOAP, GNF) and ML models (RF, MLP, GCN, MPNN), dividing them into two groups: conventional ML models and GNN models, to predict the enthalpy of formation of potential high-energy molecules. Our results demonstrate that CDS and SOAP have advantages over the ECFP, while the GNFs in GCN and MPNN models perform better. Furthermore, the MPNN model performs best among all models with a root mean square error (RMSE) as low as 8.42 kcal mol−1, surpassing even the best performing CDS-MLP model in conventional ML models. Overall, this study provides a benchmark for ML in predicting enthalpy of formation and emphasizes the tremendous potential of GNN in property prediction.
UR - http://www.scopus.com/inward/record.url?scp=85185166194&partnerID=8YFLogxK
U2 - 10.1039/d3cp05490j
DO - 10.1039/d3cp05490j
M3 - Article
C2 - 38345363
AN - SCOPUS:85185166194
SN - 1463-9076
VL - 26
SP - 7029
EP - 7041
JO - Physical Chemistry Chemical Physics
JF - Physical Chemistry Chemical Physics
IS - 8
ER -