TY - JOUR
T1 - Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods
AU - Chen, Chao
AU - Liu, Danyang
AU - Deng, Siyan
AU - Zhong, Lixiang
AU - Chan, Serene Hay Yee
AU - Li, Shuzhou
AU - Hng, Huey Hoon
N1 - Publisher Copyright:
© 2021 Science Press
PY - 2021/12
Y1 - 2021/12
N2 - A large database is desired for machine learning (ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure. When a large database is not available, the development of proper featurization method based on physicochemical nature of target proprieties can improve the predictive power of ML models with a smaller database. In this work, we show that two new featurization methods, volume occupation spatial matrix and heat contribution spatial matrix, can improve the accuracy in predicting energetic materials’ crystal density (ρcrystal) and solid phase enthalpy of formation (Hf,solid) using a database containing 451 energetic molecules. Their mean absolute errors are reduced from 0.048 g/cm3 and 24.67 kcal/mol to 0.035 g/cm3 and 9.66 kcal/mol, respectively. By leave-one-out-cross-validation, the newly developed ML models can be used to determine the performance of most kinds of energetic materials except cubanes. Our ML models are applied to predict ρcrystal and Hf,solid of CHON-based molecules of the 150 million sized PubChem database, and screened out 56 candidates with competitive detonation performance and reasonable chemical structures. With further improvement in future, spatial matrices have the potential of becoming multifunctional ML simulation tools that could provide even better predictions in wider fields of materials science.
AB - A large database is desired for machine learning (ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure. When a large database is not available, the development of proper featurization method based on physicochemical nature of target proprieties can improve the predictive power of ML models with a smaller database. In this work, we show that two new featurization methods, volume occupation spatial matrix and heat contribution spatial matrix, can improve the accuracy in predicting energetic materials’ crystal density (ρcrystal) and solid phase enthalpy of formation (Hf,solid) using a database containing 451 energetic molecules. Their mean absolute errors are reduced from 0.048 g/cm3 and 24.67 kcal/mol to 0.035 g/cm3 and 9.66 kcal/mol, respectively. By leave-one-out-cross-validation, the newly developed ML models can be used to determine the performance of most kinds of energetic materials except cubanes. Our ML models are applied to predict ρcrystal and Hf,solid of CHON-based molecules of the 150 million sized PubChem database, and screened out 56 candidates with competitive detonation performance and reasonable chemical structures. With further improvement in future, spatial matrices have the potential of becoming multifunctional ML simulation tools that could provide even better predictions in wider fields of materials science.
KW - Crystal density
KW - Energetic materials screening
KW - Formation enthalpy
KW - Small database machine learning
KW - Spatial matrix featurization method
KW - n-Body interactions
UR - http://www.scopus.com/inward/record.url?scp=85114690425&partnerID=8YFLogxK
U2 - 10.1016/j.jechem.2021.08.031
DO - 10.1016/j.jechem.2021.08.031
M3 - Article
AN - SCOPUS:85114690425
SN - 2095-4956
VL - 63
SP - 364
EP - 375
JO - Journal of Energy Chemistry
JF - Journal of Energy Chemistry
ER -