TY - JOUR
T1 - Machine learning predictions of thermochemical properties for aliphatic carbon and oxygen species
AU - Bruce, Frederick Nii Ofei
AU - Zhang, Di
AU - Bai, Xin
AU - Song, Siwei
AU - Wang, Fang
AU - Chu, Qingzhao
AU - Chen, Dongping
AU - Li, Yang
N1 - Publisher Copyright:
© 2024
PY - 2025/3/15
Y1 - 2025/3/15
N2 - In thermochemistry, predicting fundamental properties such as entropy and specific heat capacity remains underexplored, with most studies primarily focusing on the enthalpy of formation. This limits our understanding of the thermochemical landscape, particularly in combustion research, where precise thermochemical data is essential for optimizing fuel and propellant efficiency. Traditional methods, such as group additivity and quantum calculations, are often costly when dealing with large and complex molecular species, presenting a challenge in predicting their thermochemistry. Recent advancements in machine learning (ML) present a promising solution for efficiently predicting combustion-related thermochemical properties. Despite this potential, challenges persist in optimizing molecular representations and selecting appropriate models. This study aims to bridge this gap by introducing a carbon, hydrogen, and oxygen-containing species dataset. We systematically evaluate the performance of fourteen featurization methods and nine ML models, incorporating error estimations and hyperparameter tuning. Our results demonstrate that the Composite or Custom Descriptor Set (CDS) combined with the Random Forest (RF) model yields a chemical accuracy (95 % confidence interval) of 2.21 kcal/mol for enthalpy of formation at 298.15 K, 2.20 cal/(molK) for entropy at 298.15 K, and an average of 2.63 cal/(molK) for specific heat capacity across temperatures from 300 K to 1500 K. Such results highlight the effectiveness of using a single ML method to predict multiple thermochemical properties, underscoring the contribution of our study to the field of thermochemistry.
AB - In thermochemistry, predicting fundamental properties such as entropy and specific heat capacity remains underexplored, with most studies primarily focusing on the enthalpy of formation. This limits our understanding of the thermochemical landscape, particularly in combustion research, where precise thermochemical data is essential for optimizing fuel and propellant efficiency. Traditional methods, such as group additivity and quantum calculations, are often costly when dealing with large and complex molecular species, presenting a challenge in predicting their thermochemistry. Recent advancements in machine learning (ML) present a promising solution for efficiently predicting combustion-related thermochemical properties. Despite this potential, challenges persist in optimizing molecular representations and selecting appropriate models. This study aims to bridge this gap by introducing a carbon, hydrogen, and oxygen-containing species dataset. We systematically evaluate the performance of fourteen featurization methods and nine ML models, incorporating error estimations and hyperparameter tuning. Our results demonstrate that the Composite or Custom Descriptor Set (CDS) combined with the Random Forest (RF) model yields a chemical accuracy (95 % confidence interval) of 2.21 kcal/mol for enthalpy of formation at 298.15 K, 2.20 cal/(molK) for entropy at 298.15 K, and an average of 2.63 cal/(molK) for specific heat capacity across temperatures from 300 K to 1500 K. Such results highlight the effectiveness of using a single ML method to predict multiple thermochemical properties, underscoring the contribution of our study to the field of thermochemistry.
KW - Aliphatic compounds
KW - Combustion chemistry
KW - Machine learning
KW - Random Forest
KW - Thermochemical properties
KW - WUDILY-CHO dataset
UR - http://www.scopus.com/inward/record.url?scp=85211133050&partnerID=8YFLogxK
U2 - 10.1016/j.fuel.2024.133999
DO - 10.1016/j.fuel.2024.133999
M3 - Article
AN - SCOPUS:85211133050
SN - 0016-2361
VL - 384
JO - Fuel
JF - Fuel
M1 - 133999
ER -