TY - JOUR
T1 - Methodological framework for materials discovery using machine learning
AU - Lee, Eng Hock
AU - Jiang, Wei
AU - Alsalman, Hussain
AU - Low, Tony
AU - Cherkassky, Vladimir
N1 - Publisher Copyright:
© 2022 American Physical Society.
PY - 2022/4
Y1 - 2022/4
N2 - Traditionally, materials discovery has been guided by basic physical rules, and such rules embody the basic understanding of the physical characteristics of interest of the material. However, the discovery of physical rules remains a challenging task due to the inherent difficulty in recognizing patterns in the high-dimensional and highly nonuniform distributed materials space. The standard data analytics approach using machine learning (ML) may fall short in producing meaningful results due to fundamental differences between the underlying assumptions and goals of ML vs materials discovery. ML is mainly focused on estimating complex black-box predictive models (that are nonlinear and multivariate), whereas in materials discovery, the goal is to come up with interpretable data-driven physical rules. Here, we attempt to tackle this problem by proposing a robust data analytics framework that allows us to derive basic physical rules from data. We introduce the concept of global and local modeling, utilizing both supervised and unsupervised learning, for highly nonuniformly distributed materials data. To enhance the model interpretation, we also introduce a model-independent interpretation technique to assist human experts in extracting useful physical rules. The proposed framework for extracting data-derived physical rules at the global and local level is illustrated using two case studies: (1) classification of van der Waals (vdW) and non-vdW (nvdW) materials and (2) classification of wide bandgap and non-wide bandgap vdW materials.
AB - Traditionally, materials discovery has been guided by basic physical rules, and such rules embody the basic understanding of the physical characteristics of interest of the material. However, the discovery of physical rules remains a challenging task due to the inherent difficulty in recognizing patterns in the high-dimensional and highly nonuniform distributed materials space. The standard data analytics approach using machine learning (ML) may fall short in producing meaningful results due to fundamental differences between the underlying assumptions and goals of ML vs materials discovery. ML is mainly focused on estimating complex black-box predictive models (that are nonlinear and multivariate), whereas in materials discovery, the goal is to come up with interpretable data-driven physical rules. Here, we attempt to tackle this problem by proposing a robust data analytics framework that allows us to derive basic physical rules from data. We introduce the concept of global and local modeling, utilizing both supervised and unsupervised learning, for highly nonuniformly distributed materials data. To enhance the model interpretation, we also introduce a model-independent interpretation technique to assist human experts in extracting useful physical rules. The proposed framework for extracting data-derived physical rules at the global and local level is illustrated using two case studies: (1) classification of van der Waals (vdW) and non-vdW (nvdW) materials and (2) classification of wide bandgap and non-wide bandgap vdW materials.
UR - http://www.scopus.com/inward/record.url?scp=85128738297&partnerID=8YFLogxK
U2 - 10.1103/PhysRevMaterials.6.043802
DO - 10.1103/PhysRevMaterials.6.043802
M3 - Article
AN - SCOPUS:85128738297
SN - 2475-9953
VL - 6
JO - Physical Review Materials
JF - Physical Review Materials
IS - 4
M1 - 043802
ER -