-
Coherent Collections of Rules Describing Exceptional Materials Identified with a Multi-Objective Optimization of Subgroups
Authors:
Lucas Foppa,
Matthias Scheffler
Abstract:
Using a modest amount of data from a large population, subgroup discovery (SGD) identifies outstanding subsets of data with respect to a certain property of interest of that population. The SGs are described by "rules". These are constraints on key descriptive parameters that characterize the material or the environment. These parameters and constraints are obtained by maximizing a quality functio…
▽ More
Using a modest amount of data from a large population, subgroup discovery (SGD) identifies outstanding subsets of data with respect to a certain property of interest of that population. The SGs are described by "rules". These are constraints on key descriptive parameters that characterize the material or the environment. These parameters and constraints are obtained by maximizing a quality function that establishes a tradeoff between SG size and utility, i.e., between generality and exceptionality. The utility function measures how outstanding a SG is. However, this approach does not give a unique solution, but typically many SGs have similar quality-function values. Here, we identify coherent collections of SGs of a "Pareto region" presenting various size-utility tradeoffs and define a SG similarity measure based on the Jaccard index, which allows us to hierarchically cluster these optimal SGs. These concepts are demonstrated by learning rules that describe perovskites with high bulk modulus. We show that SGs focusing on exceptional materials exhibit a high quality-function value but do not necessarily maximize it. We compare the mean shift with the cumulative Jensen-Shannon divergence ($D_{sJS}$) as utility functions and show that the SG rules obtained with $D_{cJS}$ are more focused than those obtained with the mean shift.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Roadmap on Data-Centric Materials Science
Authors:
Stefan Bauer,
Peter Benner,
Tristan Bereau,
Volker Blum,
Mario Boley,
Christian Carbogno,
C. Richard A. Catlow,
Gerhard Dehm,
Sebastian Eibl,
Ralph Ernstorfer,
Ádám Fekete,
Lucas Foppa,
Peter Fratzl,
Christoph Freysoldt,
Baptiste Gault,
Luca M. Ghiringhelli,
Sajal K. Giri,
Anton Gladyshev,
Pawan Goyal,
Jason Hattrick-Simpers,
Lara Kabalan,
Petr Karpov,
Mohammad S. Khorrami,
Christoph Koch,
Sebastian Kokott
, et al. (36 additional authors not shown)
Abstract:
Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) a…
▽ More
Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) and its subset Machine Learning (ML), has become pivotal in addressing all these challenges. This Roadmap on Data-Centric Materials Science explores fundamental concepts and methodologies, illustrating diverse applications in electronic-structure theory, soft matter theory, microstructure research, and experimental techniques like photoemission, atom probe tomography, and electron microscopy. While the roadmap delves into specific areas within the broad interdisciplinary field of materials science, the provided examples elucidate key concepts applicable to a wider range of topics. The discussed instances offer insights into addressing the multifaceted challenges encountered in contemporary materials research.
△ Less
Submitted 1 May, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
From Prediction to Action: Critical Role of Performance Estimation for Machine-Learning-Driven Materials Discovery
Authors:
Mario Boley,
Felix Luong,
Simon Teshuva,
Daniel F Schmidt,
Lucas Foppa,
Matthias Scheffler
Abstract:
Materials discovery driven by statistical property models is an iterative decision process, during which an initial data collection is extended with new data proposed by a model-informed acquisition function--with the goal to maximize a certain "reward" over time, such as the maximum property value discovered so far. While the materials science community achieved much progress in developing proper…
▽ More
Materials discovery driven by statistical property models is an iterative decision process, during which an initial data collection is extended with new data proposed by a model-informed acquisition function--with the goal to maximize a certain "reward" over time, such as the maximum property value discovered so far. While the materials science community achieved much progress in developing property models that predict well on average with respect to the training distribution, this form of in-distribution performance measurement is not directly coupled with the discovery reward. This is because an iterative discovery process has a shifting reward distribution that is over-proportionally determined by the model performance for exceptional materials. We demonstrate this problem using the example of bulk modulus maximization among double perovskite oxides. We find that the in-distribution predictive performance suggests random forests as superior to Gaussian process regression, while the results are inverse in terms of the discovery rewards. We argue that the lack of proper performance estimation methods from pre-computed data collections is a fundamental problem for improving data-driven materials discovery, and we propose a novel such estimator that, in contrast to naïve reward estimation, successfully predicts Gaussian processes with the "expected improvement" acquisition function as the best out of four options in our demonstrational study for double perovskites. Importantly, it does so without requiring the over thousand ab initio computations that were needed to confirm this prediction.
△ Less
Submitted 6 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Towards a Multi-Objective Optimization of Subgroups for the Discovery of Materials with Exceptional Performance
Authors:
Lucas Foppa,
Matthias Scheffler
Abstract:
Artificial intelligence (AI) can accelerate the design of materials by identifying correlations and complex patterns in data. However, AI methods commonly attempt to describe the entire, immense materials space with a single model, while it is typical that different mechanisms govern the materials behaviors across the materials space. The subgroup-discovery (SGD) approach identifies local rules de…
▽ More
Artificial intelligence (AI) can accelerate the design of materials by identifying correlations and complex patterns in data. However, AI methods commonly attempt to describe the entire, immense materials space with a single model, while it is typical that different mechanisms govern the materials behaviors across the materials space. The subgroup-discovery (SGD) approach identifies local rules describing exceptional subsets of data with respect to a given target. Thus, SGD can focus on mechanisms leading to exceptional performance. However, the identification of appropriate SG rules requires a careful consideration of the generality-exceptionality tradeoff. Here, we discuss challenges to advance the SGD approach in materials science and analyse the tradeoff between exceptionality and generality based on a Pareto front of SGD solutions.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Hierarchical symbolic regression for identifying key physical parameters correlated with bulk properties of perovskites
Authors:
Lucas Foppa,
Thomas A. R. Purcell,
Sergey V. Levchenko,
Matthias Scheffler,
Luca M. Ghiringhelli
Abstract:
Symbolic regression identifies key physical parameters describing materials properties by uncovering correlations as nonlinear analytical expressions. However, the pool of expressions grows rapidly with complexity, compromising its efficiency. We tackle this challenge by a hierarchical approach: identified expressions are used as input parameters for obtaining more complex expressions. Crucially,…
▽ More
Symbolic regression identifies key physical parameters describing materials properties by uncovering correlations as nonlinear analytical expressions. However, the pool of expressions grows rapidly with complexity, compromising its efficiency. We tackle this challenge by a hierarchical approach: identified expressions are used as input parameters for obtaining more complex expressions. Crucially, this framework can transfer knowledge among properties, highlighting physical relationships. We demonstrate this strategy by using the Sure-Independence-Screening-and-Sparsifying-Operator (SISSO) approach to identify expressions correlated with the lattice constant and cohesive energy, which are then used to model the bulk modulus of ABO3 perovskites.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Identifying outstanding transition-metal-alloy heterogeneous catalysts for the oxygen reduction and evolution reactions via subgroup discovery
Authors:
Lucas Foppa,
Luca M. Ghiringhelli
Abstract:
In order to estimate the reactivity of a large number of potentially complex heterogeneous catalysts while searching for novel and more efficient materials, physical as well as data-centric models have been developed for a faster evaluation of adsorption energies compared to first-principles calculations. However, global models designed to describe as many materials as possible might overlook the…
▽ More
In order to estimate the reactivity of a large number of potentially complex heterogeneous catalysts while searching for novel and more efficient materials, physical as well as data-centric models have been developed for a faster evaluation of adsorption energies compared to first-principles calculations. However, global models designed to describe as many materials as possible might overlook the very few compounds that have the appropriate adsorption properties to be suitable for a given catalytic process. Here, the subgroup-discovery (SGD) local artificial-intelligence approach is used to identify the key descriptive parameters and constrains on their values, the so-called SG rules, which particularly describe transition-metal surfaces with outstanding adsorption properties for the oxygen reduction and evolution reactions. We start from a data set of 95 oxygen adsorption energy values evaluated by density-functional-theory calculations for several monometallic surfaces along with 16 atomic, bulk and surface properties as candidate descriptive parameters. From this data set, SGD identifies constraints on the most relevant parameters describing materials and adsorption sites that (i) result in O adsorption energies within the Sabatier-optimal range required for the oxygen reduction reaction and (ii) present the largest deviations from the linear scaling relations between O and OH adsorption energies, which limit the performance in the oxygen evolution reaction. The SG rules not only reflect the local underlying physicochemical phenomena that result in the desired adsorption properties but also guide the challenging design of alloy catalysts.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Materials genes of heterogeneous catalysis from clean experiments and artificial intelligence
Authors:
Lucas Foppa,
Luca M. Ghiringhelli,
Frank Girgsdies,
Maike Hashagen,
Pierre Kube,
Michael Hävecker,
Spencer J. Carey,
Andrey Tarasov,
Peter Kraus,
Frank Rosowski,
Robert Schlögl,
Annette Trunschke,
Matthias Scheffler
Abstract:
Heterogeneous catalysis is an example of a complex materials function, governed by an intricate interplay of several processes, e.g., the different surface chemical reactions, and the dynamic re-structuring of the catalyst material at reaction conditions. Modelling the full catalytic progression via first-principles statistical mechanics is impractical, if not impossible. Instead, we show here how…
▽ More
Heterogeneous catalysis is an example of a complex materials function, governed by an intricate interplay of several processes, e.g., the different surface chemical reactions, and the dynamic re-structuring of the catalyst material at reaction conditions. Modelling the full catalytic progression via first-principles statistical mechanics is impractical, if not impossible. Instead, we show here how a tailored artificial-intelligence approach can be applied, even to a small number of materials, to model catalysis and determine the key descriptive parameters ("materials genes") reflecting the processes that trigger, facilitate, or hinder catalyst performance. We start from a consistent experimental set of "clean data", containing nine vanadium-based oxidation catalysts. These materials were synthesized, fully characterized, and tested according to standardized protocols. By applying the symbolic-regression SISSO approach, we identify correlations between the few most relevant materials properties and their reactivity. This approach highlights the underlying physicochemical processes, and accelerates catalyst design.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.