In our research we develop novel data-driven techniques to solve real materials design problems across scale in an actionable way. We use machine learning models as navigation systems for the chemical space. We do this in close collaboration with experimental partners and by working on several themes that are key for progress in the field.
Plently of chemical data is being produced and published, but most of it is not used. The experiments performed in most labs are not (optimally) informed by the scientific record - they are often not even informed by the experiments performed by prior group members.
Machine learning techniques, in particular large language models, can help leveraging this information and making it more accessible. In particular, they can also help us to capture subtle (tacit) aspects that conventional machine learning approaches (operating on representations of "idealized" structures) cannot capture.
Better inductive biases and representations
We know many things about the world. We take basic physical and chemical (empirical) laws for granted. However, most of our models do not know about them.
Since, however, this can help making models more robust and predictive we develop novel ways to incorporate relevant inductive biases into our models.
Much of this work happens on the level of the inputs to the models. That is, we attempt to craft representations of molecules and materials that carry more of the relevant information in a faithful way with them. This also includes the development of representations that allow us to bridge length scales.
Learning beyond conventional objectives
Most models are trained by finding weights that minimize the mismatch between the predictions and are ground truth. While this might make models good at predicting the values that correlate with the "ground truth" in a specific case, it does not guarantee that the models are right for the right reasons.
To counteract this, we will incorporate more than just feedback on a quantitative error into the training of our models. For doing so, we will leverage principles from human-computer interaction.