Timetable¶

Timetable: subject to change

Date	Activity	Speaker
April 13 Day 1
12:00-13:20	Registration and Lunch
13:20 - 13:30	Welcome
13:30 - 15:00	Intro to Machine Learning	Reinhard Maurer
15:00 - 15:30	Coffee
15:30 - 17:00	Practicals	Reinhard Maurer
17:30 - 20:00	Posters
April 14 Day 2
8:30 - 9:30	Arrival, Coffee and Pastries
9:30 - 11:00	Descriptors/Unsupervised	Nong Arthrith
11:00 - 11:30	Coffee
11:30 - 12:30	Practicals	Nong Arthrith
12:30-13:30	Lunch
13:30 - 15:00	Bayesian Optimisation	Austin Mroz
15:00 - 15:30	Coffee
15:30 - 17:00	Practicals	Austin Mroz
17:30 - 18:30	Research Seminar	Ruby Sedgwick - Xyme
April 15 Day 3
8:30 - 9:30	Arrival, Coffee and Pastries
9:30 - 11:00	Neural Networks	Keith Butler
11:00 - 11:30	Coffee
11:30 - 12:30	Practicals	Keith Butler
12:30-13:30	Lunch
13:30 - 15:00	Graph Neural Networks	Alex Ganose
15:00 - 15:30	Coffee
15:30 - 17:00	Practicals	Alex Ganose
17:30 - 20:00	BBQ
April 16 Day 4
8:30 - 9:30	Arrival, Coffee and Pastries
9:30 - 11:00	Introduction to Machine Learning Interatormic Potentials	Ioan-Bogdan Magdau
11:00 - 11:30	Coffee
11:30 - 12:30	Practicals	Ioan-Bogdan Magdau & Alin Elena
12:30-13:30	Lunch
13:30 - 15:00	MLIPS for materials and molecules	Ioan-Bogdan Magdau
15:00 - 15:30	Coffee
15:30 - 17:00	Practicals	Ioan Magdau & Alin Elena
17:30 - 18:30	Research Seminar	Venkat Kapil - UCL
April 17 Day 5
8:30 - 9:30	Arrival, Coffee and Pastries
9:30 - 11:00	Generative Models (LLM)	Keith Butler
11:00 - 11:30	Coffee
11:30 - 12:30	Pracicals	Keith Butler
12:30 -	Lunch and departure

Intro to ML and Descriptors:¶

Outline and Topics covered:¶

Lecture¶

ML and its applications in general and specifically in comp. phys research
Its role within research/science
Basic definitions and terminology
Mathematical basics on optimisation, loss functions
Automatic differentiation, training, stochastic optimisation
Accuracy of Models (testing, validation, quantifying accuracy)
Overview and short summary of types/classes of ML models
multivariate linear regression (in detail)
Gaussian Processes (in detail)
Kernel methods (in detail)
Reinforcement learning (dynamic programming, Bellman eq.)
classification, decision trees

Workshop¶

Data generation, curation, and analysis/visualisation
Linear regression, Bayesian Linear Regression, Gaussian Regression
Hyperparameter optimisation
Uncertainty quantification

Lecture:¶

A typical machine learning workflow
Model optimisation (cross validation techniques)
Uncertainty of Models (UQ)
Data representation and data cleanup. Is my data set any good?
Featurisation of molecules and materials
mathematical requirements on descriptors and nomenclature
global descriptors
fragment, fingerprints, cliques, functional group-based descriptors
basics of atom-centred descriptors

Workshop:¶

preparing and analysing a dataset of different molecules and a dataset of MD trajectories (differences in exploring composition and configuration space)
generating different global, local (atom centred, fingerprints, fragment-based) descriptors and analysing their suitability and expressiveness (basic PCA, discriminators and decision trees)
figuring out what descriptors work best for what datasets
training, optimising and benchmarking various ML models with the descriptors
workshops based on simple rdkit, scikit-learn, ASE, dscribe, JAX functionality

Expected Outcome:¶

1.Know the role of machine learning in the computational physical sciences 2.Understand the basic terminology of machine learning (“Slang busting”) 3.Have an overview of methodologies and how they connect 4.Understand how to approach a typical machine learning workflow 5.Know how to prepare and analyse datasets 6.Be able to validate and optimise models with cross-validation 7.Know basic approaches to featurisation and representation in chemistry 8. Know how to evaluate and assess prediction errors and uncertainties

Unsupervised ML:¶

Outline and Topics covered:¶

Exploratory data analysis
Curse of dimensionality in chemical problems
Dimensionality reduction – principle component analysis, manifold learning
Clustering – k-means, k-nearest neighbours, hierarchical clustering

Expected Outcomes:¶

Understand the role of unsupervised learning in typical machine learning workflows
Understand and apply linear and non-linear dimensionality reduction on data
Evaluate and apply clustering algorithms on data
Design and apply unsupervised strategies for chemical data

Graph Neural Networks:¶

Outline and Topics covered:
Directed and undirected graphs – concepts and examples
Machine learning with graphs
Convolutions on images and graphs
Principles of graph neural networks
Message passing and locality
Oversmoothing and oversquashing
Message passing graph neural networks for crystals and molecules
Equivariance
Universal GNNs

Expected Outcomes:¶

Explain the use of graph neural networks in chemistry and materials science
Identify the main design choices when building GNNs
Compare the capabilities and performance of state-of-the-art GNNs
Implement a basic GNN from scratch
Train and run a GNN on molecular data

MLIPs:¶

Outline and Topics covered:¶

Brief Intro to MLIPs: Context and Overview
Anatomy of a potential: locality, E0s, forces
Atomic Descriptors: Symmetry, Smoothness, Completeness (link to Reinhard and Kim materials)
Models Architectures: Linear, Kernell, Message-passing NNs (MACE) (link to Reinhard, Keith and Alex)
From RMSE to MD Stability and Accuracy
MLIPs in practice: example battery electrolytes
Iterative training, committees and active learning
Foundational models and fine tuning

Expected Outcomes:¶

Learn how to fit MLIP (MACE)
Fixed test sets and landscape exploration (MD stability/accuracy)
Learn how to improve the training set using iterative training
Committee error estimates and active learning (covered in Intro)
Using foundational models and fine tuning on new systems

BO Basics¶

What is BO? And why is it well-suited to chemistry?
What are the components of a BO algorithm? (Surrogate model and acquisition function discussion)
Crash course on Gaussian Processes as surrogate models
Crash course on acquisition functions with an emphasis on most common for chemistry applications

Workshop¶

BO implementation for a simple mathematical function

BO for chemistry¶

Considering what we covered in Lecture 1, what do we need to change or include to make this useful for chemistry?
Role of chemical descriptors
How do we formulate problems for BO in chemistry?
Overview of complex, state-of-the-art BO implementations for chemistry.
Overview of chemistry-specific BO tools

Workshop¶

Working in pairs, optimise a Suzuki-Miyaura cross-coupling reaction using the fewest resources.
There is a code-based implementation, and a GUI-based implementation of the materials for students who may have less Python experience.

Expected Outcomes¶

Understand the basic principles of BO
Know the components that comprise a BO algorithm (surrogate models, acquisition functions, etc.)
Have an overview of some more complex BO algorithms (multi-objective, multi-fidelity)
Understand the capabilities and limitations of BO in chemistry
Know how to start to formulate a chemical problem for BO

Generative ML¶

Outline and Topics covered:¶

The concept of generative models
Latent variables
KL divergence
Autoencoders and variational autoencoders
The transformer architecture and the attention mechanism
Large language models and autoregressive generation

Expected Outcomes:¶

Understand how generative models link underlying variables to observations
Understand and implement dimensionality reduction with an autoencoder
Understand and implement regularised latent spaces with a variational autoencoder
Understand and implement a simple version of self-attention
Apply a large language model for materials chemistry