Skip to content

Timetable

Timetable: subject to change

Date Activity Speaker
April 13 Day 1
12:00-13:20 Registration and Lunch
13:20 - 13:30 Welcome
13:30 - 15:00 Intro to Machine Learning Reinhard Maurer
15:00 - 15:30 Coffee
15:30 - 17:00 Practicals Reinhard Maurer
17:30 - 20:00 Posters
April 14 Day 2
8:30 - 9:30 Arrival, Coffee and Pastries
9:30 - 11:00 Descriptors/Unsupervised Nong Arthrith
11:00 - 11:30 Coffee
11:30 - 12:30 Practicals Nong Arthrith
12:30-13:30 Lunch
13:30 - 15:00 Bayesian Optimisation Austin Mroz
15:00 - 15:30 Coffee
15:30 - 17:00 Practicals Austin Mroz
17:30 - 18:30 Research Seminar Ruby Sedgwick - Xyme
April 15 Day 3
8:30 - 9:30 Arrival, Coffee and Pastries
9:30 - 11:00 Neural Networks Keith Butler
11:00 - 11:30 Coffee
11:30 - 12:30 Practicals Keith Butler
12:30-13:30 Lunch
13:30 - 15:00 Graph Neural Networks Alex Ganose
15:00 - 15:30 Coffee
15:30 - 17:00 Practicals Alex Ganose
17:30 - 20:00 BBQ
April 16 Day 4
8:30 - 9:30 Arrival, Coffee and Pastries
9:30 - 11:00 Introduction to Machine Learning Interatormic Potentials Ioan-Bogdan Magdau
11:00 - 11:30 Coffee
11:30 - 12:30 Practicals Ioan-Bogdan Magdau & Alin Elena
12:30-13:30 Lunch
13:30 - 15:00 MLIPS for materials and molecules Ioan-Bogdan Magdau
15:00 - 15:30 Coffee
15:30 - 17:00 Practicals Ioan Magdau & Alin Elena
17:30 - 18:30 Research Seminar Venkat Kapil - UCL
April 17 Day 5
8:30 - 9:30 Arrival, Coffee and Pastries
9:30 - 11:00 Generative Models (LLM) Keith Butler
11:00 - 11:30 Coffee
11:30 - 12:30 Pracicals Keith Butler
12:30 - Lunch and departure

Intro to ML and Descriptors:

Outline and Topics covered:

Lecture

  • ML and its applications in general and specifically in comp. phys research
  • Its role within research/science
  • Basic definitions and terminology
  • Mathematical basics on optimisation, loss functions
  • Automatic differentiation, training, stochastic optimisation
  • Accuracy of Models (testing, validation, quantifying accuracy)
  • Overview and short summary of types/classes of ML models
  • multivariate linear regression (in detail)
  • Gaussian Processes (in detail)
  • Kernel methods (in detail)
  • Reinforcement learning (dynamic programming, Bellman eq.)
  • classification, decision trees

Workshop

  • Data generation, curation, and analysis/visualisation
  • Linear regression, Bayesian Linear Regression, Gaussian Regression
  • Hyperparameter optimisation
  • Uncertainty quantification

Lecture:

  • A typical machine learning workflow
  • Model optimisation (cross validation techniques)
  • Uncertainty of Models (UQ)
  • Data representation and data cleanup. Is my data set any good?
  • Featurisation of molecules and materials
  • mathematical requirements on descriptors and nomenclature
  • global descriptors
  • fragment, fingerprints, cliques, functional group-based descriptors
  • basics of atom-centred descriptors

Workshop:

  • preparing and analysing a dataset of different molecules and a dataset of MD trajectories (differences in exploring composition and configuration space)
  • generating different global, local (atom centred, fingerprints, fragment-based) descriptors and analysing their suitability and expressiveness (basic PCA, discriminators and decision trees)
  • figuring out what descriptors work best for what datasets
  • training, optimising and benchmarking various ML models with the descriptors
  • workshops based on simple rdkit, scikit-learn, ASE, dscribe, JAX functionality

Expected Outcome:

1.Know the role of machine learning in the computational physical sciences 2.Understand the basic terminology of machine learning (“Slang busting”) 3.Have an overview of methodologies and how they connect 4.Understand how to approach a typical machine learning workflow 5.Know how to prepare and analyse datasets 6.Be able to validate and optimise models with cross-validation 7.Know basic approaches to featurisation and representation in chemistry 8. Know how to evaluate and assess prediction errors and uncertainties

Unsupervised ML:

Outline and Topics covered:

  • Exploratory data analysis
  • Curse of dimensionality in chemical problems
  • Dimensionality reduction – principle component analysis, manifold learning
  • Clustering – k-means, k-nearest neighbours, hierarchical clustering

Expected Outcomes:

  • Understand the role of unsupervised learning in typical machine learning workflows
  • Understand and apply linear and non-linear dimensionality reduction on data
  • Evaluate and apply clustering algorithms on data
  • Design and apply unsupervised strategies for chemical data

Graph Neural Networks:

  • Outline and Topics covered:
  • Directed and undirected graphs – concepts and examples
  • Machine learning with graphs
  • Convolutions on images and graphs
  • Principles of graph neural networks
  • Message passing and locality
  • Oversmoothing and oversquashing
  • Message passing graph neural networks for crystals and molecules
  • Equivariance
  • Universal GNNs

Expected Outcomes:

  • Explain the use of graph neural networks in chemistry and materials science
  • Identify the main design choices when building GNNs
  • Compare the capabilities and performance of state-of-the-art GNNs
  • Implement a basic GNN from scratch
  • Train and run a GNN on molecular data

MLIPs:

Outline and Topics covered:

  • Brief Intro to MLIPs: Context and Overview
  • Anatomy of a potential: locality, E0s, forces
  • Atomic Descriptors: Symmetry, Smoothness, Completeness (link to Reinhard and Kim materials)
  • Models Architectures: Linear, Kernell, Message-passing NNs (MACE) (link to Reinhard, Keith and Alex)
  • From RMSE to MD Stability and Accuracy
  • MLIPs in practice: example battery electrolytes
  • Iterative training, committees and active learning
  • Foundational models and fine tuning

Expected Outcomes:

  • Learn how to fit MLIP (MACE)
  • Fixed test sets and landscape exploration (MD stability/accuracy)
  • Learn how to improve the training set using iterative training
  • Committee error estimates and active learning (covered in Intro)
  • Using foundational models and fine tuning on new systems

BO Basics

  • What is BO? And why is it well-suited to chemistry?
  • What are the components of a BO algorithm? (Surrogate model and acquisition function discussion)
  • Crash course on Gaussian Processes as surrogate models
  • Crash course on acquisition functions with an emphasis on most common for chemistry applications

Workshop

  • BO implementation for a simple mathematical function

BO for chemistry

  • Considering what we covered in Lecture 1, what do we need to change or include to make this useful for chemistry?
  • Role of chemical descriptors
  • How do we formulate problems for BO in chemistry?
  • Overview of complex, state-of-the-art BO implementations for chemistry.
  • Overview of chemistry-specific BO tools

Workshop

  • Working in pairs, optimise a Suzuki-Miyaura cross-coupling reaction using the fewest resources.
  • There is a code-based implementation, and a GUI-based implementation of the materials for students who may have less Python experience.

Expected Outcomes

  • Understand the basic principles of BO
  • Know the components that comprise a BO algorithm (surrogate models, acquisition functions, etc.)
  • Have an overview of some more complex BO algorithms (multi-objective, multi-fidelity)
  • Understand the capabilities and limitations of BO in chemistry
  • Know how to start to formulate a chemical problem for BO

Generative ML

Outline and Topics covered:

  • The concept of generative models
  • Latent variables
  • KL divergence
  • Autoencoders and variational autoencoders
  • The transformer architecture and the attention mechanism
  • Large language models and autoregressive generation

Expected Outcomes:

  • Understand how generative models link underlying variables to observations
  • Understand and implement dimensionality reduction with an autoencoder
  • Understand and implement regularised latent spaces with a variational autoencoder
  • Understand and implement a simple version of self-attention
  • Apply a large language model for materials chemistry