Timetable¶
Timetable: subject to change
Date | Activity | Speaker |
---|---|---|
April 13 Day 1 | ||
12:00-13:20 | Registration and Lunch | |
13:20 - 13:30 | Welcome | |
13:30 - 15:00 | Intro to Machine Learning | Reinhard Maurer |
15:00 - 15:30 | Coffee | |
15:30 - 17:00 | Practicals | Reinhard Maurer |
17:30 - 20:00 | Posters | |
April 14 Day 2 | ||
8:30 - 9:30 | Arrival, Coffee and Pastries | |
9:30 - 11:00 | Descriptors/Unsupervised | Nong Arthrith |
11:00 - 11:30 | Coffee | |
11:30 - 12:30 | Practicals | Nong Arthrith |
12:30-13:30 | Lunch | |
13:30 - 15:00 | Bayesian Optimisation | Austin Mroz |
15:00 - 15:30 | Coffee | |
15:30 - 17:00 | Practicals | Austin Mroz |
17:30 - 18:30 | Research Seminar | Ruby Sedgwick - Xyme |
April 15 Day 3 | ||
8:30 - 9:30 | Arrival, Coffee and Pastries | |
9:30 - 11:00 | Neural Networks | Keith Butler |
11:00 - 11:30 | Coffee | |
11:30 - 12:30 | Practicals | Keith Butler |
12:30-13:30 | Lunch | |
13:30 - 15:00 | Graph Neural Networks | Alex Ganose |
15:00 - 15:30 | Coffee | |
15:30 - 17:00 | Practicals | Alex Ganose |
17:30 - 20:00 | BBQ | |
April 16 Day 4 | ||
8:30 - 9:30 | Arrival, Coffee and Pastries | |
9:30 - 11:00 | Introduction to Machine Learning Interatormic Potentials | Ioan-Bogdan Magdau |
11:00 - 11:30 | Coffee | |
11:30 - 12:30 | Practicals | Ioan-Bogdan Magdau & Alin Elena |
12:30-13:30 | Lunch | |
13:30 - 15:00 | MLIPS for materials and molecules | Ioan-Bogdan Magdau |
15:00 - 15:30 | Coffee | |
15:30 - 17:00 | Practicals | Ioan Magdau & Alin Elena |
17:30 - 18:30 | Research Seminar | Venkat Kapil - UCL |
April 17 Day 5 | ||
8:30 - 9:30 | Arrival, Coffee and Pastries | |
9:30 - 11:00 | Generative Models (LLM) | Keith Butler |
11:00 - 11:30 | Coffee | |
11:30 - 12:30 | Pracicals | Keith Butler |
12:30 - | Lunch and departure |
Intro to ML and Descriptors:¶
Outline and Topics covered:¶
Lecture¶
- ML and its applications in general and specifically in comp. phys research
- Its role within research/science
- Basic definitions and terminology
- Mathematical basics on optimisation, loss functions
- Automatic differentiation, training, stochastic optimisation
- Accuracy of Models (testing, validation, quantifying accuracy)
- Overview and short summary of types/classes of ML models
- multivariate linear regression (in detail)
- Gaussian Processes (in detail)
- Kernel methods (in detail)
- Reinforcement learning (dynamic programming, Bellman eq.)
- classification, decision trees
Workshop¶
- Data generation, curation, and analysis/visualisation
- Linear regression, Bayesian Linear Regression, Gaussian Regression
- Hyperparameter optimisation
- Uncertainty quantification
Lecture:¶
- A typical machine learning workflow
- Model optimisation (cross validation techniques)
- Uncertainty of Models (UQ)
- Data representation and data cleanup. Is my data set any good?
- Featurisation of molecules and materials
- mathematical requirements on descriptors and nomenclature
- global descriptors
- fragment, fingerprints, cliques, functional group-based descriptors
- basics of atom-centred descriptors
Workshop:¶
- preparing and analysing a dataset of different molecules and a dataset of MD trajectories (differences in exploring composition and configuration space)
- generating different global, local (atom centred, fingerprints, fragment-based) descriptors and analysing their suitability and expressiveness (basic PCA, discriminators and decision trees)
- figuring out what descriptors work best for what datasets
- training, optimising and benchmarking various ML models with the descriptors
- workshops based on simple rdkit, scikit-learn, ASE, dscribe, JAX functionality
Expected Outcome:¶
1.Know the role of machine learning in the computational physical sciences 2.Understand the basic terminology of machine learning (“Slang busting”) 3.Have an overview of methodologies and how they connect 4.Understand how to approach a typical machine learning workflow 5.Know how to prepare and analyse datasets 6.Be able to validate and optimise models with cross-validation 7.Know basic approaches to featurisation and representation in chemistry 8. Know how to evaluate and assess prediction errors and uncertainties
Unsupervised ML:¶
Outline and Topics covered:¶
- Exploratory data analysis
- Curse of dimensionality in chemical problems
- Dimensionality reduction – principle component analysis, manifold learning
- Clustering – k-means, k-nearest neighbours, hierarchical clustering
Expected Outcomes:¶
- Understand the role of unsupervised learning in typical machine learning workflows
- Understand and apply linear and non-linear dimensionality reduction on data
- Evaluate and apply clustering algorithms on data
- Design and apply unsupervised strategies for chemical data
Graph Neural Networks:¶
- Outline and Topics covered:
- Directed and undirected graphs – concepts and examples
- Machine learning with graphs
- Convolutions on images and graphs
- Principles of graph neural networks
- Message passing and locality
- Oversmoothing and oversquashing
- Message passing graph neural networks for crystals and molecules
- Equivariance
- Universal GNNs
Expected Outcomes:¶
- Explain the use of graph neural networks in chemistry and materials science
- Identify the main design choices when building GNNs
- Compare the capabilities and performance of state-of-the-art GNNs
- Implement a basic GNN from scratch
- Train and run a GNN on molecular data
MLIPs:¶
Outline and Topics covered:¶
- Brief Intro to MLIPs: Context and Overview
- Anatomy of a potential: locality, E0s, forces
- Atomic Descriptors: Symmetry, Smoothness, Completeness (link to Reinhard and Kim materials)
- Models Architectures: Linear, Kernell, Message-passing NNs (MACE) (link to Reinhard, Keith and Alex)
- From RMSE to MD Stability and Accuracy
- MLIPs in practice: example battery electrolytes
- Iterative training, committees and active learning
- Foundational models and fine tuning
Expected Outcomes:¶
- Learn how to fit MLIP (MACE)
- Fixed test sets and landscape exploration (MD stability/accuracy)
- Learn how to improve the training set using iterative training
- Committee error estimates and active learning (covered in Intro)
- Using foundational models and fine tuning on new systems
BO Basics¶
- What is BO? And why is it well-suited to chemistry?
- What are the components of a BO algorithm? (Surrogate model and acquisition function discussion)
- Crash course on Gaussian Processes as surrogate models
- Crash course on acquisition functions with an emphasis on most common for chemistry applications
Workshop¶
- BO implementation for a simple mathematical function
BO for chemistry¶
- Considering what we covered in Lecture 1, what do we need to change or include to make this useful for chemistry?
- Role of chemical descriptors
- How do we formulate problems for BO in chemistry?
- Overview of complex, state-of-the-art BO implementations for chemistry.
- Overview of chemistry-specific BO tools
Workshop¶
- Working in pairs, optimise a Suzuki-Miyaura cross-coupling reaction using the fewest resources.
- There is a code-based implementation, and a GUI-based implementation of the materials for students who may have less Python experience.
Expected Outcomes¶
- Understand the basic principles of BO
- Know the components that comprise a BO algorithm (surrogate models, acquisition functions, etc.)
- Have an overview of some more complex BO algorithms (multi-objective, multi-fidelity)
- Understand the capabilities and limitations of BO in chemistry
- Know how to start to formulate a chemical problem for BO
Generative ML¶
Outline and Topics covered:¶
- The concept of generative models
- Latent variables
- KL divergence
- Autoencoders and variational autoencoders
- The transformer architecture and the attention mechanism
- Large language models and autoregressive generation
Expected Outcomes:¶
- Understand how generative models link underlying variables to observations
- Understand and implement dimensionality reduction with an autoencoder
- Understand and implement regularised latent spaces with a variational autoencoder
- Understand and implement a simple version of self-attention
- Apply a large language model for materials chemistry