A 30-minute primer · no ML background required

AI for
Molecules & Materials

From curve-fitting to potentials that predict new matter.
Lecture 01 Master's · AI+Science 8 live demos
Motivation

Why ML for molecules?

01 · Chemical space is huge
1060
Drug-like molecules. The universe has ~1080 atoms.
02 · Quantum mechanics is slow
O(N⁴)
DFT scaling. Seconds per small molecule, weeks for a crystal.
03 · Experiments are expensive
$2.6B
Mean cost to bring one drug to market.
ML doesn't replace physics or experiments — it makes search through this space affordable.
The Data

Molecules are structured data.

String
CC(=O)OC1=CC=
CC=C1C(=O)O
SMILES — aspirin
Fingerprint (bit-vector)
Hash substructures to 1024 bits
Graph
CCC CCC O
Atoms = nodes, bonds = edges → GNNs
x ∈ ℝd    →    f(x; θ) ≈ y — pick a representation, then fit
Basics · interactive

Supervised learning, in one picture.

Drag the points. The line refits live.
slope 0.00 intercept 0.00 loss 0.00
ŷ = m·x + b
ℒ(m,b) = 1/n Σ (ŷᵢyᵢ
Learning = find parameters m, b that minimise the loss .
Every ML model is this sentence, with fancier parameters.
Basics · interactive

Gradient descent walks downhill.

Click anywhere to start. Slide LR to see over/undershoot.
θt+1 = θtηθ
η too small → crawls.
η too large → diverges.
Non-convex loss → local minima everywhere.
In a real network, θ has millions of dimensions. The picture is a lie — but a useful one.
Basics · interactive

A neuron is a weighted sum, then a squish.

Toggle activations to compare.
ReLU sigmoid tanh GELU
h = σ( W·x + b )
Without the squish σ, stacking layers only produces linear functions. The non-linearity is where expressivity comes from.
ReLU is the modern default — cheap, non-saturating gradient on the right side.
Basics · interactive

A neural network, forward.

Drag the weights (lines). Input = [x₁, x₂]. Watch the output change.
output ŷ = 0.00
Method · interactive

Training = backpropagation.

Step through one training iteration.
phase: forward
chain rule
∂ℒ/∂wi = ∂ℒ/∂y · ∂y/∂h · ∂h/∂wi
Forward: push input through the network, compute loss.
Backward: multiply local gradients, layer by layer, right to left.
Update: θ ← θ − η ∇θ ℒ.
Autograd does this for you. You almost never write ∂ by hand again.
Method · interactive

Model too small underfits; too large overfits.

Slide polynomial degree. Dashed = true function.
train loss test loss
test = bias² + variance + noise
Regularization: weight decay, dropout, data augmentation.
Golden rule: train / val / test split.
If val ≫ train: you're memorising.
Method · interactive

Uncertainty, not just prediction.

Click to add observations. Band = ±2σ posterior.
f(x) ~ 𝒢𝒫(μ(x), k(x,x'))
k(x,x') = exp(−‖x−x'‖² / 2ℓ²)   — RBF
Far from data → uncertainty grows. Active learning: pick the next experiment where σ is highest.
Classical, pre-deep-learning, still the workhorse for small chem/mat datasets.
Method · interactive

Physics-informed neural networks.

Mix data loss vs. physics loss for d²u/dx² = −u.
truth sin(x) network training points
ℒ = λd·Σ(uNN−u*)² + λp·Σ(𝒟[uNN])²
The network must also satisfy the differential operator 𝒟 — gradients come from autograd. Data becomes a regulariser; physics becomes a loss.
Good when PDE is known & data is scarce. Not a silver bullet when the operator is stiff.
Science example

Case study — predicting molecular properties.

Graph Neural Network · message passing
C N h₁ h₂ h₃ Σ ŷ atoms messages pool property
QM9 benchmark: 134k small molecules, 12 properties (HOMO, LUMO, dipole, ...). Modern GNNs reach chemical accuracy (~1 kcal/mol) on energies.
# PyTorch Geometric — 10 lines
import torch
from torch_geometric.nn import GCNConv, global_mean_pool

class MolGNN(torch.nn.Module):
  def __init__(self, d=64):
    super().__init__()
    self.c1 = GCNConv(11, d)
    self.c2 = GCNConv(d, d)
    self.out = torch.nn.Linear(d, 1)
  def forward(self, x, edge_index, batch):
    h = self.c1(x, edge_index).relu()
    h = self.c2(h, edge_index).relu()
    return self.out(global_mean_pool(h, batch))
Science example

Case study — ML interatomic potentials.

Replace the inner loop of molecular dynamics:
E(r1, …, rN) = Σi fNN(local env of atom i)
Fi = − ∂E/∂ri   — free, via autograd
Trained on DFT forces & energies. Inference is 10³–10⁶× faster than DFT. Enables ns-scale MD on systems DFT can't touch.
Accuracy vs. cost (log–log, schematic)
cost per evaluation → accuracy → classical FF MLIP (MACE, NequIP) DFT ↖ Pareto shift
Limits

What to worry about.

Distribution shift
A model trained on small organics silently fails on metals.
test ∉ train
Data leakage
Random splits on molecules leak scaffolds. Use scaffold split or temporal split.
IID assumption ≠ reality
Symmetries
Energy must be invariant to rotation/translation/permutation. Architecture should bake this in.
E(3)-equivariance
Uncertainty
Point predictions hide disaster. Calibrated σ is table stakes for decision-making.
GPs · ensembles · conformal
Benchmark ≠ Reality
Leaderboard gains rarely translate to lab yield. Run a prospective study.
retrospective ≠ prospective
Interpretability
A chemist will ignore a black box. Attribution + physics constraints help earn trust.
saliency · SHAP · physics
Wrap-up

Three things to remember.

1
Every ML model is fit parameters to minimise loss. The rest is architecture and tricks.
2
Representation beats architecture. A graph & the right symmetries will outperform a bigger MLP.
3
For science, calibrated uncertainty > a slightly better RMSE.
reading → Deep Learning for Molecules & Materials (White) tools → PyG · e3nn · MACE · SchNetPack data → QM9 · MD17 · Materials Project

Tweaks

Accent
Display font
Background