A 30-minute primer · no ML background required

AI for
Molecules & Materials

From curve-fitting to potentials that predict new matter.

Lecture 01 Master's · AI+Science 8 live demos

Motivation

Why ML for molecules?

01 · Chemical space is huge

10⁶⁰

Drug-like molecules. The universe has ~10⁸⁰ atoms.

02 · Quantum mechanics is slow

O(N⁴)

DFT scaling. Seconds per small molecule, weeks for a crystal.

03 · Experiments are expensive

$2.6B

Mean cost to bring one drug to market.

ML doesn't replace physics or experiments — it makes search through this space affordable.

The Data

Molecules are structured data.

String

CC(=O)OC1=CC=
CC=C1C(=O)O

SMILES — aspirin

Fingerprint (bit-vector)

Hash substructures to 1024 bits

Graph

Atoms = nodes, bonds = edges → GNNs

x ∈ ℝ^d → f(x; θ) ≈ y — pick a representation, then fit

Basics · interactive

Supervised learning, in one picture.

Drag the points. The line refits live.

slope 0.00 intercept 0.00 loss 0.00

ŷ = m·x + b

ℒ(m,b) = ¹/_n Σ (ŷᵢ − yᵢ)²

Learning = find parameters m, b that minimise the loss ℒ.

Every ML model is this sentence, with fancier parameters.

Basics · interactive

Gradient descent walks downhill.

Click anywhere to start. Slide LR to see over/undershoot.

learning rate η0.08

θ_t+1 = θ_t − η ∇_θ ℒ

η too small → crawls.
η too large → diverges.
Non-convex loss → local minima everywhere.

In a real network, θ has millions of dimensions. The picture is a lie — but a useful one.

Basics · interactive

A neuron is a weighted sum, then a squish.

Toggle activations to compare.

ReLU sigmoid tanh GELU

h = σ( W·x + b )

Without the squish σ, stacking layers only produces linear functions. The non-linearity is where expressivity comes from.

ReLU is the modern default — cheap, non-saturating gradient on the right side.

Basics · interactive

A neural network, forward.

Drag the weights (lines). Input = [x₁, x₂]. Watch the output change.

x₁ 0.50 x₂ -0.30 output ŷ = 0.00

Method · interactive

Training = backpropagation.

Step through one training iteration.

phase: forward

chain rule
∂ℒ/∂w_i = ∂ℒ/∂y · ∂y/∂h · ∂h/∂w_i

Forward: push input through the network, compute loss.
Backward: multiply local gradients, layer by layer, right to left.
Update: θ ← θ − η ∇θ ℒ.

Autograd does this for you. You almost never write ∂ by hand again.

Method · interactive

Model too small underfits; too large overfits.

Slide polynomial degree. Dashed = true function.

degree 3 noise 0.15

train loss — test loss —

ℒ_test = bias² + variance + noise

Regularization: weight decay, dropout, data augmentation.
Golden rule: train / val / test split.
If val ≫ train: you're memorising.

Method · interactive

Uncertainty, not just prediction.

Click to add observations. Band = ±2σ posterior.

length ℓ 1.20 noise σ 0.08

f(x) ~ 𝒢𝒫(μ(x), k(x,x'))

k(x,x') = exp(−‖x−x'‖² / 2ℓ²) — RBF

Far from data → uncertainty grows. Active learning: pick the next experiment where σ is highest.

Classical, pre-deep-learning, still the workhorse for small chem/mat datasets.

Method · interactive

Physics-informed neural networks.

Mix data loss vs. physics loss for d²u/dx² = −u.

λ_data 0.50 λ_phys 0.50

truth sin(x) network training points

ℒ = λ_d·Σ(u_NN−u*)² + λ_p·Σ(𝒟[u_NN])²

The network must also satisfy the differential operator 𝒟 — gradients come from autograd. Data becomes a regulariser; physics becomes a loss.

Good when PDE is known & data is scarce. Not a silver bullet when the operator is stiff.

Science example

Case study — predicting molecular properties.

Graph Neural Network · message passing

QM9 benchmark: 134k small molecules, 12 properties (HOMO, LUMO, dipole, ...). Modern GNNs reach chemical accuracy (~1 kcal/mol) on energies.

# PyTorch Geometric — 10 lines
import torch
from torch_geometric.nn import GCNConv, global_mean_pool

class MolGNN(torch.nn.Module):
  def __init__(self, d=64):
    super().__init__()
    self.c1 = GCNConv(11, d)
    self.c2 = GCNConv(d, d)
    self.out = torch.nn.Linear(d, 1)
  def forward(self, x, edge_index, batch):
    h = self.c1(x, edge_index).relu()
    h = self.c2(h, edge_index).relu()
    return self.out(global_mean_pool(h, batch))

Science example

Case study — ML interatomic potentials.

Replace the inner loop of molecular dynamics:

E(r₁, …, r_N) = Σ_i f_NN(local env of atom i)

F_i = − ∂E/∂r_i — free, via autograd

Trained on DFT forces & energies. Inference is 10³–10⁶× faster than DFT. Enables ns-scale MD on systems DFT can't touch.

Accuracy vs. cost (log–log, schematic)

Limits

What to worry about.

Distribution shift

A model trained on small organics silently fails on metals.

test ∉ train

Data leakage

Random splits on molecules leak scaffolds. Use scaffold split or temporal split.

IID assumption ≠ reality

Symmetries

Energy must be invariant to rotation/translation/permutation. Architecture should bake this in.

E(3)-equivariance

Uncertainty

Point predictions hide disaster. Calibrated σ is table stakes for decision-making.

GPs · ensembles · conformal

Benchmark ≠ Reality

Leaderboard gains rarely translate to lab yield. Run a prospective study.

retrospective ≠ prospective

Interpretability

A chemist will ignore a black box. Attribution + physics constraints help earn trust.

saliency · SHAP · physics

Wrap-up

Three things to remember.

Every ML model is fit parameters to minimise loss. The rest is architecture and tricks.

Representation beats architecture. A graph & the right symmetries will outperform a bigger MLP.

For science, calibrated uncertainty > a slightly better RMSE.

reading → Deep Learning for Molecules & Materials (White) tools → PyG · e3nn · MACE · SchNetPack data → QM9 · MD17 · Materials Project

AI forMolecules & Materials

Why ML for molecules?

Molecules are structured data.

Supervised learning, in one picture.

Gradient descent walks downhill.

A neuron is a weighted sum, then a squish.

A neural network, forward.

Training = backpropagation.

Model too small underfits; too large overfits.

Uncertainty, not just prediction.

Physics-informed neural networks.

Case study — predicting molecular properties.

Case study — ML interatomic potentials.

What to worry about.

Three things to remember.

Tweaks

AI for
Molecules & Materials