Deep learning for ¹H NMR

Read couplings out of a blurry spectrum.

At low field, ¹H spectra blend into second-order multiplets. Scroll to sweep the spectrometer from 90 MHz to 600 MHz.

90MHz
90 MHz · low field600 MHz · high field
scroll to sweep ↓
The problem

Cheap spectrometers see overlapping peaks. We recover the real parameters.

A 90 MHz benchtop ¹H NMR spectrum is rarely first-order: chemical-shift differences are comparable to coupling constants, so multiplets overlap and naive peak-picking fails. Spinhance trains a neural network to invert a normalized low-field spectrum back to the underlying spin system — chemical shifts δ (ppm), scalar couplings J (Hz), and proton degeneracies. Once you have those field-independent parameters, the spectrum can be re-simulated exactly at any field strength.

The pipeline

Four stages, from molecule to model.

We scope the problem to molecules with exactly eight hard-equivalent (chemically + magnetically equivalent) spin groups, and build a fully synthetic, physically-plausible training set end to end.

TASK 1 · GENERATE

Screen molecules

Filter large public databases (ChEMBL / PubChem) with RDKit, assigning chemically & magnetically equivalent proton groups, and keep molecules with exactly 8 spin groups.

TASK 2 · SPIN SYSTEM

Heuristic shift + J

Embed each SMILES in 3D, then estimate ²J/³J/⁴J couplings and shifts with Karplus and tabulated rules — a spin-system "shift+J" graph that could have come from a real molecule.

TASK 3 · SIMULATE

Exact spin sim

Diagonalize the spin Hamiltonian to produce accurate spectra at 90 and 600 MHz — a fast pure-Python engine, validated against MestreNova. This animation is that engine, live.

TASK 4 · MODEL

Spectrum → matrix

Train a network mapping a normalized 2¹⁴-point spectrum back to the 8×9 shift+J+degeneracy block — invariant to the arbitrary labeling of groups (S₈ permutation).

The screening dataset

The 8-spin library.

Each molecule has exactly eight magnetically distinct ¹H spin groups (A–H). Hover the table or the 3D structure to see which atoms belong to which group. Explore the dataset here

Spin groups
Group Equivalence Protons
SMILES
The representation

Molecular spin systems are labeled, undirected graphs.

The spin graph is the model's learned intermediate representation — a labeled, undirected graph that sits between the raw spectrum and the output matrix. Nodes carry a chemical shift δ (ppm) and a proton degeneracy n. Edges carry a scalar coupling J (Hz) and a soft-equivalence flag indicating chemically equivalent but magnetically inequivalent protons. Stored as a symmetric 8×8 matrix with shifts on the diagonal and an 8×1 degeneracy vector — defined only up to permutation of the 8 labels (S₈). The structure on the left and the matrix on the right are two views of the same molecule now in the hero.

3D structure
building 3D structure…
spin graph

Showing the molecule currently in the hero animation. Node labels: chemical shift δ (ppm) / degeneracy n. Edge thickness and labels: coupling J (Hz); only |J| > 0.5 Hz shown.

Where we are

How the model came together — architecture, data, and scale.

Each stage introduced a different innovation — first in architecture and loss, then in the dataset itself, and finally in scaling data and model capacity together. Metrics are on a leakage-controlled held-out test set the model never saw. Explore the learning curves, a full run comparison, and per-molecule test predictions ↓

Note (corrected dataset): the dataset was regenerated to give diastereotopic protons independent shifts (they were previously merged), and the whole recipe ladder is being retrained on this corrected data across all three tiers. The quantitative model metrics below are being regenerated and will update as the retrained checkpoints land — treat current numbers as pending.

Explore the models

Interactive views into development & testing.

Training curves and the model comparison report validation-split metrics (used for model selection); the held-out test split — never seen in training or selection — is reported separately below and in the molecule explorer. New to the recipes (025–030)? Read how each model works

Learning curves validation

Model comparison validation

click to sort

Held-out test-set evaluation test

compare across recipe (64k) or model size (026) on the held-out split

Held-out test-molecule explorer test

click a label to toggle its trace
3D structure
building 3D…

Pick a model to see its predictions on a shared set of held-out PubChem test molecules — the leakage-controlled global 10% split that no model trained on or selected against (the CNN baseline was trained on ChEMBL, so PubChem is also out-of-distribution for it). The rendered trace re-simulates the model’s predicted spin system through the exact quantum simulator — a close overlay means the recovered parameters reproduce the input spectrum. Predicted values > 0.1 ppm / 1.5 Hz off ground truth are shown in red.

The team

Built at the 2026 Scripps Hackathon.

LA

Lucas Abounader

Shenvi Lab · simulation & model
SM

Sam Mansfield

Seiple Lab · generation & model
YZ

Yiming Zhang

Shenvi Lab · spin-system generation

RDKitNumPy / SciPy PyTorchspin-Hamiltonian simulation