Every Spinhance model solves the same inverse problem — read a blurry 90 MHz ¹H spectrum and recover the molecule's spin system: each proton group's chemical shift (δ), its scalar couplings (J), and how many equivalent protons it holds. They all share one neural-network backbone and differ only in two things: what extra hints we feed in, and how we shape the training loss. This page explains each, plainly.
A 1-D CNN scans the 16,384-point spectrum and turns local stretches of it into features — the rough "shapes" present at each chemical shift.
Self-attention lets every part of the spectrum inform every other part, so a multiplet at 7 ppm can be interpreted in light of one at 2 ppm.
Eight learned "spin-group" queries — one per proton group — attend back into the spectrum through a Transformer decoder, each gathering the evidence for its own group.
Per-group heads read off the shift and degeneracy; a symmetric pair-wise edge head reads off the coupling between every pair of groups (and, in some recipes, whether two groups are equivalent).
This set-structured design fits the answer — eight unordered groups and a symmetric coupling matrix — far better than a plain CNN with fixed output slots, and it's why it beats the earlier CNN baseline several-fold on every metric. The recipes 025–030 all use this exact backbone.
Predict the spin-system matrix straight from the spectrum, with the chemical-shift error weighted 2× because shifts are the hardest and most valuable target. No extra inputs, no special handling — this is the control every other recipe is measured against.
64k held-out: shift 0.047 ppm · J 1.07 Hz · F1 0.911 · deg 0.944The data is lopsided: most groups are common (CH, CH₃) and most group-pairs aren't coupled, so a model can score well while being lazy on the rare cases. Focal loss down-weights the examples already predicted confidently, redirecting effort toward rare degeneracies (6H, 9H tert-butyls) and the sparse real couplings. Only the loss changes — same inputs as 025.
64k held-out: shift 0.037 ppm · J 0.97 Hz · F1 0.902 · deg 0.929A bedrock NMR fact: a peak's area is proportional to the number of protons — exactly the degeneracy we want. But a local convolution sees peak shapes, not areas; it can't integrate. So we feed a second input channel — the spectrum's running integral — letting the model read relative areas directly and untangle the 2H-vs-1H confusion that shape alone can't.
64k held-out: shift 0.048 ppm · J 1.07 Hz · F1 0.907 · deg 0.941The peak channel and soft-equivalence of 026 together with the focal loss of 027 — testing whether sharpening the hard cases stacks on top of the structural priors. it posts the best degeneracy balanced-accuracy of the five (0.955).
64k held-out: shift 0.039 ppm · J 1.01 Hz · F1 0.889 · deg 0.955All four ideas at once: the peak channel and soft-equivalence (026), the focal loss (027), and the cumulative-integral channel (028). Each targets a different weakness — peak localization, symmetry, class imbalance, and proton-counting — so combining them should give the strongest model. Currently training at all three sizes.
64k / 500k / 3M — training now| tier | model capacity | training molecules |
|---|---|---|
| 64k | light · ~10M params | 64,000 (fast turnaround) |
| 500k | med · ~57M params | 500,000 PubChem |
| 3M | xl · ~137M params | full 3.13M PubChem |
Every model — every recipe at every size — is scored on the same leakage-controlled held-out test set (a global 10% of PubChem that no model trained on), so the comparisons are honest and directly readable.