How to choose a featurization with a VAMP-2 score#

Goal#

Decide which features to build your MSM on - and how many TICA dimensions and what TICA lag to keep - objectively rather than by intuition. The cross-validated VAMP-2 score (vampScore()) rates how much of the slow dynamics a featurization resolves; you score your candidates and keep the best one. This is the variational feature-selection idea of McGibbon and Pande.

Minimal example#

Starting from your sims (a simlist()), score two candidate featurizations of the same trajectories and compare. Higher is better:

from htmd.ui import Metric, MetricSelfDistance, MetricDihedral, vampScore

# Candidate A: Ca-Ca contact map
metrA = Metric(sims)
metrA.set(MetricSelfDistance("protein and name CA", metric="contacts", periodic="chains"))
dataA = metrA.project()
dataA.fstep = 0.1

# Candidate B: backbone dihedrals
metrB = Metric(sims)
metrB.set(MetricDihedral())
dataB = metrB.project()
dataB.fstep = 0.1

for label, data in [("contacts", dataA), ("dihedrals", dataB)]:
    mean, std = vampScore(data, lag=2, dim=4, units="ns")
    print(f"{label}: VAMP-2 = {mean:.2f} +/- {std:.2f}")

Keep the featurization with the highest cross-validated score.

How to read the score#

  • Higher is better - a higher VAMP-2 score means the features resolve more of the slow dynamics.

  • About 1.0 is the floor - a featurization that captures no slow process (pure noise) scores around 1.0, the trivial stationary process.

  • Compare like with like - scores are only comparable when computed at the same lag and dim. Changing either changes the absolute score.

  • It is cross-validated - the score is averaged over nfolds folds, so a featurization that looks good only by overfitting the training data is penalised on the held-out folds. Use the mean and watch the spread (std).

Common variations#

Choose how many dimensions to keep. The score keeps rising as you add dimensions (it sums the squared singular values), so look for diminishing returns rather than a maximum:

for d in (2, 3, 4, 6, 8):
    mean, std = vampScore(data, lag=2, dim=d, units="ns")
    print(f"dim={d}: VAMP-2 = {mean:.2f} +/- {std:.2f}")

Check robustness to the TICA lag:

for lag in (0.5, 1, 2, 5):
    mean, std = vampScore(data, lag=lag, dim=4, units="ns")
    print(f"lag={lag} ns: VAMP-2 = {mean:.2f} +/- {std:.2f}")

Get the per-fold scores instead of the mean and standard deviation (for example to plot your own error bars):

scores = vampScore(data, lag=2, dim=4, units="ns", return_scores=True)

If you have only a few long trajectories, cross-validating over whole trajectories is not possible; split each trajectory into blocks of blocksize frames instead:

mean, std = vampScore(data, lag=2, dim=4, units="ns", blocksize=1000)

Gotchas#

  • Only compare at matched lag and dim. A featurization scored at dim=6 will out-score one at dim=3 purely because of the extra dimensions, not because it is better.

  • Do not simply maximise over dim. Because VAMP-2 sums squared singular values, it rises monotonically with dim. Read the shape of the curve (where the gains flatten), not the peak.

  • A tie is a real and useful answer. Featurizations that both capture the slow process well score about the same - the score is telling you the choice does not matter for the kinetics, so decide on other grounds such as clustering robustness or cost. For example, on the villin headpiece a Cα contact map and raw Cα distances score about equally; the contact map is preferred there for its clustering robustness, not because it resolves more dynamics.

  • Needs at least 2 trajectories for the default whole-trajectory cross-validation. With a single long trajectory, pass blocksize so it can be split into independent blocks.

  • Scoring is not validation. A high VAMP-2 score says the features are promising; it does not replace building the MSM and checking the implied timescales and the Chapman-Kolmogorov test.

See also#